# DIS08 / OR92 Data Modeling: Python - Data structures

# Introduction to Data Structures

---

**Note:** Some examples and assignments in this notebook are taken from the following sources.
- https://docs.python.org/3/tutorial/datastructures.html
- https://realpython.com/python-data-structures/
- https://www.geeksforgeeks.org/python-data-structures/
- https://automatetheboringstuff.com/


---

## What are data structures?
Data structures are ways of organizing and storing data to enable efficient access and modification. They provide a means to manage large amounts of data and are fundamental to designing efficient algorithms. 

In Python, data structures can be broadly categorized into built-in types (like lists, tuples, sets, and dictionaries) and custom types created using classes.

## Importance of choosing the right data structure
Selecting the appropriate data structure is crucial for optimizing the performance of a program. The choice affects:
- **Memory usage:** Some data structures require more memory than others.
- **Speed of operations:** Different structures offer varying performance for insertion, deletion, searching, and updating operations.
- **Code clarity:** The right structure can make code easier to read and maintain.

## Built-in vs. custom data structures
- **Built-in data structures** are provided by Python and include lists, tuples, sets, and dictionaries. They are highly optimized and easy to use.
- **Custom data structures** can be implemented using classes to meet specific needs, such as stacks, queues, linked lists, trees, and graphs.

Choosing between built-in and custom data structures depends on the problem requirements and the trade-offs involved.



## Lists

### Characteristics
- **Ordered**: The elements in a list have a specific order, which is preserved.
- **Mutable**: Lists can be modified after their creation (e.g., adding, removing, or changing elements).
- **Allows Duplicates**: A list can contain multiple instances of the same value.

### Common Operations

#### Indexing

Access elements by their position in the list (starting at index 0).

In [1]:
# Creating a list with some duplicate values
my_list = [1, 2, 3, 4, 2, 5]
print(my_list)  

# Accessing elements
first_element = my_list[0]   # First element
last_element = my_list[-1]   # Last element

print(f"First element: {first_element}, Last element: {last_element}")

[1, 2, 3, 4, 2, 5]
First element: 1, Last element: 5


#### Slicing

Retrieve a subset of the list using slicing.

In [3]:
# Slicing the list
sub_list = my_list[1:4]  # Elements from index 1 to 3
print(sub_list) 

[2, 3, 4]


#### Appending

Add an element to the end of the list using append().

In [5]:
# Appending a new element
my_list.append(6)
print(my_list)  

[1, 2, 3, 4, 2, 5, 6]


#### Inserting

Insert an element at a specific position using insert().

In [7]:
# Inserting an element at index 2
my_list.insert(2, 99)
print(my_list)  

[1, 2, 99, 3, 4, 2, 5, 6]


#### Counting, sorting, reversing, etc.

In [9]:
fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']
fruits

['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']

In [11]:
fruits.count('apple')

2

In [13]:
fruits.index('banana')

3

In [15]:
fruits.index('banana', 4)  # Find next banana starting at position 4

6

In [17]:
fruits.reverse()
fruits

['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange']

#### Using Lists as Stacks

A stack is a linear data structure that follows the Last In, First Out (LIFO) principle, meaning the last element added to the stack is the first one to be removed. It is commonly used for tasks like managing function calls, undo operations, and evaluating expressions.

In Python, a stack can be implemented using a list, where append() adds an item to the top and pop() removes the topmost item. Alternatively, the deque (from the collections module) is preferred for better performance, as it provides efficient O(1) operations for both appending and popping.

**See also:** https://en.wikipedia.org/wiki/Stack_(abstract_data_type)

In [20]:
stack = [3, 4, 5]
stack

[3, 4, 5]

In [22]:
stack.append(6)
stack

[3, 4, 5, 6]

In [24]:
stack.append(7)
stack

[3, 4, 5, 6, 7]

In [26]:
stack.pop()

7

In [28]:
stack.pop()

6

#### Using Lists as Queues

A queue is a linear data structure that follows the First In, First Out (FIFO) principle, meaning the first element added is the first one to be removed. It is commonly used for tasks like task scheduling, breadth-first search, and managing shared resources.

In Python, a queue can be implemented using a list, though it is inefficient for frequent operations due to O(n) complexity when removing the first element. A more efficient approach is to use the deque (from the collections module), which provides O(1) operations for both appending and removing elements from either end. For thread-safe queues, the queue.Queue class from the queue module can be used.

**See also:** https://en.wikipedia.org/wiki/Queue_(abstract_data_type)

In [31]:
from collections import deque

queue = deque(["Eric", "John", "Michael"])
queue

deque(['Eric', 'John', 'Michael'])

In [33]:
queue.append("Terry")
queue

deque(['Eric', 'John', 'Michael', 'Terry'])

In [35]:
queue.append("Graham")
queue

deque(['Eric', 'John', 'Michael', 'Terry', 'Graham'])

In [37]:
queue.popleft()
queue

deque(['John', 'Michael', 'Terry', 'Graham'])

In [39]:
queue.popleft()
queue

deque(['Michael', 'Terry', 'Graham'])

### List comprehension

List comprehension is a concise and elegant way to create and transform lists in Python. It provides a syntactic shortcut for generating new lists by applying an expression to each item in an existing iterable (like a list, range, or string) and optionally filtering elements based on a condition. 

In [42]:
squares = []
for x in range(10):
    squares.append(x**2)

squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [44]:
squares = list(map(lambda x: x**2, range(10)))
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [46]:
squares = [x**2 for x in range(10)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

## Tuples

A tuple is an immutable sequence, meaning its elements cannot be modified after creation. Tuples are defined using parentheses () or simply by separating values with commas. For example, (1, 2, 3) or 1, 2, 3 are tuples. They are commonly used to represent fixed collections of items, like coordinates or function return values. Tuples are efficient and can hold heterogeneous data types, making them ideal for use as keys in dictionaries or when immutability is desired.

A tuple consists of a number of values separated by commas, for instance:

In [49]:
t = 12345, 54321, 'hello!'
t[0]

12345

In [51]:
# Tuples may be nested:
u = t, (1, 2, 3, 4, 5)
u

((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))

In [53]:
# Tuples are immutable:
t[0] = 88888

TypeError: 'tuple' object does not support item assignment

In [55]:
# but they can contain mutable objects:
v = ([1, 2, 3], [3, 2, 1])
v

([1, 2, 3], [3, 2, 1])

## Sets

A set in Python is an unordered collection of unique, immutable elements, commonly used for operations like membership testing, deduplication, and mathematical set operations (union, intersection, difference, and symmetric difference). Sets are defined using curly braces {} or the set() constructor, e.g., {1, 2, 3} or set([1, 2, 3]). They do not support indexing or slicing because they are unordered, but they are highly efficient for checking membership due to their underlying hash table implementation. Python also provides a frozenset, an immutable version of a set, which can be used as dictionary keys or elements of other sets.

In [58]:
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
basket

{'apple', 'banana', 'orange', 'pear'}

In [60]:
# Demonstrate set operations on unique letters from two words
a = set('abracadabra')
b = set('alacazam')

In [62]:
# unique letters in a
a                                  

{'a', 'b', 'c', 'd', 'r'}

In [64]:
# letters in a but not in b
a - b                              

{'b', 'd', 'r'}

In [66]:
# letters in a or b or both
a | b

{'a', 'b', 'c', 'd', 'l', 'm', 'r', 'z'}

In [68]:
# letters in both a and b
a & b                              


{'a', 'c'}

In [70]:
# letters in a or b but not both
a ^ b                              

{'b', 'd', 'l', 'm', 'r', 'z'}

## Dictionaries

A dictionary in Python is an unordered, mutable collection that maps unique keys to values, making it ideal for storing and retrieving data efficiently using key-based lookups. Defined using curly braces {} with key-value pairs separated by colons, e.g., {"name": "Alice", "age": 30}, or created with the dict() constructor, dictionaries allow heterogeneous keys and values, though keys must be immutable (e.g., strings, numbers, or tuples). They support various operations, including adding, updating, and deleting key-value pairs, as well as built-in methods like .get(), .keys(), .values(), and .items() for accessing and manipulating data. Their efficient hash table implementation makes them a cornerstone of Python’s data structures.

In [73]:
tel = {'jack': 4098, 'sape': 4139}
tel['guido'] = 4127
tel

{'jack': 4098, 'sape': 4139, 'guido': 4127}

In [75]:
tel['jack']

4098

In [77]:
del tel['sape']
tel

{'jack': 4098, 'guido': 4127}

In [79]:
tel['irv'] = 4127
tel

{'jack': 4098, 'guido': 4127, 'irv': 4127}

In [81]:
list(tel)

['jack', 'guido', 'irv']

In [83]:
sorted(tel)

['guido', 'irv', 'jack']

In [85]:
'guido' in tel

True

In [87]:
'jack' not in tel

False

## Self-defined classes and basic concepts of object-oriented programming

*Let's go to the library!*

**Base Class (Encapsulation)**

In [93]:
class Book:
    def __init__(self, title, author, isbn, copies=1):
        self._title = title  # Protected attribute
        self._author = author  # Protected attribute
        self._isbn = isbn  # Protected attribute
        self._copies = copies  # Protected attribute

    def display_info(self):
        """Public method to display book information."""
        print(f"Title: {self._title}")
        print(f"Author: {self._author}")
        print(f"ISBN: {self._isbn}")
        print(f"Copies Available: {self._copies}")

    def borrow_book(self):
        """Public method to borrow a book."""
        if self._copies > 0:
            self._copies -= 1
            print(f"Borrowed '{self._title}'. Remaining copies: {self._copies}")
        else:
            print(f"'{self._title}' is currently unavailable.")

    def return_book(self):
        """Public method to return a book."""
        self._copies += 1
        print(f"Returned '{self._title}'. Total copies: {self._copies}")

**Derived Class (Inheritance)**

In [96]:
class EBook(Book):
    def __init__(self, title, author, isbn, file_size, file_format, copies=1):
        super().__init__(title, author, isbn, copies)  # Call the constructor of the base class
        self._file_size = file_size  # Specific to EBook
        self._file_format = file_format  # Specific to EBook

    def display_info(self):
        """Overriding method to include eBook-specific details."""
        super().display_info()
        print(f"File Size: {self._file_size}MB")
        print(f"File Format: {self._file_format}")

**Polymorphism in action**

In [99]:
def book_info(book):
    """Function to demonstrate polymorphism."""
    book.display_info()

**Abstract Class**

In [102]:
from abc import ABC, abstractmethod

class LibraryMember(ABC):
    def __init__(self, name, member_id):
        self.name = name
        self.member_id = member_id

    @abstractmethod
    def borrow(self):
        pass

    @abstractmethod
    def return_item(self):
        pass

**Concrete class inheriting from abstract class**

In [105]:
class Student(LibraryMember):
    def __init__(self, name, member_id, borrowed_books=None):
        super().__init__(name, member_id)
        self.borrowed_books = borrowed_books if borrowed_books else []

    def borrow(self, book):
        """Implements borrowing for a student."""
        if len(self.borrowed_books) < 3:  # Limit students to borrowing 3 books
            book.borrow_book()
            self.borrowed_books.append(book)
        else:
            print(f"{self.name} has already borrowed 3 books!")

    def return_item(self, book):
        """Implements returning a book."""
        if book in self.borrowed_books:
            book.return_book()
            self.borrowed_books.remove(book)
        else:
            print(f"{self.name} did not borrow '{book._title}'.")

    def display_borrowed_books(self):
        print(f"{self.name} has borrowed:")
        for book in self.borrowed_books:
            print(f"- {book._title}")

**Demonstration**

In [108]:
if __name__ == "__main__":
    # Create some book objects
    book1 = Book("The Great Gatsby", "F. Scott Fitzgerald", "123456789", 2)
    book2 = Book("1984", "George Orwell", "987654321", 1)
    ebook1 = EBook("Python Programming", "Guido van Rossum", "555666777", 5, "PDF")

    # Display information using polymorphism
    print("\n--- Book Information ---")
    book_info(book1)
    book_info(ebook1)

    # Create a student member
    student = Student("Alice", "S001")

    # Borrow books
    print("\n--- Borrowing Books ---")
    student.borrow(book1)
    student.borrow(book2)
    student.borrow(ebook1)

    # Try borrowing more than 3 books
    book3 = Book("To Kill a Mockingbird", "Harper Lee", "222333444", 1)
    student.borrow(book3)

    # Display borrowed books
    print("\n--- Borrowed Books ---")
    student.display_borrowed_books()

    # Return a book
    print("\n--- Returning a Book ---")
    student.return_item(book1)
    student.display_borrowed_books()

    # Borrow again
    print("\n--- Borrowing After Returning ---")
    student.borrow(book3)

    # Abstract base class demonstration
    print("\n--- Abstract Class Implementation ---")
    print(f"{student.name} (ID: {student.member_id}) is a library member.")


--- Book Information ---
Title: The Great Gatsby
Author: F. Scott Fitzgerald
ISBN: 123456789
Copies Available: 2
Title: Python Programming
Author: Guido van Rossum
ISBN: 555666777
Copies Available: 1
File Size: 5MB
File Format: PDF

--- Borrowing Books ---
Borrowed 'The Great Gatsby'. Remaining copies: 1
Borrowed '1984'. Remaining copies: 0
Borrowed 'Python Programming'. Remaining copies: 0
Alice has already borrowed 3 books!

--- Borrowed Books ---
Alice has borrowed:
- The Great Gatsby
- 1984
- Python Programming

--- Returning a Book ---
Returned 'The Great Gatsby'. Total copies: 2
Alice has borrowed:
- 1984
- Python Programming

--- Borrowing After Returning ---
Borrowed 'To Kill a Mockingbird'. Remaining copies: 0

--- Abstract Class Implementation ---
Alice (ID: S001) is a library member.


## Pandas' DataFrame class

*Renting bikes in NYC...*

In the following, we use data from Citi Bike.

> **Wikipedia:** Citi Bike is a privately owned public bicycle sharing system serving the New York City boroughs of the Bronx, Brooklyn, Manhattan, and Queens, as well as Jersey City and Hoboken, New Jersey. Named after lead sponsor Citigroup, it was operated by Motivate (formerly Alta Bicycle Share), with former Metropolitan Transportation Authority CEO Jay Walder as chief executive until September 30, 2018, when the company was acquired by Lyft. The system's bikes and stations use technology from Lyft. 

**Source:** https://en.wikipedia.org/wiki/Citi_Bike

Citi Bike provides monthly reports of their service usage that can be obtained from https://citibikenyc.com/system-data or more specifcially https://s3.amazonaws.com/tripdata/index.html

In [111]:
!wget https://s3.amazonaws.com/tripdata/JC-202410-citibike-tripdata.csv.zip && unzip JC-202410-citibike-tripdata.csv.zip

'wget' is not recognized as an internal or external command,
operable program or batch file.


In [115]:
import pandas as pd

df = pd.read_csv(r"C:\Users\user\Desktop\TH KÖLN\Semester 5\Daten Modellierung\202412-citibike-tripdata\202412-citibike-tripdata_1.csv")
df

  df = pd.read_csv(r"C:\Users\user\Desktop\TH KÖLN\Semester 5\Daten Modellierung\202412-citibike-tripdata\202412-citibike-tripdata_1.csv")


Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,B44E5B10AEE58AD0,classic_bike,2024-12-14 10:58:18.153,2024-12-14 11:11:11.308,Frederick Douglass Blvd & W 145 St,7954.12,E 138 St & 5 Ave,7809.13,40.823061,-73.941928,40.814490,-73.936153,member
1,BC252DC6A6011556,electric_bike,2024-12-12 14:46:12.473,2024-12-12 16:45:37.777,Madison Ave & E 99 St,7443.01,,,40.789485,-73.952429,40.780000,-73.960000,member
2,6FBE55EF6FE8736D,electric_bike,2024-12-11 07:55:18.770,2024-12-11 08:02:23.460,Columbia St & Kane St,4422.05,,,40.687632,-74.001626,40.690000,-74.000000,member
3,908890DE7FDCF9FE,electric_bike,2024-12-09 22:51:11.668,2024-12-09 22:57:43.495,E 13 St & 2 Ave,5820.08,E 10 St & 2 Ave,5746.02,40.731539,-73.985302,40.729708,-73.986598,member
4,D5D366379A4DC0A8,classic_bike,2024-12-10 18:48:40.063,2024-12-10 19:10:32.264,11 Ave & W 41 St,6726.01,E 25 St & 1 Ave,6004.07,40.760301,-73.998842,40.738177,-73.977387,member
...,...,...,...,...,...,...,...,...,...,...,...,...,...
999995,4D7A0F3A9B538327,classic_bike,2024-12-06 18:43:51.866,2024-12-06 18:50:29.033,5 Ave & E 30 St,6248.08,10 Ave & W 28 St,6459.04,40.745985,-73.986295,40.750664,-74.001768,member
999996,93C022D486F87ABC,classic_bike,2024-12-10 10:34:58.071,2024-12-10 10:51:49.151,Lafayette St & Grand St,5422.09,10 Ave & W 28 St,6459.04,40.720280,-73.998790,40.750664,-74.001768,member
999997,20A11C486859F19B,electric_bike,2024-12-03 14:02:29.375,2024-12-03 14:07:51.452,Lenox Ave & W 117 St,7655.22,W 110 St & Amsterdam Ave,7646.04,40.802557,-73.949078,40.802692,-73.962950,member
999998,4D27B49621858BF9,electric_bike,2024-12-05 07:03:08.210,2024-12-05 07:06:03.572,Watts St & Greenwich St,5578.02,West St & Chambers St,5329.03,40.724055,-74.009660,40.717548,-74.013221,casual


In [117]:
# Inspect columns with mixed data types
print(df.iloc[:, [5, 7]].head())  # Adjusting column indices for the specific issue


  start_station_id end_station_id
0          7954.12        7809.13
1          7443.01            NaN
2          4422.05            NaN
3          5820.08        5746.02
4          6726.01        6004.07


In [119]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 13 columns):
 #   Column              Non-Null Count    Dtype  
---  ------              --------------    -----  
 0   ride_id             1000000 non-null  object 
 1   rideable_type       1000000 non-null  object 
 2   started_at          1000000 non-null  object 
 3   ended_at            1000000 non-null  object 
 4   start_station_name  999375 non-null   object 
 5   start_station_id    999375 non-null   object 
 6   end_station_name    996417 non-null   object 
 7   end_station_id      995525 non-null   object 
 8   start_lat           1000000 non-null  float64
 9   start_lng           1000000 non-null  float64
 10  end_lat             999795 non-null   float64
 11  end_lng             999795 non-null   float64
 12  member_casual       1000000 non-null  object 
dtypes: float64(4), object(9)
memory usage: 99.2+ MB


In [121]:
df.columns

Index(['ride_id', 'rideable_type', 'started_at', 'ended_at',
       'start_station_name', 'start_station_id', 'end_station_name',
       'end_station_id', 'start_lat', 'start_lng', 'end_lat', 'end_lng',
       'member_casual'],
      dtype='object')

In [123]:
df.head(n=10)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,B44E5B10AEE58AD0,classic_bike,2024-12-14 10:58:18.153,2024-12-14 11:11:11.308,Frederick Douglass Blvd & W 145 St,7954.12,E 138 St & 5 Ave,7809.13,40.823061,-73.941928,40.81449,-73.936153,member
1,BC252DC6A6011556,electric_bike,2024-12-12 14:46:12.473,2024-12-12 16:45:37.777,Madison Ave & E 99 St,7443.01,,,40.789485,-73.952429,40.78,-73.96,member
2,6FBE55EF6FE8736D,electric_bike,2024-12-11 07:55:18.770,2024-12-11 08:02:23.460,Columbia St & Kane St,4422.05,,,40.687632,-74.001626,40.69,-74.0,member
3,908890DE7FDCF9FE,electric_bike,2024-12-09 22:51:11.668,2024-12-09 22:57:43.495,E 13 St & 2 Ave,5820.08,E 10 St & 2 Ave,5746.02,40.731539,-73.985302,40.729708,-73.986598,member
4,D5D366379A4DC0A8,classic_bike,2024-12-10 18:48:40.063,2024-12-10 19:10:32.264,11 Ave & W 41 St,6726.01,E 25 St & 1 Ave,6004.07,40.760301,-73.998842,40.738177,-73.977387,member
5,D56FA800710E6478,classic_bike,2024-12-03 13:14:09.026,2024-12-03 13:16:23.278,E 13 St & 2 Ave,5820.08,E 10 St & 2 Ave,5746.02,40.731539,-73.985302,40.729708,-73.986598,member
6,DF7648016BCEECD1,electric_bike,2024-12-13 16:07:22.623,2024-12-13 16:14:25.927,Bond St & Fulton St,4479.06,Columbia Heights & Cranberry St,4829.01,40.689622,-73.983043,40.700379,-73.995481,member
7,B3D30FB1C434D756,classic_bike,2024-12-13 10:19:37.918,2024-12-13 10:22:25.468,W 24 St & 7 Ave,6257.03,W 25 St & 9 Ave,6339.06,40.744876,-73.995299,40.747833,-74.000572,member
8,35B761A9266DBB9E,classic_bike,2024-12-14 18:13:21.420,2024-12-14 18:18:06.067,W 24 St & 7 Ave,6257.03,W 25 St & 9 Ave,6339.06,40.744876,-73.995299,40.747833,-74.000572,member
9,BA1E0BE59444AA29,electric_bike,2024-12-10 18:03:56.635,2024-12-10 18:06:46.676,W 24 St & 7 Ave,6257.03,W 25 St & 9 Ave,6339.06,40.744876,-73.995299,40.747833,-74.000572,member


In [125]:
df.tail(n=10)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
999990,7CC1A828C23BC380,classic_bike,2024-12-12 09:37:03.823,2024-12-12 09:42:43.090,E 17 St & Broadway,5980.1,E 31 St & 3 Ave,6239.08,40.737006,-73.990134,40.743943,-73.979661,member
999991,71E92C5FF72D8AAE,classic_bike,2024-12-12 06:51:31.858,2024-12-12 06:57:59.346,Washington St & Gansevoort St,6039.06,10 Ave & W 28 St,6459.04,40.739323,-74.008119,40.750664,-74.001768,member
999992,284CA3F988EC441C,electric_bike,2024-12-09 18:19:48.103,2024-12-09 18:29:23.536,E 17 St & Broadway,5980.1,E 31 St & 3 Ave,6239.08,40.737006,-73.990134,40.743943,-73.979661,casual
999993,21687933F8277992,classic_bike,2024-12-03 16:30:44.051,2024-12-03 16:41:10.112,Washington St & Gansevoort St,6039.06,West St & Chambers St,5329.03,40.739323,-74.008119,40.717548,-74.013221,member
999994,7F9F2EB83A791E29,electric_bike,2024-12-10 11:09:55.259,2024-12-10 11:13:49.980,Washington St & Gansevoort St,6039.06,10 Ave & W 28 St,6459.04,40.739323,-74.008119,40.750664,-74.001768,member
999995,4D7A0F3A9B538327,classic_bike,2024-12-06 18:43:51.866,2024-12-06 18:50:29.033,5 Ave & E 30 St,6248.08,10 Ave & W 28 St,6459.04,40.745985,-73.986295,40.750664,-74.001768,member
999996,93C022D486F87ABC,classic_bike,2024-12-10 10:34:58.071,2024-12-10 10:51:49.151,Lafayette St & Grand St,5422.09,10 Ave & W 28 St,6459.04,40.72028,-73.99879,40.750664,-74.001768,member
999997,20A11C486859F19B,electric_bike,2024-12-03 14:02:29.375,2024-12-03 14:07:51.452,Lenox Ave & W 117 St,7655.22,W 110 St & Amsterdam Ave,7646.04,40.802557,-73.949078,40.802692,-73.96295,member
999998,4D27B49621858BF9,electric_bike,2024-12-05 07:03:08.210,2024-12-05 07:06:03.572,Watts St & Greenwich St,5578.02,West St & Chambers St,5329.03,40.724055,-74.00966,40.717548,-74.013221,casual
999999,AA99DCFF9F464CC0,electric_bike,2024-12-09 08:33:59.397,2024-12-09 08:46:22.344,Washington Ave & E 174 St,8277.03,Courtlandt Ave & E 149 St,7840.05,40.843079,-73.900216,40.816402,-73.919549,member


In [127]:
df.describe() # Attention! Some methods do not always make sense!

Unnamed: 0,start_lat,start_lng,end_lat,end_lng
count,1000000.0,1000000.0,999795.0,999795.0
mean,40.737646,-73.970963,40.737673,-73.970899
std,0.040257,0.028802,0.081166,0.131326
min,40.633385,-74.026823,0.0,-74.071455
25%,40.713532,-73.991475,40.714211,-73.991449
50%,40.737815,-73.978985,40.73829,-73.979481
75%,40.760339,-73.954823,40.760301,-73.954823
max,40.8863,-73.84672,40.93,0.0


In [129]:
df.loc[0]

ride_id                                 B44E5B10AEE58AD0
rideable_type                               classic_bike
started_at                       2024-12-14 10:58:18.153
ended_at                         2024-12-14 11:11:11.308
start_station_name    Frederick Douglass Blvd & W 145 St
start_station_id                                 7954.12
end_station_name                        E 138 St & 5 Ave
end_station_id                                   7809.13
start_lat                                      40.823061
start_lng                                     -73.941928
end_lat                                         40.81449
end_lng                                       -73.936153
member_casual                                     member
Name: 0, dtype: object

In [131]:
df.iloc[0]

ride_id                                 B44E5B10AEE58AD0
rideable_type                               classic_bike
started_at                       2024-12-14 10:58:18.153
ended_at                         2024-12-14 11:11:11.308
start_station_name    Frederick Douglass Blvd & W 145 St
start_station_id                                 7954.12
end_station_name                        E 138 St & 5 Ave
end_station_id                                   7809.13
start_lat                                      40.823061
start_lng                                     -73.941928
end_lat                                         40.81449
end_lng                                       -73.936153
member_casual                                     member
Name: 0, dtype: object

In [133]:
df.member_casual

0         member
1         member
2         member
3         member
4         member
           ...  
999995    member
999996    member
999997    member
999998    casual
999999    member
Name: member_casual, Length: 1000000, dtype: object

In [135]:
df.sort_values(by='started_at').head(n=1)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
415342,05474750E89F3978,classic_bike,2024-11-30 08:38:12.374,2024-12-01 09:38:06.894,43 Ave & 47 St,6209.05,,,40.744806,-73.91729,,,casual


## Lab assignments

### Comma Code

**Source:** https://automatetheboringstuff.com/2e/chapter4/

Say you have a list value like this:

```
spam = ['apples', 'bananas', 'tofu', 'cats']
``` 

Write a function that takes a list value as an argument and returns a string with all the items separated by a comma and a space, with and inserted before the last item. For example, passing the previous spam list to the function would return 'apples, bananas, tofu, and cats'. But your function should be able to work with any list value passed to it. Be sure to test the case where an empty list [] is passed to your function.

In [140]:
def comma_code(items):
    """
    Converts a list into a string where items are separated by commas,
    with 'and' before the last item.
    """
    if not items:  # Handle empty list
        return ''
    elif len(items) == 1:  # Handle single item
        return items[0]
    else:
        return ', '.join(items[:-1]) + ', and ' + items[-1]

# Example usage
spam = ['apples', 'bananas', 'tofu', 'cats']
print(comma_code(spam))  # Expected output: "apples, bananas, tofu, and cats"

# Test cases
print(comma_code([]))                # Expected output: ""
print(comma_code(['apples']))        # Expected output: "apples"
print(comma_code(['apples', 'tofu'])) # Expected output: "apples and tofu"


apples, bananas, tofu, and cats

apples
apples, and tofu


### Coin Flip Streaks

**Source:** https://automatetheboringstuff.com/2e/chapter4/

For this exercise, we’ll try doing an experiment. If you flip a coin 100 times and write down an “H” for each heads and “T” for each tails, you’ll create a list that looks like “T T T T H H H H T T.” If you ask a human to make up 100 random coin flips, you’ll probably end up with alternating head-tail results like “H T H T H H T H T T,” which looks random (to humans), but isn’t mathematically random. A human will almost never write down a streak of six heads or six tails in a row, even though it is highly likely to happen in truly random coin flips. Humans are predictably bad at being random.

Write a program to find out how often a streak of six heads or a streak of six tails comes up in a randomly generated list of heads and tails. Your program breaks up the experiment into two parts: the first part generates a list of randomly selected 'heads' and 'tails' values, and the second part checks if there is a streak in it. Put all of this code in a loop that repeats the experiment 10,000 times so we can find out what percentage of the coin flips contains a streak of six heads or tails in a row. As a hint, the function call random.randint(0, 1) will return a 0 value 50% of the time and a 1 value the other 50% of the time.

You can start with the following template:

In [144]:
import random

# Number of experiments
numberOfStreaks = 0

# Perform 10,000 experiments
for experimentNumber in range(10000):
    # Generate a list of 100 random 'H' or 'T' values
    flips = [random.choice(['H', 'T']) for _ in range(100)]
    
    # Check for streaks of 6 or more
    streak = 1  # Initialize streak counter
    for i in range(1, len(flips)):
        if flips[i] == flips[i - 1]:
            streak += 1
            if streak == 6:  # A streak of 6 is found
                numberOfStreaks += 1
                break  # No need to check further in this experiment
        else:
            streak = 1  # Reset streak counter

# Calculate the percentage of streaks
chanceOfStreak = (numberOfStreaks / 10000) * 100
print(f'Chance of streak: {chanceOfStreak:.2f}%')


Chance of streak: 80.31%


### Back to the dungeon!

Remember [bashcrawl]() from the second lab exercise? Now we are going back the "dungeon setting" but **this time you will actually implement parts of the game!**

#### Fantasy Game Inventory

**Source:** https://automatetheboringstuff.com/2e/chapter5/

You are creating a fantasy video game. The data structure to model the player’s inventory will be a dictionary where the keys are string values describing the item in the inventory and the value is an integer value detailing how many of that item the player has. For example, the dictionary value {'rope': 1, 'torch': 6, 'gold coin': 42, 'dagger': 1, 'arrow': 12} means the player has 1 rope, 6 torches, 42 gold coins, and so on.

Write a function named displayInventory() that would take any possible “inventory” and display it like the following:

```
Inventory:
12 arrow
42 gold coin
1 rope
6 torch
1 dagger
Total number of items: 62
``` 
**Hint:** You can use a for loop to loop through all the keys in a dictionary.

You can start with the following template:

In [147]:
stuff = {'rope': 1, 'torch': 6, 'gold coin': 42, 'dagger': 1, 'arrow': 12}

def displayInventory(inventory):
    print("Inventory:")
    item_total = 0
    for item, count in inventory.items():
        print(f"{count} {item}")
        item_total += count
    print("Total number of items: " + str(item_total))

displayInventory(stuff)


Inventory:
1 rope
6 torch
42 gold coin
1 dagger
12 arrow
Total number of items: 62


#### List to Dictionary Function for Fantasy Game 

**Source:** https://automatetheboringstuff.com/2e/chapter5/

Imagine that a vanquished dragon’s loot is represented as a list of strings like this:

```
dragonLoot = ['gold coin', 'dagger', 'gold coin', 'gold coin', 'ruby']
```

Write a function named addToInventory(inventory, addedItems), where the inventory parameter is a dictionary representing the player’s inventory (like in the previous project) and the addedItems parameter is a list like dragonLoot. The addToInventory() function should return a dictionary that represents the updated inventory. Note that the addedItems list can contain multiples of the same item. 

The previous program (with your displayInventory() function from the previous project) would output the following:

```
Inventory:
45 gold coin
1 rope
1 ruby
1 dagger

Total number of items: 48
```

Your code could look something like this:


In [149]:
def addToInventory(inventory, addedItems):
    for item in addedItems:
        if item in inventory:
            inventory[item] += 1
        else:
            inventory[item] = 1
    return inventory

def displayInventory(inventory):
    print("Inventory:")
    item_total = 0
    for item, count in inventory.items():
        print(f"{count} {item}")
        item_total += count
    print("Total number of items: " + str(item_total))

# Existing inventory and dragon's loot
inv = {'gold coin': 42, 'rope': 1}
dragonLoot = ['gold coin', 'dagger', 'gold coin', 'gold coin', 'ruby']

# Update inventory with dragon's loot
inv = addToInventory(inv, dragonLoot)

# Display the updated inventory
displayInventory(inv)


Inventory:
45 gold coin
1 rope
1 dagger
1 ruby
Total number of items: 48
