# DIS08 / OR92 Data Modeling: Python - Data structures

# Introduction to Data Structures

---

**Note:** Some examples and assignments in this notebook are taken from the following sources.
- https://docs.python.org/3/tutorial/datastructures.html
- https://realpython.com/python-data-structures/
- https://www.geeksforgeeks.org/python-data-structures/
- https://automatetheboringstuff.com/


---

## What are data structures?
Data structures are ways of organizing and storing data to enable efficient access and modification. They provide a means to manage large amounts of data and are fundamental to designing efficient algorithms. 

In Python, data structures can be broadly categorized into built-in types (like lists, tuples, sets, and dictionaries) and custom types created using classes.

## Importance of choosing the right data structure
Selecting the appropriate data structure is crucial for optimizing the performance of a program. The choice affects:
- **Memory usage:** Some data structures require more memory than others.
- **Speed of operations:** Different structures offer varying performance for insertion, deletion, searching, and updating operations.
- **Code clarity:** The right structure can make code easier to read and maintain.

## Built-in vs. custom data structures
- **Built-in data structures** are provided by Python and include lists, tuples, sets, and dictionaries. They are highly optimized and easy to use.
- **Custom data structures** can be implemented using classes to meet specific needs, such as stacks, queues, linked lists, trees, and graphs.

Choosing between built-in and custom data structures depends on the problem requirements and the trade-offs involved.



## Lists

### Characteristics
- **Ordered**: The elements in a list have a specific order, which is preserved.
- **Mutable**: Lists can be modified after their creation (e.g., adding, removing, or changing elements).
- **Allows Duplicates**: A list can contain multiple instances of the same value.

### Common Operations

#### Indexing

Access elements by their position in the list (starting at index 0).

In [1]:
# Creating a list with some duplicate values
my_list = [1, 2, 3, 4, 2, 5]
print(my_list)  

# Accessing elements
first_element = my_list[0]   # First element
last_element = my_list[-1]   # Last element

print(f"First element: {first_element}, Last element: {last_element}")

[1, 2, 3, 4, 2, 5]
First element: 1, Last element: 5


#### Slicing

Retrieve a subset of the list using slicing.

In [2]:
# Slicing the list
sub_list = my_list[1:4]  # Elements from index 1 to 3
print(sub_list) 

[2, 3, 4]


#### Appending

Add an element to the end of the list using append().

In [3]:
# Appending a new element
my_list.append(6)
print(my_list)  

[1, 2, 3, 4, 2, 5, 6]


#### Inserting

Insert an element at a specific position using insert().

In [4]:
# Inserting an element at index 2
my_list.insert(2, 99)
print(my_list)  

[1, 2, 99, 3, 4, 2, 5, 6]


#### Counting, sorting, reversing, etc.

In [5]:
fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']
fruits

['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']

In [6]:
fruits.count('apple')

2

In [7]:
fruits.index('banana')

3

In [8]:
fruits.index('banana', 4)  # Find next banana starting at position 4

6

In [9]:
fruits.reverse()
fruits

['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange']

#### Using Lists as Stacks

A stack is a linear data structure that follows the Last In, First Out (LIFO) principle, meaning the last element added to the stack is the first one to be removed. It is commonly used for tasks like managing function calls, undo operations, and evaluating expressions.

In Python, a stack can be implemented using a list, where append() adds an item to the top and pop() removes the topmost item. Alternatively, the deque (from the collections module) is preferred for better performance, as it provides efficient O(1) operations for both appending and popping.

**See also:** https://en.wikipedia.org/wiki/Stack_(abstract_data_type)

In [10]:
stack = [3, 4, 5]
stack

[3, 4, 5]

In [11]:
stack.append(6)
stack

[3, 4, 5, 6]

In [12]:
stack.append(7)
stack

[3, 4, 5, 6, 7]

In [13]:
stack.pop()

7

In [14]:
stack.pop()

6

#### Using Lists as Queues

A queue is a linear data structure that follows the First In, First Out (FIFO) principle, meaning the first element added is the first one to be removed. It is commonly used for tasks like task scheduling, breadth-first search, and managing shared resources.

In Python, a queue can be implemented using a list, though it is inefficient for frequent operations due to O(n) complexity when removing the first element. A more efficient approach is to use the deque (from the collections module), which provides O(1) operations for both appending and removing elements from either end. For thread-safe queues, the queue.Queue class from the queue module can be used.

**See also:** https://en.wikipedia.org/wiki/Queue_(abstract_data_type)

In [15]:
from collections import deque

queue = deque(["Eric", "John", "Michael"])
queue

deque(['Eric', 'John', 'Michael'])

In [16]:
queue.append("Terry")
queue

deque(['Eric', 'John', 'Michael', 'Terry'])

In [17]:
queue.append("Graham")
queue

deque(['Eric', 'John', 'Michael', 'Terry', 'Graham'])

In [18]:
queue.popleft()
queue

deque(['John', 'Michael', 'Terry', 'Graham'])

In [19]:
queue.popleft()
queue

deque(['Michael', 'Terry', 'Graham'])

### List comprehension

List comprehension is a concise and elegant way to create and transform lists in Python. It provides a syntactic shortcut for generating new lists by applying an expression to each item in an existing iterable (like a list, range, or string) and optionally filtering elements based on a condition. 

In [20]:
squares = []
for x in range(10):
    squares.append(x**2)

squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [21]:
squares = list(map(lambda x: x**2, range(10)))
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [22]:
squares = [x**2 for x in range(10)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

## Tuples

A tuple is an immutable sequence, meaning its elements cannot be modified after creation. Tuples are defined using parentheses () or simply by separating values with commas. For example, (1, 2, 3) or 1, 2, 3 are tuples. They are commonly used to represent fixed collections of items, like coordinates or function return values. Tuples are efficient and can hold heterogeneous data types, making them ideal for use as keys in dictionaries or when immutability is desired.

A tuple consists of a number of values separated by commas, for instance:

In [23]:
t = 12345, 54321, 'hello!'
t[0]

12345

In [24]:
# Tuples may be nested:
u = t, (1, 2, 3, 4, 5)
u

((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))

In [25]:
# Tuples are immutable:
t[0] = 88888

TypeError: 'tuple' object does not support item assignment

In [26]:
# but they can contain mutable objects:
v = ([1, 2, 3], [3, 2, 1])
v

([1, 2, 3], [3, 2, 1])

## Sets

A set in Python is an unordered collection of unique, immutable elements, commonly used for operations like membership testing, deduplication, and mathematical set operations (union, intersection, difference, and symmetric difference). Sets are defined using curly braces {} or the set() constructor, e.g., {1, 2, 3} or set([1, 2, 3]). They do not support indexing or slicing because they are unordered, but they are highly efficient for checking membership due to their underlying hash table implementation. Python also provides a frozenset, an immutable version of a set, which can be used as dictionary keys or elements of other sets.

In [27]:
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
basket

{'apple', 'banana', 'orange', 'pear'}

In [28]:
# Demonstrate set operations on unique letters from two words
a = set('abracadabra')
b = set('alacazam')

In [29]:
# unique letters in a
a                                  

{'a', 'b', 'c', 'd', 'r'}

In [30]:
# letters in a but not in b
a - b                              

{'b', 'd', 'r'}

In [31]:
# letters in a or b or both
a | b

{'a', 'b', 'c', 'd', 'l', 'm', 'r', 'z'}

In [32]:
# letters in both a and b
a & b                              


{'a', 'c'}

In [33]:
# letters in a or b but not both
a ^ b                              

{'b', 'd', 'l', 'm', 'r', 'z'}

## Dictionaries

A dictionary in Python is an unordered, mutable collection that maps unique keys to values, making it ideal for storing and retrieving data efficiently using key-based lookups. Defined using curly braces {} with key-value pairs separated by colons, e.g., {"name": "Alice", "age": 30}, or created with the dict() constructor, dictionaries allow heterogeneous keys and values, though keys must be immutable (e.g., strings, numbers, or tuples). They support various operations, including adding, updating, and deleting key-value pairs, as well as built-in methods like .get(), .keys(), .values(), and .items() for accessing and manipulating data. Their efficient hash table implementation makes them a cornerstone of Python’s data structures.

In [34]:
tel = {'jack': 4098, 'sape': 4139}
tel['guido'] = 4127
tel

{'jack': 4098, 'sape': 4139, 'guido': 4127}

In [35]:
tel['jack']

4098

In [36]:
del tel['sape']
tel

{'jack': 4098, 'guido': 4127}

In [37]:
tel['irv'] = 4127
tel

{'jack': 4098, 'guido': 4127, 'irv': 4127}

In [38]:
list(tel)

['jack', 'guido', 'irv']

In [39]:
sorted(tel)

['guido', 'irv', 'jack']

In [40]:
'guido' in tel

True

In [41]:
'jack' not in tel

False

## Self-defined classes and basic concepts of object-oriented programming

*Let's go to the library!*

**Base Class (Encapsulation)**

In [42]:
class Book:
    def __init__(self, title, author, isbn, copies=1):
        self._title = title  # Protected attribute
        self._author = author  # Protected attribute
        self._isbn = isbn  # Protected attribute
        self._copies = copies  # Protected attribute

    def display_info(self):
        """Public method to display book information."""
        print(f"Title: {self._title}")
        print(f"Author: {self._author}")
        print(f"ISBN: {self._isbn}")
        print(f"Copies Available: {self._copies}")

    def borrow_book(self):
        """Public method to borrow a book."""
        if self._copies > 0:
            self._copies -= 1
            print(f"Borrowed '{self._title}'. Remaining copies: {self._copies}")
        else:
            print(f"'{self._title}' is currently unavailable.")

    def return_book(self):
        """Public method to return a book."""
        self._copies += 1
        print(f"Returned '{self._title}'. Total copies: {self._copies}")

**Derived Class (Inheritance)**

In [43]:
class EBook(Book):
    def __init__(self, title, author, isbn, file_size, file_format, copies=1):
        super().__init__(title, author, isbn, copies)  # Call the constructor of the base class
        self._file_size = file_size  # Specific to EBook
        self._file_format = file_format  # Specific to EBook

    def display_info(self):
        """Overriding method to include eBook-specific details."""
        super().display_info()
        print(f"File Size: {self._file_size}MB")
        print(f"File Format: {self._file_format}")

**Polymorphism in action**

In [44]:
def book_info(book):
    """Function to demonstrate polymorphism."""
    book.display_info()

**Abstract Class**

In [45]:
from abc import ABC, abstractmethod

class LibraryMember(ABC):
    def __init__(self, name, member_id):
        self.name = name
        self.member_id = member_id

    @abstractmethod
    def borrow(self):
        pass

    @abstractmethod
    def return_item(self):
        pass

**Concrete class inheriting from abstract class**

In [46]:
class Student(LibraryMember):
    def __init__(self, name, member_id, borrowed_books=None):
        super().__init__(name, member_id)
        self.borrowed_books = borrowed_books if borrowed_books else []

    def borrow(self, book):
        """Implements borrowing for a student."""
        if len(self.borrowed_books) < 3:  # Limit students to borrowing 3 books
            book.borrow_book()
            self.borrowed_books.append(book)
        else:
            print(f"{self.name} has already borrowed 3 books!")

    def return_item(self, book):
        """Implements returning a book."""
        if book in self.borrowed_books:
            book.return_book()
            self.borrowed_books.remove(book)
        else:
            print(f"{self.name} did not borrow '{book._title}'.")

    def display_borrowed_books(self):
        print(f"{self.name} has borrowed:")
        for book in self.borrowed_books:
            print(f"- {book._title}")

**Demonstration**

In [47]:
if __name__ == "__main__":
    # Create some book objects
    book1 = Book("The Great Gatsby", "F. Scott Fitzgerald", "123456789", 2)
    book2 = Book("1984", "George Orwell", "987654321", 1)
    ebook1 = EBook("Python Programming", "Guido van Rossum", "555666777", 5, "PDF")

    # Display information using polymorphism
    print("\n--- Book Information ---")
    book_info(book1)
    book_info(ebook1)

    # Create a student member
    student = Student("Alice", "S001")

    # Borrow books
    print("\n--- Borrowing Books ---")
    student.borrow(book1)
    student.borrow(book2)
    student.borrow(ebook1)

    # Try borrowing more than 3 books
    book3 = Book("To Kill a Mockingbird", "Harper Lee", "222333444", 1)
    student.borrow(book3)

    # Display borrowed books
    print("\n--- Borrowed Books ---")
    student.display_borrowed_books()

    # Return a book
    print("\n--- Returning a Book ---")
    student.return_item(book1)
    student.display_borrowed_books()

    # Borrow again
    print("\n--- Borrowing After Returning ---")
    student.borrow(book3)

    # Abstract base class demonstration
    print("\n--- Abstract Class Implementation ---")
    print(f"{student.name} (ID: {student.member_id}) is a library member.")


--- Book Information ---
Title: The Great Gatsby
Author: F. Scott Fitzgerald
ISBN: 123456789
Copies Available: 2
Title: Python Programming
Author: Guido van Rossum
ISBN: 555666777
Copies Available: 1
File Size: 5MB
File Format: PDF

--- Borrowing Books ---
Borrowed 'The Great Gatsby'. Remaining copies: 1
Borrowed '1984'. Remaining copies: 0
Borrowed 'Python Programming'. Remaining copies: 0
Alice has already borrowed 3 books!

--- Borrowed Books ---
Alice has borrowed:
- The Great Gatsby
- 1984
- Python Programming

--- Returning a Book ---
Returned 'The Great Gatsby'. Total copies: 2
Alice has borrowed:
- 1984
- Python Programming

--- Borrowing After Returning ---
Borrowed 'To Kill a Mockingbird'. Remaining copies: 0

--- Abstract Class Implementation ---
Alice (ID: S001) is a library member.


## Pandas' DataFrame class

*Renting bikes in NYC...*

In the following, we use data from Citi Bike.

> **Wikipedia:** Citi Bike is a privately owned public bicycle sharing system serving the New York City boroughs of the Bronx, Brooklyn, Manhattan, and Queens, as well as Jersey City and Hoboken, New Jersey. Named after lead sponsor Citigroup, it was operated by Motivate (formerly Alta Bicycle Share), with former Metropolitan Transportation Authority CEO Jay Walder as chief executive until September 30, 2018, when the company was acquired by Lyft. The system's bikes and stations use technology from Lyft. 

**Source:** https://en.wikipedia.org/wiki/Citi_Bike

Citi Bike provides monthly reports of their service usage that can be obtained from https://citibikenyc.com/system-data or more specifcially https://s3.amazonaws.com/tripdata/index.html

In [None]:
!wget https://s3.amazonaws.com/tripdata/JC-202410-citibike-tripdata.csv.zip && unzip JC-202410-citibike-tripdata.csv.zip

In [50]:
import pandas as pd

df = pd.read_csv('JC-202410-citibike-tripdata.csv')
df

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,172DBBFC733F03CE,electric_bike,2024-10-10 14:54:24.572,2024-10-10 15:04:07.657,Oakland Ave,JC022,Stevens - River Ter & 6 St,HB602,40.737604,-74.052478,40.743133,-74.026989,member
1,D20BBA4860FE736C,electric_bike,2024-10-03 19:20:21.215,2024-10-03 19:31:46.511,Oakland Ave,JC022,Stevens - River Ter & 6 St,HB602,40.737604,-74.052478,40.743133,-74.026989,casual
2,86F89348995D0E6E,classic_bike,2024-10-20 12:14:56.318,2024-10-20 12:28:32.053,Oakland Ave,JC022,South Waterfront Walkway - Sinatra Dr & 1 St,HB103,40.737604,-74.052478,40.736982,-74.027781,casual
3,AA55A717B7EC1D10,classic_bike,2024-10-20 14:40:15.227,2024-10-20 14:55:39.100,Oakland Ave,JC022,Columbus Drive,JC014,40.737604,-74.052478,40.718355,-74.038914,member
4,C72953D91E986DA7,classic_bike,2024-10-20 08:37:03.280,2024-10-20 08:42:24.770,Brunswick & 6th,JC081,Washington St,JC098,40.726012,-74.050389,40.724294,-74.035483,member
...,...,...,...,...,...,...,...,...,...,...,...,...,...
118302,4FBD994F1648B362,electric_bike,2024-10-20 09:59:13.554,2024-10-20 10:08:13.706,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,casual
118303,204041F23ECD28F1,electric_bike,2024-10-02 18:02:16.294,2024-10-02 18:09:23.766,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,member
118304,49D3FDDA6932091C,classic_bike,2024-10-12 14:37:23.192,2024-10-12 14:45:56.088,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,member
118305,DABDE1AAF41AFACB,electric_bike,2024-10-20 13:31:30.895,2024-10-20 13:41:10.722,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,member


In [51]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118307 entries, 0 to 118306
Data columns (total 13 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   ride_id             118307 non-null  object 
 1   rideable_type       118307 non-null  object 
 2   started_at          118307 non-null  object 
 3   ended_at            118307 non-null  object 
 4   start_station_name  118305 non-null  object 
 5   start_station_id    118305 non-null  object 
 6   end_station_name    118052 non-null  object 
 7   end_station_id      118017 non-null  object 
 8   start_lat           118307 non-null  float64
 9   start_lng           118307 non-null  float64
 10  end_lat             118281 non-null  float64
 11  end_lng             118281 non-null  float64
 12  member_casual       118307 non-null  object 
dtypes: float64(4), object(9)
memory usage: 11.7+ MB


In [52]:
df.columns

Index(['ride_id', 'rideable_type', 'started_at', 'ended_at',
       'start_station_name', 'start_station_id', 'end_station_name',
       'end_station_id', 'start_lat', 'start_lng', 'end_lat', 'end_lng',
       'member_casual'],
      dtype='object')

In [53]:
df.head(n=10)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,172DBBFC733F03CE,electric_bike,2024-10-10 14:54:24.572,2024-10-10 15:04:07.657,Oakland Ave,JC022,Stevens - River Ter & 6 St,HB602,40.737604,-74.052478,40.743133,-74.026989,member
1,D20BBA4860FE736C,electric_bike,2024-10-03 19:20:21.215,2024-10-03 19:31:46.511,Oakland Ave,JC022,Stevens - River Ter & 6 St,HB602,40.737604,-74.052478,40.743133,-74.026989,casual
2,86F89348995D0E6E,classic_bike,2024-10-20 12:14:56.318,2024-10-20 12:28:32.053,Oakland Ave,JC022,South Waterfront Walkway - Sinatra Dr & 1 St,HB103,40.737604,-74.052478,40.736982,-74.027781,casual
3,AA55A717B7EC1D10,classic_bike,2024-10-20 14:40:15.227,2024-10-20 14:55:39.100,Oakland Ave,JC022,Columbus Drive,JC014,40.737604,-74.052478,40.718355,-74.038914,member
4,C72953D91E986DA7,classic_bike,2024-10-20 08:37:03.280,2024-10-20 08:42:24.770,Brunswick & 6th,JC081,Washington St,JC098,40.726012,-74.050389,40.724294,-74.035483,member
5,23A1827EA03A9AC2,electric_bike,2024-10-28 19:20:28.668,2024-10-28 19:25:28.448,Oakland Ave,JC022,Hoboken Ave at Monmouth St,JC105,40.737604,-74.052478,40.735208,-74.046964,member
6,6C0E882AE20AC640,electric_bike,2024-10-08 10:41:37.926,2024-10-08 10:44:23.533,Pershing Field,JC024,Leonard Gordon Park,JC080,40.742677,-74.051789,40.74591,-74.057271,member
7,FC4AEE485D39016D,electric_bike,2024-10-25 21:02:40.878,2024-10-25 21:11:00.117,Pershing Field,JC024,Leonard Gordon Park,JC080,40.742677,-74.051789,40.74591,-74.057271,member
8,3E4D96936A8660C6,electric_bike,2024-10-08 12:17:23.948,2024-10-08 12:20:47.035,City Hall - Washington St & 1 St,HB105,Stevens - River Ter & 6 St,HB602,40.73736,-74.03097,40.743133,-74.026989,member
9,9F2FBA8132468A3B,electric_bike,2024-10-28 11:00:40.004,2024-10-28 11:03:12.823,City Hall - Washington St & 1 St,HB105,Stevens - River Ter & 6 St,HB602,40.73736,-74.03097,40.743133,-74.026989,member


In [54]:
df.tail(n=10)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
118297,87D28FEEE090BDBC,classic_bike,2024-10-02 18:10:41.961,2024-10-02 18:19:22.370,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,casual
118298,69DDFA6D0CF78A55,electric_bike,2024-10-18 12:37:21.345,2024-10-18 12:46:29.991,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,casual
118299,29D3BBE6248000EE,electric_bike,2024-10-19 15:08:56.937,2024-10-19 15:19:36.286,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,casual
118300,74CBA308A0656463,classic_bike,2024-10-05 16:11:51.606,2024-10-05 16:21:49.360,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,casual
118301,7D68602BFA844363,classic_bike,2024-10-07 17:45:19.472,2024-10-07 17:53:24.086,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,casual
118302,4FBD994F1648B362,electric_bike,2024-10-20 09:59:13.554,2024-10-20 10:08:13.706,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,casual
118303,204041F23ECD28F1,electric_bike,2024-10-02 18:02:16.294,2024-10-02 18:09:23.766,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,member
118304,49D3FDDA6932091C,classic_bike,2024-10-12 14:37:23.192,2024-10-12 14:45:56.088,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,member
118305,DABDE1AAF41AFACB,electric_bike,2024-10-20 13:31:30.895,2024-10-20 13:41:10.722,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,member
118306,144E2FB38A17608C,electric_bike,2024-10-01 10:42:19.480,2024-10-01 10:58:21.839,Hoboken Ave at Monmouth St,JC105,River St & 1 St,HB609,40.735208,-74.046964,40.737215,-74.028865,member


In [55]:
df.describe() # Attention! Some methods do not always make sense!

Unnamed: 0,start_lat,start_lng,end_lat,end_lng
count,118307.0,118307.0,118281.0,118281.0
mean,40.732992,-74.039525,40.732966,-74.03925
std,0.012156,0.011791,0.012236,0.011877
min,40.706575,-74.086701,40.686371,-74.1
25%,40.72163,-74.044247,40.721525,-74.044247
50%,40.735938,-74.036486,40.735938,-74.035865
75%,40.742659,-74.030377,40.742659,-74.030377
max,40.75453,-74.02402,40.85168,-73.903865


In [56]:
df.loc[0]

ride_id                         172DBBFC733F03CE
rideable_type                      electric_bike
started_at               2024-10-10 14:54:24.572
ended_at                 2024-10-10 15:04:07.657
start_station_name                   Oakland Ave
start_station_id                           JC022
end_station_name      Stevens - River Ter & 6 St
end_station_id                             HB602
start_lat                              40.737604
start_lng                             -74.052478
end_lat                                40.743133
end_lng                               -74.026989
member_casual                             member
Name: 0, dtype: object

In [57]:
df.iloc[0]

ride_id                         172DBBFC733F03CE
rideable_type                      electric_bike
started_at               2024-10-10 14:54:24.572
ended_at                 2024-10-10 15:04:07.657
start_station_name                   Oakland Ave
start_station_id                           JC022
end_station_name      Stevens - River Ter & 6 St
end_station_id                             HB602
start_lat                              40.737604
start_lng                             -74.052478
end_lat                                40.743133
end_lng                               -74.026989
member_casual                             member
Name: 0, dtype: object

In [58]:
df.member_casual

0         member
1         casual
2         casual
3         member
4         member
           ...  
118302    casual
118303    member
118304    member
118305    member
118306    member
Name: member_casual, Length: 118307, dtype: object

In [59]:
df.sort_values(by='started_at').head(n=1)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
33419,0F1ED678EBA70EFD,classic_bike,2024-09-30 20:54:50.083,2024-10-01 18:59:02.214,Mama Johnson Field - 4 St & Jackson St,HB404,Adams St & 12 St,HB610,40.74314,-74.040041,40.751889,-74.033292,casual


## Lab assignments

### Comma Code

**Source:** https://automatetheboringstuff.com/2e/chapter4/

Say you have a list value like this:

```
spam = ['apples', 'bananas', 'tofu', 'cats']
``` 

Write a function that takes a list value as an argument and returns a string with all the items separated by a comma and a space, with and inserted before the last item. For example, passing the previous spam list to the function would return 'apples, bananas, tofu, and cats'. But your function should be able to work with any list value passed to it. Be sure to test the case where an empty list [] is passed to your function.

In [67]:
# Function that takes a list value as an argument and returns a string with all the items
def list_to_string(items):
    if not items:  # Check if the list is empty
        return ""
    elif len(items) == 1:  # if there is only one item in the list
        return items[0]
    else:  # More than one item in the list
        return ", ".join(items[:-1]) + " and " + items[-1]

# Test case, more than 1 item
spam = ['apples', 'bananas', 'tofu', 'cats']
print(list_to_string(spam))

# Test case, without item
empty_list = []
print(list_to_string(empty_list))

# Test case, only 1 item
single_item_list = ['apples']
print(list_to_string(single_item_list))

# Test case, more than 1 item
two_items_list = ['apples', 'bananas']
print(list_to_string(two_items_list))

apples, bananas, tofu and cats

apples
apples and bananas


### Coin Flip Streaks

**Source:** https://automatetheboringstuff.com/2e/chapter4/

For this exercise, we’ll try doing an experiment. If you flip a coin 100 times and write down an “H” for each heads and “T” for each tails, you’ll create a list that looks like “T T T T H H H H T T.” If you ask a human to make up 100 random coin flips, you’ll probably end up with alternating head-tail results like “H T H T H H T H T T,” which looks random (to humans), but isn’t mathematically random. A human will almost never write down a streak of six heads or six tails in a row, even though it is highly likely to happen in truly random coin flips. Humans are predictably bad at being random.

Write a program to find out how often a streak of six heads or a streak of six tails comes up in a randomly generated list of heads and tails. Your program breaks up the experiment into two parts: the first part generates a list of randomly selected 'heads' and 'tails' values, and the second part checks if there is a streak in it. Put all of this code in a loop that repeats the experiment 10,000 times so we can find out what percentage of the coin flips contains a streak of six heads or tails in a row. As a hint, the function call random.randint(0, 1) will return a 0 value 50% of the time and a 1 value the other 50% of the time.

You can start with the following template:

In [80]:
import random

def coin_flip():
    numberOfStreaks = 0

    for experimentNumber in range(10000):
        # Generate a list of 100 random coin flips ('H' for heads, 'T' for tails)
        coinFlips = [random.choice(['H', 'T']) for _ in range(100)]

        # Check for streaks of 6 in a row
        streak = 1
        for i in range(1, len(coinFlips)):
            if coinFlips[i] == coinFlips[i - 1]:
                streak += 1
                if streak == 6:  # Found a streak of 6
                    numberOfStreaks += 1
                    break
            else:
                streak = 1  # Reset streak counter

    # Calculate the chance of a streak
    chanceOfStreak = (numberOfStreaks / 10000) * 100
    return chanceOfStreak

# Call the function and print the result
print("Probability of a streak of 6 in a row in truly random sequences: %.2f%%" % coin_flip())

Probability of a streak of 6 in a row in truly random sequences: 81.50%


In [81]:
def coin_flip():
    numberOfStreaks = 0

    for experimentNumber in range(10000):
        # Generate a list of 100 random coin flips ('H' for heads, 'T' for tails)
        coinFlips = [random.choice(['H', 'T']) for _ in range(100)]

        # Check for streaks of 6 in a row
        streak = 1
        for i in range(1, len(coinFlips)):
            if coinFlips[i] == coinFlips[i - 1]:
                streak += 1
                if streak == 10:  # Found a streak of 10
                    numberOfStreaks += 1
                    break
            else:
                streak = 1  # Reset streak counter

    # Calculate the chance of a streak
    chanceOfStreak = (numberOfStreaks / 10000) * 100
    return chanceOfStreak

# Call the function and print the result
print("Probability of a streak of 10 in a row in truly random sequences: %.2f%%" % coin_flip())

Probability of a streak of 10 in a row in truly random sequences: 8.62%


### Back to the dungeon!

Remember [bashcrawl]() from the second lab exercise? Now we are going back the "dungeon setting" but **this time you will actually implement parts of the game!**

#### Fantasy Game Inventory

**Source:** https://automatetheboringstuff.com/2e/chapter5/

You are creating a fantasy video game. The data structure to model the player’s inventory will be a dictionary where the keys are string values describing the item in the inventory and the value is an integer value detailing how many of that item the player has. For example, the dictionary value {'rope': 1, 'torch': 6, 'gold coin': 42, 'dagger': 1, 'arrow': 12} means the player has 1 rope, 6 torches, 42 gold coins, and so on.

Write a function named displayInventory() that would take any possible “inventory” and display it like the following:

```
Inventory:
12 arrow
42 gold coin
1 rope
6 torch
1 dagger
Total number of items: 62
``` 
**Hint:** You can use a for loop to loop through all the keys in a dictionary.

You can start with the following template:

In [90]:
stuff = {'rope': 1, 'torch': 6, 'gold coin': 42, 'dagger': 1, 'arrow': 12}

def displayInventory(inventory):
    print("Inventory:")
    item_total = 0
    for k, v in inventory.items():
        print(f"{k} {v}")  # Display each item and its count
        item_total += v       # Add the count to the total
    print("Total number of items: " + str(item_total))

displayInventory(stuff)

Inventory:
rope 1
torch 6
gold coin 42
dagger 1
arrow 12
Total number of items: 62


#### List to Dictionary Function for Fantasy Game 

**Source:** https://automatetheboringstuff.com/2e/chapter5/

Imagine that a vanquished dragon’s loot is represented as a list of strings like this:

```
dragonLoot = ['gold coin', 'dagger', 'gold coin', 'gold coin', 'ruby']
```

Write a function named addToInventory(inventory, addedItems), where the inventory parameter is a dictionary representing the player’s inventory (like in the previous project) and the addedItems parameter is a list like dragonLoot. The addToInventory() function should return a dictionary that represents the updated inventory. Note that the addedItems list can contain multiples of the same item. 

The previous program (with your displayInventory() function from the previous project) would output the following:

```
Inventory:
45 gold coin
1 rope
1 ruby
1 dagger

Total number of items: 48
```

Your code could look something like this:


In [97]:
def addToInventory(inventory, addedItems):
    # Loop through each item in the addedItems list
    for item in addedItems:
        # If the item exists in the inventory, increase its count
        if item in inventory:
            inventory[item] += 1
        else:
            # Otherwise, add the item to the inventory with a count of 1
            inventory[item] = 1
    return inventory

def displayInventory(inventory):
    print("Inventory:")
    item_total = 0
    for item, count in inventory.items():
        print(f"{count} {item}")
        item_total += count
    print("Total number of items: " + str(item_total))

# Initial inventory and loot
inv = {'gold coin': 42, 'rope': 1}
dragonLoot = ['gold coin', 'dagger', 'gold coin', 'gold coin', 'ruby']

# Update inventory with the loot
inv = addToInventory(inv, dragonLoot)

# Display the updated inventory
displayInventory(inv)

Inventory:
45 gold coin
1 rope
1 dagger
1 ruby
Total number of items: 48
