### Learning Objectives

By the end of this class, you should be able to:

1. Understand common Python data structures.
2. Apply CRUD operations to data structures.
3. Recognize use cases for data structures in data science.

CRUD

Create
Read
Update
Delete

## 1. Lists <a id="lists"></a>

In Python, lists are versatile and commonly used to store collections of items. You can create lists using square brackets `[]`. you can also use the list function: list()

lists are mutable, ordered, typically homogenous.

### Creating a Python List <a id="lists-create"></a>

In [34]:
# creating a list using square brackets
shopping_list = ['Groceries', 'Cocacola', 'Milo', 'Bama', 'Floating Berries']

In [35]:
shopping_list

['Groceries', 'Cocacola', 'Milo', 'Bama', 'Floating Berries']

In [36]:
# create a list using the list function
shopping_list2 = list(('Perfume', 'Soap', 'Tissue paper', 'Body cream'))

In [37]:
shopping_list2

['Perfume', 'Soap', 'Tissue paper', 'Body cream']

In [38]:
type(shopping_list2)

list

In [39]:
# create a list using the list function
shopping_list3 = list(['Perfume', 'Soap', 'Tissue paper', 'Body cream'])

In [40]:
shopping_list3

['Perfume', 'Soap', 'Tissue paper', 'Body cream']

In [41]:
shopping_list4 = []

### Reading a List <a id="lists-read"></a>

In [42]:
print(shopping_list3[0])

Perfume


In [43]:
# slicing a list: first two items

print(shopping_list3[0:2])

['Perfume', 'Soap']


In [44]:
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [45]:
numbers[::2]

[0, 2, 4, 6, 8]

In [46]:
numbers[-1]

9

### Updating a List <a id="lists-update"></a>

In [47]:
dir(shopping_list4)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [48]:
shopping_list4.append('Soap')

In [49]:
shopping_list4.append('Perfume')

In [50]:
print(shopping_list4)

['Soap', 'Perfume']


In [51]:
shopping_list4.insert(0, 'Tissue Paper')

In [52]:
print(shopping_list4)

['Tissue Paper', 'Soap', 'Perfume']


In [53]:
shopping_list4.extend(['Groceries', 'Cocacola', 'Milo', 'Bama', 'Floating Berries'])

In [54]:
print(shopping_list4)

['Tissue Paper', 'Soap', 'Perfume', 'Groceries', 'Cocacola', 'Milo', 'Bama', 'Floating Berries']


In [55]:
# removing items from a list: remove, pop

In [56]:
shopping_list4.pop()

'Floating Berries'

In [57]:
print(shopping_list4)

['Tissue Paper', 'Soap', 'Perfume', 'Groceries', 'Cocacola', 'Milo', 'Bama']


In [58]:
shopping_list4.remove('Soap')

In [59]:
print(shopping_list4)

['Tissue Paper', 'Perfume', 'Groceries', 'Cocacola', 'Milo', 'Bama']


In [60]:
shopping_list.update

AttributeError: 'list' object has no attribute 'update'

### Delete

In [None]:
shopping_list4.clear()

In [None]:
print(shopping_list4)

[]


### List: Use Cases in Data Science <a id="lists-use-cases"></a>

### Use Case: Storing and Manipulating Datasets

#### Background

In data science, we often work with datasets that contain various types of information, such as numerical values, text data, or categorical variables. Python lists provide a convenient way to store and manipulate these datasets.

#### Scenario

Suppose we have collected data on the monthly sales revenue (in thousands of dollars) for a small retail store over the past year. The dataset includes sales data for each month. We want to use a Python list to store and analyze this dataset.

#### Implementation

Let's create a Python list called `monthly_sales` to store our dataset. Each element in the list represents the monthly sales revenue for one month, starting from January (index 0) to December (index 11).

```python
# Creating a list to store monthly sales data
monthly_sales = [120, 135, 150, 145, 155, 160, 170, 180, 190, 200, 210, 220]


In [None]:
monthly_sales = [120, 135, 150, 145, 155, 160, 170, 180, 190, 200, 210, 220]

In [None]:
total_yearly_revenue = sum(monthly_sales)

In [None]:
print(total_yearly_revenue)

2035


In [None]:
# avg = total/no. of total

average_revenue = total_yearly_revenue / len(monthly_sales)

In [None]:
print(round(average_revenue, 1))

169.6


### Assignment

1. From the list (monthly_sales) above,
- create four seperate lists frm the list for each quarter namely: q1, q2, q3, q4.
- report what quarter had the highest amount in total revenue,
- report the lowest.

In [None]:
# Your code here

2. Store the names of a few of your friends in a list called names.
- Print each person’s name by accessing each element in the list, one at a time.
- Start with the list you created, but instead of just printing each person’s name, print a message to them. The text of each message should be the same, but each message should be personalized with the person’s name. hint(use may use f-strings)

In [None]:
# Your code here

3. You are managing a list of students' names in your class.
* Create a list called student_names containing the following student names as strings: 'Alice', 'Bob', 'Charlie', 'David', 'Eve'.
- Use the student_names list from the list above for the following operations:
  - a. Print the number of students in the class (the length of the list).
  - b. Print the first and last student names in the list.
  - c. Charlie has left the class. Remove his name from the list.
  - d. A new student, "Frank," has joined the class. Add his name to the list.
  - e. The class has been dissolved, and you need to remove all student names from the list.

In [None]:
# Your code here

## 2. Tuples <a id="tuples"></a>

### Create <a id="sets-create"></a>

Creating a tuple is done by defining a sequence of elements enclosed in parentheses.

In [None]:
person = ('Sunday', 'Data Science', 'Fall Session')

In [None]:
person2 = 'Victoria', 'Data Science', 'Fall Session'

In [None]:
print(type(person))

<class 'tuple'>


In [None]:
type(person2)

tuple

In [None]:
# create a tuple using the tuple function

person_tuple = tuple(['Victoria', 'Data Science', 'Fall Session'])

In [None]:
person_tuple

('Victoria', 'Data Science', 'Fall Session')

### Read <a id="sets-read"></a>

Reading from a tuple is straightforward by accessing its elements using indexing.

In [None]:
person2[0]

'Victoria'

In [None]:
person2[:2]

('Victoria', 'Data Science')

### Update <a id="sets-update"></a>

Tuples are immutable, so you cannot directly update them. To "update," you need to create a new tuple.

In [None]:
person3 = ('Chibuike', person[1], person[2])

In [None]:
person3

('Chibuike', 'Data Science', 'Fall Session')

### Delete <a id="sets-delete"></a>

Like sets, you can delete a tuple using the del statement.

In [None]:
del person

In [None]:
person

### Tuples: Use Cases in Data Science <a id="tuples-use-cases"></a>

- Immutability: Tuples are immutable, which means their values cannot be changed after creation. They are suitable for representing data that should not be modified, like record data.
- Data Integrity: Tuples can be used to ensure data integrity, as their immutability prevents accidental changes.
- Multiple Values: Tuples are often used to store multiple related values together, such as (latitude, longitude) pairs for geographical data.
- Function Return Values: Tuples are frequently used to return multiple values from functions, making it easy to process and unpack results.

## 3. Sets <a id="sets"></a>

Sets are a great data type that is used for pretty specific situations. You will find sets most useful for
de-duplicating lists or tuples or by using them to find differences between multiple lists

### Create <a id="sets-create"></a>

In [None]:
# using curly braces
my_set = {'a', 'b', 'b', 'c', 'd', 'd'}

In [None]:
print(my_set)

{'b', 'a', 'd', 'c'}


In [None]:
# using the set function
person_list = ['Chibuike', 'Data Science', 'Fall Session']
person_set = set(person_list)

print(person_set)

{'Chibuike', 'Data Science', 'Fall Session'}


### Read <a id="sets-read"></a>

You can access items in sets much faster than lists. A Python list will iterate over each item in a
list until it finds the item you are looking for. When you look for an item in a set, it acts much like
a dictionary and will find it immediately or not at all.

In [None]:
# using the in operator
'Fall Session' in person_set

True

In [None]:
# can we use indexing to get an item from the set?

person_set[0] # sets are unordered.

TypeError: 'set' object is not subscriptable

### Update <a id="sets-update"></a>

In [None]:
# using the add method to update a set
person_set.add('First Semester')

In [None]:
person_set

{'Chibuike', 'Data Science', 'Fall Session', 'First Semester'}

In [None]:
# using the update method to add multiple items
person_set.update(['Circle 113', 'Python'])

In [None]:
print(person_set)

{'First Semester', 'Chibuike', 'Circle 113', 'Data Science', 'Python', 'Fall Session'}


In [None]:
# using the remove method
person_set.remove('Python')

In [None]:
print(person_set)

{'First Semester', 'Chibuike', 'Circle 113', 'Data Science', 'Fall Session'}


In [None]:
# using pop and discard method
person_set.discard('Circle 113')

In [None]:
print(person_set)

{'First Semester', 'Chibuike', 'Data Science', 'Fall Session'}


In [None]:
person_set.pop()

'First Semester'

### Delete <a id="sets-delete"></a>

In [None]:
# using the del keyword deletes the set
del person_set

In [None]:
person_set

NameError: name 'person_set' is not defined

In [None]:
# using the clear method empties out the set
my_set.clear()

In [None]:
my_set

set()

In [None]:
my_set.add('4')

In [None]:
my_set = {'Toyin', 'Data Science'}

In [None]:
my_set

{'Data Science', 'Toyin'}

### Set operations <a id="sets-operate"></a>

- union() - Combines two sets and returns a new set
- intersection() - Returns a new set with the elements that are common between the two sets
- difference() - Returns a new set with elements that are not in the other set

In [None]:
first_set = {'python', 'ruby', 'javascript', 'pandas', 'numpy'}
second_set = {'pandas', 'numpy', 'seaborn'}

In [None]:
first_set.union(second_set)

{'javascript', 'numpy', 'pandas', 'python', 'ruby', 'seaborn'}

In [None]:
first_list = list(first_set)

In [None]:
first_list[0] = 'keras'

In [None]:
first_list

['keras', 'javascript', 'pandas', 'numpy', 'ruby']

In [None]:
first_tuple = tuple(first_list)

In [None]:
first_set = set(first_tuple)

In [None]:
first_set

{'javascript', 'keras', 'numpy', 'pandas', 'ruby'}

In [None]:
second_set

{'numpy', 'pandas', 'seaborn'}

In [None]:
first_set.intersection(second_set)

{'numpy', 'pandas'}

In [None]:
first_set.difference(second_set)

{'javascript', 'keras', 'ruby'}

### Sets: Use Cases in Data Science <a id="sets-use-cases"></a>

- Removing Duplicates: Sets are unordered collections of unique elements. They are often used to remove duplicates from data, ensuring that only unique values are retained.

In [None]:
data = [1, 2, 2, 3, 4, 4, 5]
unique_data = set(data)

- Membership Testing: Sets are efficient for checking whether an element is present in a dataset or not. This can be useful for filtering and data validation.

- Set Operations: Sets support set operations like union, intersection, and difference, which are valuable for data manipulation and analysis.

## 4. Dictionaries <a id="dictionaries"></a>

Dictionaries are another fundamental data type in Python. A dictionary is a (key, value) pair. Some
programming languages refer to them as hash tables. They are described as a mapping that maps
keys (hashable objects) to values (any object). Immutable objects are hashable (immutable means
unable to change).

### Creating a dictionary <a id="dictionaries-create"></a>

In [None]:
# using curly braces and inserting key, value pairs
student = {'firstname': 'Joshua',
           'lastname': 'Ohayi',
           'cirle': 113}


In [None]:
type(student)

dict

In [None]:
# using the dict function and passing a list of tuples
student2 = dict([('firstname', 'Akin-Johnson'), ('Cirle', 117), ('lastname', 'Oluwamayowa')])

In [None]:
student2

{'firstname': 'Akin-Johnson', 'Cirle': 117, 'lastname': 'Oluwamayowa'}

### Reading through a Dictionary <a id="dictionaries-read"></a>

In [None]:
# using the key
student['firstname']

'Joshua'

In [None]:
# using the in keyword
'firstname' in student

True

In [None]:
'fav_food' in student

False

In [None]:
# checking all items :dict.items()
student.items()

dict_items([('firstname', 'Joshua'), ('lastname', 'Ohayi'), ('cirle', 113)])

In [None]:
# checking all values :dict.values()
student.values()

dict_values(['Joshua', 'Ohayi', 113])

In [None]:
# using the get method to retrieve the value of a key in a dictionary
# returns 'Not Available' if key is not in the dictionary else it returns the value.
student.get('fav_drink', 'Not Availble')

'Not Availble'

### Updating a Dictionary <a id="dictionaries-update"></a>

In [None]:
# re assign the value by accessing the key if the key isnt available, a new k:v pair is created
student['fav_food'] = 'Groceries'

In [None]:
student

{'lastname': 'Ohayi', 'alias': 'Joshua'}

In [None]:
# using the update method.

In [None]:
# using the pop method to remove a k:v pair
student.pop('cirle')

113

In [None]:
student

{'firstname': 'Joshua', 'lastname': 'Ohayi', 'fav_food': 'Groceries'}

In [None]:
# using the popitem method
student.popitem()

('fav_food', 'Groceries')

In [None]:
old_key = 'firstname'
new_key = 'alias'

student[new_key] = student.pop(old_key)


NameError: name 'student' is not defined

In [None]:
student

{'lastname': 'Ohayi', 'alias': 'Joshua'}

### Deleting a Dictionary <a id="dictionaries-delete"></a>

In [None]:
del student['circle']

KeyError: 'circle'

In [None]:
student

{'lastname': 'Ohayi', 'alias': 'Joshua'}

In [None]:
revenue = {'months':['jan', 'feb', 'mar',
                    'apr', 'may','jun', 'jul',
                    'aug', 'sept', 'oct',
                    'nov', 'dec'],
          'monthly_sales': [120, 135, 150, 145, 155, 160, 170, 180, 190, 200, 210, 220] }

In [None]:
revenue

{'months': ['jan',
  'feb',
  'mar',
  'apr',
  'may',
  'jun',
  'jul',
  'aug',
  'sept',
  'oct',
  'nov',
  'dec'],
 'monthly_sales': [120, 135, 150, 145, 155, 160, 170, 180, 190, 200, 210, 220]}

### Dictionaries: Use Cases in Data Science <a id="dictionaries-use-cases"></a>

## 5. DataFrames

2-dimensional structures for handling tabular data

In [None]:
import pandas as pd

In [None]:
revenue_df = pd.DataFrame(revenue)

In [None]:
revenue_df

Unnamed: 0,months,monthly_sales
0,jan,120
1,feb,135
2,mar,150
3,apr,145
4,may,155
5,jun,160
6,jul,170
7,aug,180
8,sept,190
9,oct,200


## Exercises

1. a) Create a dictionary representing a library catalog. The keys should be book titles, and the values should be the corresponding authors. Allow the user to input a book title and display the author's name.

   b) Extend the library catalog program to allow users to add new books to the catalog by providing the title and author.

In [None]:
libraRy = {"Jane Eyre": "Charlotte Brontë", "Wuthering Heights": "Emily Brontë", "Moby-Dick": "Herman Melville", "Beloved": "Toni Morrison"}

libraRy["The Color Purple"] = "Alice Walker"

libraRy

{'Jane Eyre': 'Charlotte Brontë',
 'Wuthering Heights': 'Emily Brontë',
 'Moby-Dick': 'Herman Melville',
 'Beloved': 'Toni Morrison',
 'The Color Purple': 'Alice Walker'}

In [None]:
#Giving the users a chance to add four books to the library
for i in range(4):
    book = input("Enter a book name")
    author = input("Enter the author's name")
    libraRy[book] = author
    

In [None]:
libraRy

{'Jane Eyre': 'Charlotte Brontë',
 'Wuthering Heights': 'Emily Brontë',
 'Moby-Dick': 'Herman Melville',
 'Beloved': 'Toni Morrison',
 'The Color Purple': 'Alice Walker'}

2. Create a dictionary that represents a dataset of 10 students' and thier test scores (4 courses). The keys should be dictionaries bearing student names, in the nested dictionary, let the keys represent a course and the value a score for that course. Allow the user to input a student's name and display their average test score.

   Create a DataFrame from the student test scores dictionary using pandas. Display the DataFrame to the user.

In [None]:
studentTestScores = {'Joshua Ohayi': {'Mathematics': 25, 'English': 18, 'Chemistry': 22, 'Physics': 25}, 'Fidelia Achi': {'Mathematics': 13, 
                        'English': 26, 'Literature': 25, 'Government': 27}, 'Caleb Nelson': {'Mathematics': 22, 'English': 23, 'Literature': 23, 'Government': 25}, 
                        'Precious Achi': {'Mathematics': 10, 'English': 18, 'Commerce': 13, 'Economics': 20}, 'Oreoluwa Kosi': {'Mathematics': 28, 'English': 27, 'Chemistry': 29, 'Physics': 30},
                        'Abiola Rhema': {'Mathematics': 25, 'English': 18, 'Literature': 22, 'Physics': 25}, 'Chinonso Kinsley': {'Mathematics': 28, 'English': 29, 'Commerce': 25, 'Economics': 30},
                        'Wilson Ayeni': {'Mathematics': 24, 'English': 18, 'Literature': 22, 'Government': 25}, 'Chinanza Gideon': {'Mathematics': 21, 'English': 18, 'Chemistry': 12, 'Physics': 25},
                        'Tolu Dada': {'Mathematics': 25, 'English': 28, 'Chemistry': 22, 'Physics': 25}}

In [None]:
userInput = input('Enter a student name').lower()
for key, value in studentTestScores.items():
    sum = 0
    for course, score in value.items():
        sum = sum + score
    if key.lower() == userInput:    
        print(f'Average score for {key}: {sum/4}')
    studentTestScores[key].update({'Average': sum/4})

Average score for Abiola Rhema: 22.5


In [None]:
studentTestScores

{'Joshua Ohayi': {'Mathematics': 25,
  'English': 18,
  'Chemistry': 22,
  'Physics': 25,
  'Average': 22.5},
 'Fidelia Achi': {'Mathematics': 13,
  'English': 26,
  'Literature': 25,
  'Government': 27,
  'Average': 22.75},
 'Caleb Nelson': {'Mathematics': 22,
  'English': 23,
  'Literature': 23,
  'Government': 25,
  'Average': 23.25},
 'Precious Achi': {'Mathematics': 10,
  'English': 18,
  'Commerce': 13,
  'Economics': 20,
  'Average': 15.25},
 'Oreoluwa Kosi': {'Mathematics': 28,
  'English': 27,
  'Chemistry': 29,
  'Physics': 30,
  'Average': 28.5},
 'Abiola Rhema': {'Mathematics': 25,
  'English': 18,
  'Literature': 22,
  'Physics': 25,
  'Average': 22.5},
 'Chinonso Kinsley': {'Mathematics': 28,
  'English': 29,
  'Commerce': 25,
  'Economics': 30,
  'Average': 28.0},
 'Wilson Ayeni': {'Mathematics': 24,
  'English': 18,
  'Literature': 22,
  'Government': 25,
  'Average': 22.25},
 'Chinanza Gideon': {'Mathematics': 21,
  'English': 18,
  'Chemistry': 12,
  'Physics': 25,
  

In [None]:
import pandas as pd

studentTestScores_df = pd.DataFrame(studentTestScores)
studentTestScores_df

Unnamed: 0,Joshua Ohayi,Fidelia Achi,Caleb Nelson,Precious Achi,Oreoluwa Kosi,Abiola Rhema,Chinonso Kinsley,Wilson Ayeni,Chinanza Gideon,Tolu Dada
Mathematics,25.0,13.0,22.0,10.0,28.0,25.0,28.0,24.0,21.0,25.0
English,18.0,26.0,23.0,18.0,27.0,18.0,29.0,18.0,18.0,28.0
Chemistry,22.0,,,,29.0,,,,12.0,22.0
Physics,25.0,,,,30.0,25.0,,,25.0,25.0
Average,22.5,22.75,23.25,15.25,28.5,22.5,28.0,22.25,19.0,25.0
Literature,,25.0,23.0,,,22.0,,22.0,,
Government,,27.0,25.0,,,,,25.0,,
Commerce,,,,13.0,,,25.0,,,
Economics,,,,20.0,,,30.0,,,


In [None]:
studentTestScores_df.isnull().sum()

Joshua Ohayi        4
Fidelia Achi        4
Caleb Nelson        4
Precious Achi       4
Oreoluwa Kosi       4
Abiola Rhema        4
Chinonso Kinsley    4
Wilson Ayeni        4
Chinanza Gideon     4
Tolu Dada           4
dtype: int64

3. Create a dictionary that represents an online store's inventory. The keys should be product names, and the values should be tuples containing the price and available quantity. Allow the user to input a product name, and display its price and availability.

    Extend the online store program to allow users to "purchase" products by specifying the product name and the quantity they want to buy. Update the inventory accordingly and display the total cost of the purchase.

In [68]:
invenTory = {"lip balm": (800, 10), 'biscuits': (200, 20), 'Food stuffs': (10000, 100), 'shoes': (20000, 100)}


In [69]:
def printInvenTory():
    i = 0
    for product, priceQuantity in invenTory.items():
        if i == 0:
            print(f"product: {product}\nprice: ${priceQuantity[0]} per piece, {priceQuantity[1]} pieces available\n")
        elif i == 1:
            print(f"product: {product}\nprice: ${priceQuantity[0]} per piece, {priceQuantity[1]} pieces available\n")
        elif i == 2:
            print(f"product: {product}\nprice: ${priceQuantity[0]} per pack, {priceQuantity[1]} packs available\n")
        elif i == 3:
            print(f"product: {product}\nprice: ${priceQuantity[0]} per pair, {priceQuantity[1]} pairs available\n")
        i = i + 1

printInvenTory()


product: lip balm
price: $800 per piece, 10 pieces available

product: biscuits
price: $200 per piece, 20 pieces available

product: Food stuffs
price: $10000 per pack, 100 packs available

product: shoes
price: $20000 per pair, 100 pairs available



In [71]:
purchase = input("Enter the product you want to buy")
quantity = int(input("Enter quantity you want to buy"))
i = 0
for product, priceQuantity in invenTory.items():
    availability = priceQuantity[1] - quantity
    if product.lower() == purchase.lower() and availability >= 0:
        print(f'You purchased {product}, quantity: {quantity}\n')
        invenTory[product] = (priceQuantity[0], priceQuantity[1] - quantity)
        print(f"Available products in inventory\n")
        printInvenTory()
        break
    elif priceQuantity[1] < 0:
        print(f'{product} is exhausted come back another time.')
        break
    elif i == 3:
        print(f'Product not found in inventory.')
    i = i + 1

Product not found in inventory.


4. Create an empty dictionary to represent a glossary. Allow users to add new words and their definitions to the glossary.

    Implement a feature that lets users search for the definition of a word. If the word is not in the glossary, inform the user that the word is not found.

In [24]:
glossary = {}

[1, 2]

['__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__ror__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']