# Data Manipulation in Python

## Objectives

- Extract data from nested data structures
- Write functions to transform data
- Construct list and dictionary comprehensions

## First - Let's Review

Let's practice different forms of python data manipulation... with bento boxes!

<img src='https://cdn.shopify.com/s/files/1/1610/3863/articles/What_is_Shokado_Bento_Box_a_Classic-Style_Bento_Box_Originated_from_Japanese_Kaiseki_Cuisine_1_1600x.jpg' alt='bento box image, image source: https://www.globalkitchenjapan.com/blogs/articles/would-you-like-to-have-one-for-special-occasions' width=600>

## Python Lists

### List Methods

Here are a few common list methods:

- `.append()`: adds the input element to the end of a list
- `.pop()`: removes and returns the element with input index from the list
- `.extend()`: adds the elements in the input iterable to the end of a list
- `.index()`: returns the first place in a list where the argument is found
- `.remove()`: removes element by value
- `.count()`: returns the number of occurrences of the input element in a list

Let's practice with a few!

### Create our Bento Box!

Let's make a list, called `bento`, that captures some ingredients we'd like to see in our bento box lunch.

Some common ingredients for bento boxes include: chicken, fish, katsu curry, tofu, gyoza, edamame, salad, pickled cucumbers, boiled egg, broccoli, rice, udon noodles, yakisoba ... and these are just in more traditional bento boxes! The best part about a bento is that you can combine any lunch ingredients you like.

Pick 5 things you'd like in your bento, and put those in your `bento` list.

In [1]:
# Create your bento list
bento = ['salmon', 'rice', 'edamame', 'seaweed salad', 'dumplings']

Lists are ordered, meaning you can access the index number for an element:

In [2]:
type(bento)

In [3]:
len(bento)

In [4]:
# Run this cell without changes
bento[4]

In [5]:
# Try to get the last entry
bento[-2]

In [6]:
bento[5]

IndexError: list index out of range

Or you can grab ranges/slices of a list:

In [None]:
# Run this cell without changes
# Play around with these numbers, and start to build some understanding of 
# which elements are where exactly in the list
bento[2:]

In [None]:
bento

In [None]:
bento[:3]

Add items to a list with `.append()` - add something else you like to your bento!

In [None]:
# Code here to add to your list
bento.append('kimchi')

In [None]:
bento

If you don't want to keep that last item, you can use `.pop()` to remove it.

In [None]:
# Code here to test that out
bento.pop()

In [None]:
# Now check what your list looks like - is that last item still there?
bento

In [None]:
bento.remove('kimchi')

In [None]:
bento

Now, let's put our bento box in a readable format using `join`:

In [None]:
bento[:-1]

In [None]:
# Pay attention to what the .join is doing
print("I'd like my bento to contain: " + ", ".join(bento[:-1]) + ", and " + bento[-1])

**Neat trick!** F-strings allow you to easily format strings to add variables or elements from an iterable (like a list). You can also use `.format()` in a similar way.

In [None]:
# F-string formatting easier!
print(f"My bento box will include: {', '.join(bento[:-1])}, and {bento[1]}.")

In [None]:
print(f"My bento box will include: {bento[0]} and {bento[1]}.")

In [None]:
# The above cell is the same as:
print("My bento box will include: {} and {}.".format(bento[0], bento[1]))

**Think about it:** How is the f-string/`format` working differently from the `join` we did before?

- If we use string formatting we don't need to add seperate strings together 


## For Loops

Now let's say we want to capitalize each ingredient in our bento box. How could we do this without editing each one individually?

Well - to go over an iterable, like a list, we can use for loops!

In [None]:
text_str = 'some string here'
text_str.title()

In [None]:
# Write a for loop to capitalize each ingredient in our bento list
# for x in interable:
#     f(X)
for item in bento:
    print(item.title())

In [None]:
bento.title()

In [None]:
item

In [None]:
bento

We can add conditionals to our loops as well! Let's create a new list, called `s_bento`, that contains only ingredients that have the letter `s` in them.

(don't have any ingredients with `s`? feel free to use another letter!)

In [None]:
# Write your for loop with a conditional

# Need to first define an empty list to become our new list
s_bento = [] 
r_bento = []
other_bento = []
# Now our loop
for ingredient in bento:
    if 's' in ingredient:
        s_bento.append(ingredient)
    if 'r' in ingredient:
        r_bento.append(ingredient)
    else:
        other_bento.append(ingredient)

In [None]:
# Check your work
s_bento

In [None]:
r_bento

In [None]:
other_bento

### List Comprehension

**Neat trick!** You can write one-line for loops!

List comprehensions are especially useful if you'd like to loop over something and output a new list - just like we did above!

The syntax is: `[f(x) for x in <iterable> if <condition>]`

In [None]:
# Change our loop to a list comprehension
s_bento = [ingredient for ingredient in bento if 's' in ingredient]
s_bento

In [None]:
# We could do the same with our earlier capitalization, too!
[ingredient.title() for ingredient in bento]

Do you _need_ to use list comprehension for this? Nope! But list comprehensions are more efficient: The syntax is simpler, and they're also faster. Also, you'll see them in other people's code, so you'll have to know how to work with them!

## Python Dictionaries

<img src='https://images.pexels.com/photos/270233/pexels-photo-270233.jpeg?auto=compress&cs=tinysrgb&dpr=2&w=500' alt='picture of a dictionary page' width=600>

No, not that kind! 

With your list above, someone would need to tell you that "rice" is the main and "salmon" is the protein. 

Dictionaries let you assign **key** and **value** pairs, which connects a key like "main" to a value like "rice". Rather than using **indexing**, you use **keys** to return values.

## Dictionary Methods

Make sure you're comfortable with the following dictionary methods:

- `.keys()`: returns an array of the dictionary's keys
- `.values()`: returns an array of the dictionary's values
- `.items()`: returns an array of key-value tuples
- `.get()`: returns value of a specific key - better than dict['key']

Update your bento box to be a dictionary, called `bento_dict`. There are multiple ways to do this, but let's showcase how you can use your list and a new list of keys to zip your bento box together.

In [None]:
new_dict = {'key': 'value', 'key2': 'value2'}

In [None]:
# Here's an example of zipping two lists together to form a dictionary
example_bento_keys = ["ingredient1", "ingredient2", "ingredient3"]
example_bento_values = ["rice", "tempura", "miso soup"]

example_bento_dict = dict(zip(example_bento_keys, example_bento_values))

print(example_bento_dict)
print(type(example_bento_dict))

In [None]:
# Now let's do that! What does our current list look like?
bento

In [None]:
# Let's define keys for our bento
bento_keys = ['protein', 'main', 'vegetable1', 'vegetable2', 'side']

In [None]:
# Now create your bento_dict!
bento_dict = dict(zip(bento_keys, bento))

In [None]:
# Code here to check your work - check type, and print your dictionary
print(type(bento_dict))

print(bento_dict)

You use the key of the dictionary to access its value, for example `bento_box['main']` 

In [None]:
bento_dict['protein']

In [None]:
bento_dict[0]

In [None]:
dict1 = {'key1': 20, 'key2': 30}

In [None]:
bracket_way = dict1['key3']
type(bracket_way)

In [None]:
# Potentially better way because it returns None rather than an error
#bracket_way = dict1['key3']
get_way = dict1.get('key3')
type(get_way)

In [None]:
get_way

In [None]:
bracket_way

Let's practice more loops - write a loop that prints the ingredient value, if `vegetable` is in the key.

In [None]:
bento_dict.items()

In [None]:
# Write your loop using .items() to unpack key, value pairs
for key, value in bento_dict.items():
    if 'vegetable' in key:
        print(value)

In [None]:
for x in bento_dict:
    print(x, bento_dict[x])

Now let's make a new dictionary, `veggie_dict`, that contains the number of the vegetable as the key and the capitalized ingredient as the value.

In [None]:
# Need to first define an empty dictionary to become our new dict
veggie_dict = {}

for key, value in bento_dict.items():
    if 'vegetable' in key:
        number = key[-1]
        veggie_dict[number] = value.title()

In [None]:
# Check your work!
veggie_dict

### Dictionary Comprehension

Guess what! Just like there's list comprehension to write one-line for loops to output a list, there's the same for dictionaries! This can allow you to take one dictionary and transform it into another.

The syntax is: `{f(key):f(value) for (key,value) in <dictonary>.items() if <condition>}`


In [None]:
# Change our loop to a dictionary comprehension

{k[-1]: v.title() for k, v in bento_dict.items() if 'vegetable' in k}

In [None]:
bento_dict.values()

In [None]:
# You can get creative with it too!
{f"Ingredient {x+1}": list(bento_dict.values())[x] for x in range(len((bento_dict.values())))}

## Nesting

![Dictionaries inside dictionaries](https://i.imgflip.com/3orgly.jpg)

Let's say we want to combine EVERYONE'S bento dictionaries - we can nest those dictonaries as a dictionary of dictionaries! That way, the key can be the person's name, and the value can be the dictionary that contains their bento box order.
 
Grab at least two other orders and create a dictionary of dictionaries, where the key is the name of the person ordering and the value is their bento dictionary:

In [None]:
# Can go ahead and paste at least two other dictionaries
james_bento = {
    'main': 'cheeseburger',
    'cheese': 'pepper jack',
    'side': 'french fries',
    'vegetable1': 'pickles',
    'vegetable2': 'onions',
    'drink': 'milkshake'}

hannah_bento = {
    "main": "salad",
    "protein": "tempura shrimp",
    "vegetable1": "radishes",
    "vegetable2": "cucumbers",
    "side": "tuna roll"}

In [None]:
# Code here to create your nested dictionaries
group_dict = {'Daniel': bento_dict, 'James': james_bento, 'Hannah': hannah_bento}

In [None]:
# Check your work
group_dict

In [None]:
group_dict.values()

Now, if we wanted a list of people who ordered bento boxes, we could grab a list of those names by using `.keys()`

In [None]:
# Code here to grab a list of who you have orders for
group = list(group_dict.keys())
group

In [None]:
# Check your work
type(group)

How would we access one of these `main`s?

In [None]:
# Access one dictionary's main
group_dict['James']['main']

In [None]:
group_dict.get('James').get('main')

Now let's write a loop to print the main ingredient in everyone's bento order. 

(This is easier if everyone named an ingredient `main` in their dictionary...)

In [None]:
list(group_dict.values())[0]

In [None]:
# Code here to write a for loop that prints each main
# Think about what we are looping through and if you need .items()
for order in group_dict.values():
    print(order['main'])

Just as we can put lists and dictionaries inside of other lists and dictionaries, we can also put comprehensions inside of other comprehensions!

In [None]:
# An example of nested comprehensions
{f"{name}'s vegetables": [v for k, v in order.items() if 'vegetable' in k]
 for name, order in group_dict.items()}

In [None]:
# But remember ... it's okay to easier to write this out as a for loop
# THEN you can condense into a comprehension more easily!

group_veggie_dict = {}

for name, order in group_dict.items():
    ingredient_list = []    
    for key, ingredient in order.items():        
        if 'vegetable' in key:
            ingredient_list.append(ingredient)
    group_veggie_dict[f"{name}'s vegetables"] = ingredient_list
    
# Check it
group_veggie_dict

## Functions

This aspect of Python is _incredibly_ useful! Writing your own functions can save you a TON of work - by _automating_ it.

### Creating Functions

The first line will read:

```python

'def function_name():'

```

Any arguments to the function will go in the parentheses, and you can set default arguments in those parentheses as well.

Let's write a function that will take in both a nested dictionary of bento orders and the name of an ingredient type, which outputs a tuple with each person's name and the ingredients that match that type!

In [None]:
def find_ingredients(nested_dict, ingredient_type='main'):
    '''
    Function that takes in a dictionary, where names are keys and values are
    dictionaries of that person's bento order, and then checks which keys in
    the bento order dictionary match the provided string. The output is a list
    of tuples, with each person's name and a list of matched ingredients.
    
    Inputs:
        nested_dictionary : dictionary
        ingredient_type : string (default is 'main')
        
    Outputs:
        output_list : tuple
    '''
    output_list = []
    for name, order in nested_dict.items():
        ingredient_list = []
        for key, ingredient in order.items():
            if ingredient_type in key:
                ingredient_list.append(ingredient)
        output_list.append((name, ingredient_list))
    
    
    return output_list

In [None]:
# version that outputs dictionary instead of list
def find_ingredients_dict(nested_dict, ingredient_type='main'):
    '''
    Function that takes in a dictionary, where names are keys and values are
    dictionaries of that person's bento order, and then checks which keys in
    the bento order dictionary match the provided string. The output is a list
    of tuples, with each person's name and a list of matched ingredients.
    
    Inputs:
        nested_dictionary : dictionary
        ingredient_type : string (default is 'main')
        
    Outputs:
        output_list : tuple
    '''
    output_dict = {}
    for name, order in nested_dict.items():
        ingredient_list = []
        for key, ingredient in order.items():
            if ingredient_type in key:
                ingredient_list.append(ingredient)
        output_dict[name] = ingredient_list
    
    
    return output_dict

In [None]:
# Try it!
output = find_ingredients_dict(group_dict, 'side')
output

In [None]:
type(output[0])

In [None]:
find_ingredients_dict(group_dict)

---

# Extra Practice Exercises

1) Use a list comprehension to extract the odd numbers from this set:

In [8]:
nums = set(range(1000))

In [9]:
# Your code here

odds = [x for x in nums if x % 2 != 0]
odds

[1,
 3,
 5,
 7,
 9,
 11,
 13,
 15,
 17,
 19,
 21,
 23,
 25,
 27,
 29,
 31,
 33,
 35,
 37,
 39,
 41,
 43,
 45,
 47,
 49,
 51,
 53,
 55,
 57,
 59,
 61,
 63,
 65,
 67,
 69,
 71,
 73,
 75,
 77,
 79,
 81,
 83,
 85,
 87,
 89,
 91,
 93,
 95,
 97,
 99,
 101,
 103,
 105,
 107,
 109,
 111,
 113,
 115,
 117,
 119,
 121,
 123,
 125,
 127,
 129,
 131,
 133,
 135,
 137,
 139,
 141,
 143,
 145,
 147,
 149,
 151,
 153,
 155,
 157,
 159,
 161,
 163,
 165,
 167,
 169,
 171,
 173,
 175,
 177,
 179,
 181,
 183,
 185,
 187,
 189,
 191,
 193,
 195,
 197,
 199,
 201,
 203,
 205,
 207,
 209,
 211,
 213,
 215,
 217,
 219,
 221,
 223,
 225,
 227,
 229,
 231,
 233,
 235,
 237,
 239,
 241,
 243,
 245,
 247,
 249,
 251,
 253,
 255,
 257,
 259,
 261,
 263,
 265,
 267,
 269,
 271,
 273,
 275,
 277,
 279,
 281,
 283,
 285,
 287,
 289,
 291,
 293,
 295,
 297,
 299,
 301,
 303,
 305,
 307,
 309,
 311,
 313,
 315,
 317,
 319,
 321,
 323,
 325,
 327,
 329,
 331,
 333,
 335,
 337,
 339,
 341,
 343,
 345,
 347,
 349,
 351,

<details>
    <summary>Answer
    </summary>
    <code>[num for num in nums if num % 2 == 1]</code>
    </details>

2) Use a list comprehension to take the first character of each string from the following list of words:

In [None]:
words = ['carbon', 'osmium', 'mercury', 'potassium', 'rhenium', 'einsteinium',
        'hydrogen', 'erbium', 'nitrogen', 'sulfur', 'iodine', 'oxygen', 'niobium']

In [None]:
# Your code here


<details>
    <summary>Answer
    </summary>
    <code>[word[0] for word in words]</code>
    </details>

3) Use a list comprehension to build a list of all the names that start with 'R' from the following list. Add a '?' to the end of each name.

In [None]:
names = ['Randy', 'Robert', 'Alex', 'Ranjit', 'Charlie', 'Richard', 'Ravdeep',
        'Vimal', 'Wu', 'Nelson']

In [None]:
# Your code here (couple ways to do this)

<details>
<summary>Answer
    </summary>
    <code>[name+'?' for name in names if name[0] == 'R']</code>
    </details>

4) From the list below, make a list of dictionaries where the key is the person's name and the value is the person's home phone number.

In [None]:
phone_nos = [{'name': 'greg', 'nums': {'home': 1234567, 'work': 7654321}},
             {'name': 'max', 'nums': {'home': 9876543, 'work': 1010001}},
             {'name': 'erin', 'nums': {'home': 3333333, 'work': 4444444}},
             {'name': 'joél', 'nums': {'home': 2222222, 'work': 5555555}},
             {'name': 'sean', 'nums': {'home': 9999999, 'work': 8888888}}]

In [None]:
# Your code here

<details>
    <summary>Answer</summary>
    <code>[{item['name']: item['nums']['home']} for item in phone_nos]</code>
    </details>

5) Using this customer's dictionary, build a dictionary where the customers' names are the keys and the movies they've bought are the values.

In [None]:
customers = {
    'bill': {'purchases': {'movies': ['Terminator', 'Elf'],
                           'books': []},
             'id': 1},
    'dolph': {'purchases': {'movies': ['It Happened One Night'],
                            'books': ['The Far Side Gallery']},
              'id': 2},
    'pat': {'purchases': {'movies': [],
                          'books': ['Seinfeld and Philosophy', 'I Am a Bunny']},
            'id': 3}
}

In [None]:
# Your code here

<details>
    <summary>Answer</summary>
    <code>{customer: customers[customer]['purchases']['movies'] for customer in customers.keys()}</code> <br/>
    OR <br/>
    <code>{k: v['purchases']['movies'] for k, v in customers.items()}</code>
    </details>

6) Build a function that will return $2^n$ for an input integer $n$.

In [None]:
# Your code here

<details>
    <summary>Answer</summary>
    <code>
def expo(n):
    return 2**n</code>
    </details>

7) Build a function that will take in a list of phone numbers as strings and return the same as integers, removing any parentheses ('(' and ')'), hyphens ('-'), and spaces.

In [None]:
# Your code here

<details>
    <summary>Answer</summary>
    <code>
def int_phone(string_list):
    return [int(string.replace('(', '').replace(')', '').replace('-', '').replace(' ', ''))\
    for string in string_list]</code>
    </details>