# Data Manipulation in Python

## Objectives

- Extract data from nested data structures
- Write functions to transform data
- Construct list and dictionary comprehensions

## First - Let's Review

Let's practice different forms of python data manipulation... with bento boxes!

<img src='https://cdn.shopify.com/s/files/1/1610/3863/articles/What_is_Shokado_Bento_Box_a_Classic-Style_Bento_Box_Originated_from_Japanese_Kaiseki_Cuisine_1_1600x.jpg' alt='bento box image, image source: https://www.globalkitchenjapan.com/blogs/articles/would-you-like-to-have-one-for-special-occasions' width=600>

## Python Lists

### List Methods

Here are a few common list methods:

- `.append()`: adds the input element to the end of a list
- `.pop()`: removes and returns the element with input index from the list
- `.extend()`: adds the elements in the input iterable to the end of a list
- `.index()`: returns the first place in a list where the argument is found
- `.remove()`: removes element by value
- `.count()`: returns the number of occurrences of the input element in a list

Let's practice with a few!

### Create our Bento Box!

Let's make a list, called `bento`, that captures some ingredients we'd like to see in our bento box lunch.

Some common ingredients for bento boxes include: chicken, fish, katsu curry, tofu, gyoza, edamame, salad, pickled cucumbers, boiled egg, broccoli, rice, udon noodles, yakisoba ... and these are just in more traditional bento boxes! The best part about a bento is that you can combine any lunch ingredients you like.

Pick 5 things you'd like in your bento, and put those in your `bento` list.

In [1]:
# Create your bento list
bento = ["rice", "salmon", "edamame", "seaweed salad", "dumplings"]

Lists are ordered, meaning you can access the index number for an element:

In [2]:
# Run this cell without changes
bento[4]

'dumplings'

Or you can grab ranges/slices of a list:

In [3]:
# Run this cell without changes
# Play around with these numbers, and start to build some understanding of 
# which elements are where exactly in the list
bento[0:2]

['rice', 'salmon']

Add items to a list with `.append()` - add something else you like to your bento!

In [4]:
# Code here to add to your list
bento.append("cucumbers")

In [5]:
bento_copy = bento

In [6]:
bento_copy

['rice', 'salmon', 'edamame', 'seaweed salad', 'dumplings', 'cucumbers']

If you don't want to keep that last item, you can use `.pop()` to remove it.

In [7]:
# Code here to test that out
bento.pop()

'cucumbers'

In [8]:
# Now check what your list looks like - is that last item still there?
bento

['rice', 'salmon', 'edamame', 'seaweed salad', 'dumplings']

Now, let's put our bento box in a readable format using `join`:

In [9]:
", ".join(bento[:-1])

'rice, salmon, edamame, seaweed salad'

In [10]:
# Run this cell without changes
print("I'd like my bento box to contain: " +
      ", ".join(bento[:-1]) + ", and " + bento[-1])

I'd like my bento box to contain: rice, salmon, edamame, seaweed salad, and dumplings


**Neat trick!** F-strings allow you to easily format strings to add variables or elements from an iterable (like a list). You can also use `.format()` in a similar way.

In [11]:
# Run this cell without changes
print(f"My bento box will include {bento[0]} and {bento[1]}.")

My bento box will include rice and salmon.


In [12]:
# The above cell is the same as:
print("My bento box will include {} and {}.".format(bento[0], bento[1]))

My bento box will include rice and salmon.


In [13]:
print("My bento box will include " + bento[0] + " and " + bento[1] + ".")

My bento box will include rice and salmon.


In [14]:
print(f"I'd like my bento box to contain: {', '.join(bento[:-1])}, and {bento[-1]}.")

I'd like my bento box to contain: rice, salmon, edamame, seaweed salad, and dumplings.


**Think about it:** How is the f-string/`format` working differently from the `+` we used before?

- 


## For Loops

Now let's say we want to capitalize each ingredient in our bento box. How could we do this without editing each one individually?

Well - to go over an iterable, like a list, we can use for loops!

In [15]:
bento[0].title()

'Rice'

In [16]:
bento

['rice', 'salmon', 'edamame', 'seaweed salad', 'dumplings']

In [17]:
# Write a for loop to capitalize each ingredient in our bento list
# capital_bento = []
for bento_index_num, bento_item in enumerate(bento):
    print(bento_index_num)
    print(bento_item.capitalize())
#     bento[bento_index_num] = bento_item.capitalize()

0
Rice
1
Salmon
2
Edamame
3
Seaweed salad
4
Dumplings


In [18]:
bento

['rice', 'salmon', 'edamame', 'seaweed salad', 'dumplings']

We can add conditionals to our loops as well! Let's create a new list, called `s_bento`, that contains only ingredients that have the letter `s` in them.

(don't have any ingredients with `s`? feel free to use another letter!)

In [19]:
"s" in bento[-1]

True

In [20]:
# Write your for loop with a conditional

# Need to first define an empty list to become our new list
s_bento = []

# Now our loop
for food in bento: 
    if "s" in food:
        s_bento.append(food)
#     else:
#         continue

In [21]:
# Check your work
s_bento

['salmon', 'seaweed salad', 'dumplings']

### List Comprehension

**Neat trick!** You can write one-line for loops!

List comprehensions are especially useful if you'd like to loop over something and output a new list - just like we did above!

The syntax is: `[f(x) for x in <iterable> if <condition>]`

In [22]:
# Change our loop to a list comprehension
[food for food in bento if "s" in food]

['salmon', 'seaweed salad', 'dumplings']

In [23]:
# We could do the same with our earlier capitalization, too!
[food.capitalize() for food in bento]

['Rice', 'Salmon', 'Edamame', 'Seaweed salad', 'Dumplings']

In [24]:
[food.title() for food in bento if "s" in food]

['Salmon', 'Seaweed Salad', 'Dumplings']

Do you _need_ to use list comprehension for this? Nope! But list comprehensions are more efficient: The syntax is simpler, and they're also faster. Also, you'll see them in other people's code, so you'll have to know how to work with them!

## Python Dictionaries

<img src='https://images.pexels.com/photos/270233/pexels-photo-270233.jpeg?auto=compress&cs=tinysrgb&dpr=2&w=500' alt='picture of a dictionary page' width=600>

No, not that kind! 

With your list above, someone would need to tell you that "rice" is the main and "salmon" is the protein. 

Dictionaries let you assign **key** and **value** pairs, which connects a key like "main" to a value like "rice". Rather than using **indexing**, you use **keys** to return values.

## Dictionary Methods

Make sure you're comfortable with the following dictionary methods:

- `.keys()`: returns an array of the dictionary's keys
- `.values()`: returns an array of the dictionary's values
- `.items()`: returns an array of key-value tuples

Update your bento box to be a dictionary, called `bento_dict`. There are multiple ways to do this, but let's showcase how you can use your list and a new list of keys to zip your bento box together.

In [25]:
# Here's an example of zipping two lists together to form a dictionary
example_bento_keys = ["ingredient1", "ingredient2", "ingredient3"]
example_bento_values = ["rice", "tempura", "miso soup"]

example_bento_dict = dict(zip(example_bento_keys, example_bento_values))

print(example_bento_dict)
print(type(example_bento_dict))

{'ingredient1': 'rice', 'ingredient2': 'tempura', 'ingredient3': 'miso soup'}
<class 'dict'>


In [26]:
# Now let's do that! What does our current list look like?
bento

['rice', 'salmon', 'edamame', 'seaweed salad', 'dumplings']

In [27]:
# Let's define keys for our bento
# Please use the keys 'protein' and 'main', and have at least 2 keys end in numbers!
# This will make later exercises easier!
bento_keys = ["main", "protein", "side1", "side2", "side3"]

In [28]:
dict(zip(bento_keys, bento))

{'main': 'rice',
 'protein': 'salmon',
 'side1': 'edamame',
 'side2': 'seaweed salad',
 'side3': 'dumplings'}

In [29]:
# Now create your bento_dict!
bento_dict = dict(zip(bento_keys, bento))

In [30]:
# Code here to check your work - check type, and print your dictionary

print(bento_dict)
print(type(bento_dict))

{'main': 'rice', 'protein': 'salmon', 'side1': 'edamame', 'side2': 'seaweed salad', 'side3': 'dumplings'}
<class 'dict'>


You use the key of the dictionary to access its value, for example `bento_box['main']` 

In [31]:
# Practice accessing elements in your bento box
bento_dict['side1']

'edamame'

Let's practice more loops - write a loop that prints the ingredient value, if `side` is in the key.

In [32]:
# Write your loop
for key in bento_dict.keys():
    if 'side' in key:
        print(key)
        print(bento_dict[key])

side1
edamame
side2
seaweed salad
side3
dumplings


In [33]:
key

'side3'

In [34]:
for key, value in bento_dict.items():
    if 'side' in key:
        print(value)

edamame
seaweed salad
dumplings


Now let's make a new dictionary, `side_dict`, that contains the number of the side as the key and the capitalized ingredient as the value.

In [35]:
list(bento_dict.keys())[2]

'side1'

In [36]:
list(bento_dict.keys())[2][-1]

'1'

In [37]:
# Write your loop

# Need to first define an empty dictionary to become our new dict
side_dict = {}

for key, value in bento_dict.items():
    if "side" in key:
        side_num = key[-1]
        side_dict[side_num] = value
        

In [38]:
# Check your work!
side_dict

{'1': 'edamame', '2': 'seaweed salad', '3': 'dumplings'}

### Dictionary Comprehension

Guess what! Just like there's list comprehension to write one-line for loops to output a list, there's the same for dictionaries! This can allow you to take one dictionary and transform it into another.

The syntax is: `{f(key):f(value) for (key,value) in <dictonary>.items() if <condition>}`


In [39]:
# Change our loop to a dictionary comprehension
{key[-1]: value for key, value in bento_dict.items() if "side" in key}

{'1': 'edamame', '2': 'seaweed salad', '3': 'dumplings'}

In [40]:
# You can get creative with it too!
{key[-1]: value.title() for key, value in bento_dict.items() if "side" in key}

{'1': 'Edamame', '2': 'Seaweed Salad', '3': 'Dumplings'}

In [41]:
bento_dict

{'main': 'rice',
 'protein': 'salmon',
 'side1': 'edamame',
 'side2': 'seaweed salad',
 'side3': 'dumplings'}

## Nesting

![Dictionaries inside dictionaries](https://i.imgflip.com/3orgly.jpg)

Let's say we want to combine EVERYONE'S bento dictionaries - we can nest those dictonaries as a dictionary of dictionaries! That way, the key can be the person's name, and the value can be the dictionary that contains their bento box order.

Use Slack to send your dictionaries to each other (you'll want to send everyone the dictionary output, not the code you wrote if you used zip to create your dictionary). 

Grab at least two other orders and create a dictionary of dictionaries, where the key is the name of the person ordering and the value is their bento dictionary:

In [42]:
# Can go ahead and paste at least two other dictionaries
Andy_dict = {'main': 'Katsu Curry',
             'protien1': 'Gyoza',
             'vegetable1': 'Pickled Cucumbers',
             'main2': 'Rice',
             'protien2': 'Chicken'}

Juliet_dict = {'main': 'sushi',
               'side1': 'edamame',
               'side2': 'salad',
               'side3': 'rice',
               'protein': 'gyoza'}

In [43]:
# Code here to create your nested dictionaries
nested_bentos = {
    'Lindsey': bento_dict,
    'Andy': Andy_dict,
    'Juliet': Juliet_dict
}

In [44]:
# Check your work
nested_bentos

{'Lindsey': {'main': 'rice',
  'protein': 'salmon',
  'side1': 'edamame',
  'side2': 'seaweed salad',
  'side3': 'dumplings'},
 'Andy': {'main': 'Katsu Curry',
  'protien1': 'Gyoza',
  'vegetable1': 'Pickled Cucumbers',
  'main2': 'Rice',
  'protien2': 'Chicken'},
 'Juliet': {'main': 'sushi',
  'side1': 'edamame',
  'side2': 'salad',
  'side3': 'rice',
  'protein': 'gyoza'}}

Now, if we wanted a list of people who ordered bento boxes, we could grab a list of those names by using `.keys()`

In [45]:
# Code here to grab a list of who you have orders for
bento_people = list(nested_bentos.keys())

In [46]:
# Check your work
bento_people

['Lindsey', 'Andy', 'Juliet']

How would we access one of these `main`s?

In [47]:
# Access one dictionary's main
nested_bentos['Andy']['main']

'Katsu Curry'

Now let's write a loop to print the main ingredient in everyone's bento order. 

(This is easier if everyone named an ingredient `main` in their dictionary...)

In [48]:
# Code here to write a for loop that prints each main
for bento_order in nested_bentos.values():
    print(bento_order['main'])

rice
Katsu Curry
sushi


Just as we can put lists and dictionaries inside of other lists and dictionaries, we can also put comprehensions inside of other comprehensions!

In [55]:
# An example of nested comprehensions
# An example of nested comprehensions
{f"{name}'s sides": [
    v for k, v in order.items() if 'side' in k] for name, order in nested_bentos.items()}

{"Lindsey's sides": ['edamame', 'seaweed salad', 'dumplings'],
 "Andy's sides": [],
 "Juliet's sides": ['edamame', 'salad', 'rice']}

In [57]:
# But remember ... it's okay to easier to write this out as a for loop
# THEN you can condense into a comprehension more easily!
group_side_dict = {}

for name, order in nested_bentos.items():
    side_list = []    
    for key, ingredient in order.items():        
        if 'side' in key:
            side_list.append(ingredient)
    group_side_dict[f"{name}'s sides"] = side_list
    
# Check it
group_side_dict

{"Lindsey's sides": ['edamame', 'seaweed salad', 'dumplings'],
 "Andy's sides": [],
 "Juliet's sides": ['edamame', 'salad', 'rice']}

## Functions

This aspect of Python is _incredibly_ useful! Writing your own functions can save you a TON of work - by _automating_ it.

### Creating Functions

The first line will read:

```python

'def' function_name() ':'

```

Any arguments to the function will go in the parentheses, and you can set default arguments in those parentheses as well.

Let's write a function that will take in both a nested dictionary of bento orders and the name of an ingredient type, which outputs a tuple with each person's name and the ingredients that match that type!

In [51]:
nested_bentos

{'Lindsey': {'main': 'rice',
  'protein': 'salmon',
  'side1': 'edamame',
  'side2': 'seaweed salad',
  'side3': 'dumplings'},
 'Andy': {'main': 'Katsu Curry',
  'protien1': 'Gyoza',
  'vegetable1': 'Pickled Cucumbers',
  'main2': 'Rice',
  'protien2': 'Chicken'},
 'Juliet': {'main': 'sushi',
  'side1': 'edamame',
  'side2': 'salad',
  'side3': 'rice',
  'protein': 'gyoza'}}

In [52]:
for name, bento_order in nested_bentos.items():
    print(name)
    print(bento_order['main'])

Lindsey
rice
Andy
Katsu Curry
Juliet
sushi


In [53]:
def find_ingredients(nested_dictionary, ingredient_type='main'):
    '''
    Function that takes in a dictionary, where names are keys and values are
    dictionaries of that person's bento order, and then checks which keys in
    the bento order dictionary match the provided string. The output is a list
    of tuples, with each person's name and a list of matched ingredients.

    Inputs:
        nested_dictionary : dictionary
        ingredient_type : string (default is 'main')

    Outputs:
        output_list : list of tuples
    '''
    output_list = []
    for name, bento_order in nested_dictionary.items():
#         print(name)
#         print(bento_order[ingredient_type])    
        output_list.append((name, bento_order[ingredient_type]))
    
    return output_list

In [54]:
# Try it!
find_ingredients(nested_bentos)

[('Lindsey', 'rice'), ('Andy', 'Katsu Curry'), ('Juliet', 'sushi')]

In [58]:
# More robust version would check if ingredient_type is in the text of the key!
# Example below:
def find_ingredients(nested_dictionary, ingredient_type='main'):
    '''
    Function that takes in a dictionary, where names are keys and values are
    dictionaries of that person's bento order, and then checks which keys in
    the bento order dictionary match the provided string. The output is a list
    of tuples, with each person's name and a list of matched ingredients.
    
    Inputs:
        nested_dictionary : dictionary
        ingredient_type : string (default is 'main')
        
    Outputs:
        output_list : tuple
    '''
    
    output_list = []
    
    for name, order in nested_dictionary.items():
        ingredient_list = []    
        for key, ingredient in order.items():        
            if ingredient_type in key:
                ingredient_list.append(ingredient)
        output_list.append((name, ingredient_list))
                
    return output_list

In [59]:
find_ingredients(nested_bentos)

[('Lindsey', ['rice']),
 ('Andy', ['Katsu Curry', 'Rice']),
 ('Juliet', ['sushi'])]

---

# Extra Practice Exercises

1) Use a list comprehension to extract the odd numbers from this set:

In [55]:
nums = set(range(1000))

In [56]:
# Your code here

<details>
    <summary>Answer
    </summary>
    <code>[num for num in nums if num % 2 == 1]</code>
    </details>

2) Use a list comprehension to take the first character of each string from the following list of words:

In [57]:
words = ['carbon', 'osmium', 'mercury', 'potassium', 'rhenium', 'einsteinium',
        'hydrogen', 'erbium', 'nitrogen', 'sulfur', 'iodine', 'oxygen', 'niobium']

In [58]:
# Your code here

<details>
    <summary>Answer
    </summary>
    <code>[word[0] for word in words]</code>
    </details>

3) Use a list comprehension to build a list of all the names that start with 'R' from the following list. Add a '?' to the end of each name.

In [59]:
names = ['Randy', 'Robert', 'Alex', 'Ranjit', 'Charlie', 'Richard', 'Ravdeep',
        'Vimal', 'Wu', 'Nelson']

In [60]:
# Your code here

<details>
<summary>Answer
    </summary>
    <code>[name+'?' for name in names if name[0] == 'R']</code>
    </details>

4) From the list below, make a list of dictionaries where the key is the person's name and the value is the person's home phone number.

In [61]:
phone_nos = [{'name': 'greg', 'nums': {'home': 1234567, 'work': 7654321}},
             {'name': 'max', 'nums': {'home': 9876543, 'work': 1010001}},
             {'name': 'erin', 'nums': {'home': 3333333, 'work': 4444444}},
             {'name': 'joél', 'nums': {'home': 2222222, 'work': 5555555}},
             {'name': 'sean', 'nums': {'home': 9999999, 'work': 8888888}}]

In [62]:
# Your code here

<details>
    <summary>Answer</summary>
    <code>[{item['name']: item['nums']['home']} for item in phone_nos]</code>
    </details>

5) Using this customer's dictionary, build a dictionary where the customers' names are the keys and the movies they've bought are the values.

In [63]:
customers = {
    'bill': {'purchases': {'movies': ['Terminator', 'Elf'],
                           'books': []},
             'id': 1},
    'dolph': {'purchases': {'movies': ['It Happened One Night'],
                            'books': ['The Far Side Gallery']},
              'id': 2},
    'pat': {'purchases': {'movies': [],
                          'books': ['Seinfeld and Philosophy', 'I Am a Bunny']},
            'id': 3}
}

In [64]:
# Your code here

<details>
    <summary>Answer</summary>
    <code>{customer: customers[customer]['purchases']['movies'] for customer in customers.keys()}</code> <br/>
    OR <br/>
    <code>{k: v['purchases']['movies'] for k, v in customers.items()}</code>
    </details>

6) Build a function that will return $2^n$ for an input integer $n$.

In [65]:
# Your code here

<details>
    <summary>Answer</summary>
    <code>
def expo(n):
    return 2**n</code>
    </details>

7) Build a function that will take in a list of phone numbers as strings and return the same as integers, removing any parentheses ('(' and ')'), hyphens ('-'), and spaces.

In [66]:
# Your code here

<details>
    <summary>Answer</summary>
    <code>
def int_phone(string_list):
    return [int(string.replace('(', '').replace(')', '').replace('-', '').replace(' ', ''))\
    for string in string_list]</code>
    </details>

8) Build a function that returns the mode of a list of numbers.

In [67]:
# Your code here

<details>
    <summary>Answer</summary>
        <code>
def mode(lst):
    counts = {num: lst.count(num) for num in lst}
    return [num for num in counts.keys() if counts[num] == max(counts.values())]</code>
    </details>