# Data Manipulation in Python

## Objectives

- Construct list and dictionary comprehensions
- Extract data from nested data structures
- Write functions to transform data

Let's review some python data manipulation... with NBA data!

![nba finals image from ESPN](https://a.espncdn.com/photo/2022/0408/nba_playoff-preview_16x9_608x342.jpg)

## PythonLists

### List Methods

Here are a few common list methods:

- `.append()`: adds the input element to the end of a list
- `.pop()`: removes and returns the element with input index from the list
- `.extend()`: adds the elements in the input iterable to the end of a list
- `.index()`: returns the first place in a list where the argument is found
- `.remove()`: removes element by value
- `.count()`: returns the number of occurrences of the input element in a list

Let's practice with a few!

### Create Our NBA Team Roster

In a perfect world, which four teams would you have in the NBA semi-finals?

I've provided a list of NBA teams below. Let's create our dream list! Pick 4 teams you'd like to be in your semi-finals, and put those in your `nba_teams` list:


```
['Dallas Mavericks', 'Orlando Magic', 'San Antonio Spurs', 'Denver Nuggets', 'Brooklyn Nets', 'Washington Wizards', 'Golden State Warriors', 'Los Angeles Clippers', 'Los Angeles Lakers', 'Memphis Grizzlies', 'Milwaukee Bucks', 'Phoenix Suns', 'Miami Heat', 'Indiana Pacers', 'Sacremento Kings', 'Detroit Pistons', 'New York Knicks', 'Portland Trail Blazers', 'Oklahoma City Thunder', 'Cleveland Cavaliers', 'Toronto Raptors', 'New Orleans Pelicans', 'Charlotte Hornets', 'Atlanta Hawks', 'Minnesota Timberwolves', 'Boston Celtics', 'Houston Rockets', 'Chicago Bulls', 'Utah Jazz', 'Philadelphia 76ers']
```

(don't know or care anything about the NBA? no worries, just pick four at random, or make some up!)

In [1]:
# Replace None with code to create your list
nba_teams = [
    'Boston Celtics',
    'Miami Heat',
    'Houston Rockets',
    'Golden State Warriors'
]

In [2]:
# Check the contents of your list
nba_teams

['Boston Celtics', 'Miami Heat', 'Houston Rockets', 'Golden State Warriors']

Since lists are indexed, you can both access elements using their index number and access slices of lists using index numbers.

In [3]:
# Access the first, aka zero-th, element of your list
nba_teams[0]

'Boston Celtics'

In [4]:
# Now access a range
nba_teams[1:3]

['Miami Heat', 'Houston Rockets']

## For Loops

What if we wanted to remove spaces from each of the team names? Whenever we want to access each element from a collection of elements or iterable, we should consider using some kind of loop - here, let's use a `for` loop to remove the spaces from each team name.

In [5]:
# Always easier to test on a single element first
# Try removing the spaces from nba_team[0]
nba_teams[0].replace(" ", "")

'BostonCeltics'

In [6]:
# Now that we can do one, let's try with a loop!
# Create a new list, nba_team_nospace, with team names without spaces
nba_team_nospace = []

for team in nba_teams:
    team_nospace = team.replace(" ", "")
    nba_team_nospace.append(team_nospace)

In [7]:
# Check your work
nba_team_nospace

['BostonCeltics', 'MiamiHeat', 'HoustonRockets', 'GoldenStateWarriors']

### List Comprehension

**Neat trick!** You can write one-line for loops!

List comprehensions are especially useful if you'd like to loop over something and output a new list - just like we did above!

The syntax is: `[f(x) for x in <iterable> if <condition>]`

In [8]:
# Change our loop to a list comprehension
[team.replace(" ", "") for team in nba_teams]

['BostonCeltics', 'MiamiHeat', 'HoustonRockets', 'GoldenStateWarriors']

Do you _need_ to use list comprehension for this? Nope! But list comprehensions are more efficient: The syntax is often simpler, and they're also faster. Also, you'll see them in other people's code, so you'll have to know how to work with them!

# Dictionaries

## Dictionary Methods

Make sure you're comfortable with the following dictionary methods:

- `.keys()`: returns an array of the dictionary's keys
- `.values()`: returns an array of the dictionary's values
- `.items()`: returns an array of key-value tuples

What if we wanted to know the locations of our teams? Let's update our dream teams to be a dictionary, named `nba_dict`, where the key is the team name, and the value is the state where that team is located.

Many ways to do this - let's use a `.join()`! 

In [9]:
nba_teams

['Boston Celtics', 'Miami Heat', 'Houston Rockets', 'Golden State Warriors']

In [10]:
# Joins need two lists
# So, creating my state list to match my teams list
state_list = ['Massachusetts', 'Florida', 'Texas', 'California']

In [11]:
# Now, create your nba_dict
nba_dict = dict(zip(nba_teams, state_list))

In [12]:
# Check your work
nba_dict

{'Boston Celtics': 'Massachusetts',
 'Miami Heat': 'Florida',
 'Houston Rockets': 'Texas',
 'Golden State Warriors': 'California'}

Let's practice more loops: write a loop that prints the team name if that team is based in Texas (or, pick a state that's in your dictionary!)

In [13]:
# Loop over your dictionary
# Note that we want to access both the key and the value...
for key, value in nba_dict.items():
    if value == 'Texas':
        print(key)

Houston Rockets


Now let's make a new dictionary, `tx_dict` (or change to whatever state you used above), where the key is the first word of the team name, and the value is the last word of the team name in all capital letters:

In [14]:
# Create your tx_dict
# Key is the state, value is the last word in all caps
tx_dict = {}

for key, value in nba_dict.items():
    if value == 'Texas':
        tx_dict[key.split(" ")[0]] = key.split(" ")[-1].upper()

In [15]:
# Check your work
tx_dict

{'Houston': 'ROCKETS'}

### Dictionary Comprehension

Guess what! Just like there's list comprehension to write one-line for loops to output a list, there's the same for dictionaries! This can allow you to take one dictionary and transform it into another.

The syntax is: `{f(key):f(value) for (key,value) in <dictonary>.items() if <condition>}`


In [16]:
# Change our loop to a dictionary comprehension
{key.split(" ")[0]: key.split(" ")[-1].upper() for key, value in nba_dict.items() if value == 'Texas'}

{'Houston': 'ROCKETS'}

As you might note, this can get really long! Up to you how much you want to write in one line of code... typical max would be 80 characters, FYI.

In [17]:
# You can get creative with it too!
{f"Team {x+1}": list(nba_dict.keys())[x] for x in range(len((nba_dict.values())))}

{'Team 1': 'Boston Celtics',
 'Team 2': 'Miami Heat',
 'Team 3': 'Houston Rockets',
 'Team 4': 'Golden State Warriors'}

## Nesting

There is more to a team beyond the team name and the state they're based out of. Let's play around with this new, nested dictionary, `nba_2022`, where the keys are the eight teams that made it to the semi-finals this year, and the value is a dictionary with details like the state, division, and location.

In [18]:
nba_2022 = {
    'Miami Heat':{
        'State': 'Florida',
        'Division': 'East',
        'Lat': 25.781389,
        'Long': -80.188056
    },
    'Philadelphia 76ers':{
         'State': 'Pennsylvania',
         'Division': 'East',
         'Lat': 39.901111,
         'Long': -75.171944
    },
    'Milwaukee Bucks':{
        'State': 'Wisconsin',
        'Division': 'East',
        'Lat': 43.045028,
        'Long': -87.918167
    },
    'Boston Celtics':{
        'State': 'Massachusetts',
        'Division': 'East',
        'Lat': 42.366303,
        'Long': -71.062228
    },
    'Phoenix Suns':{
        'State': 'Arizona',
        'Division': 'West',
        'Lat': 33.445833,
        'Long': -112.071389
    },
    'Dallas Mavericks': {
        'State': 'Texas',
        'Division': 'West',
        'Lat': 32.790556,
        'Long': -96.810278
    },
    'Golden State Warriors':{
        'State': 'California',
        'Division': 'West',
        'Lat': 37.768056,
        'Long': -122.3875
    },
    'Memphis Grizzlies':{
        'State': 'Tennessee',
        'Division': 'West',
        'Lat': 35.138333,
        'Long': -90.050556
    }
} 

Now, if we wanted to check the team names, we could check the keys of this dictionary:

In [19]:
# Check out the keys of nba_2022
nba_2022.keys()

dict_keys(['Miami Heat', 'Philadelphia 76ers', 'Milwaukee Bucks', 'Boston Celtics', 'Phoenix Suns', 'Dallas Mavericks', 'Golden State Warriors', 'Memphis Grizzlies'])

How would we access one of those Divisions?

In [20]:
# Try for the Memphis Grizzlies
nba_2022['Memphis Grizzlies']['Division']

'West'

Now let's write a loop that prints out the state for each team:

In [21]:
# Print out each team's state using a loop
for team_details in nba_2022.values():
    print(team_details['State'])

Florida
Pennsylvania
Wisconsin
Massachusetts
Arizona
Texas
California
Tennessee


Just as we can put lists and dictionaries inside of other lists and dictionaries, we can also put comprehensions inside of other comprehensions!

In [22]:
# An example of nested comprehensions
{f"{name}'s Lat & Long": 
    [v for k,v in details.items() if "L" in k] for name, details in nba_2022.items()}

{"Miami Heat's Lat & Long": [25.781389, -80.188056],
 "Philadelphia 76ers's Lat & Long": [39.901111, -75.171944],
 "Milwaukee Bucks's Lat & Long": [43.045028, -87.918167],
 "Boston Celtics's Lat & Long": [42.366303, -71.062228],
 "Phoenix Suns's Lat & Long": [33.445833, -112.071389],
 "Dallas Mavericks's Lat & Long": [32.790556, -96.810278],
 "Golden State Warriors's Lat & Long": [37.768056, -122.3875],
 "Memphis Grizzlies's Lat & Long": [35.138333, -90.050556]}

In [23]:
# But remember ... it's okay to easier to write this out as a for loop
# THEN you can condense into a comprehension more easily!

lat_long_dict = {}

for name, details in nba_2022.items():
    loc_list = []    
    for key, detail in details.items():        
        if 'L' in key:
            loc_list.append(detail)
    lat_long_dict[f"{name}'s Lat & Long"] = loc_list
    
# Check it
lat_long_dict

{"Miami Heat's Lat & Long": [25.781389, -80.188056],
 "Philadelphia 76ers's Lat & Long": [39.901111, -75.171944],
 "Milwaukee Bucks's Lat & Long": [43.045028, -87.918167],
 "Boston Celtics's Lat & Long": [42.366303, -71.062228],
 "Phoenix Suns's Lat & Long": [33.445833, -112.071389],
 "Dallas Mavericks's Lat & Long": [32.790556, -96.810278],
 "Golden State Warriors's Lat & Long": [37.768056, -122.3875],
 "Memphis Grizzlies's Lat & Long": [35.138333, -90.050556]}

## Functions

This aspect of Python is _incredibly_ useful! Writing your own functions can save you a TON of work - by _automating_ it.

### Creating Functions

The first line will read:

```python

'def' function_name() ':'

```

Any arguments to the function will go in the parentheses, and you can set default arguments in those parentheses as well.

Let's write a function that will take in both a nested dictionary of bento orders and the name of an ingredient type, which outputs a tuple with each person's name and the ingredients that match that type!

In [26]:
def find_team_details(nested_dictionary, detail_desc='State'):
    '''
    Function that takes in a dictionary, where team names are keys and values are
    dictionaries of details on that team, and then checks which keys in
    the nested team dictionary match the provided string. The output is a list
    of tuples, with each team's name and a list of matched details.
    
    Inputs:
        nested_dictionary : dictionary
        detail_desc : string (default is 'State')
        
    Outputs:
        output_list : tuple
    '''
    
    output_list = []
    
    for name, details in nested_dictionary.items():
        team_list = []    
        for key, detail in details.items():        
            if detail_desc in key:
                team_list.append(detail)
        output_list.append((name, team_list))
                
    return output_list

In [29]:
# Try it!
output = find_team_details(nba_2022, 'L')
output

[('Miami Heat', [25.781389, -80.188056]),
 ('Philadelphia 76ers', [39.901111, -75.171944]),
 ('Milwaukee Bucks', [43.045028, -87.918167]),
 ('Boston Celtics', [42.366303, -71.062228]),
 ('Phoenix Suns', [33.445833, -112.071389]),
 ('Dallas Mavericks', [32.790556, -96.810278]),
 ('Golden State Warriors', [37.768056, -122.3875]),
 ('Memphis Grizzlies', [35.138333, -90.050556])]

In [30]:
type(output[0])

tuple

---

# Extra Practice Exercises

1) Use a list comprehension to extract the odd numbers from this set (`nums`):

In [None]:
nums = set(range(1000))

In [None]:
# Your code here

<details>
    <summary>Answer
    </summary>
    <code>[num for num in nums if num % 2 == 1]</code>
    </details>

2) Use a list comprehension to take the first character of each string from the following list `words`:

In [None]:
words = ['carbon', 'osmium', 'mercury', 'potassium', 'rhenium', 'einsteinium',
        'hydrogen', 'erbium', 'nitrogen', 'sulfur', 'iodine', 'oxygen', 'niobium']

In [None]:
# Your code here

<details>
    <summary>Answer
    </summary>
    <code>[word[0] for word in words]</code>
    </details>

3) Use a list comprehension to build a list of all the names that start with 'R' from the following `names` list. Add a '?' to the end of each name.

In [None]:
names = ['Randy', 'Robert', 'Alex', 'Ranjit', 'Charlie', 'Richard', 'Ravdeep',
        'Vimal', 'Wu', 'Nelson']

In [None]:
# Your code here

<details>
<summary>Answer
    </summary>
    <code>[name+'?' for name in names if name[0] == 'R']</code>
    </details>

4) From the `phone_nos` list below, make a list of dictionaries where the key is the person's name and the value is the person's home phone number.

In [None]:
phone_nos = [{'name': 'greg', 'nums': {'home': 1234567, 'work': 7654321}},
             {'name': 'max', 'nums': {'home': 9876543, 'work': 1010001}},
             {'name': 'erin', 'nums': {'home': 3333333, 'work': 4444444}},
             {'name': 'joél', 'nums': {'home': 2222222, 'work': 5555555}},
             {'name': 'sean', 'nums': {'home': 9999999, 'work': 8888888}}]

In [None]:
# Your code here

<details>
    <summary>Answer</summary>
    <code>[{item['name']: item['nums']['home']} for item in phone_nos]</code>
    </details>

5) Using this `customer` dictionary, build a dictionary where the customers' names are the keys and the movies they've bought are the values.

In [None]:
customers = {
    'bill': {'purchases': {'movies': ['Terminator', 'Elf'],
                           'books': []},
             'id': 1},
    'dolph': {'purchases': {'movies': ['It Happened One Night'],
                            'books': ['The Far Side Gallery']},
              'id': 2},
    'pat': {'purchases': {'movies': [],
                          'books': ['Seinfeld and Philosophy', 'I Am a Bunny']},
            'id': 3}
}

In [None]:
# Your code here

<details>
    <summary>Answer</summary>
    <code>{customer: customers[customer]['purchases']['movies'] for customer in customers.keys()}</code> <br/>
    OR <br/>
    <code>{k: v['purchases']['movies'] for k, v in customers.items()}</code>
    </details>

6) Build a function that will return $2^n$ for an input integer $n$.

In [None]:
# Your code here

<details>
    <summary>Answer</summary>
    <code>
def expo(n):
    return 2**n</code>
    </details>

7) Build a function that will take in a list of phone numbers as strings and return the same as integers, removing any parentheses ('(' and ')'), hyphens ('-'), and spaces.

In [None]:
# Your code here

<details>
    <summary>Answer</summary>
    <code>
def int_phone(string_list):
    return [int(string.replace('(', '').replace(')', '').replace('-', '').replace(' ', ''))\
    for string in string_list]</code>
    </details>

8) Build a function that returns the mode of a list of numbers.

In [None]:
# Your code here

<details>
    <summary>Answer</summary>
        <code>
def mode(lst):
    counts = {num: lst.count(num) for num in lst}
    return [num for num in counts.keys() if counts[num] == max(counts.values())]</code>
    </details>