<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:200%;
           font-family:Arial;letter-spacing:0.5px">

<p width = 20%, style="padding: 10px;
              color:white;">
Data Manipulation in Python              
</p>
</div>

Data Science Cohort Live NYC August 2023
<p>Phase 1: Topic 3</p>
<br>
<br>

<div align = "right">
<img src="images/flatiron-school-logo.png" align = "right" width="200"/>
</div>
    

# Objectives

- Construct list and dictionary comprehensions
- Extract data from nested data structures
- Write functions to transform data

# Lists

## List Methods

Make sure you're comfortable with the following list methods:

- `.append()`: adds the input element to the end of a list
- `.pop()`: removes and returns the element with input index from the list
- `.extend()`: adds the elements in the input iterable to the end of a list
- `.index()`: returns the first place in a list where the argument is found
- `.remove()`: removes element by value
- `.count()`: returns the number of occurrences of the input element in a list

Question: What's the difference between `.remove()` and `del`?

<details>
    <summary>
         Answer here  
    </summary>
    .remove() removes an element by value;<br/>
    del removes an element by position

## List Comprehension

List comprehension is a handy way of generating a new list from existing iterables.

Suppose I start with a simple list.

In [None]:
primes = [2, 3, 5, 7, 11, 13, 17, 19]


What I want now to do is to build a new list that comprises doubles of primes. I can do this with list comprehension!

The syntax is: `[ f(x) for x in <iterable> if <condition>]`

In [None]:
prime_doubles = [x*2 for x in primes]
prime_triples = [x*3 for x in primes]

In [None]:
prime_doubles

In [None]:
prime_triples = [x*3 for x in primes if x%3==0]
prime_triples

##### Aside: List Comprehensions Vs. `for`-Loops

Yes, I could do the same work with `for`-loops:

In [None]:
prime_doubles2 = []
for prime in primes:
    prime_doubles2.append(prime*2)
prime_doubles2

In [None]:
prime_doubles = [x*2 for x in primes]

In [None]:
prime_doubles == prime_doubles2

But list comprehensions are more efficient: The syntax is simpler, and they're also faster. Also, you'll see them in other people's code, so you'll have to know how to work with them!

### Another List Comprehension Example

We can use list comprehension to build a list from objects other than lists:

In [7]:
names = ('Alan Turing', 'Charles Babbage', 'Ada Lovelace',
        'Anita Borg', 'Steve Wozniak', 'Andrew Ng')

splits = [name.split() for name in names]
splits

[['Alan', 'Turing'],
 ['Charles', 'Babbage'],
 ['Ada', 'Lovelace'],
 ['Anita', 'Borg'],
 ['Steve', 'Wozniak'],
 ['Andrew', 'Ng']]

In [8]:
[name1[0]+'. '+name2[0]+'.' for [name1, name2] in splits]

['A. T.', 'C. B.', 'A. L.', 'A. B.', 'S. W.', 'A. N.']

In [9]:
[name[0][0]+'. '+ name[1][0] for name in splits]

['A. T', 'C. B', 'A. L', 'A. B', 'S. W', 'A. N']

### Exercises

1. Use a list comprehension to extract the odd numbers from this set:

In [11]:
nums = set(range(1000))

In [12]:
odds = [num for num in nums if num % 2 ==1]
odds[0:5]

[1, 3, 5, 7, 9]

<details>
    <summary>Answer
    </summary>

```python
[num for num in nums if num % 2 == 1]
```
</details>

2. Use a list comprehension to take the first character of each string from the following list of words.

In [5]:
words = ['carbon', 'osmium', 'mercury', 'potassium', 'rhenium', 'einsteinium',
        'hydrogen', 'erbium', 'nitrogen', 'sulfur', 'iodine', 'oxygen', 'niobium']

In [6]:
first_char = [word[0] for word in words]
first_char[0:5]

['c', 'o', 'm', 'p', 'r']

<details>
    <summary>Answer
    </summary>
    
```python
[word[0] for word in words]
```
</details>

3. Use a list comprehension to build a list of all the names that start with 'R' from the following list. Add a '?' to the end of each name.

In [10]:
names = ['Randy', 'Robert', 'Alex', 'Ranjit', 'Charlie', 'Richard', 'Ravdeep',
        'Vimal', 'Wu', 'Nelson']

In [13]:
R_names = [name+"?" for name in names if name[0] == 'R']
R_names

['Randy?', 'Robert?', 'Ranjit?', 'Richard?', 'Ravdeep?']

<details>
<summary>Answer
    </summary>
 
```python
[name+'?' for name in names if name[0] == 'R']
```
</details>

### What's the difference between lists and tuples?

In [None]:
a_tuple = (3, 8, 2, 5, 4, 2)
a_list = [3, 8, 2, 5, 4, 2]

<details>
    <summary>Answer
    </summary>
    <code>Tuples are immutable objects while lists are mutable.</code>
    </details>

In [16]:
primes_list = [2, 3, 5, 7, 11, 13, 17, 19]
primes_list[0]='e'
print(primes_list)

primes_tuple = (2, 3, 5, 7, 11, 13, 17, 19)
primes_tuple[0] = 'e'
print(primes_tuple)


['e', 3, 5, 7, 11, 13, 17, 19]


TypeError: 'tuple' object does not support item assignment

# Dictionaries

## Dictionary Methods

Make sure you're comfortable with the following dictionary methods:

- `.keys()`: returns an array of the dictionary's keys
- `.values()`: returns an array of the dictionary's values
- `.items()`: returns an array of key-value tuples

## Dictionary Comprehension

Much like list comprehension, I can use dictionary comprehension to build dictionaries from existing iterables.

In [17]:
my_dict = {'who': 'flatiron school', 'what': 'data science',
           'when': 'now', 'where': 'here', 'why': '$',
           'how': 'python'}

Remember that the `.items()` method will return a collection of doubles:

In [18]:
my_dict.items()

dict_items([('who', 'flatiron school'), ('what', 'data science'), ('when', 'now'), ('where', 'here'), ('why', '$'), ('how', 'python')])

So I can use a pair of variables to range over it:

In [19]:
{k: v + '!' for k, v in my_dict.items() if k.startswith('w')}

{'who': 'flatiron school!',
 'what': 'data science!',
 'when': 'now!',
 'where': 'here!',
 'why': '$!'}

The same thing works for any collections of doubles:

In [1]:
{k**2: v**2 for k, v in [(0, 1), (2, 3), (4, 5)]}

{0: 1, 4: 9, 16: 25}

#### List of dictionaries representation of data:

In [3]:
xy_dict_list = [ {'x': 3, 'y': 18}, {'x': 4, 'y': 32}, {'x': 5, 'y': 50}, {'x': 6, 'y': 72}, {'x': 8, 'y': 128} ]

In [4]:
xy_dict_list[0]['y']

18

In [5]:
# Get me the y-value of the third entry
xy_dict_list[2]['y']

50

In [10]:
# get me a list of all x values. Use a list comprehension
print(xy_dict_list )
xs = [xvalue['x'] for xvalue in xy_dict_list]
xs

[{'x': 3, 'y': 18}, {'x': 4, 'y': 32}, {'x': 5, 'y': 50}, {'x': 6, 'y': 72}, {'x': 8, 'y': 128}]


[3, 4, 5, 6, 8]

<details>
    <summary>Answer</summary>

```python
[diction['x'] for diction in xy_dict_list]
```
</details>

### `zip`

Remember that `zip` is a handy way of pairing up two or more iterables:

In [11]:
dict(zip(range(5), ['apple', 'orange', 'banana', 'lime', 'blueberry']))

{0: 'apple', 1: 'orange', 2: 'banana', 3: 'lime', 4: 'blueberry'}

In [None]:
# Zipping multiple iterables together
tuple(zip(range(1, 5), 'a'*4, 'b'*4, 'c'*4, 'd'*4, 'e'*4))

#### Dictionary Comprehension Using `zip`

In [14]:
{k: v for k, v in zip(range(5), range(0, 10, 2))}

{0: 0, 1: 2, 2: 4, 3: 6, 4: 8}

In [15]:
scores = [.858, .873, .868]
{'model' + str(j+1): scores[j] for j in range(3)}

{'model1': 0.858, 'model2': 0.873, 'model3': 0.868}

### Exercises

1. Use a dictionary comprehension to pair up the countries in the first list with their corresponding capitals in the second list:

In [17]:
list1 = ['USA', 'France', 'Canada', 'Thailand']
list2 = ['Washington', 'Paris', 'Ottawa', 'Bangkok']

In [27]:
list3 = dict(zip(list1, list2))
list3

{'USA': 'Washington',
 'France': 'Paris',
 'Canada': 'Ottawa',
 'Thailand': 'Bangkok'}

<details>
<summary>Answer
    </summary>

```python
{country: capital for (country, capital) in zip(list1, list2)}
 OR 
 dict(zip(list1, list2))
```
</details>

2. Use a dictionary comprehension to make each of the characters in the following list a key with the value 'fictional character'.

In [28]:
chars = ['Pinocchio', 'Batman', 'Gilgamesh', 'Neo']

In [29]:
{character: 'fictional character' for character in chars}

{'Pinocchio': 'fictional character',
 'Batman': 'fictional character',
 'Gilgamesh': 'fictional character',
 'Neo': 'fictional character'}

<details>
    <summary>Answer</summary>

```python
{char: 'fictional character' for char in chars}
```
</details>

# Nesting

Just as we can put lists and dictionaries inside of other lists and dictionaries, we can also put comprehensions inside of other comprehensions.

In [30]:
lists = [['morning', 'afternoon', 'night'], ['read', 'code', 'sleep']]

In [33]:
[[item[0] for item in small_list] for small_list in lists]

[['m', 'a', 'n'], ['r', 'c', 's']]

## Nested Structures

It will be well worth your while to practice accessing data in complex structures. Consider the following:

In [34]:
customers = {
    'bill': {'purchases': {'movies': ['Terminator', 'Elf'],
                     'books': []}, 'id': 1},
            'dolph': {'purchases': {'movies': ['It Happened One Night'],
                     'books': ['The Far Side Gallery']}, 'id': 2},
            'pat': {'purchases': {'movies': [],
                   'books': ['Seinfeld and Philosophy', 'I Am a Bunny']},
                   'id': 3}
}

**Q**: How would we access 'I Am a Bunny'?
<br/>
**A**: The outermost "layer" has a name: 'customers', and that object is a dictionary:
<br/>
`customers`
<br/>
The key we are interested in is 'pat', since that's where 'I Am a Bunny' is located:
<br/>
`customers['pat']`
<br/>
The value corresponding to the key 'pat' is also a dictionary, and in this "lower-down" dictionary, the key we are interested in is 'purchases':
<br/>
`customers['pat']['purchases']`
<br/>
The value corresponding to the key 'purchases' is yet another dictionary, and here the key of interest is `books`:
<br/>
`customers['pat']['purchases']['books']`
<br/>
The value corresponding to the key 'books' is a list, and 'I Am a Bunny' is the second element in that list:
<br/>
`customers['pat']['purchases']['books'][1]`

In [37]:
customers['pat']['purchases']['books'][1]

'I Am a Bunny'

## Exercises

1. From the list below, make a list of dictionaries where the key is the person's name and the value is the person's home phone number.

In [40]:
phone_nos = [{'name': 'greg', 'nums': {'home': 1234567, 'work': 7654321}},
          {'name': 'max', 'nums': {'home': 9876543, 'work': 1010001}},
            {'name': 'erin', 'nums': {'home': 3333333, 'work': 4444444}},
            {'name': 'joél', 'nums': {'home': 2222222, 'work': 5555555}},
            {'name': 'sean', 'nums': {'home': 9999999, 'work': 8888888}}]

In [42]:
people = [{item['name']: item['nums']['home']} for item in phone_nos]
people

[{'greg': 1234567},
 {'max': 9876543},
 {'erin': 3333333},
 {'joél': 2222222},
 {'sean': 9999999}]

<details>
    <summary>Answer</summary>

```python
[{item['name']: item['nums']['home']} for item in phone_nos]
```
</details>

2. From the customers dictionary above, build a dictionary where the customers' names are the keys and the movies they've bought are the values.

In [43]:
customers = {
    'bill': {'purchases': {'movies': ['Terminator', 'Elf'],
                     'books': []}, 'id': 1},
            'dolph': {'purchases': {'movies': ['It Happened One Night'],
                     'books': ['The Far Side Gallery']}, 'id': 2},
            'pat': {'purchases': {'movies': [],
                   'books': ['Seinfeld and Philosophy', 'I Am a Bunny']},
                   'id': 3}
}

In [48]:
{k: purchase['purchases']['movies'] for k, purchase in customers.items()}

{'bill': ['Terminator', 'Elf'], 'dolph': ['It Happened One Night'], 'pat': []}

<details>
    <summary>Answer</summary>
    
```python
{customer: customers[customer]['purchases']['movies'] for customer in customers.keys()}
OR 
{k: v['purchases']['movies'] for k, v in customers.items()}
```
</details>

# Functions

This aspect of Python is _incredibly_ useful! Writing your own functions can save you a TON of work - by _automating_ it.

## Creating Functions

The first line will read:

```python

'def' function_name() ':'

```

Any arguments to the function will go in the parentheses.

Try building a function that will automate the task of finding how many times a given number can be evenly divided by 2.

In [62]:
def div2(x):
    y = 0
    while x%2 ==0:
        x/=2
        y+=1    
    return y

div2()


11

<details>
    <summary>Answer</summary>
    
```python
def div2(num):
    ctr = 0
    while num %2 ==0: 
        num/=2
        ctr+=1
    return ctr
```    
</details>

## Calling Functions

To _call_ a function, simply type its name, along with any necessary arguments in parentheses.

In [64]:
div2(28)

2

## Default Argument Values

Sometimes we'll want the argument(s) of our function to have default values.

In [65]:
def cheers(person='aaron', job='data scientist', age=30):
    return f'Hooray for {person}. You\'re a {job} and you\'re {age}!'

In [66]:
cheers()

"Hooray for aaron. You're a data scientist and you're 30!"

In [67]:

cheers('greg', 'scientist', 130)

"Hooray for greg. You're a scientist and you're 130!"

In [68]:

cheers(job='scientist', age=130, person='greg')


"Hooray for greg. You're a scientist and you're 130!"

In [69]:
cheers('cristian', 'git enthusiast')

"Hooray for cristian. You're a git enthusiast and you're 30!"

# More Exercises

1. Build a function that will return $2^n$ for an input $n$.

In [73]:
def power(n):
    return 2**n

<details>
    <summary>Answer</summary>

```python   
def expo(n):
    return 2**n
```
</details>

2. Build a function that will take in a list of phone numbers as strings and return the same as integers, removing any parentheses ('(' and ')'), hyphens ('-'), and spaces.

In [74]:
v =['(718-931-6749)', '(888)-345 4446']

In [84]:
def clean_phone(nums):
    numlist = []
    for num in nums:
        num = num.replace('(', '')
        num = num.replace('-', '')
        num = num.replace(')', "")
        num = num.replace(' ', "")
        numlist.append(int(num))
    return numlist

clean_phone(v)

[7189316749, 8883454446]

In [82]:
def int_phone(string_list):
    return [int(string.replace('(', '').replace(')', '').replace('-', '').replace(' ', ''))\
    for string in string_list]

int_phone(v)

[7189316749, 8883454446]

<details>
    <summary>Answer</summary>
    <code>
def int_phone(string_list):
    return [int(string.replace('(', '').replace(')', '').replace('-', '').replace(' ', ''))\
    for string in string_list]</code>
    </details>