# Data Manipulation in Python

In [4]:
# No imports today



# Objectives

- Write functions to transform data
- Construct list and dictionary comprehensions
- Extract data from nested data structures

# Functions

This aspect of Python is _incredibly_ useful! Writing your own functions can save you a TON of work - by _automating_ it.

## Creating Functions

The first line will read:

```python

'def' function_name() ':'

```

Any arguments to the function will go in the parentheses.

Let's try building a function that will automate the task of finding how many times a given number can be evenly divided by 2.

In [169]:
# Let's code it!

def count_even(number):
    if number == 0:
        return 0  # Handle the case of zero, as it cannot be divided by 2
    
    count = 0
    while number % 2 == 0:
        number //= 2  # Use integer division
        count += 1
    return count

    

## Calling Functions

To _call_ a function, simply type its name, along with any necessary arguments in parentheses.

In [171]:
# Let's call it!
count_even(4)
print(count)

0


## Default Argument Values

Sometimes we'll want the argument(s) of our function to have default values.

In [164]:
def cheers(person='aaron', job='data scientist', age=30):
    return f'Hooray for {person}. You\'re a {job} and you\'re {age}!'

In [165]:
cheers('greg', 'scientist', 130)

"Hooray for greg. You're a scientist and you're 130!"

In [166]:
cheers(job='scientist', age=130, person='greg')

"Hooray for greg. You're a scientist and you're 130!"

In [167]:
cheers('cristian', 'git enthusiast')

"Hooray for cristian. You're a git enthusiast and you're 30!"

In [168]:
cheers()

"Hooray for aaron. You're a data scientist and you're 30!"

# Lists

## List Methods

Make sure you're comfortable with the following list methods:

- `.append()`: adds the input element to the end of a list
- `.pop()`: removes and returns the element with input index from the list
- `.extend()`: adds the elements in the input iterable to the end of a list
- `.index()`: returns the first place in a list where the argument is found
- `.remove()`: removes element by value
- `.count()`: returns the number of occurrences of the input element in a list

Question: What's the difference between `.remove()` and `del`?

<details>
    <summary>
        Answer here
    </summary>
    .remove() removes an element by value;<br/>
    del removes an element by position

## List Comprehension

List comprehension is a handy way of generating a new list from existing iterables.

Suppose I start with a simple list.

In [6]:
primes = [2, 3, 5, 7, 11, 13, 17, 19]

What I want now to do is to build a new list that comprises doubles of primes. I can do this with list comprehension!

The syntax is: `[ f(x) for x in <iterable> if <condition>]`

In [7]:
prime_doubles = [x*2 for x in primes]
prime_triples = [x*3 for x in primes]
prime_Quad_test= [x*4 for x in primes]

In [8]:
prime_doubles

[4, 6, 10, 14, 22, 26, 34, 38]

##### Aside: List Comprehensions Vs. `for`-Loops

Yes, I could do the same work with `for`-loops:

In [9]:
prime_doubles2 = []
for prime in primes:
    prime_doubles2.append(prime*2)
prime_doubles2

[4, 6, 10, 14, 22, 26, 34, 38]

In [10]:
prime_doubles == prime_doubles2

True

But list comprehensions are more efficient: The syntax is simpler, and they're also faster. Also, you'll see them in other people's code, so you'll have to know how to work with them!

### Another List Comprehension Example

I can use list comprehension to build a list from objects other than lists:

In [11]:
names = ('Alan Turing', 'Charles Babbage', 'Ada Lovelace',
        'Anita Borg', 'Steve Wozniak', 'Andrew Ng')

splits = [name.split() for name in names]
splits

[['Alan', 'Turing'],
 ['Charles', 'Babbage'],
 ['Ada', 'Lovelace'],
 ['Anita', 'Borg'],
 ['Steve', 'Wozniak'],
 ['Andrew', 'Ng']]

In [12]:
[name1[0]+'. '+name2[0]+'.' for (name1, name2) in splits]

['A. T.', 'C. B.', 'A. L.', 'A. B.', 'S. W.', 'A. N.']

### Exercises

1. Use a list comprehension to extract the odd numbers from this set:

In [19]:
nums = set(range(1000))


In [22]:
num_odd = [x for x in nums if x % 2 == 0]
print(num_odd)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 416, 418, 420,

<details>
    <summary>Answer
    </summary>
    <code>[num for num in nums if num % 2 == 1]</code>
    </details>

2. Use a list comprehension to take the first character of each string from the following list of words.

In [28]:
words = ['carbon', 'osmium', 'mercury', 'potassium', 'rhenium', 'einsteinium',
        'hydrogen', 'erbium', 'nitrogen', 'sulfur', 'iodine', 'oxygen', 'niobium']

In [35]:
word_fchart =[word[0] for word in words]
print(word_fchart)

['c', 'o', 'm', 'p', 'r', 'e', 'h', 'e', 'n', 's', 'i', 'o', 'n']


<details>
    <summary>Answer
    </summary>
    <code>[word[0] for word in words]</code>
    </details>

3. Use a list comprehension to build a list of all the names that start with 'R' from the following list. Add a '?' to the end of each name.

In [39]:
names = ['Randy', 'Robert', 'Alex', 'Ranjit', 'Charlie', 'Richard', 'Ravdeep',
        'Vimal', 'Wu', 'Nelson']

In [51]:
sign="?"
n_name =[name+sign for name in names if name[0]=='R' ]
print(n_name)

['Randy?', 'Robert?', 'Ranjit?', 'Richard?', 'Ravdeep?']


<details>
<summary>Answer
    </summary>
    <code>[name+'?' for name in names if name[0] == 'R']</code>
    </details>

# Dictionaries

## Dictionary Methods

Make sure you're comfortable with the following dictionary methods:

- `.keys()`: returns an array of the dictionary's keys
- `.values()`: returns an array of the dictionary's values
- `.items()`: returns an array of key-value tuples

## Dictionary Comprehension

Much like list comprehension, I can use dictionary comprehension to build dictionaries from existing iterables.

In [52]:
my_dict = {'who': 'flatiron school', 'what': 'data science',
           'when': 'now', 'where': 'here', 'why': '$',
           'how': 'python'}

Remember that the `.items()` method will return a collection of doubles:

In [53]:
my_dict.items()

dict_items([('who', 'flatiron school'), ('what', 'data science'), ('when', 'now'), ('where', 'here'), ('why', '$'), ('how', 'python')])

So I can use a pair of variables to range over it:

In [54]:
{k: v + '!' for k, v in my_dict.items() if k.startswith('w')}

{'who': 'flatiron school!',
 'what': 'data science!',
 'when': 'now!',
 'where': 'here!',
 'why': '$!'}

The same thing works for any collections of doubles:

In [55]:
{k**2: v**2 for k, v in [(0, 1), (2, 3), (4, 5)]}

{0: 1, 4: 9, 16: 25}

### `zip`

Remember that `zip` is a handy way of pairing up two or more iterables:

In [56]:
dict(zip(range(5), ['apple', 'orange', 'banana', 'lime', 'blueberry']))

{0: 'apple', 1: 'orange', 2: 'banana', 3: 'lime', 4: 'blueberry'}

In [57]:
# Zipping multiple iterables together
tuple(zip(range(1, 5), 'a'*4, 'b'*4, 'c'*4, 'd'*4, 'e'*4))

((1, 'a', 'b', 'c', 'd', 'e'),
 (2, 'a', 'b', 'c', 'd', 'e'),
 (3, 'a', 'b', 'c', 'd', 'e'),
 (4, 'a', 'b', 'c', 'd', 'e'))

#### Dictionary Comprehension Using `zip`

In [58]:
{k: v for k, v in zip(range(5), range(0, 10, 2))}

{0: 0, 1: 2, 2: 4, 3: 6, 4: 8}

In [59]:
scores = [.858, .873, .868]
{'model' + str(j+1): scores[j] for j in range(3)}

{'model1': 0.858, 'model2': 0.873, 'model3': 0.868}

### Exercises

1. Use a dictionary comprehension to pair up the countries in the first list with their corresponding capitals in the second list:

In [65]:
list1 = ['USA', 'France', 'Canada', 'Thailand']
list2 = ['Washington', 'Paris', 'Ottawa', 'Bangkok']

In [66]:
{k: v for k, v in zip(list1, list2)}
 

{'USA': 'Washington',
 'France': 'Paris',
 'Canada': 'Ottawa',
 'Thailand': 'Bangkok'}

<details>
<summary>Answer
    </summary>
    <code>{country: capital for (country, capital) in zip(list1, list2)}</code> <br/> OR <br/>
    <code>dict(zip(list1, list2))</code>
    </details>

2. Use a dictionary comprehension to make each of the characters in the following list a key with the value 'fictional character'.

In [70]:
chars = ['Pinocchio', 'Gilgamesh', 'Kumar Patel', 'Toby Flenderson']

In [79]:
{name:'Fictional_Chatacter' for name in chars}

{'Pinocchio': 'Fictional_Chatacter',
 'Gilgamesh': 'Fictional_Chatacter',
 'Kumar Patel': 'Fictional_Chatacter',
 'Toby Flenderson': 'Fictional_Chatacter'}

<details>
    <summary>Answer</summary>
    <code>{char: 'fictional character' for char in chars}</code>
    </details>

# Nesting

Just as we can put lists and dictionaries inside of other lists and dictionaries, we can also put comprehensions inside of other comprehensions.

In [80]:
lists = [['morning', 'afternoon', 'night'], ['read', 'code', 'sleep']]

In [81]:
[[item[0] for item in small_list] for small_list in lists]

[['m', 'a', 'n'], ['r', 'c', 's']]

## Nested Structures

It will be well worth your while to practice accessing data in complex structures. Consider the following:

In [103]:
customers = {
    'bill': {'purchases': {'movies': ['Terminator', 'Elf'],
                     'books': []}, 'id': 1},
            'dolph': {'purchases': {'movies': ['It Happened One Night'],
                     'books': ['The Far Side Gallery']}, 'id': 2},
            'pat': {'purchases': {'movies': [],
                   'books': ['Seinfeld and Philosophy', 'I Am a Bunny']},
                   'id': 3}
}

**Q**: How would we access 'I Am a Bunny'?
<br/>
**A**: The outermost "layer" has a name: 'customers', and that object is a dictionary:
<br/>
`customers`
<br/>
The key we are interested in is 'pat', since that's where 'I Am a Bunny' is located:
<br/>
`customers['pat']`
<br/>
The value corresponding to the key 'pat' is also a dictionary, and in this "lower-down" dictionary, the key we are interested in is 'purchases':
<br/>
`customers['pat']['purchases']`
<br/>
The value corresponding to the key 'purchases' is yet another dictionary, and here the key of interest is `books`:
<br/>
`customers['pat']['purchases']['books']`
<br/>
The value corresponding to the key 'books' is a list, and 'I Am a Bunny' is the second element in that list:
<br/>
`customers['pat']['purchases']['books'][1]`

In [83]:
customers['pat']['purchases']['books'][1]

'I Am a Bunny'

## Exercises

1. From the list below, make a list of dictionaries where the key is the person's name and the value is the person's home phone number.

In [89]:
phone_nos = [{'name': 'greg', 'nums': {'home': 1234567, 'work': 7654321}},
          {'name': 'max', 'nums': {'home': 9876543, 'work': 1010001}},
            {'name': 'erin', 'nums': {'home': 3333333, 'work': 4444444}},
            {'name': 'joél', 'nums': {'home': 2222222, 'work': 5555555}},
            {'name': 'sean', 'nums': {'home': 9999999, 'work': 8888888}}]

In [98]:
home_numbers = [item['nums']['home'] for item in phone_nos]
print(home_numbers)

[1234567, 9876543, 3333333, 2222222, 9999999]


<details>
    <summary>Answer</summary>
    <code>[{item['name']: item['nums']['home']} for item in phone_nos]</code>
    </details>

2. From the customers dictionary above, build a dictionary where the customers' names are the keys and the movies they've bought are the values.

In [109]:
{k: v['purchases']['movies'] for k,v in customers.items()}

{'bill': ['Terminator', 'Elf'], 'dolph': ['It Happened One Night'], 'pat': []}

<details>
    <summary>Answer</summary>
    <code>{customer: customers[customer]['purchases']['movies'] for customer in customers.keys()}</code> <br/>
    OR <br/>
    <code>{k: v['purchases']['movies'] for k, v in customers.items()}</code>
    </details>

# More Exercises

1. Build a function that will return $2^n$ for an input $n$.

In [187]:
def square_number(n):
    factor = 2 **n
    return factor

In [190]:
square_number (5)

32

<details>
    <summary>Answer</summary>
    <code>
def expo(n):
    return 2**n</code>
    </details>

2. Build a function that will take in a list of phone numbers as strings and return the same as integers, removing any parentheses ('(' and ')'), hyphens ('-'), and spaces.

In [234]:

def phone_numbers (word):
        return [int(word.replace('(', '').replace(')', '').replace('-', '')) for list in word]
        


In [235]:
phone_numbers("123-456(785)")



[123456785,
 123456785,
 123456785,
 123456785,
 123456785,
 123456785,
 123456785,
 123456785,
 123456785,
 123456785,
 123456785,
 123456785]

<details>
    <summary>Answer</summary>
    <code>
def int_phone(string_list):
    return [int(string.replace('(', '').replace(')', '').replace('-', '').replace(' ', ''))\
    for string in string_list]</code>
    </details>

3. Build a function that returns the mode of a list of numbers.

In [242]:
def list_numbers(numbers):
    counts = {num: numbers.count(num) for num in numbers}
    max_count = max(counts.values())
    return [num for num in counts.keys() if counts[num] == max_count]


[3]


<details>
    <summary>Answer</summary>
        <code>
def mode(lst):
    counts = {num: lst.count(num) for num in lst}
    return [num for num in counts.keys() if counts[num] == max(counts.values())]</code>
    </details>

In [244]:
list_numbers([10,12,12])

[12]