## 1 Dictionaries and Frequency Tables

 ### 1.1 Storing Data

In the last lesson, we worked with a data set that stores information for 7,197 mobile apps:

| |id | track_name | size_bytes | price | user_rating_ver | ver | cont_rating | prime_genre | 
|----|------------|-------------------------|-------|-----------------|-----|-------------|-------------|-------------------|
| 0 | 284882215 | Facebook | USD | 3.5 | 3.5 | 95.0 | 4+ | Social Networking |
| 1 | 389801252 | Instagram | USD | 4.5 | 4.0 | 10.23 | 12+ | Photo & Video |
| 2 | 529479190 | Clash of Clans | USD | 4.5 | 4.5 | 9.24.12 | 9+ | Games |
| 3 | 420009108 | Temple Run | USD | 4.5 | 4.0 | 1.6.2 | 9+ | Games |
| 4 | 284035177 | Pandora - Music & Radio | USD | 4.0 | 4.5 | 8.4.1 | 12+ | Music |


The **cont_rating** column offers information about the content rating of each app. The content rating of an app (also known as the maturity rating) represents the age required to use that app. The table below shows the unique content ratings in our data set, along with the number of apps specific to each rating:

| Content rating | Number of apps |
|----------------|----------------|
| 4+ | 4,433 |
| 9+ | 987 |
| 12+ | 1,155 |
| 17+ | 622 |

From the table above, we can see that:

- Most apps (4,433 apps) have a content rating of 4+ (only people of age four or more are allowed to use these apps).
- Apps with a content rating of 17+ are the fewest (622 apps).
- In the middle, we have the 9+ and 12+ apps — 987 apps have a content rating of 9+, and 1,155 apps have a rating of 12+.

If we wanted to save the data from the table above, we could use two lists or maybe a list of lists. We'll try this in the following exercise, while in the next sections we'll learn about **dictionaries** and explore a more efficient solution for storing the data above.


**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Store the data in the table above using two different lists.
  - Assign the list ['4+', '9+', '12+', '17+'] to a variable named **content_ratings**.
  - Assign the list [4433, 987, 1155, 622] to a variable named **numbers**.
- Store the data in the table above using a list of lists. Assign the list [['4+', '9+', '12+', '17+'], [4433, 987, 1155, 622]] to a variable named **content_rating_numbers**.



In [0]:
# put your code here

### 1.2 Dictionaries

In the previous subsection, we saw a table that shows the unique content ratings in our data set, along with the number of apps specific to each rating:

| Content rating | Number of apps |
|----------------|----------------|
| 4+ | 4,433 |
| 9+ | 987 |
| 12+ | 1,155 |
| 17+ | 622 |

We stored the data above in two ways:

- Using two separate lists
- Using a single list of lists

<left><img width="400" src="https://drive.google.com/uc?export=view&id=18D4vQ-JERgYxeghnxlBEcxrzbYmb4dtR" /></left>

Looking at the lists above, it may not be immediately clear which content rating corresponds to which number — especially for someone who doesn't have enough context. We need to find a better way to map a content rating to its corresponding number.

Remember that each list element has an index number. Let's consider the **numbers** list:

<left><img width="300" src="https://drive.google.com/uc?export=view&id=1Rt6aBHg18wizfv1xdeJ5VEeMbONvcF0x" /></left>

What if we could transform the index numbers to content rating values? This way, the mapping between content ratings and their corresponding numbers should become much more clear.

<left><img width="300" src="https://drive.google.com/uc?export=view&id=1NTj5c5W6CEAND3ur18T5TUeDrDdBmT2x" /></left>

Fortunately, we can do this using a **dictionary**:

<left><img width="500" src="https://drive.google.com/uc?export=view&id=1N6Wu8WWqFCbAe_3BftmaTuisZwrP8c5Y" /></left>

To create the dictionary above, we:

- Mapped each content rating to its corresponding number by following an index:value pattern. For instance, to map a rating of '4+' to the number 4,433, we typed '4+': 4433 (notice the colon between '4+' and 4433). To map '9+' to 987, we typed '9+': 987, and so on.
- Typed the entire sequence of index:value pairs, and separated each with a comma: '4+': 4433, '9+': 987, '12+': 1155, '17+': 622
- Surrounded the sequence with curly braces: {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}



**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Map content ratings to their corresponding numbers by recreating the dictionary above: {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}. Assign the dictionary to a variable named **content_ratings**.
- Print **content_ratings** and examine the output carefully. Has the order we used to create the dictionary been preserved? In other words, is the output identical to {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}? We'll discuss more about this in the subsection.

In [0]:
# put your code here

### 1.3 Indexing


Remember from the previous subsection that using a dictionary allowed us to change the index numbers of a list to content rating values — this way, the mapping between content ratings and their corresponding numbers became much more clear.


To retrieve the individual values of the **content_ratings** dictionary, we can use the new indices. The way we retrieve individual dictionary values is identical to the way we retrieve individual list elements — we follow a **variable_name[index]** pattern:

<left><img width="450" src="https://drive.google.com/uc?export=view&id=1X_2J__qfnAyOwMpPZViQiuREro3xJd6V" /></left>

This is contrary to what we've seen with lists, where the order is always preserved. In lists, there's a direct connection between the index of a value and the position of that value in the list. For instance, the index value 0 always retrieves the list element that's positioned first in a list. If order wasn't preserved and list elements were constantly swapped, then the index value 0 would retrieve different list elements at different times — this is something we strongly want to avoid.

With dictionaries, there's no connection anymore between the index of a value and the position of that value in the dictionary, so the order becomes unimportant. For instance, the index value '4+' will retrieve the value 4433 no matter its position — 4433 could be the first element in the dictionary, the second, the fourth, it doesn't matter.

Whether or not order is preserved within dictionaries also depends on the version of Python we use — we'll discuss versions later on in this course. Now, let's practice retrieving a few dictionary values.



**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Retrieve values from the **content_ratings** dictionary.
  - Assign the value at index '9+' to a variable named **over_9**.
  - Assign the value at index '17+' to a variable named **over_17**.
- Print **over_9** and **over_17**.



In [0]:
content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}

# put your code here

### 1.4 Alternative Way of Creating a Dictionary

We can create a dictionary and populate it with values by following these steps:

- We create an empty dictionary.
- We add values one by one to that empty dictionary.

Adding a value to a dictionary follows the pattern **dictionary_name[index] = value**. To add a value 4433 with an index '4+' to a dictionary named **content_ratings**, we need to use the code **content_ratings['4+'] = 4433**.

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1qRQdOytXhD5NrDegPwx1lNDA0kbz7HMw" /></left>


**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Use the new technique we learned above to map content ratings to their corresponding numbers inside a dictionary.
  - Create an empty dictionary named **content_ratings**.
  - Add the **index:value** pairs one by one using the **dictionary_name[index] = value** technique. This should be the final form of the dictionary: {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}.
- Retrieve the value at index 12+ from the **content_ratings** dictionary, and assign it to a variable named **over_12_n_apps**.



In [0]:
# put your code here

### 1.5 Key-Value Pairs

The index of a dictionary value is called a **key**. In '4+': 4433, the dictionary key is '4+', and the dictionary value is 4433. As a whole, '4+': 4433 is a **key-value** pair.

<left><img width="500" src="https://drive.google.com/uc?export=view&id=1iyS33fmYlwLIUOUe9SMzBo81iI70GFP7" /></left>

Dictionary values can be of any data type: strings, integers, floats, Booleans, lists, and even dictionaries.

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1uvYRBfFJ7eDW0E7OsyhoYOJgadric10Z" /></left>

Dictionary keys can be of almost any data type we've learned so far, except lists and dictionaries. 

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1qy-7YnMHr3X7EJQ2FQhLIU6efxKA4wTG" /></left>


If we use lists or dictionaries as dictionary keys, the computer raises an error:


<left><img width="500" src="https://drive.google.com/uc?export=view&id=1CK4MwOVFOBdvJ9MYVxb8PB7oxh59QKBz" /></left>


In the spirit of explaining what happens behind the curtains, we're going to explain below why this error is raised. Understanding this, however, is not important for moving forward in this mission, so feel free to jump straight to the exercises.

To understand the error messages above, we have to take a brief look at what Python does behind the scenes. When we populate a dictionary, Python tries to convert each dictionary key to an integer (even if the key is of a data type other than an integer) in the background. Python does the conversion using the **hash()** command:

<left><img width="200" src="https://drive.google.com/uc?export=view&id=1AS0GLDREehM8meuz4uXcPRBZL5vTr_Iu" /></left>

For reasons we'll be able to understand later, the **hash()** command doesn't transform lists and dictionaries to integers, and returns an error instead. Notice the error messages are identical to what we got when we tried to use lists or dictionaries as keys.

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1nQsssulsgX2CqFMR3F0PRs8VDuJs-dRT" /></left>


When we populate a dictionary, we also need to make sure each key in that dictionary is unique. If we use an identical key for two different values or more, Python keeps only the last key-value pair in the dictionary and removes the others — this means that we'll lose data. We illustrate this in the diagram below, where we highlited the identical keys with a distinct color:

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1m7k0ErHjWb2Am3zkR6IOiicSTqkL9RZa" /></left>

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Create the following dictionary and assign it to a variable named **d_1**:

```python
{'key_1': 'first_value', 
 'key_2': 2,
 'key_3': 3.14,
 'key_4': True,
 'key_5': [4,2,1],
 'key_6': {'inner_key' : 6}
 }
```

- Examine the code below and determine whether it'll raise an error or not. If you think it'll raise an error, then assign the boolean True to a variable named error, otherwise assign False.

```python
{4: 'four',
1.5: 'one point five',
'string_key': 'string_value',
True: 'True',
[1,2,3]: 'a list',
{10: 'ten'}: 'a dictionary'}
```




In [0]:
# put your code here

### 1.6 Checking for Membership

Previously, we worked with a small table showing the four unique content ratings in our data set, along with the number of apps corresponding to each rating.

| Content rating | Number of apps |
|----------------|----------------|
| 4+ | 4,433 |
| 9+ | 987 |
| 12+ | 1,155 |
| 17+ | 622 |

You might have wondered how we managed to count the number of apps for each unique content rating. How did we find out there are 4,433 apps with a 4+ content rating, or 622 apps with a 17+ rating? Part of the answer is that we used a technique that makes use of the special properties of dictionaries. The full answer is a bit lengthier, and we'll explore it over this subsection and the next — we'll learn how to count the number of apps for each unique content rating.

Once we've created a dictionary, we can check whether a certain value exists in the dictionary as a key. We can check, for instance, whether the value '12+' exists as a key in the dictionary {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}. To do that, we use the **in** operator.

<left><img width="600" src="https://drive.google.com/uc?export=view&id=1rPsn_uOr-jhY53oKpZnm8MfGCAwpY-SS" /></left>

An expression of the form **a_value** in **a_dictionary** always returns a Boolean value:

- **True** is returned if **a_value** exists in **a_dictionary** as a dictionary key.
- **False** is returned if **a_value** doesn't exist in **a_dictionary** as a dictionary key.

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Using the **in** operator, check whether the following values exist as dictionary keys in the **content_ratings** dictionary:

  - The string '9+'. Assign the output of the expression to a variable named **is_in_dictionary_1**.
  - The integer 987. Assign the output of the expression to a variable named **is_in_dictionary_2**.
- Combine the output of an expression containing **in** with an if statement. If the string '17+' exists as dictionary key in **content_ratings**, then:
  - Assign the string "It exists" to a variable named **result**.
  - Print the **result** variable.

In [0]:
content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}
# put your code here

### 1.7 Counting with Dictionaries

Once we've created and populated a dictionary, we can update (change) the dictionary values. To update a dictionary value, we need to reference it by its corresponding dictionary key and then perform the updating operation we want. In the code example below, we:

- Change the value corresponding to the dictionary key '4+' from 4433 to 0.
- Add 13 to the value corresponding to the dictionary key '9+'.
- Subtract 1155 from the value corresponding to the dictionary key '12+'.
- Change the value corresponding to the dictionary key '17+' from 622 (integer) to '622' (string).

<left><img width="600" src="https://drive.google.com/uc?export=view&id=1BP16mBNq3qIJ3CoH0LefS8z2qu_7BtIz" /></left>

We can combine updating dictionary values with what we know already to count how many times each unique content rating occurs in our data set. 

<left><img width="500" src="https://drive.google.com/uc?export=view&id=1qX7DQ5Zq3slleIwcczT7gD339NKtHr25" /></left>

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Count the number of times each unique content rating occurs in the data set.
  - Create a dictionary named **content_ratings** where the keys are the unique content ratings and the values are all 0 (the values of 0 are temporary at this point, and they'll be updated).
  - Loop through the **apps_data** list of lists. Make sure you don't include the header row. For each iteration of the loop:
    - Assign the content rating value to a variable named **c_rating**. The content rating is at index number 10 in each row.
    - Check whether **c_rating** exists as a key in **content_ratings**. If it exists, then increment the dictionary value at that key by 1 (the key is equivalent to the value stored in **c_rating**).
  - Outside the loop, print **content_ratings** to check whether the counting worked as expected.


In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

# put your code here

### 1.8 Finding the Unique Values

Previously, we created the dictionary **{'4+': 0, '9+': 0, '12+': 0, '17+': 0}** before we looped over the data set to count the occurrence of each content rating. Unfortunately, this approach requires us to know beforehand the unique values we want to count.

Let's say we didn't know what the unique content ratings are. This means that we don't have enough information to create the dictionary **{'4+': 0, '9+': 0, '12+': 0, '17+': 0}**. We need to devise a way to extract this information.

Our data set has 7,197 rows, and it's impractical to go over each row and figure out what the unique content ratings are. As a workaround, we can modify the logic of the code we used in the previous screen to find the unique values automatically.

Let's consider again the count we did for the list **['4+', '4+', '4+', '9+', '9+', '12+', '17+']**. To perform the count while finding the unique values automatically, we will:

- Create an empty dictionary named **content_ratings**.
- Loop through the list **['4+', '4+', '4+', '9+', '9+', '12+', '17+']**, and check for every iteration whether the iteration variable (**c_rating**) exists as a key in **content_ratings**.
  - If it exists, then increment the dictionary value at that key by 1.
  - Else (if it doesn't exist), create a new key-value pair in the **content_ratings** dictionary, where the dictionary key is the iteration variable (**c_rating**) and the dictionary value is 1.
  
<left><img width="500" src="https://drive.google.com/uc?export=view&id=1zPUxDMjDXCviHaWE1T9rpPMay-sFpKjv" /></left>


**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Count the number of times each unique content rating occurs in the data set while finding the unique values automatically.
  - Create an empty dictionary named **content_ratings**.
  - Loop through the **apps_data** list of lists (make sure you don't include the header row). For each iteration of the loop:
    - Assign the content rating value to a variable named **c_rating**. The content rating is at index number 10.
    - Check whether **c_rating** exists as a key in **content_ratings**.
      - If it exists, then increment the dictionary value at that key by 1 (the key is equivalent to the value stored in **c_rating**).
      - Else, create a new key-value pair in the dictionary, where the dictionary key is **c_rating** and the dictionary value is 1.
  - Outside the loop, print **content_ratings** to check whether the counting worked as expected.
  

In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

# put your code here

### 1.9 Looping over dictionaries

To transform frequencies to proportions or percentages, we can update the dictionary values individually by performing the required arithmetical operations.

Updating each individual dictionary value can get more and more cumbersome as the dictionary length increases. For a dictionary with 20 key-value pairs, we'd have to manually update 20 dictionary values. Fortunately, we can speed up the process using a for loop.

Additionally, we'll often need to keep the dictionaries separate for later analysis. For instance, we might want to have three separate dictionaries: one storing frequencies, another storing proportions, and another storing percentages.

When we transform frequencies to proportions, we can create a new dictionary instead of overwriting the values in the initial dictionary. To do that, we can create a new empty dictionary and populate it within the loop:

<left><img width="600" src="https://drive.google.com/uc?export=view&id=1a0cUvxnG9V-lzcwI7K6BtPP7vTDB7oMh" /></left>


**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>



- Transform the frequencies inside **content_ratings** to proportions and percentages while creating separate dictionaries for each.
  - Assign the dictionary storing proportions to a variable named **c_ratings_proportions**.
  - Assign the dictionary storing percentages to a variable named **c_ratings_percentages**.
- Optional challenge: try to solve this exercise using a single for loop

In [0]:
content_ratings = {'4+': 4433, '12+': 1155, '9+': 987, '17+': 622}
total_number_of_apps = 7197

# put your code here

## 2 Functions: Fundamentals

### 2.1 Functions

A function is composed of a **header** (which contains the def statement), a **body**, and a **return** statement. Together, these three elements make up the **function's definition**. We'll often use the phrase *"inside the function's definition"* to refer to the function's body.


<left><img width="400" src="https://drive.google.com/uc?export=view&id=1CxFQDRrruzcf4rHZ696xIqK6i2x8P3Sf" /></left>


Notice we indented the body and the return statement four spaces to the right — recall that we did the same for the bodies of for loops and if statements. Technically, we only need to indent at least one space character to the right, but the convention in the Python community is to use four space characters instead. This helps with readability — other people who follow this convention will be able to read your code easier, and you'll be able to read their code easier.

In [0]:
def square(a_number):
    squared_number = a_number * a_number
    return squared_number

square(2)

4

### 2.2 Extract Values From Any Column

Now that we've learned more about functions and how to create them, let's get back to our initial goal: **creating a function that generates frequency tables** for any column we want in our iOS apps data set.

Remember our data set is structured as a list of lists.

In [0]:
from csv import reader
apps_data = list(reader(open('AppleStore.csv')))
print(apps_data[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


To generate a frequency table for a certain column, we could:

- Extract the values of the column in a separate list.
- Generate a frequency table for the elements of that list.

One thing we can try is to create a separate function for each of these two tasks:

- A function that extracts the values for any column we want in a separate list; and
- A function that generates a frequency table for a list


Using the first function, we can extract the values for any column we want in a separate list. Then, we can pass the resulting list as an argument to the second function, which will output a frequency table for that list.

To extract the values from any column we want from our **apps_data** data set, we need to:

- Create an empty list.
- Loop through the **apps_data** data set (excluding the header row), and for each iteration:
  - Store the value from the column we want in a variable.
  - Append that value to the empty list we created outside the for loop.

Below, we see how to extract the values for the **cont_ratings** column:

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1N4Zgz1U0eUGGj8N5t9QF7_wXOYh7pZUq" /></left>


Now let's create a function that extracts the values from any column we want. We'll work again with the iOS apps data set.

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Write a function named **extract()** that can extract any column you want from the **apps_data** data set.
  - The function should take in the **index** number of a column as input (name the parameter as you want).
  - Inside the function's definition:
    - Create an empty list.
    - Loop through the **apps_data** data set (excluding the header) and extract only the value you want by using the parameter (which is expected to be an index number).
    - Append that value to the empty list.
  - Return the list containing the values of the column.
- Use the **extract()** function to extract the values in the **prime_genre** column and store them in a variable named **genres**. The index number of this column is 11.


In [0]:
from csv import reader
apps_data = list(reader(open('AppleStore.csv')))

# put your code here

### 2.3 Creating Frequency Tables

In the previous exercise, we created the **extract()** function, which we can use to extract the values for any column we want from our **apps_data** data set. Remember that we want to create two functions:

- A function that extracts the values for any column we want in a separate list (we already created this function — it's the **extract()** function); and
- A function that generates a frequency table for a list.

In the following exercise, we'll create the second function. Remember that to create a frequency table for the elements of a list, we need to:

  - Create an empty dictionary.
  - Loop through that list and check for each iteration whether the iteration variable exists as a key in the dictionary created.
    - If it exists, then increment by 1 the dictionary value at that key.
    - Else (if it doesn't exist), create a new key-value pair in the dictionary, where the dictionary key is the iteration variable, and the dictionary value is 1.
    
 <left><img width="500" src="https://drive.google.com/uc?export=view&id=1zPUxDMjDXCviHaWE1T9rpPMay-sFpKjv" /></left>
 
 
 
**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Write a function named **freq_table()** that generates a frequency table for any list.
  - The function should take in a list as input.
  - Inside the function's body, write code that generates a frequency table for that list and stores the table in a dictionary.
  - Return the frequency table as a dictionary.
- Use the **freq_table()** function on the genres list (already defined from the previous exercise) to generate the frequency table for the **prime_genre** column. Store the frequency table to a variable named **genres_ft**.
- Feel free to experiment with the **extract()** and **freq_table()** functions to easily create frequency tables for any column you want.




In [0]:
from csv import reader
apps_data = list(reader(open('AppleStore.csv')))

def extract(index):
    column = []    
    for row in apps_data[1:]:
        value = row[index]
        column.append(value)    
    return column

genres = extract(11)

# put your code here

### 2.4 Returning Multiple Variables

Python allows us to build functions that return more than just one variable.

<left><img width="300" src="https://drive.google.com/uc?export=view&id=1C399Qz4V8TG1ubzlhLox6LC60TgEg2vS" /></left>

Above, we passed 15 and 5 as arguments to the **sum_and_difference()** function. The function returned (20, 10), where 20 is the sum, and 10 is the difference. The order of the returned values matches the order of the variables in the return statement.


One thing you might have found a bit odd is the structure of the output (20, 10). (20, 10) is a **tuple**, which is a data type that is very similar to a list (recall that examples of data types include integers, strings, lists, dictionaries, etc.).

Just as a list, a tuple is usually used for storing multiple values. Creating a tuple is similar to creating a list, with the exception that we need to use parentheses instead of brackets.

<left><img width="200" src="https://drive.google.com/uc?export=view&id=1-Jq8LbWbEmrZU8LCaswk2Xs8ZDarMX61" /></left>


Just as lists, tuples support positive and negative indexing.

<left><img width="200" src="https://drive.google.com/uc?export=view&id=15jTa9FCAGpQ3e2EbVbi5mUI2lEsKXU3S" /></left>


The main difference between tuples and lists boils down to whether we can modify the existing values or not. In the case of tuples, we can't modify the existing values, while in the case of lists, we can. Below, we're trying to modify the first value of a list and a tuple.

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1TOzy38_rZu700He8PTKgoNHaqnFV58_F" /></left>

Tuples are called immutable data types because we can't change their state after they've been created. Conversely, lists are mutable data types because their state can be changed after they've been created. The only way we could modify tuples, and immutable data types in general, is by recreating them. This is a list of all the mutable and immutable data types we've learned so far.

<left><img width="200" src="https://drive.google.com/uc?export=view&id=15w1EWfYCVix4VCjmwcKJpURYkwC-k7sv" /></left>



**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Edit the **open_dataset()** function (already written in the cell below) such that:
  - If the data set has a header, the function returns separately both the header and the rest of the data set.
  - Else (if there's no header), the function returns the entire data set.
- Use the updated **open_dataset()** function to open the **AppleStore.csv** file, which has a header row.
  - Assign the result to a variable named **all_data**.
  - Use tuple indexing to extract the header and the rest of the data set from the **all_data** tuple.
    - Assign the header to a variable named **header**.
    - Assign the rest of the data set to a variable named **apps_data**.

In [0]:
def open_dataset(file_name='AppleStore.csv', header=True):        
    opened_file = open(file_name)
    from csv import reader
    read_file = reader(opened_file)
    data = list(read_file)
    
    if header:
        return data[1:]
    else:
        return data

### 2.5 More About Tuples


When we create a **tuple**, surrounding the values with parentheses is optional. It's enough to write the individual values and separate each with a comma. Below, we see two ways of creating a tuple (on the right, we're not using parentheses):

<left><img width="300" src="https://drive.google.com/uc?export=view&id=1ez3YUItTjvXiuAtAMNzDRfYzkqZMP_0i" /></left>


With this in mind, remember the syntax we used in the return statement to return multiple values:

<left><img width="300" src="https://drive.google.com/uc?export=view&id=113U8vEj6VeVyZeUch798rIHlSQuAnNmK" /></left>

When we use return **a_sum**, **difference**, Python thinks we want the **tuple a_sum, difference** returned. This is why multiple variables are returned as tuples. If we wanted to return a **list** instead of a **tuple**, we need to use brackets:

<left><img width="300" src="https://drive.google.com/uc?export=view&id=1mmSE9Q_-xhMEjQbg2NOsd6_sArFVQ8eV" /></left>


When we work with tuples, we can assign their individual elements to separate variables in a single line of code.

<left><img width="500" src="https://drive.google.com/uc?export=view&id=1EZs0gXkf2h1XuWtUjMJnM-R1-LHMlIeS" /></left>


We can do the same with lists — we can assign individual list elements to separate variables in a single line of code:

<left><img width="500" src="https://drive.google.com/uc?export=view&id=1ns9zpBTWmSetZAa_lR-W6LhEzI0K4Vhs" /></left>


We can use this variable assignment technique with functions that return multiple variables.


<left><img width="300" src="https://drive.google.com/uc?export=view&id=1ZXlnL9pzUPWC26_moEHUnCEMmu0o7fVk" /></left>

Now let's get a bit of practice with this variable assignment technique.



**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Use the **open_dataset()** function to open the **AppleStore.csv** file, which has a header row.
  - Do the variable assignment step in a single line of code.
    - Assign the header to a variable named **header**.
    - Assign the rest of the data set to a variable named **apps_data**.

In [0]:
def open_dataset(file_name='AppleStore.csv', header=True):        
    from csv import reader
    data = list(reader(open(file_name)))
    
    if header:
        return data[1:], data[0]
    else:
        return data

### 2.6 Functions — Code Running Quirks

So far, we've been using parameters and return statements for all of our functions. Note, however, that parameters and return statements are optional:


<left><img width="200" src="https://drive.google.com/uc?export=view&id=1cKeeNGbHNHLyKD8xaeOBCI7Fbun3JKye" /></left>

Functions without a return statement don't return any value. However, strictly speaking, they return a **None** value, which practically represents the absence of a value. The None value is an instance of the **NoneType** data type (just like 5.321 is an instance of the float data type).

<left><img width="200" src="https://drive.google.com/uc?export=view&id=17I8mLGFwmrOcV20DKabCijmaRTEwjE0h" /></left>

In the function above, notice also that we assigned 3.14 to a variable named **x**. Although we clearly defined **x**, it turns out that we can't access **x** outside the function definition — Python raises a **NameError** and says that **x** is not defined.


<left><img width="200" src="https://drive.google.com/uc?export=view&id=1_yAVF8jUOwFsi2fIjwwwjBH3tuqDxUc7" /></left>


**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Rewrite the **print_constant()** function above.
- Call the **print_constant()** function to make sure **x = 3.14** gets executed.
- Print the variable x using the **print()** function.
  - What do you notice about the output?
  - This may be totally unexpected, and we'll explain why this happens in the next subsection.

In [0]:
# put your code here

### 2.7 Scopes — Global and Local

You might have found the error we got in the previous exercise completely unexpected. After all, we called the **print_constant()** function, which means that **x = 3.14** must have been executed. So why did we still get an error telling us that x is undefined?


In [0]:
def print_constant():
  x = 3.14
  print(x)
  
print_constant()
x

3.14


NameError: ignored

When **print_constant()** is called, **x = 3.14** is indeed executed, but the quirk is that Python only saves the **x** variable *temporarily*. Python saves **x** into a kind of **temporary memory**, which is immediately erased after the **print_constant()** finishes running.

This explains why **x** is still undefined even after **print_constant()** is called — the temporary memory associated with **print_constant()** is immediately erased after the function finishes running, being freed up for later use.

This kind of temporary memory storage doesn't also apply to the code that is being run outside function definitions. If we define **x = 3.14** in our **main program** (outside function definitions), we can use **x** later on without having to worry that it was erased from memory.

<left><img width="200" src="https://drive.google.com/uc?export=view&id=1-JrqTpgV-wAPlD1h9zHWmNcOYHeSrmwU" /></left>

The temporary memory associated with a function is isolated from the memory associated with the main program. The consequence of this is that we can initialize a variable **x = 10** in the main program, and then execute **x = 3.14** in the body of a function without overwriting the **x** variable of the main program.

<left><img width="200" src="https://drive.google.com/uc?export=view&id=1q_C8qIkTKZ6_vDsk4B1hSsUQ9O9WQGRf" /></left>

This memory isolation is useful because we don't have to worry about overwriting variables from the main program when we write functions, or vice-versa. This is especially helpful when we write large programs, and it becomes difficult to remember all the variable names we used.

This memory isolation also means that some variables can be accessed only from certain parts of a program. We've already seen in one of the examples above that we couldn't access **x** from the main program because it was only defined in the function definition, which is memory-isolated from the main program.

<left><img width="500" src="https://drive.google.com/uc?export=view&id=1jLszjnbBXf9DqiDHnhCN0LAhMIpUqoVg" /></left>


The part of a program where a variable can be accessed is often called **scope**. The variables defined in the **main program** are said to be in the **global scope**, while the variables defined inside a function are in the **local scope**.

Let's get a bit of practice to understand scopes better before resuming the discussion in the next subsection. For the exercise below, we've already defined three variables in the cell below: **e**, **a_sum**, and **length**. 


**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Create a function named **exponential()** that takes in a single parameter named **x**.
  - Inside the function definition:
    - Assign 2.72 to a variable named **e**.
    - Print **e**.
  - The function should return **e** to the power of **x**.
  - Call the **exponential()** function with an argument of 5. Assign the result to a variable named **result**.
  - Hypothesize what you should see if you printed **e** in the main program after calling the **exponential()** function. Print **e** to confirm or reject your hypothesis.
- Create a new function named **divide()** which doesn't take in any parameter, and then call the function.
  - Inside the function definition:
    - Print the **a_sum** variable.
    - Print the **length** variable.
  - The function should return the result of the division between **a_sum** and **length**.
  - Call the **divide()** function, and try to assign the result to a variable named **result_2**. Before running the code, hypothesize whether we'll get an error or not.

In [0]:
e = 'mathematical constant'
a_sum = 1000
length = 50

# put your code here