# Dictionaries


The dataset

In this section, we'll work with Los Angeles weather data from 2014. The dataset is a list of string elements that represent weather patterns:

```python
["Sunny"
 "Sunny"
 "Sunny",
 ...,
 "Fog"]
 ```
 
The list contains **365** elements. The first element is for the type of weather that occurred on January 1st, and the last element represents the type of weather that occurred on December 31st.

We've loaded the list into the **weather** variable.

In [1]:
# open the dataset
f = open("la_weather.csv", 'r')

# convert to string
rows = f.read().split('\n')
full_data = []

# split and get only the second column
for row in rows:
    full_data.append(row.split(",")[1])

# skip the header
weather = full_data[1:]

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


1. Assign the first element of **weather** to **first_element** and display it using the **print()** function.
2. Assign the last element of **weather** to **last_element** and display it using the **print()** function.

In [7]:
first_element = weather[0]
last_element = weather[-1]
print(first_element)
print(last_element)

Sunny
Fog


## Dictionaries' structure

Let's say we have a set of students, along with their scores from a recent math test:

<img width="200" src="https://drive.google.com/uc?export=view&id=1HN8kr5NVSfBqKDKVbY3tnd5Dajq-UzkI">

To store the students' names and their scores, we could use two lists:

```python
students = ["Tom", "Jim", "Sue", "Ann"]
scores = [70, 80, 85, 75]
```

To figure out what **score Sue** got on the test, we'd first have to write a loop to find the index corresponding to the element **Sue** in the **students** list. We'd then have to find the value for that index in **scores**. Here's how we could do this:

```python
indexes = [0,1,2,3]
name = "Sue"
score = 0
for i in indexes:
    if students[i] == name:
        score = scores[i]
print(score)
```

This is a complex piece of code for a simple task; we just want to find the value associated with a name.

To accomplish this in an easier way, we can use a **dictionary**. **A dictionary is like a list in that it has indexes, but the indexes aren't necessarily sequential numbers**. We can create our own indexes with values of any data type, including strings.

While we initiate a new list with square brackets (**\[** ), we create a new dictionary with curly braces (**{**). We can make an empty dictionary like this:

```python
scores = {}
```

To add values to an existing dictionary, we specify the index to the left of the equals sign, and the value it should have on the right side. We use square brackets (**\[**) to specify the index.

```python
scores["Tom"] = 70
```

Taken together, we call the index and value **key/value pairs**. In this mission, however, we'll refer to the dictionary values on the right side of the equals sign as elements, just like the elements in a list.

The code above will create the index **Tom** in the scores dictionary, and associate the element **70** with it. To look up the test score for **Tom**, we would simply write:

```python
scores["Tom"]
```

This would return the element **70** because we associated it with the index **Tom** in the dictionary **scores**. We use square brackets (**\[**) to add values to dictionaries or look up values.

We can add the rest of the students' **scores** in the same way:

```python
scores["Jim"] = 80
scores["Sue"] = 85
scores["Ann"] = 75
```


## Practice Populating a Dictionary

Recall that to create a **dictionary**, we first define it with curly braces (**\{**), then add values for specific indexes. We call the values elements, and refer to the indexes as **dictionary keys**:

```python
students = {}
students["Jerry"] = 60
```

In the example above, we create an empty dictionary called **students**, then specify that the dictionary **key Jerry** should have the value **60**. To find the value (that's now an element) associated with the dictionary key **Jerry**, we'd look up **Jerry** in the dictionary students:

```python
print(students["Jerry"])
```

The code above would display 60.

A dictionary key can be a string, integer, or float:

```python
students[10] = 100
```

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Assign the value **1** to the key **Aquaman** in a new dictionary named **superhero_ranks**.
2. Assign the value **2** to the key **Superman** in **superhero_ranks**.

In [3]:
# put your code here
superhero_ranks = {}
superhero_ranks['Aquaman'] = 1
superhero_ranks['Superman'] = 2

superhero_ranks

{'Aquaman': 1, 'Superman': 2}

## Practice Indexing a Dictionary

We can look up values in dictionaries by using square brackets. When we pass in a dictionary key, we retrieve the value associated with that key:

```python
students["Tom"]
```

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Look up **FDR** in **president_ranks** and assign the result to a new variable **fdr_rank**.
2. Look up **Lincoln** in **president_ranks** and assign the result to a new variable **lincoln_rank**.
3. Look up **Aquaman** in **president_ranks** and assign the result to a new variable **aquaman_rank**.

In [5]:
president_ranks = {}
president_ranks["FDR"] = 1
president_ranks["Lincoln"] = 2
president_ranks["Aquaman"] = 3

# put your code here
fdr_rank = president_ranks["FDR"]
lincoln_rank = president_ranks["Lincoln"]
aquaman_rank = president_ranks["Aquaman"]

## Defining a Dictionary with Values

So far, we've created a dictionary and added elements to it in multiple steps:

```python
students = {}
students["Tom"] = 60
students["Jim"] = 70
```

This approach is cumbersome when we want to add multiple dictionary keys. Fortunately, we can create a dictionary and add elements to it in a single step:

```python
students = {
    "Tom": 60,
    "Jim": 70
}
```

In the example above, we create a dictionary, then specify that the key **Tom** should have the value **60**, and the key **Jim** should have the value **70**.

We do this by entering the dictionary key, then a colon (** : **), then the value. We separate each key/value pair with a comma. If we wanted to, we could add more students like this:

```python
students = {
    "Tom": 60,
    "Jim": 70,
    "Sue": 85,
    "Ann": 80
}
```

We can use this technique to specify as many key/value pairs as we'd like.


**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


1. Create a dictionary named **animals** with the following keys and values:
  - The key 7 corresponding to the value raven.
  - The key 8 corresponding to the value goose.
  - The key 9 corresponding to the value duck.
  
2. Create a dictionary named **times** with the following keys and values:

  - The key morning corresponding to the value 9.
  - The key afternoon corresponding to the value 14.
  - The key evening corresponding to the value 19.
  - The key night corresponding to the value 23.

In [None]:
random_values = {"key1": 10, "key2": "indubitably", "key3": "dataquest", 3: 5.6}
print(random_values)

# put your code here

{'key1': 10, 'key2': 'indubitably', 'key3': 'dataquest', 3: 5.6}


## Modifying Dictionary Values

We can modify the elements in a dictionary, just like we can with a list:

```python
students = {
    "Tom": 60,
    "Jim": 70
}
```

For example, we can replace the element we've associated with a key:

```python
students["Tom"] = 65
```

The code above would change the element for the key **Tom** to **65.**

We can also modify an existing element:

```python
students["Tom"] = students["Tom"] + 5
```

The code above would add 5 to the value for the key **Tom**.

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Add the key **Ann** and value 85 to the dictionary **students**.
2. Replace the value for the key **Tom** with 80.
3. Add 5 to the value for the key **Jim**.

In [6]:
students = {
    "Tom": 60,
    "Jim": 70
}

students["Ann"] = 85
students["Tom"] = 80
students["Jim"] = students["Jim"] + 5
students

{'Ann': 85, 'Jim': 75, 'Tom': 80}

## The In Statement and Dictionaries

In the last lesson, we used the **in statement** to check whether an element occurred in a list:

>```python
animals = ["Cat", "Dog"]
found = "Cat" in animals
```

We can also use the **in statement** to check whether a **key** occurs in a dictionary:

>```python
students = {
    "Tom": 60,
    "Jim": 70
}
```

**"Tom" in students** would return **True**, and **"Sue" in students** would return **False**.

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Check whether **jupiter** is a key in **planet_numbers**, and assign the resulting Boolean value to **jupiter_found**.
2. Check whether **earth** is a key in **planet_numbers**, and assign the resulting Boolean value to **earth_found**.

In [9]:
planet_numbers = {"mercury": 1, "venus": 2, "earth": 3, "mars": 4}

jupiter_found = "jupiter" in planet_numbers
print(jupiter_found)
earth_found = "earth" in planet_numbers
print(earth_found)

False
True


## The Else Statement

We learned about the **if statement** in a previous lesson. The **if statement** runs a segment of code if a condition is True:

```python
if temperature > 50:
    print("It's hot!")
```

In the code above, the **if statement** checks whether the variable **temperature** is greater than 50, and prints out **It's hot!** if it is.

We can also print a different message if the **temperature** is less than or equal to 50:

```python
if temperature > 50:
    print("It's hot!")
if temperature <= 50:
    print("It's cold!")
```

If **temperature** is greater than 50, we'll see **It's hot!**, and if it's less than or equal to 50, we'll see **It's cold!**. In other words, we print one statement when temperature > 50 equals **True**, and another statement when temperature > 50 equals **False**.

Performing different actions depending on whether a condition is true or false is a common scenario in programming. The else statement offers a simpler way to do this:

```python
if temperature > 50:
    print("It's hot!")
else:
    print("It's cold!")
```

The code above is much simpler than the previous example, but results in the same outcome. **If temperature > 50** is **True**, then it executes the code in the if statement block. **If temperature > 50** is **False**, then it executes the code in the else statement block.

When using if/else statements, only one of the blocks will execute. That means the code above will only print **It's hot!** or **It's cold!** - never both.


**Else statements** allow us to simplify our code. Here's an example:

```python
scores = [80, 100, 60, 30]
high_scores = []
low_scores = []
for score in scores:
    if score > 70:
        high_scores.append(score)
    else:
        low_scores.append(score)
```

The code above will add a score to **high_scores** if score is greater than 70, and to **low_scores** otherwise.


**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Append any names in **planet_names** that are longer than 5 characters to **long_names**. Otherwise, append the names to **short_names**. To accomplish this:

  - Loop through each item in **planet_names**.
  - Use the **len()** function to find the length of the item.
  - If the length is greater than 5, append the item to **long_names**.
  - Otherwise, append it to **short_names**.
2. When complete, **short_names** should contain any planet names less than 6 characters long, and **long_names** should contain any planet names 6 characters or longer.

In [12]:
planet_names = ["Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Neptune", "Uranus"]
short_names = []
long_names = []

for i in planet_names:
    if len(i) > 5:
        long_names.append(i)
    else:
        short_names.append(i)
long_names

['Mercury', 'Jupiter', 'Saturn', 'Neptune', 'Uranus']

## Couting on Dictionaries

We now have all the pieces we need to count how many times each element occurs in a dictionary. Let's practice our skills in the following exercise.

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Count the number of times that each element occurs in the list named **pantry** that appears in the code block below. You'll need to:

  - Create an empty dictionary named **pantry_counts.**
  - Loop through each item in **pantry**.
  - If the item appears in **pantry_counts**, add 1 to the value in **pantry_counts** for the item's key.
    - Otherwise, add the item to **pantry_counts** as a key, with the value **1**.
2. When finished, each item in **pantry** will have its own key in **pantry_counts**, and its value will be the number of times the item appears in **pantry**.

In [13]:
pantry = ["apple", "orange", "grape", "apple", "orange", "apple", "tomato", "potato", "grape"]

pantry_counts = {}
for i in pantry:
    if i in pantry_counts:
        pantry_counts[i] += 1
    else:
        pantry_counts[i] = 1
pantry_counts

{'apple': 3, 'grape': 2, 'orange': 2, 'potato': 1, 'tomato': 1}

## Counting the Weather

Now that we have some practice counting with dictionaries, it's time to count how often each type of weather occurs in the weather list.

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Count how many times each **type of weather** occurs in the **weather** list, and store the results in a new dictionary called **weather_counts**.
2. When finished, **weather_counts** should contain a key for each different **type of weather** in the **weather** list, along with its associated frequency. Here's a preview of how the result should format the **weather_counts** dictionary (note that you'll be using real values, rather than the dummy ones below):

```python
{
    'Fog': 0,
    'Fog-Rain': 0,
    ....
}
```





In [15]:
weather_counts = {}

for i in weather:
    if i in weather_counts:
        weather_counts[i] += 1
    else:
        weather_counts[i] = 1
weather_counts

{'Fog': 125, 'Fog-Rain': 4, 'Rain': 25, 'Sunny': 210, 'Thunderstorm': 1}

# Introduction to Functions

## Overview 

In this section, we will work with a data set consisting of high-grossing movies, according to the [Internet Movie Database(IMDb)](http://www.imdb.com/). IMDb is an online extensive database for films, television programs and video games. Our end goal is to create a dictionary that stores useful statistics from this data set, named **movie_metadata**. In order to do this, we will:

- Clean data to make the information useful to us more easily accessible
- Practice using dictionaries in more complex functions
- Learn how to write our own functions!


We'll be working with **movie_metadata.csv**, which is a CSV file. You may recall that CSV stands for "Comma-Separated Values", meaning that the values in the data set are separated by commas. Similar to previous missions where we worked with CSV files, we need to represent the CSV file in a structure that Python is familiar with, which would allow us to manipulate it. This time, we ask you to do all the necessary parsing.

Below is a table showing the first 5 movies in the list, but you may realize that there are 6 rows in the diagram. The first element of **movie_metadata** is not a movie, but instead is a list of the attributes that the other elements have. This is called a header row, and we will deal with it later in the section.

| movie_title | director_name | color | duration | actor_1_name | language | country | title_year |
|----------------------------------------------|-------------------|-------|----------|-----------------|----------|---------|------------|
| Avatar | James Cameron | Color | 178 | CCH Pounder | English | USA | 2009 |
| Pirates of the Caribbean: At the World's End | Gore Verbinski | Color | 169 | Johnny Depp | English | USA | 2007 |
| Spectre | Sam Mendes | Color | 148 | Christoph Waltz | English | UK | 2015 |
| The Dark Knight Rises | Christopher Nolan | Color | 164 | Tom Hardy | English | USA | 2012 |
| Star Wars VII: The Force Awakens | JJ Abrams | Color | 136 | Harrison Ford | English | USA | 2015 |

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Read **movie_metadata.csv** into a list of lists and assign to **movie_data**.
2. Open and read the file **movie_metadata.csv** into a **string variable**.
3. Split the data into rows on the newline character ("\n").
4. Create an empty list, **movie_data**.
5. Loop through each row, and split each row into a list on the comma character (","), and append it to **movie_data**.
6. Display the first 5 lists in **movie_data** using the **print()** function.

In [18]:
movie_data = open("movie_metadata.csv", 'r')
rows = movie_data.read().split('\n')

movie_list = []
for i in rows:
    movie_list.extend(i.split(","))
movie_list[:5]

['movie_title', 'director_name', 'color', 'duration', 'actor_1_name']

## Motivating Functions

You may realize that parsing this file was not different than parsing any other file from the previous missions. You may also realize that we rarely rewrite the same code twice; instead, we use a method that allows us to reuse code, **functions** such as **print()**, **type()**, **open()**, etc. Now, we formally define functions.

As we have seen before, a function is a packaged body of code that we can reuse by **calling** with the relevant parameters. The parameters that a function takes are called the **inputs** of the function, and the result that it returns is called the **output**. For example, we called the **open()** function with two inputs (the strings **"movie_metadata.csv"** and **"r"**), and received the output which is a **wrapper** for the file **movie_metadata.csv.** All functions follow the same road map as **open()**: They take in **input(s)**, execute the code that they surround, and return an **output**.

Because functions are **reusable**, we can package all the parsing we just did into one function. Then, we can call the function whenever we need to parse a file instead of having to rewrite the necessary code every time. We have been making extensive use of functions, so this should not be unfamiliar. If we had a **parser()** function, the code for the last instructions would be as concise as this:

```python
>>> movie_data = parser(movie_metadata)
>>> print(movie_data[0:5])
[['movie_title', 'director_name', 'color', 'duration', 'actor_1_name', 'language', 'country', 'title_year'], ['Avatar', 'James Cameron', 'Color', '178', 'CCH Pounder', 'English', 'USA', '2009'], ["Pirates of the Caribbean: At World's End", 'Gore Verbinski', 'Color', '169', 'Johnny Depp', 'English', 'USA', '2007'], ['Spectre', 'Sam Mendes', 'Color', '148', 'Christoph Waltz', 'English', 'UK', '2015'], ['The Dark Knight Rises', 'Christopher Nolan', 'Color', '164', 'Tom Hardy', 'English', 'USA', '2012']]
```

Other than reusability, there are **3 main advantages of using functions**:

- They allow us to use other people's code without the necessity to have a deep understanding of how it was written (e.g., we use the **print()** function without reading the code inside it). We call this information hiding.

- They **break down complex logic** into smaller components or modules. Instead of writing very lengthy and complicated code, we can progress function by function. For example, if we were writing a larger piece of code, **parser()** as a function would be easier to manage rather than the code that executes the same behavior. This would make testing easier as well. We refer to this as **modularity**, which is especially important when working on teams. **Modularity makes it easier for someone else to read, understand, use, and build upon our code.**
- They streamline our code and make it easier **to maintain**. Programmers reuse the same functions in multiple situations across a project. This means that they generalize the function as much as possible to maximize its usefulness. we call this **process abstraction**, which is an important part of reducing our code's complexity, especially for larger projects.


Knowing the usefulness of functions, let's see how we can write our own functions.


Up until now, we used **built-in** functions: **functions that Python has defined for us**. However, our toolbox is not limited to this; we can write our own functions. The syntax for defining a function consists of 5 parts:

- **def keyword** - For Python to interpret the following code as a function
- **Name** - To refer to when we need to call the function later
- **Arguments** - Input value(s) that the function takes in
- **Body** - The code that the function executes
- **Return value** - The value that the function returns to the user when the function terminates

Let us examine the syntax further, using an example function that returns the first element of a list:

```python
def first_elt(input_lst):
    first = input_lst[0]
    return first
```

We start the function definition with the keyword **def**. We give the function a name that explains its use, in this case: **first_elt()**. Then, we name the single argument that the function takes. Here, we name the argument **input_lst**, suggesting to the user that the function must take in a list as input. In the next line, we define the body of the function, which consists of only one line in our case. This is the actual code that will be executed when the function is called. Finally, we use the keyword **"return"** to signify the end of the function, and type the variable that we want returned to the user, in this case, **first**.

Here, we should take note of the **indentation** of the function. Realize that after the colon, we indent the remainder of the function by one **tab**, which is the equivalent of **4 space bar strokes**. This is to clarify to Python what part of the code belongs to the function. Therefore, indentation makes functional differences in Python's interpretation of the code. You may notice that if you format the first line of the function correctly, so that Python recognizes that the next line will be the beginning of the function body, it indents the next line by 1 tab automatically. To make this behavior clear, you can see the diagram below and try for yourself. Notice that we do not need to press tab or space bar; Python automatically puts the indentation when we press enter.

<img width="400" src="https://drive.google.com/uc?export=view&id=1AV7PYW5SrQUPClQwskn_81dqemm9r8aN">

One other thing to note is that **first** and **input_lst** are temporary variables, which means that they are only accessible inside the function. If you attempt to use first somewhere else in the code outside the function, you will get an error saying that first is undefined. For example, observe the example below:


<img width="300" src="https://drive.google.com/uc?export=view&id=19QaXF1ldAkF1rGIF2zzMZnAz40V89aql">


This error occurs because first does not have a defined value outside of the **first_elt()** function. Then, in order to access the return value of this function, we set it equal to a variable, so the variable gets the function's return value. Here is the fix:

<img width="300" src="https://drive.google.com/uc?export=view&id=15No0sywSnKO9lHlel7NHMcMsk3ESKYVz">


**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


1. Write a function, with a definition, name, argument(s), body and return value, that returns a list containing the names of the movies in **movie_data**. This function is expected to behave similar to **first_elt()**, but for multiple lists.

  - Give the function a name that describes what it does; **first_elts()** is a good example, but feel free to be creative.
  - Declare an empty list.
  - Use a for loop to extract the first element of each list, and append these elements to the empty list.
  - Return the list.
2. Assign the returned list to **movie_names.**
3. Display the first 5 elements of **movie_names** using the **print()** function.

In [None]:
def name

## Functions with Multiple Return Paths

Even though we suggested **return** signifies the end of a function, a function can have multiple return statements. We can take advantage of this to add an **if statement** that returns a value if a certain criteria is met, and another value otherwise. For example, let's take a look at a function that checks whether or not the first element of a list is "blah":

```python
def is_blah(input_lst):
    if input_lst[0] == "blah":
        return True
    else:
        return False
```

Here, the indentation gives us the necessary intuition to decipher how the function works. If the list's first element is the string "blah", then the first return will execute, and the function will end there. However, if the first element is not "blah", then the function will continue without executing **return True**, because it doesn't enter that path. Instead, it will continue to the **else** path, and end by executing **return False.**

Notice that there is a further layer of indentation after the **if** and **else** statements. Python can do this indentation automatically as well. Here is an example of auto indentation with a for loop:


<img width="400" src="https://drive.google.com/uc?export=view&id=12nK4Uhl6n4yYmyQgzIFdPUaAuFN_fG8f">

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


1. Write a function named **is_usa()** that checks whether or not a movie was made in the United States.
  - Check the **movie_metadata.csv** file to see which column corresponds to the nationality of the movie. Don't forget to subtract one to find the true index of the column in the list.
  - Use an if statement to check the right column of the list with the word "USA". The equality operation is case sensitive, so make sure to get the capitilization right.
  - Return True if the condition is met, and False otherwise.
2. Try it with a few movies in **movie_data**.
3. Call it on **wonder_woman** and store the result in **wonder_woman_usa**.


In [None]:
wonder_woman = ['Wonder Woman','Patty Jenkins','Color',141,'Gal Gadot','English','USA',2017]

def is_usa()

## Functions with Multiple Arguments

This function works, but its use is quite narrow. If we wanted to check if the first value of the 7th column is "UK" instead, we would have to write a completely separate function:

```python
def is_uk(input_lst):
    if input_lst[6] == "UK":
        return True
    else:
        return False
```

However, you can see that this function is almost the same with **is_usa()**, except for the string they check for. This can give us the intuition that there is another layer of abstraction we can perform. If we could write a function that takes in two inputs, namely, the list and the string to check for, we could eliminate the inefficiency of writing the same code twice. Fortunately, we can exactly do that:

```python
def equals_str(input_lst,input_str):
    if input_lst[0] == input_str:
        return True
    else:
        return False
```

Now, **is_usa(input_lst)** behaves the same way as **equals_str(input_lst,"USA")**, and **is_uk(input_lst)** behaves the same way as **equals_str(input_lst,"UK")**.

Because there is more than one argument in this function, the order with which we call the arguments becomes important. For example, **equals_str(movie_data[4], "UK")** would be correct; however, **equals_str("UK",movie_data[4])** would not, because the function expects to get the list first and the string second. 

If we want to override this, we have to use **named arguments** instead of the **default, positional arguments**. If we explicitly write the names of the arguments as we provide them, their positions become unimportant. This means that **equals_str(input_str="UK",input_lst=movie_data[4])** does not result in an error. **Naming arguments** does not add any functionality, **but it may embellish the readability of the code**, which is important if you are working on a team.

Finally, we can abstract out another layer by adding a third argument that will determine which column of the list the checked attribute is.

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Write a function **index_equals_str()** that takes in three arguments: **a list**, **an index** and **a string**, and checks whether that index of the list is equal to that string.

2. Call the function with a different order of the inputs, using named arguments.

3. Call the function on **wonder_woman** to check whether or not it is a movie in color, store it in **wonder_woman_in_color**, and **print()** the value.

In [None]:
wonder_woman = ['Wonder Woman','Patty Jenkins','Color',141,'Gal Gadot','English','USA',2017]

def is_usa(input_lst):
    if input_lst[6] == "USA":
        return True
    else:
        return False

# put your code here

## Optional Arguments

In the beginning of the section, we observed that the first row of **movie_metadata** is not an element of the data itself, but is a list of the attributes that defines that data. Although this is useful, there is no need for this row to be a part of our **movie_data**, and can actually cause misinformation. Let's say we want to count the number of movies in the list. Our intuition might be to do the following:

```python
def naive_counter(input_lst):
    num_elt = 0
    for each in input_lst:
        num_elt = num_elt + 1
    return num_elt
```

However, if we attempt to call this function with **movie_data** as its argument, we get a wrong answer:

```python
>>> print(naive_counter(movie_data))
4933
```

This is because the first item in the list is also counted by the counter. Of course, we can get around this by subtracting one from the result, but manipulating the function that way would cause it to be unusable in cases where there is no header row. This is not generalizable, so it is not a neat solution. Instead, we can use an argument that has a default value that can be manipulated, we call this an **optional argument**. Optional arguments have default values that they take on unless a different value is provided by the user.

In this case, the default value for an argument that determines whether or not there is a header row would be **False**, because most datasets do not have header rows. However, when we encounter a dataset like this one, we can call the counter by explicitly telling it that there is a header row. Let's modify the function to have this behavior by adding an optional parameter:

```python
def counter(input_lst,header_row = False):
    num_elt = 0
    if header_row == True:
        input_lst = input_lst[1:len(input_lst)]
    for each in input_lst:
        num_elt = num_elt + 1
    return num_elt
```

Now, the function will behave as we expected:

```python
>>> print(counter(movie_data))
4933
>>> print(counter(movie_data, True))
4932
```

If we are concerned about the readability of our code by co-workers, we can name the optional argument as well:

```python
>>> print(counter(movie_data, header_row = True))
4932
```

This way, if there are multiple optional arguments, and you want to provide a latter optional argument, then you can name the arguments and guarantee that Python understands which input corresponds to which argument.


**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>
mg width="60" src="https://drive.google.com/uc?export=view&id=1QoTRiOtUzjnbRL7Ue5uPxKse03tE1tPe">


1. Write a function named **feature_counter()** that combines the logic of the **index_equals_str()** and **counter()** functions.
2. Use this to find out how many of the movies were made in USA, and store the value in **num_of_us_movies**.





In [None]:
# put your code here

## Calling a Function inside another Function

Now, we have all the tools we need to create the statistics summary function we explained in the beginning of this section. However, we would like **summary_statistics()** to be a function itself, and re-writing all of the code inside **feature_counter()** in **summary_statistics()** defies the purpose of using a function. You may remember that one of the big advantages of using a function is **abstraction**: the fact that it saves us from having to write the same code twice. In this vein, the last feature of functions that we will use is the ability to call a function inside another function.

The body of one function can include a call to another function. We have already seen this, because comparison operators such as the equality operator (** == ** ) and arithmetic operations such as sum  (**+** ) and minus (**-**) are functions as well. Similarly, we can call built-in or user-created functions by making their return values equal to a variable in the outer function, so that we can use that variable in the function.

Let's say we want to build a function **list_counter()** that will count the elements in multiple lists, and make a separate list holding these values. This is how we want the function to operate:

```python
>>> lists = [["dog","cat","rabbit"],[1,2,3,4],[True]]
>>> list_count = (list_counter(lists))
>>> print(list_count)
[3,4,1]
```

Even though this seems like a complicated problem, because we have a counter function, it will not take more than 6 lines:

```python
def list_counter(input_lst):
    final_list = []
    for each in input_lst:
        num_elt = counter(each)
        final_list.append(num_elt)
    return final_list
```

As you can see, we called the user-defined function **counter()** and assigned its return value to **num_elt**. Each time the for loop starts, the counter will be called with a different argument (the current value assigned to **each**), and return a different value. Whenever we define a new function, we can call it inside another function using this syntax.

Similarly, we can make use of **feature_counter()** when we are building our final function. Now, you are ready to build **summary_statistics()!**


**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


1. Write a **summary_statistics()** function that will take **movie_data** as input, and output a dictionary that will give useful numbers from the data.
  - Define **summary_statistics()** with one argument, an input list.
  - Use the **feature_counter()** with the relevant arguments to count the following properties and make them equal to the corresponding variables.
     - Assign the number of movies made in **Japan** to **num_japan_films.**
    - Assign the number of movies in **color** to **num_color_films.**
    - Assign the number of movies in **English** to **num_films_in_english.**
  - Create a dictionary that associates the keys (**japan_films,color_films,films_in_english**) with the corresponding variables.
  - Return the dictionary.
2. Call the function with **movie_data** as its input, and store its value in **summary**.




In [None]:
def feature_counter(input_lst,index, input_str, header_row = False):
    num_elt = 0
    if header_row == True:
        input_lst = input_lst[1:len(input_lst)]
    for each in input_lst:
        if each[index] == input_str:
            num_elt = num_elt + 1
    return num_elt
  
# put your code here

# Debugging Errors


## Types of Errors

As we begin to take greater advantage of functions to organize our code, it can become more complex. We need to better understand the kinds of mistakes we can make when writing it. We've talked briefly about errors, or mistakes that prevent our code from working as we expect, in previous missions. Now it's time to learn more about them.

The two main types of errors are:

- Syntax errors
- Runtime errors

Before code can be run, it must be parsed by the Python interpreter and organized into a data structure that represents the flow and complexity of the code we wrote. If the interpreter encounters any code that doesn't adhere to Python's language rules, it halts the parsing and returns an error. Rich coding environments like [Atom](https://atom.io/) and [Jupyter Notebook ](https://jupyter.org/) help us prevent these types of errors through syntax highlighting - a feature that displays different parts of our code (such as brackets) in different colors. Syntax highlighting makes it easier to read our code and spot errors. Some examples of syntax errors include:

- Missing ending quotes or starting quotes
- Using improper indentation
- Using improper keywords

Runtime errors only occur when the code is actually running, which makes them harder to catch and prevent beforehand. Some examples of runtime errors include:

- Calling a function before it's defined
- Calling a method or attribute that the object doesn't contain
- Attempting to convert a value to an incompatible data type

Let's explore some more specific examples of these types of errors.



## Sintax Errors

Here's a simple example of a syntax error:

```python
# Missing ending quotes.
the_answer = "42
```

When we run the above line of code, we'll get back an error message describing the mistake we made, as well as the Python interpreter's best guess as to where it occurred:



In [None]:
the_answer = "42

SyntaxError: ignored



Python uses the **SyntaxError** class to represent syntax errors, and displays the error message after the colon. The interpreter may sometimes struggle to pinpoint the problematic code that caused the error. In the following code block, for example, we attempt to define the **find()** function, but misspell the **def keyword** as **de**:



In [None]:
# `def` keyword misspelled as `de`.
de find():
    print("42")

SyntaxError: ignored


The **error message** suggests that the mistake was in the function name, rather than the **def keyword**.


Sometimes the Python interpreter will return an **IndentationError** instead of a **SyntaxError.** This object represents a more specific syntax error that makes it easier to debug our code. We'll see an **IndentationError** when the indentation in our code is inconsistent. Here's an example:

```python
def find():
    print("42")
     print("what, really?")
```

Notice that the second print statement is indented differently than the first one (with one extra space). It also doesn't follow the indentation rules for code blocks such as if statements and for loops. It will return the following error:



In [None]:
def find():
    print("42")
     print("what, really?")

IndentationError: ignored



Let's practice debugging and fixing syntax errors in the **first_elts()** function from the previous section.

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. The function **first_elts()** contains multiple syntax errors. Scan and edit the code to resolve these errors.

In [None]:
def first_elts (input_lst):
elts = []
    for each in input_lst:
        elts.append(each0)
    retern elts

animals = [["dog","cat","rabbit"],["turtle","snake"],["sloth","penguin","bird"]]
first_animal = first_elts(animals)
print(first_animal)


IndentationError: ignored

## Runtime Errors

**Runtime errors** are very common. While code often works in predictable situations, runtime errors occur when it fails at handling a case the programmer didn't account for.

Because runtime errors occur when our code is running and can't be detected during parsing, they're more difficult to prevent than syntax errors. As you become more proficient in programming, however, you'll learn to identify potential runtime errors beforehand and prevent them from occurring. Python and most other programming languages include tools like error handling and automated tests that help you manage and reduce runtime errors. As your code becomes more complex, you'll learn how to incorporate errors into the functions you write to so that they fail gracefully, and prevent certain negative behavior from occurring.

[The documentation for Python](https://docs.python.org/3/library/exceptions.html#IndentationError) includes a full list of possible errors. If you glance at the hierarchy of errors [here](https://docs.python.org/3/library/exceptions.html#exception-hierarchy), the **SyntaxError** class is a small fraction of the entire tree of possibilities; the rest are runtime errors. Let's look at some examples of runtime errors.

## TypeError and ValueError

In the following code, we try to concatenate a string and an integer. This returns a **TypeError** because the Python interpreter expects values being added to be the same type.

In [None]:
forty_two = 42
forty_two + "42"

TypeError: ignored

You may have noticed that runtime errors look a little different. For example, the error appears twice (**TypeError** is in the top left and bottom left corners). In addition, the text **Traceback** (most recent call last) appears at the top right corner. The traceback displays all the code that was executed, **ending with the most recent call that actually caused the error**. While the code in this example is very simple, we'll explore some errors where the traceback is more useful later in this mission.

Another common runtime error is the **ValueError**, which is generated when the type is correct but the value is still improper. A **ValueError** is returned when we try to convert a string representing a non-numeric value into a numeric type, such as a float. Recall that we use the **float()** function to cast, or convert, a value to a float:



In [None]:
float("guardians")

ValueError: ignored

While trying to cast a string to a float isn't automatically an issue (which is why there was no TypeError), the specific value that we tried to cast was problematic. The **float()** function didn't know how to cast **"guardians"** into a float, and returned an instance of **ValueError** instead.



## IndexError and AttributeError

The **IndexError** is a common error that's returned when we try to access an element that's not in a list's index. Trying to access the fifth element in a list containing only two elements would return this error, for example. Here's what this looks like:



In [None]:
lives = [1,2,3]
lives[4]

IndexError: ignored

Since there's no value at index four, an **IndexError** is returned, along with an arrow pointing to the problematic line of code. If we're working with a list and don't know its length, use the **len()**  function to look up the number of elements before attempting to access them.

The final runtime error we'll explore is the **AttributeError.** This occurs when we try to call a method or attribute on an object that doesn't contain it. In the following code, we try to call the **split()** method on the **File handler** instance, instead of using the **read()** method to read it into a string first:



In [None]:
f = open("story.txt")
f.split(" ")

AttributeError: ignored

**TextIOWrapper** is a built-in Python object that represents the File handler. It does not contain the **split()** method. Since the Python interpreter couldn't find the **split()** method within the **TextIOWrapper** class, it returned an instance of **AttributeError.**


**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Edit the default code and fix the errors:

- Access the first element in **lives** (instead of the fifth) and assign it to a new variable **first_life.**
- Use the **read()** method to read the file **story.txt** into a string named story.
- Use the **split()** method to split the story variable into strings seperated by spaces and assign the result to **split_story**.
- Display **first_life** and **story**.

In [None]:
# Default code containing errors
lives = [1,2,3]
lives[4]

f = open("story.txt")
f.split(" ")

IndexError: ignored

## Traceback

When calling a function that uses other functions, our function calls become nested. This can make it harder to debug, since the code that triggered the error is usually inside the function, and different than the code we called. In the following example, we try to call **summary_statistics()**, but it seems that one of the functions that it is referring to is problematic.


<img width="900" src="https://drive.google.com/uc?export=view&id=1e545kmxFlT0iTLUAvAB4VXTTGmOCSUxW">


The traceback shows the series of function calls that occurred. The topmost function call is the highest level of code we wrote, and oftentimes the section we need to fix. The last function call is where the error actually occurred.


**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Edit the default code and remove the error.

In [None]:
def summary_statistics(input_lst):
    num_japan_films = feature_counter(input_lst,6,"Japan",True)
    num_color_films = feature_counter(input_lst,2,"Color",True)
    num_films_in_english = feature_counter(input_lst,5,"English",True)
    summary_dict = {"japan_films" : num_japan_films, "color_films" : num_color_films, "films_in_english" : num_films_in_english}
    return summary_dict

def feature_counter(input_lst,index, input_str, header_row = False):
    num_elt = 0
    if header_row == Treu:
        input_lst = input_lst[1:len(input_lst)]
    for each in input_lst:
        if each[index] == input_str:
            num_elt = num_elt + 1
    return num_elt

summary = summary_statistics(movie_data)
print(summary)

# Challenge - Birth Dates In The United States



## Introduction to the dataset

The raw data behind the story Some People Are Too Superstitious To Have A Baby On Friday The 13th, which you can read [here](https://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/).

We'll be working with the data set from the Centers for Disease Control and Prevention's National National Center for Health Statistics (**"births.csv"**). The data set has the following structure:

- **year** - Year (1994 to 2003)
- **month** - Month (1 to 12)
- **date_of_month** - Day number of the month (1 to 31)
- **day_of_week** - Day of week, where 1 is Monday and 7 is Sunday
- **births** - Number of births


**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Read the CSV file **"births.csv"** into a string.
2. Split the string on the newline character (** "\n" **).
3. Display the first 10 values in the resulting list.

## Converting Data Into A List Of Lists

While a list of strings helps us get a general picture of the dataset, we need to convert it to a more structured format to be able to analyze it. Specifically, we need to convert the dataset into a list of lists where each nested list contains integer values (not strings). We also need to remove the header row.

Here's what we want the data to look like:

```python
[ 
  [1994, 1, 1, 6, 8096],
  [1994, 1, 2, 7, 7772],
  [1994, 1, 3, 1, 10142],
  [1994, 1, 4, 2, 11248],
  [1994, 1, 5, 3, 11053],
...
]
```

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Create a function named **read_csv()** that:
  - Takes a single, required argument, a string representing the file name of the CSV file.
  - Reads the file into a string, splits the string on the newline character (** "\n"** ), and removes the header row. Assign this list to **string_list** and create an empty list named **final_list**.
  - Uses a **for loop** to:
    - Iterate over **string_list**,
    - Create an empty list named **int_fields**,
    - Splits each row on the comma delimiter (** , ** ) and assigns the resulting list to **string_fields**,
    - Converts each value in **string_fields** to an integer and appends to **int_fields**,
    - Appends **int_fields** to **final_list**.
  - Returns **final_list**.
2. Use the **read_csv()** function to read in the file **"births.csv"** and assign the result to **cdc_list**.
3. Display the first 10 rows of **cdc_list** to confirm it's a list of lists, containing only integer values, and no header row.

## Calculating Number Of Births Each Month

Now that the data is in a more usable format, we can start to analyze it. Let's calculate the total number of births that occured in each month, across all of the years in the dataset. We'll create a dictionary where each key is a unique month and each value is the number of births that happened in that month, across all years:

```python
{  
   1: 3232517,
   2: 3018140,
   3: 3322069,
   4: 3185314,
   5: 3350907,
   6: 3296530,
   7: 3498783,
   8: 3525858,
   9: 3439698,
   10: 3378814,
   11: 3171647,
   12: 3301860
}
```

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


1. Create a function named **month_births()** that:
  - Takes a single, required argument, a list of lists.
  - Creates an empty dictionary, **births_per_month**, to store the monthly totals.
  - Uses a for loop to:
    - Iterate over the list of lists,
    - Extract the value in the month and births columns,
    - If the month value already exists as a key in **births_per_month**, the births value is added to the existing value,
    - If the month value doesn't exist as a key in **births_per_month**, it's created and the associated value is the births value.
  - After the loop, return the **births_per_month dictionary**.
2. Use the **month_births()** function to calculate the monthly totals for the dataset and assign the result to **cdc_month_births**. Display the dictionary.

## Calculating Number Of Births Each Day Of Week

Let's now create a function that calculates the total number of births for each unique day of the week. Here's what we want the dictionary to look like:

```python
{
  1: 5789166,
  2: 6446196,
  3: 6322855,
  4: 6288429,
  5: 6233657,
  6: 4562111,
  7: 4079723
}
```

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


1. Create a function named **dow_births()** that takes a single, required argument (a list of lists) and returns a dictionary containing the total number of births for each unique value of the **day_of_week** column.
2. Use the **dow_births()** function to return the **day-of-week** totals for the dataset and assign the result to **cdc_day_births**. Display the dictionary.

## Creating A More General Function

You may have noticed that there was a lot of similarity between the two functions you just wrote. While we can also create separate functions to calculate the totals for the **year** and **date_of_month** columns, it's better to create a single function that works for any column and specify the column we want as a parameter each time we call the function.

**Exercise**

<left><img align="left" width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


1. Create a function named **calc_counts()** that:
  - Takes two, required parameters:
    - **data**: a list of lists
    - **column**: the column number we want to calculate the totals for
  - Populates and returns a dictionary containing the total number of births for each unique value in the column at position column.
2. Use the **calc_counts()** function to:
  - Return the yearly totals for the dataset and assign the result to **cdc_year_births**.
  - Return the monthly totals for the dataset and assign the result to **cdc_month_births**.
  - Return the day-of-month totals for the dataset and assign the result to **cdc_dom_births**.
  - Return the day-of-week totals for the dataset and assign the result to **cdc_dow_births**.

## Next Steps

That's it for the challenge. Here are some suggestions for next steps:

1. Write a function that can calculate the **min** and **max** values for any dictionary that's passed in.
2. Write a function that extracts the same values across years and calculates the differences between consecutive values to show if number of births is increasing or decreasing.
  - For example, how did the number of births on Saturday change each year between 1994 and 2003?
3. Find a way to combine the CDC data with the SSA data, which you can find [here](https://github.com/fivethirtyeight/data/tree/master/births). Specifically, brainstorm ways to deal with the overlapping time periods in the datasets.