# Debugging Python code


Go through the exercises below.

- Each exercise contains some code with **one or more mistakes**.
- The mistakes can either prompt an error or not.
- There might be multiple ways to fix the mistakes.
- Improving the code readability is also encouraged.

In [1]:
# data creation
beatles = ["John Lennon", "Paul McCartney", "George Harrison", "Ringo Starr"]

numbers = [1, 2, 3, 4, 5]

capitals = {"Germany": "Berlin",
            "Russia": "Moscow",
            "France": "Paris",
            "China": "Beijing",
            "Egypt": "Cairo",
            "Brazil": "Sao Paulo"
            }

top_profitable_films = {
    "Film": ["Avengers: Endgame", "Avatar", "Titanic", "Star Wars: The Force Awakens", "Jurassic World",
             "The Lion King", "The Avengers", "Frozen II", "Frozen", "Beauty and the Beast"],
    "Year": ["2019", "2007", "1997", "2015", "2015", "2019", "2012", "2019", "2013", "2017"],
    "Worldwide Gross (in billions)": ["2.798", "2.789", "2.194", "2.073", "1.673", "1.656", "1.519",
                                      "1.450", "1.276", "1.263"]
    }

**Logical error**: the capital of Brazil is acually Brasilia, not Sao Paulo!

In [4]:
capitals

{'Germany': 'Berlin',
 'Russia': 'Moscow',
 'France': 'Paris',
 'China': 'Beijing',
 'Egypt': 'Cairo',
 'Brazil': 'Brasilia'}

In [3]:
capitals["Brazil"] = "Brasilia"

## Exercise 1:

In [5]:
for c in Capitals.keys():
  print(f"{c} is the capital of {Capitals[c]}.")

NameError: name 'Capitals' is not defined

**Solution:**
1. The `NameError` with the message "NameError: name 'Capitals' is not defined" occurs because variable `capitals` that we actually defined is not capitalized. `Capitals` does not exist.

In [6]:
for c in capitals.keys():
  print(f"{c} is the capital of {capitals[c]}.")

Germany is the capital of Berlin.
Russia is the capital of Moscow.
France is the capital of Paris.
China is the capital of Beijing.
Egypt is the capital of Cairo.
Brazil is the capital of Brasilia.


2. We have to switch the country and the capital for the sentences to make sense.

In [7]:
for c in capitals.keys():
  print(f"{capitals[c]} is the capital of {c}.")

Berlin is the capital of Germany.
Moscow is the capital of Russia.
Paris is the capital of France.
Beijing is the capital of China.
Cairo is the capital of Egypt.
Brasilia is the capital of Brazil.


3. Optionally, we can give a more meaningful name to the iterator variable so that the code is easier to understand.

In [8]:
for country in capitals.keys():
  print(f"{capitals[country]} is the capital of {country}.")

Berlin is the capital of Germany.
Moscow is the capital of Russia.
Paris is the capital of France.
Beijing is the capital of China.
Cairo is the capital of Egypt.
Brasilia is the capital of Brazil.


## Exercise 2:
Let's imagine we want to show our love for Ringo Starr and print a love statement for him as many times as numbers are in the `numbers` list. For all Beatles who are not Ringo, we want to print as many times a hate statement. The output should look like this:

```
I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!
```



In [9]:
for beatle in beatles:
  if beatle = "Ringo Starr":
    for n in numbers:
      print(f"I love {beatle}!")
  if beatle != "Ringo Starr":
    print(f"I hate {beatle}!")
      print("\n")

SyntaxError: invalid syntax. Maybe you meant '==' or ':=' instead of '='? (<ipython-input-9-5581e587c023>, line 2)

**Solution:**

1. A single `=` sign is the assignment operator. For the logical operator "equals to" we need to use the double equal sign `==`:

In [10]:
for beatle in beatles:
  if beatle == "Ringo Starr":
    for n in numbers:
      print(f"I love {beatle}!")
  if beatle != "Ringo Starr":
    print(f"I hate {beatle}!")
      print("\n")

IndentationError: unexpected indent (<ipython-input-10-8ec2b6eca587>, line 7)

2. The `print("\n")` statement needs to be properly indented.

In [11]:
for beatle in beatles:
  if beatle == "Ringo Starr":
    for n in numbers:
      print(f"I love {beatle}!")
  if beatle != "Ringo Starr":
    for n in numbers:
      print(f"I hate {beatle}!")
  print("\n")

I hate John Lennon!
I hate John Lennon!
I hate John Lennon!
I hate John Lennon!
I hate John Lennon!


I hate Paul McCartney!
I hate Paul McCartney!
I hate Paul McCartney!
I hate Paul McCartney!
I hate Paul McCartney!


I hate George Harrison!
I hate George Harrison!
I hate George Harrison!
I hate George Harrison!
I hate George Harrison!


I love Ringo Starr!
I love Ringo Starr!
I love Ringo Starr!
I love Ringo Starr!
I love Ringo Starr!




3. We want to iterate through `numbers` first, and then through `beatles`.

In [12]:
for n in numbers:
  for beatle in beatles:
    if beatle == "Ringo Starr":
        print(f"I love {beatle}!")
    if beatle != "Ringo Starr":
      print(f"I hate {beatle}!")
  print("\n")

I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!




4. Optionally, we can replace `if beatle != "Ringo Starr":` with `else`. It will make our code simpler and more elegant:

In [13]:
for n in numbers:
  for beatle in beatles:
    if beatle == "Ringo Starr":
        print(f"I love {beatle}!")
    else:
      print(f"I hate {beatle}!")
  print("\n")

I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!


I hate John Lennon!
I hate Paul McCartney!
I hate George Harrison!
I love Ringo Starr!




## Exercise 3:

In [14]:
top_profitable_films = pd.DataFrame(top_profitable_films)
top_profitable_films.head

NameError: name 'pd' is not defined

1. We have not imported pandas yet, hence the NameError for `pd`.



In [2]:
import pandas as pd
top_profitable_films = pd.DataFrame(top_profitable_films)
top_profitable_films.head

<bound method NDFrame.head of                            Film  Year Worldwide Gross (in billions)
0             Avengers: Endgame  2019                         2.798
1                        Avatar  2007                         2.789
2                       Titanic  1997                         2.194
3  Star Wars: The Force Awakens  2015                         2.073
4                Jurassic World  2015                         1.673
5                 The Lion King  2019                         1.656
6                  The Avengers  2012                         1.519
7                     Frozen II  2019                         1.450
8                        Frozen  2013                         1.276
9          Beauty and the Beast  2017                         1.263>

2. It is not a good practice to overwrite the variable of the original dataset when creating a new version of it. We want to preserve the dictionary `top_profitable_films`.

In [2]:
import pandas as pd
top_films_df = pd.DataFrame(top_profitable_films)
top_films_df.head

<bound method NDFrame.head of                            Film  Year Worldwide Gross (in billions)
0             Avengers: Endgame  2019                         2.798
1                        Avatar  2007                         2.789
2                       Titanic  1997                         2.194
3  Star Wars: The Force Awakens  2015                         2.073
4                Jurassic World  2015                         1.673
5                 The Lion King  2019                         1.656
6                  The Avengers  2012                         1.519
7                     Frozen II  2019                         1.450
8                        Frozen  2013                         1.276
9          Beauty and the Beast  2017                         1.263>

3. `head()` is a method, and therefore needs the parentheses.


In [17]:
import pandas as pd
top_films_df = pd.DataFrame(top_profitable_films)
top_films_df.head()

Unnamed: 0,Film,Year,Worldwide Gross (in billions)
0,Avengers: Endgame,2019,2.798
1,Avatar,2007,2.789
2,Titanic,1997,2.194
3,Star Wars: The Force Awakens,2015,2.073
4,Jurassic World,2015,1.673


## Exercise 4:

In [6]:
top_films_df

Unnamed: 0,Film,Year,Worldwide Gross (in billions)
0,Avengers: Endgame,2019,2.798
1,Avatar,2009,2.789
2,Titanic,1997,2.194
3,Star Wars: The Force Awakens,2015,2.073
4,Jurassic World,2015,1.673
5,The Lion King,2019,1.656
6,The Avengers,2012,1.519
7,Frozen II,2019,1.45
8,Frozen,2013,1.276
9,Beauty and the Beast,2017,1.263


In [3]:
top_films_df[top_films_df["Film"]=="Avatar"]["Year"] = "2009"

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  top_films_df[top_films_df["Film"]=="Avatar"]["Year"] = "2009"


**Solution:**

In general, when selecting data from a DataFrame, and ALWAYS when overwriting data from it, use `.loc[]` instead of simple `[]`.

When fixing the infamous `A value is trying to be set on a copy of a slice from a DataFrame.` error, it's better to have a fresh start, so create the dataframe again.

In [11]:
top_profitable_films

Unnamed: 0,Film,Year,Worldwide Gross (in billions)
0,Avengers: Endgame,2019,2.798
1,Avatar,2009,2.789
2,Titanic,1997,2.194
3,Star Wars: The Force Awakens,2015,2.073
4,Jurassic World,2015,1.673
5,The Lion King,2019,1.656
6,The Avengers,2012,1.519
7,Frozen II,2019,1.45
8,Frozen,2013,1.276
9,Beauty and the Beast,2017,1.263


In [5]:
#top_profitable_films = pd.DataFrame(top_profitable_films)

top_films_df.loc[top_films_df["Film"]=="Avatar", "Year"] = "2009"

In [7]:
top_films_df.head(2)

Unnamed: 0,Film,Year,Worldwide Gross (in billions)
0,Avengers: Endgame,2019,2.798
1,Avatar,2009,2.789


## Exercise 5:

We want to get the average gross profit of all films:

In [21]:
top_films_df["Worldwide Gross (in billions)"].avg()

AttributeError: 'Series' object has no attribute 'avg'

**Solution:**

1. The `AttributeError: 'Series' object has no attribute 'avg'` tells us that the method we have used does not exist for a Pandas column (which is a Series). A quick google shows us that the method we need is `mean()`:

In [25]:
top_films_df["Worldwide Gross (in billions)"]

0    2.798
1    2.789
2    2.194
3    2.073
4    1.673
5    1.656
6    1.519
7    1.450
8    1.276
9    1.263
Name: Worldwide Gross (in billions), dtype: float64

In [26]:
top_films_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Film                           10 non-null     object 
 1   Year                           10 non-null     object 
 2   Worldwide Gross (in billions)  10 non-null     float64
dtypes: float64(1), object(2)
memory usage: 368.0+ bytes


In [22]:
top_films_df["Worldwide Gross (in billions)"].mean()

TypeError: Could not convert 2.7982.7892.1942.0731.6731.6561.5191.4501.2761.263 to numeric

2. The `TypeError` and the message `Could not convert 2.7982.7892.1942.0731.6731.6561.5191.4501.2761.263 to numeric` we understand that these numbers don't have a numeric data type, which does not allow Pandas to compute their mean. Let's change it:

In [None]:
top_films_df["Worldwide Gross (in billions)"].astype(float).mean().round(3)

In [23]:
top_films_df.loc[:, "Worldwide Gross (in billions)"] = pd.to_numeric(top_films_df.loc[:, "Worldwide Gross (in billions)"])
top_films_df["Worldwide Gross (in billions)"].mean()

  top_films_df.loc[:, "Worldwide Gross (in billions)"] = pd.to_numeric(top_films_df.loc[:, "Worldwide Gross (in billions)"])


1.8691000000000002