# Solutions to Lesson 2 Exercises

For each exercise, the solutions below show one possible way of solving it, but you might have used a different approach, and that's great! There is almost always more than one way to solve any particular problem in Python.

## Initial Setup

Since this notebook is in the `solutions` sub-folder, use the  magic command `%cd` to go up one folder to the main project folder to keep the file paths the same as in the lessons:

In [1]:
%cd ..

C:\Users\jenfl\Projects\eoas-python


Import `pandas` library:

In [2]:
import pandas

# Exercise 2

**a)** Read the file `data/gapminder_world_data_2018.csv` into a new DataFrame `world` and display the first 10 rows.

In [3]:
world = pandas.read_csv('data/gapminder_world_data_2018.csv')
world.head()

Unnamed: 0,country,population,region,sub_region,income_group,life_expectancy,gdp_per_capita,children_per_woman,child_mortality,pop_density
0,Afghanistan,36400000,Asia,Southern Asia,Low,58.7,1870,4.33,65.9,55.7
1,Albania,2930000,Europe,Southern Europe,Upper middle,78.0,12400,1.71,12.9,107.0
2,Algeria,42000000,Africa,Northern Africa,Upper middle,77.9,13700,2.64,23.1,17.6
3,Angola,30800000,Africa,Sub-Saharan Africa,Lower middle,65.2,5850,5.55,81.6,24.7
4,Antigua and Barbuda,103000,Americas,Latin America and the Caribbean,High,77.6,21000,2.03,7.89,234.0


**b)** How many rows and columns does `world` have?

In [4]:
world.shape

(178, 10)

178 rows and 10 columns

**c)** Display the names and data types of each column.

In [5]:
world.dtypes

country                object
population              int64
region                 object
sub_region             object
income_group           object
life_expectancy       float64
gdp_per_capita          int64
children_per_woman    float64
child_mortality       float64
pop_density           float64
dtype: object

**d)** Display summary statistics with the `describe` method. What are the lowest and highest populations? How about lowest/highest population densities? Any guesses which countries these might be? (We'll find out the answer in Lesson 4!)

In [6]:
world.describe()

Unnamed: 0,population,life_expectancy,gdp_per_capita,children_per_woman,child_mortality,pop_density
count,178.0,178.0,178.0,178.0,177.0,178.0
mean,42660720.0,72.653371,17892.808989,2.746573,29.767401,201.118202
std,151144900.0,7.339901,19330.555705,1.294496,29.613831,660.593969
min,95200.0,51.1,629.0,1.23,1.95,2.01
25%,3207500.0,67.1,3572.5,1.75,7.41,30.85
50%,9910000.0,74.05,12200.0,2.265,16.7,81.0
75%,31700000.0,78.175,25350.0,3.6475,49.7,151.0
max,1420000000.0,84.2,121000.0,7.13,126.0,8270.0


- The lowest and highest populations are 95200 and 1.42 billion, respectively.
- The lowest and highest population densities are 2.01 and 8270 people per km$^2$, respectively.

### Bonus exercises

**e) Data wrangling - dealing with header rows**

The file `data/raw/weather_YVR_1938.csv` contains the daily weather data for 1938, in the original format downloaded from Environment Canada. Open this file in the JupyterLab CSV viewer to see what it looks like.

> Note that the CSV viewer isn't able to parse the data correctly because of the extra header rows at the beginning.

- Now try reading the file into your notebook with `pandas.read_csv` and see what happens.

If you look at the documentation for `pandas.read_csv`, you'll see a `skiprows` input buried amongst a few dozen other inputs for this function. This input tells `read_csv` how many rows to skip at the beginning of the file. 
- Try reading `data/raw/weather_YVR_1938.csv` again, but this time using a value of `24` for the `skiprows` keyword argument, and display the first 5 rows of the resulting DataFrame.

In [7]:
weather_1938 = pandas.read_csv('data/raw/weather_YVR_1938.csv', skiprows=24)
weather_1938.head()

Unnamed: 0,Date/Time,Year,Month,Day,Data Quality,Max Temp (°C),Max Temp Flag,Min Temp (°C),Min Temp Flag,Mean Temp (°C),...,Total Snow (cm),Total Snow Flag,Total Precip (mm),Total Precip Flag,Snow on Grnd (cm),Snow on Grnd Flag,Dir of Max Gust (10s deg),Dir of Max Gust Flag,Spd of Max Gust (km/h),Spd of Max Gust Flag
0,1938-01-01,1938,1,1,,9.4,,-0.6,,4.4,...,,M,0.3,,,,,,,
1,1938-01-02,1938,1,2,,7.2,,1.7,,4.5,...,,M,0.5,,,,,,,
2,1938-01-03,1938,1,3,,7.2,,-3.9,,1.7,...,0.0,,0.0,,,,,,,
3,1938-01-04,1938,1,4,,7.2,,-2.8,,2.2,...,0.0,,0.0,,,,,,,
4,1938-01-05,1938,1,5,,7.2,,-2.8,,2.2,...,0.0,,0.0,,,,,,,


**f) Importing a library from a `.py` file**

In the workshop folder, you'll see a file called `ecweather.py`. It is a Python *module*, which is a library contained in a single `.py` file (as opposed to a package, which is multiple `.py` files bundled together).

- Double-click on this file in the Files Sidebar of JupyterLab to view it in the text editor. You'll see that it contains two functions: `welcome` and `download_daily_data`.

You can import a library from a local `.py` file with the same syntax as any other library. The library name is just the file name minus the `.py` extension, so to import this library the syntax is:
```python
import ecweather
```

- Import `ecweather` into your notebook, and call the function `ecweather.welcome()` to test it. If everything worked ok, it should print a welcome message.

> We'll use the other function `download_daily_data` in a later bonus exercise!

In [8]:
import ecweather

In [9]:
ecweather.welcome()

Hello! Thank you for importing the ecweather library!


**g) Library nicknames** 

When you import a library you can give it a nickname. For example, the `pandas` library is commonly imported with the syntax:
```python
import pandas as pd
```
This gives a nickname `pd` to the library, which is used instead of the full name to save a bit of typing (we scientists tend to be very lazy when it comes to coding). So instead of `pandas.read_csv`, we would type `pd.read_csv`, and so on for any other functions or other items in the library.

- Import the `numpy` library and give it a nickname `np`.
- Use the `mean` function from this library to compute the mean of the list `[5, 6, 7, 8, 9, 10]`.

In [10]:
import numpy as np

In [11]:
np.mean([5, 6, 7, 8, 9, 10])

7.5