# Project - Play with DataFrames

## Goal of Project
- Master pandas DataFrame

### Step 1: Import pandas
- Execute the cell below (SHIFT + ENTER)

In [2]:
import pandas as pd 

### Step 2: Read the data
- Use ```pd.read_csv()``` to read the file `files/population.csv`
- NOTE: Remember to assign the result to a variable (e.g., ```data```)

In [35]:
df = pd.read_csv("files/population.csv")

In [36]:
df


Unnamed: 0,Country,Year,Population
0,Denmark,2000,5.3
1,Denmark,2010,5.5
2,Denmark,2020,5.8
3,Sweden,2000,8.8
4,Sweden,2010,9.3
5,Sweden,2020,10.2


### Step 3: Investigate the data types
- Use ```.dtypes``` 

In [8]:
df.dtypes

Country               object
Year          datetime64[ns]
Population           float64
dtype: object

### Step 4: Convert Year to Datetime
- ```pd.to_datetime(...)```: Convert to a datetime
- ```format='%Y'```: Format of input, here it is the year.

In [13]:
df['Year'] = pd.to_datetime(df['Year'],format = '%Y')

In [12]:
df

Unnamed: 0,Country,Year,Population
0,Denmark,2000-01-01,5.3
1,Denmark,2010-01-01,5.5
2,Denmark,2020-01-01,5.8
3,Sweden,2000-01-01,8.8
4,Sweden,2010-01-01,9.3
5,Sweden,2020-01-01,10.2


In [14]:
df.dtypes

Country               object
Year          datetime64[ns]
Population           float64
dtype: object

### Step 5: Scale Population to millions
- HINT: ```data['Population']*1000``` scales by 1000

In [37]:
df['Population'] = df['Population']*100000

In [38]:
df

Unnamed: 0,Country,Year,Population
0,Denmark,2000,530000.0
1,Denmark,2010,550000.0
2,Denmark,2020,580000.0
3,Sweden,2000,880000.0
4,Sweden,2010,930000.0
5,Sweden,2020,1020000.0


### Step 6: Calculate mean population for each country
- HINT: ```data.groupby('Country')``` groups the data

In [31]:
df.groupby('Country').mean()

Unnamed: 0_level_0,Population
Country,Unnamed: 1_level_1
Denmark,55333330000.0
Sweden,94333330000.0


In [32]:
df.groupby('Country').mean()

Unnamed: 0_level_0,Population
Country,Unnamed: 1_level_1
Denmark,55333330000.0
Sweden,94333330000.0


In [33]:
df

Unnamed: 0_level_0,Population
Country,Unnamed: 1_level_1
Denmark,55333330000.0
Sweden,94333330000.0


### Step 7: Replace Denmark to DNK
- Given a column you can access the string functions on it with ```.str```
    - This enables you to apply string functions on it
    - HINT: ```data['Country'].str.replace('Denmark', 'DNK')```

In [39]:
df

Unnamed: 0,Country,Year,Population
0,Denmark,2000,530000.0
1,Denmark,2010,550000.0
2,Denmark,2020,580000.0
3,Sweden,2000,880000.0
4,Sweden,2010,930000.0
5,Sweden,2020,1020000.0


In [42]:
df["Country"]=df["Country"].str.replace("Denmark","DNK")

In [43]:
df

Unnamed: 0,Country,Year,Population
0,DNK,2000,530000.0
1,DNK,2010,550000.0
2,DNK,2020,580000.0
3,Sweden,2000,880000.0
4,Sweden,2010,930000.0
5,Sweden,2020,1020000.0


In [44]:
df['Country'] = df['Country'].str.replace("Sweden",'SWD')

In [45]:
df

Unnamed: 0,Country,Year,Population
0,DNK,2000,530000.0
1,DNK,2010,550000.0
2,DNK,2020,580000.0
3,SWD,2000,880000.0
4,SWD,2010,930000.0
5,SWD,2020,1020000.0
