# Pandas

This section is all about the package, Pandas. Are you ready? Let's go!

You will find some small tasks in sections below. Most codes are hidden and you can only see the output.

> Try to figure out by yourself, or search for references. Being able to search and find information needed is an important skill that benefits you and your career for a long time.





## Set up the environment

### Task: import pandas, and name it as pd

In [1]:
import pandas as pd

## Data Creation

### Task
Find the data [here](https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)) for the GDP (nomial). For the top **10** countries, create a dataframe that has the names of country, the GDP of the country in 2021 (reported in 2022) by IMF in dollars , the population of the country (you can find the data [here](https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population))

In [33]:
gdp = pd.Series([26.95, 17.70, 4.43, 4.23, 3.73, 3.33, 3.05, 2.19, 2.13, 2.12], index = ["US", "China", "Germany", "Japan", "India", "UK", "France", "Italy", "Brazil", "Canada"])
population = pd.Series([334233854, 1411750000, 84482267, 124310000, 1392329000, 67736802, 68221000, 58776233, 203062512, 40097761], index = ["US", "China", "Germany", "Japan", "India", "UK", "France", "Italy", "Brazil", "Canada"])

In [34]:
gdp, population

(US         26.95
 China      17.70
 Germany     4.43
 Japan       4.23
 India       3.73
 UK          3.33
 France      3.05
 Italy       2.19
 Brazil      2.13
 Canada      2.12
 dtype: float64,
 US          334233854
 China      1411750000
 Germany      84482267
 Japan       124310000
 India      1392329000
 UK           67736802
 France       68221000
 Italy        58776233
 Brazil      203062512
 Canada       40097761
 dtype: int64)

In [39]:
df_country = pd.DataFrame([gdp, population])
df_country.index = ["GDP", "Population"]
df_country

TypeError: DataFrame.__init__() got an unexpected keyword argument 'axis'

## Data Accessing

### Task: List the countries and their GDP

In [20]:
df_country.loc["GDP"]

US         26.95
China      17.70
Germany     4.43
Japan       4.23
India       3.73
UK          3.33
France      3.05
Italy       2.19
Brazil      2.13
Canada      2.12
Name: GDP, dtype: float64

### Task: List the countries and their population

In [21]:
df_country.loc["Population"]

US         3.342339e+08
China      1.411750e+09
Germany    8.448227e+07
Japan      1.243100e+08
India      1.392329e+09
UK         6.773680e+07
France     6.822100e+07
Italy      5.877623e+07
Brazil     2.030625e+08
Canada     4.009776e+07
Name: Population, dtype: float64

### Task: List the first country with its GDP and population

In [24]:
df_country["US"]

GDP           2.695000e+01
Population    3.342339e+08
Name: US, dtype: float64

## Data Selection

### Task: Show the countries whose GDP is more than 5 trillion dollars
> (Hint: the data source has US$ Million as a unit, hence you need to choose GDP > 5,000,000 unit)




In [52]:
df = df_country.transpose()
df[df["Population"] > 7 * (10 ** 7)]

Unnamed: 0,GDP,Population
US,26.95,334233900.0
China,17.7,1411750000.0
Germany,4.43,84482270.0
Japan,4.23,124310000.0
India,3.73,1392329000.0
Brazil,2.13,203062500.0


### Task: Show the countries whose GDP is more than 3 trillion dollars

In [55]:
df[df["GDP"] > 5]

Unnamed: 0,GDP,Population
US,26.95,334233900.0
China,17.7,1411750000.0


### Task: Show the countries whose pupulation is more than 1 billion

In [54]:
df[df["Population"] > 1 * 10 ** 9]

Unnamed: 0,GDP,Population
China,17.7,1411750000.0
India,3.73,1392329000.0


### Task: Show the countries whose pupulation is less than 300 million

In [56]:
df[df["Population"] < 3 * 10 ** 8]

Unnamed: 0,GDP,Population
Germany,4.43,84482267.0
Japan,4.23,124310000.0
UK,3.33,67736802.0
France,3.05,68221000.0
Italy,2.19,58776233.0
Brazil,2.13,203062512.0
Canada,2.12,40097761.0


### Task: Show the countries whose pupulation is between 300 million and 500 million

In [59]:
df[((3 * 10 ** 8) < df["Population"]) & (df["Population"] < (5 * 10 ** 8))]

Unnamed: 0,GDP,Population
US,26.95,334233854.0


## Data Manipulation

### Task: Create a new column, called "GDP per capita", based on the column "GDP" and the column "Population"

In [65]:
df["GDP per capita"] = [i/j for i,j in zip(df["GDP"].values, df["Population"].values)]
df

Unnamed: 0,GDP,Population,GDP per capita
US,26950000000000.0,334233900.0,80632.167201
China,17700000000000.0,1411750000.0,12537.6306
Germany,4430000000000.0,84482270.0,52437.039835
Japan,4230000000000.0,124310000.0,34027.833642
India,3730000000000.0,1392329000.0,2678.964526
UK,3330000000000.0,67736800.0,49160.868268
France,3050000000000.0,68221000.0,44707.641342
Italy,2190000000000.0,58776230.0,37259.95846
Brazil,2130000000000.0,203062500.0,10489.380728
Canada,2120000000000.0,40097760.0,52870.782486


In [64]:
df["GDP"] *= 10 ** 12
df

Unnamed: 0,GDP,Population,GDP per capita
US,26950000000000.0,334233900.0,8.063217e-08
China,17700000000000.0,1411750000.0,1.253763e-08
Germany,4430000000000.0,84482270.0,5.243704e-08
Japan,4230000000000.0,124310000.0,3.402783e-08
India,3730000000000.0,1392329000.0,2.678965e-09
UK,3330000000000.0,67736800.0,4.916087e-08
France,3050000000000.0,68221000.0,4.470764e-08
Italy,2190000000000.0,58776230.0,3.725996e-08
Brazil,2130000000000.0,203062500.0,1.048938e-08
Canada,2120000000000.0,40097760.0,5.287078e-08


## Data Understanding

### Task: With the three numeric columns, show the statsitics of each:


1.   Count
1.   Max
1.   Min
1.   Mean
1.   Median
1.   Quantiles
1.   25% Quantile
1.   50% Quantile
1.   75% QUantile
1.   Variance
1.   Std
1.   Total




In [67]:
df.describe()

Unnamed: 0,GDP,Population,GDP per capita
count,10.0,10.0,10.0
mean,6986000000000.0,378499900.0,37680.226709
std,8414274000000.0,546589200.0,23768.963416
min,2120000000000.0,40097760.0,2678.964526
25%,2405000000000.0,67857850.0,17910.181361
50%,3530000000000.0,104396100.0,40983.799901
75%,4380000000000.0,301441000.0,51617.996943
max,26950000000000.0,1411750000.0,80632.167201
