# Pandas exercises

## Import pandas

* Import pandas and numpy with the standard shorthand aliases pd and np, respectively

In [1]:
import pandas as pd
import numpy as np

## Creating pandas objects

* Create a pandas Series object with keys corresponding to the days of the week and values corresponding to the integer numbers from 0 to 6. Hint: you can construct the series starting from a dictionary.

In [2]:
dict_week = {"Monday": 0, "Tuesday": 1, "Wednesday": 2, "Thursday": 3, "Friday": 4, "Saturday": 5, "Sunday": 6}
ser_week = pd.Series(dict_week)
#ser_week

* Create a pandas DataFrame with 5 columns and 10 rows containing random numbers from the standard Gaussian distribution. Assign column names "C0", "C1", "C2", "C3", "C4". Do not specify row names.

In [3]:
#column_names = ["C0", "C1", "C2"]
column_names = ["C"+str(i) for i in range(5)]
df_rand = pd.DataFrame(data=np.random.randn(10, 5), columns=column_names)
#df_rand

* Create a pandas DataFrame with 5 columns and 10 rows containing random numbers from the standard Gaussian distribution. Assign column names "C0", "C1", "C2", "C3", "C4" and row names "R0", "R1", "R2", ..., "R9". Hint: you can use list comprehension to generate row/column names programmatically!

In [4]:
column_names = ["C"+str(i) for i in range(5)]
row_names = ["R"+str(i) for i in range(10)]
df_rand_with_rownames = pd.DataFrame(data=np.random.randn(10, 5), columns=column_names, index=row_names)
#df_rand_with_rownames

* Read the csv file "worldcities.csv" into a pandas DataFrame. Assign it to a variable with name ``df_cities``.

In [5]:
df_cities = pd.read_csv("worldcities.csv")

Display the first 3 rows and the last 3 rows of ``df_cities``. Hint: use the ``head`` and ``tail`` methods of the DataFrame Object!

In [6]:
df_cities.head(3)

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
0,Tokyo,Tokyo,35.685,139.7514,Japan,JP,JPN,Tōkyō,primary,35676000.0,1392685764
1,New York,New York,40.6943,-73.9249,United States,US,USA,New York,,19354922.0,1840034016
2,Mexico City,Mexico City,19.4424,-99.131,Mexico,MX,MEX,Ciudad de México,primary,19028000.0,1484247881


In [7]:
df_cities.tail(3)

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
15490,Ambarchik,Ambarchik,69.651,162.3336,Russia,RU,RUS,Sakha (Yakutiya),,0.0,1643739159
15491,Nordvik,Nordvik,74.0165,111.51,Russia,RU,RUS,Krasnoyarskiy Kray,,0.0,1643587468
15492,Ennadai,Ennadai,61.1333,-100.8833,Canada,CA,CAN,Nunavut,,0.0,1124019423


## Indexing and slicing

* Create a pandas DataFrame containing the columns "city", "lat", "lng" of ``df_cities``

In [8]:
df_cities_geo = df_cities[["city", "lat", "lng"]]
df_cities_geo.head(5)

Unnamed: 0,city,lat,lng
0,Tokyo,35.685,139.7514
1,New York,40.6943,-73.9249
2,Mexico City,19.4424,-99.131
3,Mumbai,19.017,72.857
4,São Paulo,-23.5587,-46.625


* Create a pandas Series containing the column "city" of ``df_cities``

In [9]:
df_cities["city"]

0              Tokyo
1           New York
2        Mexico City
3             Mumbai
4          São Paulo
            ...     
15488    Timmiarmiut
15489    Cheremoshna
15490      Ambarchik
15491        Nordvik
15492        Ennadai
Name: city, Length: 15493, dtype: object

* What is the difference between ``df_cities["city"]`` and ``df_cities[["city"]]``?

In [10]:
type(df_cities[["city"]]), type(df_cities["city"])

(pandas.core.frame.DataFrame, pandas.core.series.Series)

The first is a pandas DataFrame with a single column, the second is a pandas Series!

* Set the column ``city_ascii`` as row index of the dataframe ``df_cities``

In [11]:
# df_cities = df_cities.set_index("city_ascii")
df_cities.set_index("city_ascii", inplace=True)

In [12]:
df_cities.head()

Unnamed: 0_level_0,city,lat,lng,country,iso2,iso3,admin_name,capital,population,id
city_ascii,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Tokyo,Tokyo,35.685,139.7514,Japan,JP,JPN,Tōkyō,primary,35676000.0,1392685764
New York,New York,40.6943,-73.9249,United States,US,USA,New York,,19354922.0,1840034016
Mexico City,Mexico City,19.4424,-99.131,Mexico,MX,MEX,Ciudad de México,primary,19028000.0,1484247881
Mumbai,Mumbai,19.017,72.857,India,IN,IND,Mahārāshtra,admin,18978000.0,1356226629
Sao Paulo,São Paulo,-23.5587,-46.625,Brazil,BR,BRA,São Paulo,admin,18845000.0,1076532519


* Select the row of ``df_cities`` with the data of Lugano using the `.loc` indexer

In [13]:
df_cities.loc["Lugano"]

city               Lugano
lat               46.0004
lng                8.9667
country       Switzerland
iso2                   CH
iso3                  CHE
admin_name         Ticino
capital               NaN
population         105388
id             1756503816
Name: Lugano, dtype: object

* Select the first 3 rows and the last 3 columns of ``df_cities`` using the ``.iloc`` indexer

In [14]:
df_cities.iloc[:3, -3:]

Unnamed: 0_level_0,capital,population,id
city_ascii,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Tokyo,primary,35676000.0,1392685764
New York,,19354922.0,1840034016
Mexico City,primary,19028000.0,1484247881


## Boolean indexing

* Read again the csv file "worldcities.csv" into a pandas DataFrame. Assign it to a variable with name ``df_cities``.

In [15]:
df_cities = pd.read_csv('worldcities.csv') # from https://simplemaps.com/data/world-cities

* Find the latitude of Lugano (city_ascii == 'Lugano') and assign it to the variable ``lat_lugano``.

In [16]:
lat_lugano = df_cities[df_cities['city_ascii'] == 'Lugano']['lat'].iloc[0]
lat_lugano

46.0004

* Select all the cities at the north of Lugano (latitude > ``lat_lugano``)

In [17]:
df_north = df_cities[df_cities['lat'] > lat_lugano]
df_north.head(3)

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
17,Moscow,Moscow,55.7522,37.6155,Russia,RU,RUS,Moskva,primary,10452000.0,1643318494
19,Paris,Paris,48.8667,2.3333,France,FR,FRA,Île-de-France,primary,9904000.0,1250015082
25,London,London,51.5,-0.1167,United Kingdom,GB,GBR,"London, City of",primary,8567000.0,1826645935



* Select all cities at the north of Lugano in Italy (country == 'Italy')

In [18]:
df_north_IT = df_cities[(df_cities['lat'] > lat_lugano) & (df_cities['country'] == 'Italy')]
df_north_IT

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
2936,Udine,Udine,46.07,13.24,Italy,IT,ITA,Friuli-Venezia Giulia,minor,119009.0,1380396446
3122,Trento,Trento,46.0804,11.12,Italy,IT,ITA,Trentino-Alto Adige,admin,107808.0,1380953307
7068,Bolzano,Bolzano,46.5004,11.36,Italy,IT,ITA,Trentino-Alto Adige,minor,95895.0,1380677819
