# Pandas

Pandas is a powerful library for data manipulation and analysis in Python. It is widely used in a range of fields, including data science, finance, and statistics.

## 004. Iteration

## 004.000 Assets

Some assets to avoid too much typing

| Name        | Age|
|-------------|----|
| Mbappé      | 23 |
| De Bruyne   | 31 |
| Lewandowski | 33 |
| Benzema     | 34 |
| Messi       | 35 |

In [1]:
import sys
from pathlib import Path

current_dir = Path().resolve()
while current_dir != current_dir.parent and current_dir.name != "katas":
    current_dir = current_dir.parent
if current_dir != current_dir.parent:
    sys.path.append(current_dir.as_posix())

In [2]:
import pandas as pd
from lib.utils import fresh_df
from IPython.core.interactiveshell import InteractiveShell

pd.set_option('display.max_rows', None)
InteractiveShell.ast_node_interactivity = "all"

### 004.001 Using Itertuples

1. Use itertuples to print each row


In [10]:
df = fresh_df("002.tsv", id="Name")
# solution

for row in df.itertuples():
    row

Pandas(Index='Mbappé', DOB='1998-12-20')

Pandas(Index='De Bruyne', DOB='1991-06-28')

Pandas(Index='Lewandowski', DOB='1988-08-21')

Pandas(Index='Benzema', DOB='1987-12-19')

Pandas(Index='Messi', DOB='1987-06-24')

### 004.002 Using iterrows

1. Use iterrows to print the index of each row, and then the row as a Series

In [11]:
df = fresh_df("002.tsv")
# solution

for index, row in df.iterrows():
    index
    row


0

Name        Mbappé
DOB     1998-12-20
Name: 0, dtype: object

1

Name     De Bruyne
DOB     1991-06-28
Name: 1, dtype: object

2

Name    Lewandowski
DOB      1988-08-21
Name: 2, dtype: object

3

Name       Benzema
DOB     1987-12-19
Name: 3, dtype: object

4

Name         Messi
DOB     1987-06-24
Name: 4, dtype: object

### 004.003 Using Items

1. Use items to print each column as a series


In [12]:
df = fresh_df("002.tsv")
# solution

for name, column in df.items():
    name
    column

'Name'

0         Mbappé
1      De Bruyne
2    Lewandowski
3        Benzema
4          Messi
Name: Name, dtype: object

'DOB'

0    1998-12-20
1    1991-06-28
2    1988-08-21
3    1987-12-19
4    1987-06-24
Name: DOB, dtype: object

### 004.004 Using Query

1. Add an 'Age' column with years of player
1. Use query to find all players older than 30 years 
1. Make another query for all players older than 30 and with a name longer than 8 characters


In [16]:
from datetime import datetime
df = fresh_df("002.tsv")
# solution

1
df["Age"] = df["DOB"].map(lambda cell: (datetime.now() - datetime.strptime(cell, "%Y-%m-%d")).days // 365)
df["Age"]

2
df.query("Age > 30")

3
df.query("Age > 30 and Name.str.len() > 8")

1

0    24
1    31
2    34
3    35
4    35
Name: Age, dtype: int64

2

Unnamed: 0,Name,DOB,Age
1,De Bruyne,1991-06-28,31
2,Lewandowski,1988-08-21,34
3,Benzema,1987-12-19,35
4,Messi,1987-06-24,35


3

Unnamed: 0,Name,DOB,Age
1,De Bruyne,1991-06-28,31
2,Lewandowski,1988-08-21,34


### 004.003 aggregagtion

1. Load the minimal_player_list file
1. Find the mean, median and max for Age and Wages in the complete datase

In [18]:
1
datafile = "minimal_player_list.csv"
df = fresh_df(datafile, id="ID", sep=",")
df.head()
# solution

df[["Age", "Wage"]].agg(['mean',  'median', 'max'])


1

Unnamed: 0_level_0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Position
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
231747,Kylian Mbappé,23,France,91,95,Paris Saint Germain,190500000,23000000,ST
192985,Kevin De Bruyne,31,Belgium,91,91,Manchester City,107500000,35000000,CM
188545,Robert Lewandowski,33,Poland,91,91,FC Barcelona,84000000,42000000,ST
165153,Karim Benzema,34,France,91,91,Real Madrid,64000000,45000000,CF
158023,Lionel Messi,35,Argentina,91,91,Paris Saint Germain,54000000,19500000,RW


Unnamed: 0,Age,Wage
mean,26.341552,1308528.0
median,26.0,500000.0
max,44.0,45000000.0


### 004.005 Using Grouping

1. Generate a DataFrame with the rows grouped by nationality and for each nationality the max wages. Use `reset_index` to generate the DataFrame. Show its head
1. Merge back to the original dataframe, so that the original rows are still with the aggregates. Show its head
1. Extract the "Name", "Nationality", and "wage" fields. Show its head
1. Sort by Wage descending. Head doesn't work here; take the first 20 rows instead and copy them to a new df
1. Format so that "Wage" has comma separators, and no decimals. Hide its row number. Show its head


In [36]:
datafile = "minimal_player_list.csv"
df = fresh_df(datafile, id="ID", sep=",")
group_by_field = "Nationality"
# solution

1
agg = df.groupby(group_by_field)["Wage"].agg("max").reset_index()
agg.head()

2
agg = df.merge(agg, on=[group_by_field, "Wage"])
agg.head()

3
minimal = agg[["Name", group_by_field, "Wage"]]
minimal.head()

4
minimal = minimal.sort_values(by="Wage", ascending=False)[0:20].copy()
minimal

5
minimal.style.format('{:,.0f}', subset=["Wage"]).hide(axis="index")

1

Unnamed: 0,Nationality,Wage
0,Afghanistan,100000
1,Albania,6400000
2,Algeria,20000000
3,Andorra,300000
4,Angola,3400000


2

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Position
0,Kevin De Bruyne,31,Belgium,91,91,Manchester City,107500000,35000000,CM
1,Robert Lewandowski,33,Poland,91,91,FC Barcelona,84000000,42000000,ST
2,Karim Benzema,34,France,91,91,Real Madrid,64000000,45000000,CF
3,Lionel Messi,35,Argentina,91,91,Paris Saint Germain,54000000,19500000,RW
4,Mohamed Salah,30,Egypt,90,90,Liverpool,115500000,27000000,RW


3

Unnamed: 0,Name,Nationality,Wage
0,Kevin De Bruyne,Belgium,35000000
1,Robert Lewandowski,Poland,42000000
2,Karim Benzema,France,45000000
3,Lionel Messi,Argentina,19500000
4,Mohamed Salah,Egypt,27000000


4

Unnamed: 0,Name,Nationality,Wage
2,Karim Benzema,France,45000000
1,Robert Lewandowski,Poland,42000000
0,Kevin De Bruyne,Belgium,35000000
12,Toni Kroos,Germany,31000000
4,Mohamed Salah,Egypt,27000000
10,Bernardo Mota Carvalho e Silva,Portugal,26000000
8,Carlos Henrique Venancio Casimiro,Brazil,24000000
9,Heung Min Son,Korea Republic,24000000
6,Harry Kane,England,24000000
5,Erling Haaland,Norway,23000000


5

Name,Nationality,Wage
Karim Benzema,France,45000000
Robert Lewandowski,Poland,42000000
Kevin De Bruyne,Belgium,35000000
Toni Kroos,Germany,31000000
Mohamed Salah,Egypt,27000000
Bernardo Mota Carvalho e Silva,Portugal,26000000
Carlos Henrique Venancio Casimiro,Brazil,24000000
Heung Min Son,Korea Republic,24000000
Harry Kane,England,24000000
Erling Haaland,Norway,23000000


### 004.006 Using Grouping

1. Repeat 004.005, but applied to "Club" instead of "Nationality". Also, it should only apply to players with "Wage > 2000000"
    1. Generate a DataFrame with the rows grouped by Club and for each Club the max wages. Use `reset_index` to generate the DataFrame. Show its head
    1. Merge back to the original dataframe, so that the original rows are still with the aggregates. Show its head
    1. Extract the "Name", "Club", and "wage" fields. Show its head
    1. Sort by Wage descending. Show its head
    1. Format so that "Wage" has comma separators, and no decimals. Hide its row number. Show its head

In [38]:
datafile = "minimal_player_list.csv"
df = fresh_df(datafile, id="ID", sep=",")
group_by_field = "Club"
# solution

1
agg = df.query("Wage > 2000000").groupby(group_by_field)["Wage"].agg("max").reset_index()
agg.head()

2
agg = df.merge(agg, on=[group_by_field, "Wage"])
agg.head()

3
minimal = agg[["Name", group_by_field, "Wage"]]
minimal.head()

4
minimal = minimal.sort_values(by="Wage", ascending=False)[0:20].copy()
minimal

5
minimal.style.format('{:,.0f}', subset=["Wage"]).hide(axis="index")

1

Unnamed: 0,Club,Wage
0,AFC Bournemouth,4700000
1,AZ,2900000
2,Adana Demirspor,2300000
3,Ajaccio,2200000
4,Ajax,4200000


2

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Position
0,Kylian Mbappé,23,France,91,95,Paris Saint Germain,190500000,23000000,ST
1,Kevin De Bruyne,31,Belgium,91,91,Manchester City,107500000,35000000,CM
2,Robert Lewandowski,33,Poland,91,91,FC Barcelona,84000000,42000000,ST
3,Karim Benzema,34,France,91,91,Real Madrid,64000000,45000000,CF
4,Mohamed Salah,30,Egypt,90,90,Liverpool,115500000,27000000,RW


3

Unnamed: 0,Name,Club,Wage
0,Kylian Mbappé,Paris Saint Germain,23000000
1,Kevin De Bruyne,Manchester City,35000000
2,Robert Lewandowski,FC Barcelona,42000000
3,Karim Benzema,Real Madrid,45000000
4,Mohamed Salah,Liverpool,27000000


4

Unnamed: 0,Name,Club,Wage
3,Karim Benzema,Real Madrid,45000000
2,Robert Lewandowski,FC Barcelona,42000000
1,Kevin De Bruyne,Manchester City,35000000
4,Mohamed Salah,Liverpool,27000000
6,Harry Kane,Tottenham Hotspur,24000000
7,Heung Min Son,Tottenham Hotspur,24000000
9,Carlos Henrique Venancio Casimiro,Manchester United,24000000
0,Kylian Mbappé,Paris Saint Germain,23000000
10,N'Golo Kanté,Chelsea,21000000
13,Romelu Lukaku,Inter,20000000


5

Name,Club,Wage
Karim Benzema,Real Madrid,45000000
Robert Lewandowski,FC Barcelona,42000000
Kevin De Bruyne,Manchester City,35000000
Mohamed Salah,Liverpool,27000000
Harry Kane,Tottenham Hotspur,24000000
Heung Min Son,Tottenham Hotspur,24000000
Carlos Henrique Venancio Casimiro,Manchester United,24000000
Kylian Mbappé,Paris Saint Germain,23000000
N'Golo Kanté,Chelsea,21000000
Romelu Lukaku,Inter,20000000
