# Pandas

Pandas is a powerful library for data manipulation and analysis in Python. It is widely used in a range of fields, including data science, finance, and statistics.

## 002 Basic DataFrame manipulation

## 002.000 Assets

Some assets to avoid too much typing

The content of the tsv file is:

| Name        | Age|
|-------------|----|
| Mbappé      | 23 |
| De Bruyne   | 31 |
| Lewandowski | 33 |
| Benzema     | 34 |
| Messi       | 35 |

In [2]:
import sys
from pathlib import Path

current_dir = Path().resolve()
while current_dir != current_dir.parent and current_dir.name != "katas":
    current_dir = current_dir.parent
if current_dir != current_dir.parent:
    sys.path.append(current_dir.as_posix())


In [11]:
import pandas as pd
from lib.utils import fresh_df
from IPython.core.interactiveshell import InteractiveShell

pd.set_option('display.max_rows', None)
InteractiveShell.ast_node_interactivity = "all"

### 002.001 Indices

1. Print the df - not that it has an unnamed index column, created by Pandas
1. Rename the index column to 'ID', and print it again
1. Do it again, but differently

In [13]:
df = fresh_df(src="002.tsv")
# solution
df

df.index = df.index.rename("ID")
df

df.index.name = "ID"
df

Unnamed: 0,Name,DOB
0,Mbappé,1998-12-20
1,De Bruyne,1991-06-28
2,Lewandowski,1988-08-21
3,Benzema,1987-12-19
4,Messi,1987-06-24


Unnamed: 0_level_0,Name,DOB
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Mbappé,1998-12-20
1,De Bruyne,1991-06-28
2,Lewandowski,1988-08-21
3,Benzema,1987-12-19
4,Messi,1987-06-24


### 002.002 Basic cell access

1. Get a new DataFrame, with "Name" as the index column
1. Save "De Bruyne" to a var and use it to fetch the DOB with the loc method. Print result in format `f"{xxx} was born on {xxx}"`
1. Save the index corresponding to "Benzema" to a var and use it to fetch the DOB with the iloc method. Print result in format `f"{xxx} was born on {xxx}"`

In [15]:
df = fresh_df(src="002.tsv", id="Name")
df
# solution

player = "De Bruyne"print(f"{player} was born on {df.loc[player, 'DOB']}")

player = 3
print(f"{df.index[player]} was born on {df.iloc[player, 0]}")


Unnamed: 0_level_0,DOB
Name,Unnamed: 1_level_1
Mbappé,1998-12-20
De Bruyne,1991-06-28
Lewandowski,1988-08-21
Benzema,1987-12-19
Messi,1987-06-24


De Bruyne was born on 1991-06-28
Benzema was born on 1987-12-19


### 002.003 Columns

1. Get a new dataframe
1. Add a "Profession" column at the end, with "footballer" as a value for all rows
1. Add a "Still Playing" column at the beginning, with True as value for all rows
1. Print the dataframe
1. Rename the "Still Playing" column as "Active"
1. Remove the "Profession" column
1. (No need to print the dataframe)


In [17]:
df = fresh_df(src="002.tsv")
df
# solution

df.insert(len(df.columns), "Profession", "Footballer")
df.insert(0, "Still Playing", True)
dfdf.rename(columns={"Still Playing": "Active"}, inplace=True)

df.drop(columns=["Profession"], inplace=True)
df

Unnamed: 0,Name,DOB
0,Mbappé,1998-12-20
1,De Bruyne,1991-06-28
2,Lewandowski,1988-08-21
3,Benzema,1987-12-19
4,Messi,1987-06-24


Unnamed: 0,Active,Name,DOB,Profession
0,True,Mbappé,1998-12-20,Footballer
1,True,De Bruyne,1991-06-28,Footballer
2,True,Lewandowski,1988-08-21,Footballer
3,True,Benzema,1987-12-19,Footballer
4,True,Messi,1987-06-24,Footballer


Unnamed: 0,Active,Name,DOB
0,True,Mbappé,1998-12-20
1,True,De Bruyne,1991-06-28
2,True,Lewandowski,1988-08-21
3,True,Benzema,1987-12-19
4,True,Messi,1987-06-24


### 002.004 Applying transforms

1. Get a fresh DataFrame
3. Create a new column Age, by using 'apply' to derive the data from the DOB
1. Print it, and print the type of each column
1. The same, but  using 'map'; this time convert the Age to string.
1. Print it, and print the type of each column



In [21]:
from datetime import datetime

df = fresh_df(src="002.tsv")
df
# solution

df["Age"] = df.apply(lambda row: (datetime.now() - datetime.strptime(row['DOB'], '%Y-%m-%d')).days // 365, axis=1)
df
df.dtypes

df["Age"] = df["DOB"].map(lambda cell: str((datetime.now() - datetime.strptime(cell, '%Y-%m-%d')).days // 365))
df
df.dtypes

Unnamed: 0,Name,DOB
0,Mbappé,1998-12-20
1,De Bruyne,1991-06-28
2,Lewandowski,1988-08-21
3,Benzema,1987-12-19
4,Messi,1987-06-24


Unnamed: 0,Name,DOB,Age
0,Mbappé,1998-12-20,24
1,De Bruyne,1991-06-28,31
2,Lewandowski,1988-08-21,34
3,Benzema,1987-12-19,35
4,Messi,1987-06-24,35


Name    object
DOB     object
Age      int64
dtype: object

Unnamed: 0,Name,DOB,Age
0,Mbappé,1998-12-20,24
1,De Bruyne,1991-06-28,31
2,Lewandowski,1988-08-21,34
3,Benzema,1987-12-19,35
4,Messi,1987-06-24,35


Name    object
DOB     object
Age     object
dtype: object