# Pandas

Pandas is a powerful library for data manipulation and analysis in Python. It is widely used in a range of fields, including data science, finance, and statistics.

## 002 Basic DataFrame manipulation

## 002.000 Assets

Some assets to avoid too much typing

The content of the tsv file is:

| Name        | Age|
|-------------|----|
| Mbappé      | 23 |
| De Bruyne   | 31 |
| Lewandowski | 33 |
| Benzema     | 34 |
| Messi       | 35 |

In [13]:
%pip install --upgrade -q pip pandas

Note: you may need to restart the kernel to use updated packages.


In [7]:
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell

pd.set_option('display.max_rows', None)
InteractiveShell.ast_node_interactivity = "all"

### 002.001 Indices

1. Print the df - note that it has an unnamed index column, created by Pandas
1. Rename the index column to 'id'
1. Rename the index column to 'ID', but do it differently

In [8]:
df = pd.read_csv("002.tsv", sep="\t")
# solution
1
df

2
df.index.rename("ID", inplace=True)
df

3
df.index.name = "ID"
df

1

Unnamed: 0,Name,DOB
0,Mbappé,1998-12-20
1,De Bruyne,1991-06-28
2,Lewandowski,1988-08-21
3,Benzema,1987-12-19
4,Messi,1987-06-24


2

Unnamed: 0_level_0,Name,DOB
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Mbappé,1998-12-20
1,De Bruyne,1991-06-28
2,Lewandowski,1988-08-21
3,Benzema,1987-12-19
4,Messi,1987-06-24


3

Unnamed: 0_level_0,Name,DOB
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Mbappé,1998-12-20
1,De Bruyne,1991-06-28
2,Lewandowski,1988-08-21
3,Benzema,1987-12-19
4,Messi,1987-06-24


### 002.002 Basic cell access

1. Get a new DataFrame, with "Name" as the index column
1. Save "De Bruyne" to a var and use it to fetch the DOB with the loc method. Print result in format `f"{xxx} was born on {xxx}"` YOU ALWAYS MAKE THE SAME MISTAKE
1. Save the index corresponding to "Benzema" to a var and use it to find the numerical index of the row. Print that index
1. Find the DOB with the iloc method using the previous numerical index. Print result in format `f"{xxx} was born on {xxx}"`

In [9]:
1
df =  pd.read_csv("002.tsv", sep="\t")
df.set_index("Name", inplace=True)
# f"{xxx} was born on {xxx}"
# solution

2
player = "De Bruyne"
f"{player} was born on {df.loc[player, 'DOB']}"

3
player = "Benzema"
player = df.index.get_loc(player)
player


4
f"{df.index[player]} was born on {df.iloc[player, 0]}"


1

2

'De Bruyne was born on 1991-06-28'

3

3

4

'Benzema was born on 1987-12-19'

### 002.003 Columns

1. Get a new dataframe
1. Insert some columns
   1. Add a "Profession" column at the end, with "footballer" as a value for all rows
   1. Add a "Still Playing" column at the beginning, with True as value for all rows
1. Rename
   1. Rename the "Still Playing" column as "Active"
   1. Remove the "Profession" column


In [10]:
1
df = pd.read_csv("002.tsv", sep="\t")
df
# solution

2
# this would work too
# df.insert(len(df.columns), "Profession", "Footballer")
df.insert(loc=len(df.columns), column="Profession", value="footballer")
df

3
df.insert(0, "Still Playing", True)
df

4
df.rename(columns={"Still Playing": "Active"}, inplace=True)
df.drop(columns=["Profession"], inplace=True)
df

1

Unnamed: 0,Name,DOB
0,Mbappé,1998-12-20
1,De Bruyne,1991-06-28
2,Lewandowski,1988-08-21
3,Benzema,1987-12-19
4,Messi,1987-06-24


2

Unnamed: 0,Name,DOB,Profession
0,Mbappé,1998-12-20,footballer
1,De Bruyne,1991-06-28,footballer
2,Lewandowski,1988-08-21,footballer
3,Benzema,1987-12-19,footballer
4,Messi,1987-06-24,footballer


3

Unnamed: 0,Still Playing,Name,DOB,Profession
0,True,Mbappé,1998-12-20,footballer
1,True,De Bruyne,1991-06-28,footballer
2,True,Lewandowski,1988-08-21,footballer
3,True,Benzema,1987-12-19,footballer
4,True,Messi,1987-06-24,footballer


4

Unnamed: 0,Active,Name,DOB
0,True,Mbappé,1998-12-20
1,True,De Bruyne,1991-06-28
2,True,Lewandowski,1988-08-21
3,True,Benzema,1987-12-19
4,True,Messi,1987-06-24


### 002.004 Applying transforms

1. Get a fresh DataFrame
3. Create a new column Age, by using 'apply' to derive the data from the DOB. YOU ARE GOING TO GET AN ERROR, YOU ALWAYS DO. YOU KNOW WHAT IT IS.
1. Print it, and print the type of each column
1. The same, but  using 'map'; this time convert the Age to string.
1. Print it, and print the type of each column



In [11]:
from datetime import datetime

1
df = pd.read_csv("002.tsv", sep="\t")
df
# solution

2
df["Age"] = df.apply(lambda row: (datetime.now() - datetime.strptime(row["DOB"], "%Y-%m-%d")).days // 365, axis=1)
df.dtypes
df

3
df["Age"] = df["DOB"].map(lambda cell: (datetime.now() - datetime.strptime(cell, "%Y-%m-%d")).days // 365).astype(str)
df.dtypes
df


1

Unnamed: 0,Name,DOB
0,Mbappé,1998-12-20
1,De Bruyne,1991-06-28
2,Lewandowski,1988-08-21
3,Benzema,1987-12-19
4,Messi,1987-06-24


2

Name    object
DOB     object
Age      int64
dtype: object

Unnamed: 0,Name,DOB,Age
0,Mbappé,1998-12-20,25
1,De Bruyne,1991-06-28,32
2,Lewandowski,1988-08-21,35
3,Benzema,1987-12-19,36
4,Messi,1987-06-24,36


3

Name    object
DOB     object
Age     object
dtype: object

Unnamed: 0,Name,DOB,Age
0,Mbappé,1998-12-20,25
1,De Bruyne,1991-06-28,32
2,Lewandowski,1988-08-21,35
3,Benzema,1987-12-19,36
4,Messi,1987-06-24,36
