# Pandas

Pandas is a powerful library for data manipulation and analysis in Python. It is widely used in a range of fields, including data science, finance, and statistics.

## 002 Basic DataFrame manipulation

## 002.000 Assets

Some assets to avoid too much typing

The content of the tsv file is:

| Name        | Age|
|-------------|----|
| Mbappé      | 23 |
| De Bruyne   | 31 |
| Lewandowski | 33 |
| Benzema     | 34 |
| Messi       | 35 |

In [98]:
import pandas as pd
from typing import Optional

def fresh_df(id: Optional[str] = ""):
    """Return a dataframe from the tsv file.
    
    @param (id) - if it's the name of a column, it will become the index
                if itsn't, the default index will be renamed to that
                if not passed, the default index will remain nameless
    """
    df = pd.read_csv("002.tsv", sep="\t")
    if id:
        if id in df.columns:
            df.set_index(id, inplace=True)
        else:
            df.index = df.index.rename(id)
    return df

### 002.001 Indices

1. Print the df - not that it has an unnamed index column, created by Pandas
1. Rename the index column to 'ID', and print it again

In [100]:
df = fresh_df()
# solution
print(df)
df.index = df.index.rename("ID")
print(df)

          Name         DOB
0       Mbappé  1998-12-20
1    De Bruyne  1991-28-06
2  Lewandowski  1988-08-21
3      Benzema  1987-12-19
4        Messi  1987-06-24
           Name         DOB
ID                         
0        Mbappé  1998-12-20
1     De Bruyne  1991-28-06
2   Lewandowski  1988-08-21
3       Benzema  1987-12-19
4         Messi  1987-06-24


### 002.002 Basic cell access

1. Get a new DataFrame, with "Name" as the index column
1. Save "De Bruyne" to a var and use it to fetch the DOB with the loc method
1. Save "Benzema" to a var and use it to fetch the DOB with the iloc method

In [101]:
df = fresh_df(id="Name")
print(df)
# solution
player = "De Bruyne"
print(f"{player} was born on {df.loc[player, 'DOB']}")
player = 3
print(f"{df.index[player]} was born on {df.iloc[player, 0]}")


                    DOB
Name                   
Mbappé       1998-12-20
De Bruyne    1991-28-06
Lewandowski  1988-08-21
Benzema      1987-12-19
Messi        1987-06-24
De Bruyne was born on 1991-28-06
Benzema was born on 1987-12-19


### 002.003 Columns

1. Get a new dataframe
1. Add a "Profession" column at the end, with "footballer" as a value for all rows
1. Add a "Still Playing" column at the beginning, with True as value for all rows
1. Rename the "Still Playing" column as "Active"
1. Print the dataframe
1. Remove the "Profession" column
1. Print the dataframe


In [111]:
df = fresh_df()
print(df)
# solution
df.insert(len(df.columns), "Profession", "Footballer")
df.insert(0, "Still Playing", True)
df.rename(columns={"Still Playing": "Active"}, inplace=True)
print(df)
df.drop(columns=["Profession"], inplace=True)
print(df)

          Name         DOB
0       Mbappé  1998-12-20
1    De Bruyne  1991-28-06
2  Lewandowski  1988-08-21
3      Benzema  1987-12-19
4        Messi  1987-06-24
   Active         Name         DOB  Profession
0    True       Mbappé  1998-12-20  Footballer
1    True    De Bruyne  1991-28-06  Footballer
2    True  Lewandowski  1988-08-21  Footballer
3    True      Benzema  1987-12-19  Footballer
4    True        Messi  1987-06-24  Footballer
   Active         Name         DOB
0    True       Mbappé  1998-12-20
1    True    De Bruyne  1991-28-06
2    True  Lewandowski  1988-08-21
3    True      Benzema  1987-12-19
4    True        Messi  1987-06-24


### 002.004 Applying transforms

1. Get a fresh DataFrame
2. Convert the DOB column to datetime
3. Create a new column Age, by using 'apply' to derive the data from the DOB
1. Print it, and print the type of each column
1. The same, but  using 'map'; this time convert the Age to string.
1. Print it, and print the type of each column



In [145]:
from datetime import datetime

df = fresh_df()
print(df)
# solution
df["DOB"] = pd.to_datetime(df['DOB'], format="%Y-%m-%d")

now = datetime.now()
df["Age"] = df.apply(lambda row: (now - row['DOB']).days // 365, axis=1)
print(df)
print(df.dtypes)

df["Age"] = df["DOB"].map(lambda cell: (now - cell).days // 365)
df["Age"] = df["Age"].astype(str)
print(df)
print(df.dtypes)

          Name         DOB
0       Mbappé  1998-12-20
1    De Bruyne  1991-06-28
2  Lewandowski  1988-08-21
3      Benzema  1987-12-19
4        Messi  1987-06-24
          Name        DOB  Age
0       Mbappé 1998-12-20   24
1    De Bruyne 1991-06-28   31
2  Lewandowski 1988-08-21   34
3      Benzema 1987-12-19   35
4        Messi 1987-06-24   35
Name            object
DOB     datetime64[ns]
Age              int64
dtype: object
          Name        DOB Age
0       Mbappé 1998-12-20  24
1    De Bruyne 1991-06-28  31
2  Lewandowski 1988-08-21  34
3      Benzema 1987-12-19  35
4        Messi 1987-06-24  35
Name            object
DOB     datetime64[ns]
Age             object
dtype: object
