# Pandas

Pandas is a powerful library for data manipulation and analysis in Python. It is widely used in a range of fields, including data science, finance, and statistics.

## 003. Series Basics

Pandas Series can be used for a variety of data manipulation and
analysis tasks. For example, you can use them to perform calculations on the
data, such as sum, mean, and standard deviation, or to plot the data using the
built-in plotting functions in pandas.

## 003.000 Assets

Some assets to avoid too much typing

| Name        | Age|
|-------------|----|
| Mbappé      | 23 |
| De Bruyne   | 31 |
| Lewandowski | 33 |
| Benzema     | 34 |
| Messi       | 35 |

In [15]:
import sys
from pathlib import Path

current_dir = Path().resolve()
while current_dir != current_dir.parent and current_dir.name != "katas":
    current_dir = current_dir.parent
if current_dir != current_dir.parent:
    sys.path.append(current_dir.as_posix())


In [16]:
import pandas as pd
from lib.utils import fresh_df
from IPython.core.interactiveshell import InteractiveShell

pd.set_option('display.max_rows', None)
InteractiveShell.ast_node_interactivity = "all"

names_as_list = ["Mbappé", "De Bruyne", "Lewandowski", "Benzema", "Messi"]
ages_as_list = [ 23,       31,          33,             34,        35]


### 003.001 Extract series from DataFrame

1. Load the TSV file into a DF
1. Extract the "DOB" column as Series without using loc
1. Show it with head(), notice it has the row indices and values
1. Extract the dobs as a list, and as array
1. Extract the row labels as a list, and as array
1. Extract the "DOB" column as Series, this time using loc


In [17]:
# 1
datafile = "002.tsv"
df = fresh_df(src=datafile, id="Name")
# solution


3

Name
Mbappé         1998-12-20
De Bruyne      1991-06-28
Lewandowski    1988-08-21
Benzema        1987-12-19
Messi          1987-06-24
Name: DOB, dtype: object

4

['1998-12-20', '1991-06-28', '1988-08-21', '1987-12-19', '1987-06-24']

array(['1998-12-20', '1991-06-28', '1988-08-21', '1987-12-19',
       '1987-06-24'], dtype=object)

5

array(['Mbappé', 'De Bruyne', 'Lewandowski', 'Benzema', 'Messi'],
      dtype=object)

['Mbappé', 'De Bruyne', 'Lewandowski', 'Benzema', 'Messi']

6

Name
Mbappé         1998-12-20
De Bruyne      1991-06-28
Lewandowski    1988-08-21
Benzema        1987-12-19
Messi          1987-06-24
Name: DOB, dtype: object

### 003.002 Sort Series

1. Load the TSV file into a DF
1. Extract the "DOB" column as Series by making a copy, otherwise it wont' work. Print it
1. Sort by row labels, print it
1. sort by values, print it


In [18]:
datafile = "002.tsv"
df = fresh_df(src=datafile, id="Name")
# solution


Name
Mbappé         1998-12-20
De Bruyne      1991-06-28
Lewandowski    1988-08-21
Benzema        1987-12-19
Messi          1987-06-24
Name: DOB, dtype: object

Name
Benzema        1987-12-19
De Bruyne      1991-06-28
Lewandowski    1988-08-21
Mbappé         1998-12-20
Messi          1987-06-24
Name: DOB, dtype: object

Name
Mbappé         1998-12-20
De Bruyne      1991-06-28
Lewandowski    1988-08-21
Benzema        1987-12-19
Messi          1987-06-24
Name: DOB, dtype: object

### 003.003 File without header

1. Read the datafile, which doesn't have a header, and after it's loaded tell df to make column 0 the index. Drop the automatically created index and print
1. The same, but tell df that column 0 is the index at load time


In [19]:
datafile = "katas.tsv"
# solution


1

Unnamed: 0_level_0,1,2,3,4,5,6
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
PANDAS 001,2023-01-06 08:00|22:49,2023-01-08 13:11|00:22,2023-01-12 00:52|00:31,2023-01-17 23:55 |00:23,,
PANDAS 003,2023-01-13 23:41|00:06,2023-01-18 22:02 |00:05,,,,
PANDAS 004,2023-01-20 22:41|00:23,,,,,
PANDAS 002,2023-01-06 13:08|12:53,2023-01-07 23:05|00:19,2023-01-10 21:45|00:10,2023-01-12 20:16|00:22,2023-01-16 23:29 |00:10,2023-01-21 14:10 |00:26


2

Unnamed: 0_level_0,1,2,3,4,5,6
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
PANDAS 001,2023-01-06 08:00|22:49,2023-01-08 13:11|00:22,2023-01-12 00:52|00:31,2023-01-17 23:55 |00:23,,
PANDAS 003,2023-01-13 23:41|00:06,2023-01-18 22:02 |00:05,,,,
PANDAS 004,2023-01-20 22:41|00:23,,,,,
PANDAS 002,2023-01-06 13:08|12:53,2023-01-07 23:05|00:19,2023-01-10 21:45|00:10,2023-01-12 20:16|00:22,2023-01-16 23:29 |00:10,2023-01-21 14:10 |00:26
