# Pandas

Pandas is a powerful library for data manipulation and analysis in Python. It is widely used in a range of fields, including data science, finance, and statistics.

## 003. Series Basics

Pandas Series can be used for a variety of data manipulation and
analysis tasks. For example, you can use them to perform calculations on the
data, such as sum, mean, and standard deviation, or to plot the data using the
built-in plotting functions in pandas.

## 003.000 Assets

Some assets to avoid too much typing

| Name        | Age|
|-------------|----|
| Mbappé      | 23 |
| De Bruyne   | 31 |
| Lewandowski | 33 |
| Benzema     | 34 |
| Messi       | 35 |

In [5]:
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell

pd.set_option('display.max_rows', None)
InteractiveShell.ast_node_interactivity = "all"


In [6]:
names_as_list = ["Mbappé", "De Bruyne", "Lewandowski", "Benzema", "Messi"]
ages_as_list = [ 23,       31,          33,             34,        35]



### 003.001 Extract series from DataFrame

1. Load the TSV file into a DF. Extract the "DOB" column as Series without using loc. Print it, notice it has the row indices and values
1. Extract the dobs as a list, and as a NumPy array
1. Extract the row labels as a list, and as array
1. Extract the "DOB" column as Series, this time using loc


In [7]:
datafile = "002.tsv"
df = pd.read_csv(datafile, sep="\t")
df.set_index("Name", inplace=True)

# solution


1

Name
Mbappé         1998-12-20
De Bruyne      1991-06-28
Lewandowski    1988-08-21
Benzema        1987-12-19
Messi          1987-06-24
Name: DOB, dtype: object

2

['1998-12-20', '1991-06-28', '1988-08-21', '1987-12-19', '1987-06-24']

array(['1998-12-20', '1991-06-28', '1988-08-21', '1987-12-19',
       '1987-06-24'], dtype=object)

3

['Mbappé', 'De Bruyne', 'Lewandowski', 'Benzema', 'Messi']

array(['Mbappé', 'De Bruyne', 'Lewandowski', 'Benzema', 'Messi'],
      dtype=object)

4

Name
Mbappé         1998-12-20
De Bruyne      1991-06-28
Lewandowski    1988-08-21
Benzema        1987-12-19
Messi          1987-06-24
Name: DOB, dtype: object

### 003.002 Sort Series

1. Load the TSV file into a DF. Extract the "DOB" column as Series by making a copy, otherwise it wont' work
1. Sort by row labels
1. Sort by values with default ordering. Then sort it again, reversing it
1. Get another df, this time without id. Print it, and then print it sorted by name


In [8]:
datafile = "002.tsv"
df = pd.read_csv(datafile, sep="\t")
df.set_index("Name", inplace=True)
# solution


1

Name
Mbappé         1998-12-20
De Bruyne      1991-06-28
Lewandowski    1988-08-21
Benzema        1987-12-19
Messi          1987-06-24
Name: DOB, dtype: object

2

Name
Benzema        1987-12-19
De Bruyne      1991-06-28
Lewandowski    1988-08-21
Mbappé         1998-12-20
Messi          1987-06-24
Name: DOB, dtype: object

3

Name
Messi          1987-06-24
Benzema        1987-12-19
Lewandowski    1988-08-21
De Bruyne      1991-06-28
Mbappé         1998-12-20
Name: DOB, dtype: object

Name
Mbappé         1998-12-20
De Bruyne      1991-06-28
Lewandowski    1988-08-21
Benzema        1987-12-19
Messi          1987-06-24
Name: DOB, dtype: object

4

Unnamed: 0,Name,DOB
0,Mbappé,1998-12-20
1,De Bruyne,1991-06-28
2,Lewandowski,1988-08-21
3,Benzema,1987-12-19
4,Messi,1987-06-24


Unnamed: 0,Name,DOB
3,Benzema,1987-12-19
1,De Bruyne,1991-06-28
2,Lewandowski,1988-08-21
0,Mbappé,1998-12-20
4,Messi,1987-06-24


### 003.003 File for seaborn

1. Read the data_file
   1. It doesn't have a header 
   1. The index is the first column. Rename it to 'Name'
   1. The rows need to be sorted by index
1. Extract the 1st row as a series to a variable row
   1. Ignore the NaN values
   1. ...and split each cell into a list, splitting by '|'
   1. ...and turn each item in the list into a column (Series)
   1. ...and save into a new df called new_df, and rename the columns "Time" and name, then make "Time" the index
   1. Append ':00' to each 'name' column, and turn into a timedelta
1. Create an empty dataframe called katas_df. Then loop through each row of df, and apply the transformations above to it. At the end of the loop, append the newly created DF to katas_df. Note that you don't need to set index in the row
1. Make "Time" the index column, sort ascending by it, and save to the `save_to` csv


In [9]:
import os
import re
data_file = "katas.tsv"
name = "PANDAS XXX"
save_to = re.sub(r'/katas/.+$', '/katas/seaborn_katas/solutions/katas.tsv', os.path.abspath(''))
# solution


1

Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
GOOGLEAPPS 001,2023-04-23 23:21|01:52,2023-04-24 23:46|01:07,2023-04-25 22:20|00:21,2023-05-04 00:05|00:22,2023-05-26 00:08|00:26,2023-08-03 23:11|00:44,2023-08-28 22:18|00:27,2023-09-29 23:59|00:31,,,,,,,,,,
PANDAS 001,2023-01-06 08:00|00:22,2023-01-08 13:11|00:22,2023-01-12 00:52|00:31,2023-01-17 23:55|00:23,2023-01-22 11:10|00:31,2023-01-26 21:02|00:25,2023-02-10 23:30|00:36,2023-03-14 23:59|00:47,2023-03-21 23:39|00:32,2023-03-29 00:42|00:23,2023-04-07 15:13|00:20,2023-04-22 13:28|00:20,2023-05-16 22:53|00:31,2023-08-03 18:19|00:28,2023-08-28 23:32|00:25,2023-12-08 08:20|00:39,,
PANDAS 002,2023-01-06 13:08|00:12,2023-01-07 23:05|00:19,2023-01-10 21:45|00:10,2023-01-12 20:16|00:22,2023-01-16 23:29|00:10,2023-01-21 14:10|00:26,2023-01-26 00:29|00:15,2023-02-01 23:00|00:10,2023-03-05 00:54|00:14,2023-03-19 21:21|00:14,2023-03-26 22:19|00:23,2023-04-06 01:54|00:12,2023-04-19 23:37|00:08,2023-05-16 00:45|00:11,2023-06-15 20:28|1082227:17,2023-06-15 20:39|00:10,2023-08-27 23:32|00:09,2023-12-20 09:02|00:13
PANDAS 003,2023-01-13 23:41|00:06,2023-01-18 22:02|00:05,2023-01-23 23:18|00:12,2023-01-28 11:43|01:19,2023-02-11 16:36|00:26,2023-03-07 01:38|00:29,2023-03-21 01:08|00:34,2023-03-26 03:16|00:22,2023-03-31 00:43|00:23,2023-04-10 12:56|00:20,2023-04-26 20:29|00:20,2023-05-20 16:19|00:19,2023-08-06 17:49|00:20,2023-09-08 20:59|00:22,,,,
PANDAS 004,2023-01-20 22:41|00:23,2023-01-24 22:58|00:25,2023-01-27 21:56|00:36,2023-01-31 22:18|00:20,2023-02-12 21:40|00:21,2023-03-15 21:01|00:22,2023-03-17 00:03|00:15,2023-03-25 01:14|00:18,2023-03-29 23:22|00:16,2023-04-06 21:35|00:15,2023-04-17 22:45|00:15,2023-05-02 21:06|00:14,2023-06-07 22:38|00:43,2023-08-27 22:14|00:24,2023-09-30 19:27|00:22,,,
PANDAS 006 anki,2023-05-13 14:35|06:00,2023-05-14 22:05|01:22,2023-06-08 23:59|01:30,2023-06-11 15:02|00:33,06-08-2023|00:32,,,,,,,,,,,,,
PANDAS 007 Car,2023-05-28 03:17|01:19,2023-05-29 00:11|00:38,2023-06-03 18:41|00:11,2023-06-07 01:01|00:14,06-08-2023|00:09,2023-08-21 01:05|00:09,2023-09-29 00:56|00:11,,,,,,,,,,,
PANDAS 008 anki nlp,2023-06-13 00:06|00:39,2023-06-14 23:43|02:02,2023-08-27 02:45|00:38,2023-09-23 01:45|00:21,,,,,,,,,,,,,,
PANDAS 009,2023-11-12 14:26|04:20,2023-12-05 08:42|00:28,2023-12-06 09:14|00:51,,,,,,,,,,,,,,,
PYTHON_DABAEZ 001,2023-09-19 23:38|00:19,2023-09-21 00:39|00:21,2023-12-22 08:39|00:23,,,,,,,,,,,,,,,


2

1    2023-04-23 23:21|01:52
2    2023-04-24 23:46|01:07
3    2023-04-25 22:20|00:21
4    2023-05-04 00:05|00:22
5    2023-05-26 00:08|00:26
6    2023-08-03 23:11|00:44
7    2023-08-28 22:18|00:27
8    2023-09-29 23:59|00:31
Name: GOOGLEAPPS 001, dtype: object

1    [2023-04-23 23:21, 01:52]
2    [2023-04-24 23:46, 01:07]
3    [2023-04-25 22:20, 00:21]
4    [2023-05-04 00:05, 00:22]
5    [2023-05-26 00:08, 00:26]
6    [2023-08-03 23:11, 00:44]
7    [2023-08-28 22:18, 00:27]
8    [2023-09-29 23:59, 00:31]
Name: GOOGLEAPPS 001, dtype: object

Unnamed: 0,0,1
1,2023-04-23 23:21,01:52
2,2023-04-24 23:46,01:07
3,2023-04-25 22:20,00:21
4,2023-05-04 00:05,00:22
5,2023-05-26 00:08,00:26
6,2023-08-03 23:11,00:44
7,2023-08-28 22:18,00:27
8,2023-09-29 23:59,00:31


Unnamed: 0_level_0,PANDAS XXX
Time,Unnamed: 1_level_1
2023-04-23 23:21,0 days 01:52:00
2023-04-24 23:46,0 days 01:07:00
2023-04-25 22:20,0 days 00:21:00
2023-05-04 00:05,0 days 00:22:00
2023-05-26 00:08,0 days 00:26:00
2023-08-03 23:11,0 days 00:44:00
2023-08-28 22:18,0 days 00:27:00
2023-09-29 23:59,0 days 00:31:00


3

Unnamed: 0_level_0,GOOGLEAPPS 001,PANDAS 001,PANDAS 002,PANDAS 003,PANDAS 004,PANDAS 006 anki,PANDAS 007 Car,PANDAS 008 anki nlp,PANDAS 009,PYTHON_DABAEZ 001,PYTHON_FROM_DOCS 002,PYTHON_REAL_PYTHON 001,PYTHON_REAL_PYTHON 002
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
06-08-2023,NaT,NaT,NaT,NaT,NaT,0 days 00:32:00,NaT,NaT,NaT,NaT,NaT,NaT,NaT
06-08-2023,NaT,NaT,NaT,NaT,NaT,NaT,0 days 00:09:00,NaT,NaT,NaT,NaT,NaT,NaT
2023-01-06 08:00,NaT,0 days 00:22:00,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT
2023-01-06 13:08,NaT,NaT,0 days 00:12:00,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT
2023-01-07 23:05,NaT,NaT,0 days 00:19:00,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT
2023-01-08 13:11,NaT,0 days 00:22:00,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT
2023-01-10 21:45,NaT,NaT,0 days 00:10:00,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT
2023-01-12 00:52,NaT,0 days 00:31:00,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT
2023-01-12 20:16,NaT,NaT,0 days 00:22:00,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT
2023-01-13 23:41,NaT,NaT,NaT,0 days 00:06:00,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT
