# Pandas Library
**What is pandas for?**  
To manipulate datasets, potentially even very large ones. It’s like Excel but more powerful.


In [11]:
import pandas as pd
import numpy as np
import os

### Dataframe Creation 
The DataFrame is one of the main objects in pandas and can be created from Python lists, dictionaries, or arrays generated with numpy.

In [12]:
trees_names = np.array([
    "Tree1",
    "Tree2",
    "Tree3",
    "Tree4",
    "Tree5",
    "Tree6",
    "Tree7",
    "Tree8",
    "Tree9",
    "Tree10",
    "Tree11",
    "Tree12",
    "Tree13",
    "Tree14",
    "Tree15",
])

facing = np.array([
    "South",
    "Est",
    "West",
    "South",
    "Est",
    "Est",
    "West",
    "Est",
    "Est",
    "North",
    "West",
    "North",
    "North",
    "West",
    "North"
])

height = np.array(
[
    12,
    14,
    16,
    11,
    15,
    17,
    16,
    15,
    16,
    14,
    15,
    13,
    14,
    12,
    14,
])
data = np.column_stack([facing, height])
print(data)
tree_df = pd.DataFrame(data, index=trees_names)
tree_df

[['South' '12']
 ['Est' '14']
 ['West' '16']
 ['South' '11']
 ['Est' '15']
 ['Est' '17']
 ['West' '16']
 ['Est' '15']
 ['Est' '16']
 ['North' '14']
 ['West' '15']
 ['North' '13']
 ['North' '14']
 ['West' '12']
 ['North' '14']]


Unnamed: 0,0,1
Tree1,South,12
Tree2,Est,14
Tree3,West,16
Tree4,South,11
Tree5,Est,15
Tree6,Est,17
Tree7,West,16
Tree8,Est,15
Tree9,Est,16
Tree10,North,14


**Alternativamente si poteva usare il dictionary** in questo modo possiamo già indicare i nomi dei campi

In [23]:
trees_df1 = pd.DataFrame({
    "Name": trees_names,
    "Facing": facing,
    "Heights": height
})
trees_df1

Unnamed: 0,Name,Facing,Heights
0,Tree1,South,12
1,Tree2,Est,14
2,Tree3,West,16
3,Tree4,South,11
4,Tree5,Est,15
5,Tree6,Est,17
6,Tree7,West,16
7,Tree8,Est,15
8,Tree9,Est,16
9,Tree10,North,14


**N.B.** check the difference between the first and second dataframe, one has a numeric index, the other has names for index

### Create a DataFrame 

Use theese array:


In [14]:
# Array student's names 
student_names = np.array([
    "Mario ",
    "Giulia ",
    "Luca ",
    "Anna ",
    "Marco ",
    "Chiara ",
    "Simone ",
    "Laura ",
    "Alessio ",
    "Elena ",
    "Federico ",
    "Valentina ",
    "Giovanni ",
    "Francesca ",
    "Roberto ",
    "Elisa ",
    "Davide ",
    "Martina ",
    "Stefano ",
    "Caterina "
])

# title in the same order as names array
graduated_in = np.array([
    "Laurea in Economia",
    "Laurea in Ingegneria",
    "Laurea in Matematica",
    "Laurea in Lettere",
    "Laurea in Giurisprudenza",
    "Laurea in Psicologia",
    "Laurea in Scienze Politiche",
    "Laurea in Architettura",
    "Laurea in Medicina",
    "Laurea in Chimica",
    "Laurea in Fisica",
    "Laurea in Filosofia",
    "Laurea in Lettere",
    "Laurea in Storia",
    "Laurea in Informatica",
    "Laurea in Biologia",
    "Laurea in Scienze Motorie",
    "Laurea in Scienze della Comunicazione",
    "Laurea in Agraria",
    "Laurea in Scienze dell'Educazione"
])

# Array graduation grades
graduation_grade = np.random.randint(60, 110, size=len(student_names)) 

print(graduation_grade, graduated_in, student_names)

# organize data in a dataframe

[ 77  92  85  68  76  83  66  84  66  76  63  89 107  82  74  78  80  60
  83  75] ['Laurea in Economia' 'Laurea in Ingegneria' 'Laurea in Matematica'
 'Laurea in Lettere' 'Laurea in Giurisprudenza' 'Laurea in Psicologia'
 'Laurea in Scienze Politiche' 'Laurea in Architettura'
 'Laurea in Medicina' 'Laurea in Chimica' 'Laurea in Fisica'
 'Laurea in Filosofia' 'Laurea in Lettere' 'Laurea in Storia'
 'Laurea in Informatica' 'Laurea in Biologia' 'Laurea in Scienze Motorie'
 'Laurea in Scienze della Comunicazione' 'Laurea in Agraria'
 "Laurea in Scienze dell'Educazione"] ['Mario ' 'Giulia ' 'Luca ' 'Anna ' 'Marco ' 'Chiara ' 'Simone ' 'Laura '
 'Alessio ' 'Elena ' 'Federico ' 'Valentina ' 'Giovanni ' 'Francesca '
 'Roberto ' 'Elisa ' 'Davide ' 'Martina ' 'Stefano ' 'Caterina ']


### Importing a DataFrame

A DataFrame can be imported from various sources. Typically, in practice, files in CSV format or spreadsheets (in .xls or .xlsx format) are imported.

To do this, you can use the methods `pd.read_excel()` and `pd.read_csv()`, to which you pass the name or path of the file you are interested in.

**Note**: Importing Excel files can sometimes lead to errors because Excel files may contain elements that cannot be recognized by pandas. In some cases, using the `openpyxl` and `xlrd` libraries resolves the issue (so you need to install them with `pip install openpyxl xlrd`). Other times, you may need to open the file and modify its content or perhaps save it as a CSV.


In [15]:
eta_dipendenti = pd.read_excel('https://github.com/pg-88/IFOA_ML_AI/raw/main/Risorse/Dipendenti_eta.xlsx')
print(eta_dipendenti.head(3))
larici_excel = pd.read_excel('https://github.com/pg-88/IFOA_ML_AI/raw/main/Risorse/Larici-analisiDati.xlsx')
larici_excel

    Dipendente    Età 
0             1     28
1             2     42
2             3     35


Unnamed: 0,Larice#,Circonferenza(cm),Altezza(m),Età(anni),Esposizione
0,Larice1,85,12,40,Sud
1,Larice2,92,14,45,Est
2,Larice3,105,16,50,Ovest
3,Larice4,80,11,38,Sud
4,Larice5,88,15,48,Est
5,Larice6,98,17,53,Est
6,Larice7,90,16,49,Ovest
7,Larice8,100,15,46,Est
8,Larice9,93,16,51,Est
9,Larice10,86,14,43,Nord


In [16]:
# https://github.com/pg-88/IFOA_ML_AI/raw/main/Risorse/dataset/titanic3.xls
titanic_df = pd.read_csv("https://raw.githubusercontent.com/pg-88/IFOA_ML_AI/main/Risorse/dataset/titanic3.csv")
titanic_df

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1,1,"Allen, Miss. Elisabeth Walton",female,29,0,0,24160,2113375,B5,S,2,,"St Louis, MO"
1,1,1,"Allison, Master. Hudson Trevor",male,09167,1,2,113781,1515500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"
2,1,0,"Allison, Miss. Helen Loraine",female,2,1,2,113781,1515500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30,1,2,113781,1515500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25,1,2,113781,1515500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,3,0,"Zabour, Miss. Hileni",female,145,1,0,2665,144542,,C,,328.0,
1305,3,0,"Zabour, Miss. Thamine",female,,1,0,2665,144542,,C,,,
1306,3,0,"Zakarian, Mr. Mapriededer",male,265,0,0,2656,72250,,C,,304.0,
1307,3,0,"Zakarian, Mr. Ortin",male,27,0,0,2670,72250,,C,,,


In [17]:
titanic_df = pd.get_dummies(data=titanic_df, columns=['sex'])
titanic_df

Unnamed: 0,pclass,survived,name,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,sex_female,sex_male
0,1,1,"Allen, Miss. Elisabeth Walton",29,0,0,24160,2113375,B5,S,2,,"St Louis, MO",1,0
1,1,1,"Allison, Master. Hudson Trevor",09167,1,2,113781,1515500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON",0,1
2,1,0,"Allison, Miss. Helen Loraine",2,1,2,113781,1515500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",1,0
3,1,0,"Allison, Mr. Hudson Joshua Creighton",30,1,2,113781,1515500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON",0,1
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",25,1,2,113781,1515500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,3,0,"Zabour, Miss. Hileni",145,1,0,2665,144542,,C,,328.0,,1,0
1305,3,0,"Zabour, Miss. Thamine",,1,0,2665,144542,,C,,,,1,0
1306,3,0,"Zakarian, Mr. Mapriededer",265,0,0,2656,72250,,C,,304.0,,0,1
1307,3,0,"Zakarian, Mr. Ortin",27,0,0,2670,72250,,C,,,,0,1


## First Look at the Data
There are several methods that allow us to understand the structure and content of a DataFrame in Pandas. Here are some of the most common methods:

- `head()`: This method returns the first *n* rows of the DataFrame. By default, it returns the first 5 rows, but you can specify a different number by passing an argument to `head(n)`.

- `tail()`: This method returns the last *n* rows of the DataFrame. By default, it returns the last 5 rows, but you can specify a different number by passing an argument to `tail(n)`.

- `info()`: This method provides concise information about the DataFrame, including the number of rows and columns, column names, data types of each column, and the number of non-null values.

- `describe()`: This method provides descriptive statistics for the numerical columns of the DataFrame, such as count, mean, standard deviation, minimum, quartiles, and maximum values.

**There are also several attributes that help us understand the structure of the DataFrame**

- `shape`: This attribute returns a tuple representing the dimensions of the DataFrame, i.e., the number of rows and columns.

- `columns`: This attribute returns a list of the column names in the DataFrame.

- `index`: This attribute returns a list of the index names in the DataFrame.


In [18]:
# try to use theese methods on the trees df



## Index
When not explicitly assigned, the index is a range of integer numbers automatically assigned by pandas, similar to Excel row numbers.

In any case, it is possible to set a custom index using a field from the DataFrame with the method `set_index(<field_name>, drop=<True or False>)`. The `drop` parameter defaults to `True` and determines whether to remove the field from the DataFrame (using it only as the index) or to keep it in the DataFrame.

Alternatively, you can directly assign values to the `index` attribute.


In [19]:
# assign an index for the tree df 



## Columns
For columns, we can assign values directly to the `columns` attribute. For example, to rename a single column, we can retrieve the current column names with `df.columns` and assign them to a variable, change the value in the array at the desired index, and then reassign the modified array back to `df.columns`.


In [20]:
# Change the name of the column Facing in Orientation


### Rename
The `rename()` method in Pandas allows you to rename row indices, column labels, or both within a DataFrame. It can be used to rename individual elements, all elements, or a selection of specific elements using mapping dictionaries.

- **Rename row indices**: You can use the `index` parameter to rename row indices. Specify a dictionary that maps the current row indices to the new row index names.
- **Rename columns**: Using the `columns` parameter, you can rename the columns in the DataFrame. Specify a dictionary that maps the current column names to the new column names.

If you set `inplace=True`, the change will be applied directly to the DataFrame without needing to assign the output to a new DataFrame. If `inplace=False` (the default), a new DataFrame with the renamed indices or columns will be returned without modifying the original.


In [22]:
# example 

print(tree_df) # doesn't have columns name

# try to add names for collumns

#tree_df.rename(#, inplace=True)
# larici_df


            0   1
Tree1   South  12
Tree2     Est  14
Tree3    West  16
Tree4   South  11
Tree5     Est  15
Tree6     Est  17
Tree7    West  16
Tree8     Est  15
Tree9     Est  16
Tree10  North  14
Tree11   West  15
Tree12  North  13
Tree13  North  14
Tree14   West  12
Tree15  North  14


## DataFrame - Series

Dataframes are made of Series. To get a serie from a dataframe just select a colum of the dataframe.

Working on a single serie can be useful to understand all the operation possible in a dataframe.

In [24]:
# extract the graduation grades serie from the students dataframe

## Creare e manipolare pd.Series
Da un dataframe estrapoliamo un campo particolare che ci serve:


In [26]:
# let's load a datframe from a csv 
oil = pd.read_csv("https://frenzy86.s3.eu-west-2.amazonaws.com/python/data/retail/oil.csv")
oil

Unnamed: 0,date,dcoilwtico
0,2013-01-01,
1,2013-01-02,93.14
2,2013-01-03,92.97
3,2013-01-04,93.12
4,2013-01-07,93.20
...,...,...
1213,2017-08-25,47.65
1214,2017-08-28,46.40
1215,2017-08-29,46.46
1216,2017-08-30,45.96


In [28]:
# get the prices serie
oil_serie = oil.dcoilwtico 
# or (even better) 
oil_serie = oil['dcoilwtico']
oil_serie


0         NaN
1       93.14
2       92.97
3       93.12
4       93.20
        ...  
1213    47.65
1214    46.40
1215    46.46
1216    45.96
1217    47.26
Name: dcoilwtico, Length: 1218, dtype: float64

In [34]:
# get an array from a df 
arr_from_df = np.array(oil['dcoilwtico'])
# oil_serie = pd.Series() # like a dataframe we can assign name to the column
oil_serie

Series([], dtype: float64)

series **not the same as** array

In [35]:
print(oil_serie.values) # np array 
print(oil_serie.index) # index of the serie
# oil_array
# oil_array.mean() 

[]
Index([], dtype='object')


### Drop missing values 

Pandas uses different sentinel values to represent a missing (also referred to as NA) depending on the data type.

NaN is one case and it means Not a Number in a serie of numbers a string can be consider na.

There are 2 main pandas methods to deal with na `isna()` to find it and `dropna()` to delete the incriminated row

In [31]:
# find if there are any na and drop it


### Data Type

To find out the data type of a serie call the property `dtype`

[link to documentation for dtype](https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#dtypes)

In [18]:
# oil_series.dtype



Using `astype()` it is possible to assign a diffrent data type to a Serie (cast)

In [19]:
# cast the price data to integer


## Accessing Values

To select the values of a series, we use `iloc[]`, which locates the values based on the index.


In [36]:
# Select the first 10 elements of the series `oil_series`

# Select the last 10 values

# Select values from the fifth to the twentieth and calculate the mean, mode, and median


In addition to `iloc`, there is also `loc`, which retrieves values based on the label of the index.


In [21]:
# select the prices from 2015



The `value_counts()` method counts the occurrences of values in a series; these are the absolute frequencies.


### Sorting Values

The `sort_values()` method orders the values, while `sort_index()` orders them by index. You can pass the arguments `inplace` (True or False) and `ascending` (True or False).


In [22]:
# sort in ascending order then grab first 5 rows to retrieve 5 highest prices

## Filtering Values

We can create boolean arrays by performing boolean operations. This array can then be used to extract data from the original series.


In [23]:
# flter all the value greater of the mean


## Other Functions

You can calculate maximum values, products, sums, and more...

Other useful methods include 

`serie.fillna(serie.median())`, which allows you to insert the median value whenever it encounters NaN.


## DataFrame

In [39]:
path = "https://frenzy86.s3.eu-west-2.amazonaws.com/python/data/retail/transactions.csv"

transactions = pd.read_csv(path)
transactions.dtypes

date            object
store_nbr        int64
transactions     int64
dtype: object

**loc** e **iloc** possono prendere più parametri 

In [25]:
# select the first 5 rows

# select only the dates from the first 5 rows


In [26]:
# calculate sum of store number column 


## Drop 

Sometimes we need to delete part of a df, to do so there is the method  `drop()`.

The parameters are `label` and `axis` to select the pieces and indicate whether it's rows (axis 0) or columns (axis 1).


In [27]:
# delete first row
# transactions.drop() 
# delete column 'date'

### Removing Duplicates

The `drop_duplicates()` method removes duplicate values. You need to pass `subset`, which indicates the name of the column to operate on, and `keep`, which specifies whether to keep the first occurrence of the value or the last.


In [40]:
# drop duplicates on "store_nbr"

## Sorting

Since there are multiple fields, sorting becomes more complex, and I can specify multiple fields to sort within the same DataFrame.

Method is called `sort_values()`.

In [127]:
# sort date ascending and transaction descending
transactions

Unnamed: 0,date,store_nbr,transactions
0,2013-01-01,25,770
1,2013-01-02,1,2111
2,2013-01-02,2,2358
3,2013-01-02,3,3487
4,2013-01-02,4,1922
...,...,...,...
83483,2017-08-15,50,2804
83484,2017-08-15,51,1573
83485,2017-08-15,52,2255
83486,2017-08-15,53,932


## Creare nuove colonne

Possiamo compiere operazioni sulle serie e assegnarle al dataframe per creare nuovi campi calcolati

Unnamed: 0,date,store_nbr,transactions,percentuale
0,2013-01-01,25,770,0.000544
1,2013-01-02,1,2111,0.001492
2,2013-01-02,2,2358,0.001667
3,2013-01-02,3,3487,0.002465
4,2013-01-02,4,1922,0.001359
...,...,...,...,...
83483,2017-08-15,50,2804,0.001982
83484,2017-08-15,51,1573,0.001112
83485,2017-08-15,52,2255,0.001594
83486,2017-08-15,53,932,0.000659


## group by
Raggruppa i dati in funzione di un dato campo 


In [139]:
titanic_df.groupby('sex').mean()
titanic_df

  titanic_df.groupby('sex').mean()


Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1,1,"Allen, Miss. Elisabeth Walton",female,29,0,0,24160,2113375,B5,S,2,,"St Louis, MO"
1,1,1,"Allison, Master. Hudson Trevor",male,09167,1,2,113781,1515500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"
2,1,0,"Allison, Miss. Helen Loraine",female,2,1,2,113781,1515500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30,1,2,113781,1515500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25,1,2,113781,1515500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,3,0,"Zabour, Miss. Hileni",female,145,1,0,2665,144542,,C,,328.0,
1305,3,0,"Zabour, Miss. Thamine",female,,1,0,2665,144542,,C,,,
1306,3,0,"Zakarian, Mr. Mapriededer",male,265,0,0,2656,72250,,C,,304.0,
1307,3,0,"Zakarian, Mr. Ortin",male,27,0,0,2670,72250,,C,,,


aggregazione 

In [141]:
data = {'Nome': ['Alice', 'Bob', 'Charlie', 'Alice', 'Bob', 'Charlie'],
        'Città': ['Roma', 'Milano', 'Napoli', 'Roma', 'Milano', 'Napoli'],
        'Vendite': [100, 200, 150, 120, 250, 180],
        'Profitto': [20, 30, 25, 22, 35, 28]}

df = pd.DataFrame(data)
print(df)

      Nome   Città  Vendite  Profitto
0    Alice    Roma      100        20
1      Bob  Milano      200        30
2  Charlie  Napoli      150        25
3    Alice    Roma      120        22
4      Bob  Milano      250        35
5  Charlie  Napoli      180        28


In [142]:
# Raggruppiamo i dati per nome e calcoliamo la somma delle vendite e la media del profitto
result = df.groupby('Nome').agg({'Vendite': 'sum', 'Profitto': 'mean'})

print(result)

         Vendite  Profitto
Nome                      
Alice        220      21.0
Bob          450      32.5
Charlie      330      26.5


In [140]:
titanic_df.pivot_table(index="name")

  titanic_df.pivot_table(index="name")


Unnamed: 0_level_0,body,parch,pclass,sibsp,survived
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"Abbing, Mr. Anthony",,0,3,0,0.0
"Abbott, Master. Eugene Joseph",,2,3,0,0.0
"Abbott, Mr. Rossmore Edward",190.0,1,3,1,0.0
"Abbott, Mrs. Stanton (Rosa Hunt)",,1,3,1,1.0
"Abelseth, Miss. Karen Marie",,0,3,0,1.0
...,...,...,...,...,...
"del Carlo, Mrs. Sebastiano (Argenia Genovesi)",,0,2,1,1.0
"van Billiard, Master. James William",,1,3,1,0.0
"van Billiard, Master. Walter John",1.0,1,3,1,0.0
"van Billiard, Mr. Austin Blyler",255.0,2,3,0,0.0


In [33]:
titanic_df["sex"].astype('category')
serie_eta = pd.Series(titanic_df['age']).str.replace(',', '.').astype('float')
serie_eta_int = serie_eta.round(0)
serie_eta_int.dropna().astype('int')
titanic_df["eta_int"] = serie_eta_int
titanic_df['eta_int'].value_counts().sort_index().iloc[-20:]

53.0     4
54.0    10
55.0     8
56.0     5
57.0     5
58.0     6
59.0     3
60.0     8
61.0     5
62.0     5
63.0     4
64.0     5
65.0     3
66.0     1
67.0     1
70.0     3
71.0     2
74.0     1
76.0     1
80.0     1
Name: eta_int, dtype: int64

In [43]:
titanic_df['eta_range'] = pd.cut(titanic_df['eta_int'], bins=[0, 10, 20, 30, 40, 50, 60, 100])
titanic_df[['age','sex','pclass', 'eta_range']]


Unnamed: 0,age,sex,pclass,eta_range
0,29,female,1,"(20.0, 30.0]"
1,09167,male,1,"(0.0, 10.0]"
2,2,female,1,"(0.0, 10.0]"
3,30,male,1,"(20.0, 30.0]"
4,25,female,1,"(20.0, 30.0]"
...,...,...,...,...
1304,145,female,3,"(10.0, 20.0]"
1305,,female,3,
1306,265,male,3,"(20.0, 30.0]"
1307,27,male,3,"(20.0, 30.0]"


In [45]:
titanic_df.info()
titanic_df.replace()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 16 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   pclass     1309 non-null   int64   
 1   survived   1309 non-null   int64   
 2   name       1309 non-null   object  
 3   sex        1309 non-null   object  
 4   age        1046 non-null   object  
 5   sibsp      1309 non-null   int64   
 6   parch      1309 non-null   int64   
 7   ticket     1309 non-null   object  
 8   fare       1308 non-null   object  
 9   cabin      295 non-null    object  
 10  embarked   1307 non-null   object  
 11  boat       486 non-null    object  
 12  body       121 non-null    float64 
 13  home.dest  745 non-null    object  
 14  eta_int    1046 non-null   float64 
 15  eta_range  1043 non-null   category
dtypes: category(1), float64(2), int64(4), object(9)
memory usage: 155.2+ KB


0       211,3375
1       151,5500
2       151,5500
3       151,5500
4       151,5500
          ...   
1304     14,4542
1305     14,4542
1306      7,2250
1307      7,2250
1308      7,8750
Name: fare, Length: 1309, dtype: object