# This is a Pandas tutorial from:

[Tutorial](https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/)

### Instalation prep on Terminal:
1. mkdir pandas-tutorial, cd into
1. poetry init -n  
1. poetry add pandas jupyterlab  
1. poetry shell    
1. jupyter lab  

and we are here!

### We need to import pandas

In [21]:
import pandas as pd

#### Let's say we have a fruit stand that sells apples and oranges. We want to have a column for each fruit and a row for each customer purchase. To organize this as a dictionary for pandas we could do something like:

In [22]:
data = {'apples' : [2,3,0,1], 
       'oranges': [0,3,7,2]}
data

{'apples': [2, 3, 0, 1], 'oranges': [0, 3, 7, 2]}

### Pass the data to a contructor

In [23]:
purchases = pd.DataFrame(data)
purchases

Unnamed: 0,apples,oranges
0,2,0
1,3,3
2,0,7
3,1,2


### Change the orinal DS indices (0,1,2,30

In [24]:
purchases = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David'])
purchases

Unnamed: 0,apples,oranges
June,2,0
Robert,3,3
Lily,0,7
David,1,2


### **Loc**ate a row using his name

In [25]:
purchases.loc['June']

apples     2
oranges    0
Name: June, dtype: int64

### Load a CSV

In [40]:
movies_df = pd.read_csv('./IMDB-Movie-Data.csv', index_col='Title')
# movies_df.tail(10)
# movies_df.info
movies_df.shape
# movies_df.columns

(1000, 11)

### Force having duplicates, by appending the dataframe to itself

In [27]:
# Using append() will return a copy without affecting the original DataFrame.. eventhough we are using movies_df.append, the append ocurrs only on temp_df
temp_df = movies_df.append(movies_df)
# movies_df.shape
temp_df.shape

(2000, 11)

In [28]:
# Just like append(), the drop_duplicates() method will also return a copy of your DataFrame, but this time with duplicates removed
# inplace keyword argument on many of its methods. Using inplace=True will modify the DataFrame object in place:
# Keep will drop all duplicates. If two rows are the same then both will be dropped. Watch what happens to        
temp_df.drop_duplicates(inplace=True, keep=False)
temp_df.shape

(0, 11)

In [29]:
temp_df.shape

(0, 11)

# Columns manipulation

In [41]:
movies_df.columns

Index(['Rank', 'Genre', 'Description', 'Director', 'Actors', 'Year',
       'Runtime (Minutes)', 'Rating', 'Votes', 'Revenue (Millions)',
       'Metascore'],
      dtype='object')

### Rename a column using dict
We don't want parentheses, so let's rename those:

In [47]:
# object.rename( what_element = {dict of the key-value}, inplace=True)
# we can rename certains columns
movies_df.rename(columns={
    'Runtime (Minutes)' : 'Runtime',
    'Revenue (Millions)' : 'Revenue_Millions'
}, inplace=True)

movies_df.columns

Index(['Rank', 'Genre', 'Description', 'Director', 'Actors', 'Year', 'Runtime',
       'Rating', 'Votes', 'Revenue_Millions', 'Metascore'],
      dtype='object')

### Rename using a list

In [49]:
movies_df.columns = ['rank', 'genre', 'description', 'director', 'actors', 'year', 'runtime', 
                     'rating', 'votes', 'revenue_millions', 'metascore']
movies_df

Unnamed: 0_level_0,rank,genre,description,director,actors,year,runtime,rating,votes,revenue_millions,metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
Suicide Squad,5,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0
...,...,...,...,...,...,...,...,...,...,...,...
Secret in Their Eyes,996,"Crime,Drama,Mystery","A tight-knit team of rising investigators, alo...",Billy Ray,"Chiwetel Ejiofor, Nicole Kidman, Julia Roberts...",2015,111,6.2,27585,,45.0
Hostel: Part II,997,Horror,Three American college students studying abroa...,Eli Roth,"Lauren German, Heather Matarazzo, Bijou Philli...",2007,94,5.5,73152,17.54,46.0
Step Up 2: The Streets,998,"Drama,Music,Romance",Romantic sparks occur between two dance studen...,Jon M. Chu,"Robert Hoffman, Briana Evigan, Cassie Ventura,...",2008,98,6.2,70699,58.01,50.0
Search Party,999,"Adventure,Comedy",A pair of friends embark on a mission to reuni...,Scot Armstrong,"Adam Pally, T.J. Miller, Thomas Middleditch,Sh...",2014,93,5.6,4881,,22.0


### Rename columns using list comprenhension

In [50]:
movies_df.columns = [col.upper() for col in movies_df]

movies_df.head(5)

Unnamed: 0_level_0,RANK,GENRE,DESCRIPTION,DIRECTOR,ACTORS,YEAR,RUNTIME,RATING,VOTES,REVENUE_MILLIONS,METASCORE
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
Suicide Squad,5,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0


# Missing Values
python: none and numpy: np.nan

There are two options in dealing with nulls:

1. Get rid of rows or columns with nulls
1. Replace nulls with non-null values, a technique known as imputation

In [51]:
# step 1, check for nulls
movies_df.isnull()

Unnamed: 0_level_0,RANK,GENRE,DESCRIPTION,DIRECTOR,ACTORS,YEAR,RUNTIME,RATING,VOTES,REVENUE_MILLIONS,METASCORE
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Guardians of the Galaxy,False,False,False,False,False,False,False,False,False,False,False
Prometheus,False,False,False,False,False,False,False,False,False,False,False
Split,False,False,False,False,False,False,False,False,False,False,False
Sing,False,False,False,False,False,False,False,False,False,False,False
Suicide Squad,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...
Secret in Their Eyes,False,False,False,False,False,False,False,False,False,True,False
Hostel: Part II,False,False,False,False,False,False,False,False,False,False,False
Step Up 2: The Streets,False,False,False,False,False,False,False,False,False,False,False
Search Party,False,False,False,False,False,False,False,False,False,True,False
