# Pandas simple tutorial
This is a simple tutorial I created for consultation.<br>

[Pandas API reference](https://pandas.pydata.org/pandas-docs/stable/reference/index.html)

## Important concepts:
* [Series](https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#series) is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). 
* [Dataframe](https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#dataframe) is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. 

In [48]:
import pandas as pd

In [49]:
contacts = {
            'name': ['Susan Calvin', 'Bently Powell', 'Gregory Powell', 'Mike Donovan'],
            'city': ['London', 'Kathmandu', 'Moskow', 'Bangalore'],
            'phone': ['056152358', '096523995', '895712365', '886549702'],
            'age' : ['28', '42', '66', '67'],
            'e-mail': ['SusanCalvin@email.com', 'BentlyP@email.com', 'GregP14@email.com', 'MDonovan@email.com']
            }

In [50]:
# create a dataframe that can be worked with pandas
df = pd.DataFrame(contacts)

In [51]:
# print the dataframe, that's the best way to do so.
df

Unnamed: 0,name,city,phone,age,e-mail
0,Susan Calvin,London,56152358,28,SusanCalvin@email.com
1,Bently Powell,Kathmandu,96523995,42,BentlyP@email.com
2,Gregory Powell,Moskow,895712365,66,GregP14@email.com
3,Mike Donovan,Bangalore,886549702,67,MDonovan@email.com


In [52]:
# shape of the df (rows, columns)
df.shape

(4, 5)

In [53]:
df.columns

Index(['name', 'city', 'phone', 'age', 'e-mail'], dtype='object')

### Working with columns

In [54]:
# select a column
df['name']

0      Susan Calvin
1     Bently Powell
2    Gregory Powell
3      Mike Donovan
Name: name, dtype: object

In [55]:
# selecting 2 or more columns, notice the extra braket
df[['name', 'city']]

Unnamed: 0,name,city
0,Susan Calvin,London
1,Bently Powell,Kathmandu
2,Gregory Powell,Moskow
3,Mike Donovan,Bangalore


In [56]:
# select an entry on a given column
df['e-mail'][1]

'BentlyP@email.com'

### Renaming columns

In [57]:
# observe that it's a list
df.columns

Index(['name', 'city', 'phone', 'age', 'e-mail'], dtype='object')

In [58]:
# renaming all at once
df.columns = ['NAME', 'CITY', 'PHONE', 'AGE', 'E-MAIL']
df

Unnamed: 0,NAME,CITY,PHONE,AGE,E-MAIL
0,Susan Calvin,London,56152358,28,SusanCalvin@email.com
1,Bently Powell,Kathmandu,96523995,42,BentlyP@email.com
2,Gregory Powell,Moskow,895712365,66,GregP14@email.com
3,Mike Donovan,Bangalore,886549702,67,MDonovan@email.com


In [59]:
# using list comprehension
df.columns = [x.lower() for x in df.columns]
df

Unnamed: 0,name,city,phone,age,e-mail
0,Susan Calvin,London,56152358,28,SusanCalvin@email.com
1,Bently Powell,Kathmandu,96523995,42,BentlyP@email.com
2,Gregory Powell,Moskow,895712365,66,GregP14@email.com
3,Mike Donovan,Bangalore,886549702,67,MDonovan@email.com


In [60]:
# item by item
df.rename(columns = {'name': 'full_name', 'e-mail': 'email'}, inplace=True)
df

Unnamed: 0,full_name,city,phone,age,email
0,Susan Calvin,London,56152358,28,SusanCalvin@email.com
1,Bently Powell,Kathmandu,96523995,42,BentlyP@email.com
2,Gregory Powell,Moskow,895712365,66,GregP14@email.com
3,Mike Donovan,Bangalore,886549702,67,MDonovan@email.com


### Working with rows
The most common way to access rows are from 2 comands **iloc** and **loc**.
* iloc uses numbered index to access items.
* loc can use column labels to access items and it gives more options.

In [61]:
df.iloc[3]

full_name          Mike Donovan
city                  Bangalore
phone                 886549702
age                          67
email        MDonovan@email.com
Name: 3, dtype: object

In [62]:
# to access more than one column or row use the brakets
df.loc[[1,3,0], ['full_name', 'email']]

Unnamed: 0,full_name,email
1,Bently Powell,BentlyP@email.com
3,Mike Donovan,MDonovan@email.com
0,Susan Calvin,SusanCalvin@email.com


In [63]:
# no brakets needed to slicing, differently to the python standard the stop index is inclusive
# [row, column]
df.loc[1:3, 'full_name':'phone']

Unnamed: 0,full_name,city,phone
1,Bently Powell,Kathmandu,96523995
2,Gregory Powell,Moskow,895712365
3,Mike Donovan,Bangalore,886549702


### Indexing

In [64]:
# set email column as index. To apply the change inplace must be True.
df.set_index('email', inplace=True)
df

Unnamed: 0_level_0,full_name,city,phone,age
email,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
SusanCalvin@email.com,Susan Calvin,London,56152358,28
BentlyP@email.com,Bently Powell,Kathmandu,96523995,42
GregP14@email.com,Gregory Powell,Moskow,895712365,66
MDonovan@email.com,Mike Donovan,Bangalore,886549702,67


In [65]:
# now the email is used as the index and loc cannot use the index numbers anymore 
df.loc['GregP14@email.com']

full_name    Gregory Powell
city                 Moskow
phone             895712365
age                      66
Name: GregP14@email.com, dtype: object

In [66]:
df.loc['MDonovan@email.com', ['full_name', 'phone']]

full_name    Mike Donovan
phone           886549702
Name: MDonovan@email.com, dtype: object

In [67]:
# iloc can still be used with the index number
df.iloc[1]

full_name    Bently Powell
city             Kathmandu
phone            096523995
age                     42
Name: BentlyP@email.com, dtype: object

In [68]:
# sorting the index, ascending order is the default
df.sort_index(inplace=True)
df

Unnamed: 0_level_0,full_name,city,phone,age
email,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
BentlyP@email.com,Bently Powell,Kathmandu,96523995,42
GregP14@email.com,Gregory Powell,Moskow,895712365,66
MDonovan@email.com,Mike Donovan,Bangalore,886549702,67
SusanCalvin@email.com,Susan Calvin,London,56152358,28


In [69]:
# sorting the index in the descent order 
df.sort_index(ascending=False, inplace=True)
df

Unnamed: 0_level_0,full_name,city,phone,age
email,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
SusanCalvin@email.com,Susan Calvin,London,56152358,28
MDonovan@email.com,Mike Donovan,Bangalore,886549702,67
GregP14@email.com,Gregory Powell,Moskow,895712365,66
BentlyP@email.com,Bently Powell,Kathmandu,96523995,42


In [70]:
# to reset the index
df.reset_index(inplace=True)
df

Unnamed: 0,email,full_name,city,phone,age
0,SusanCalvin@email.com,Susan Calvin,London,56152358,28
1,MDonovan@email.com,Mike Donovan,Bangalore,886549702,67
2,GregP14@email.com,Gregory Powell,Moskow,895712365,66
3,BentlyP@email.com,Bently Powell,Kathmandu,96523995,42


### Updating rows

In [72]:
df.loc[3]

email        BentlyP@email.com
full_name        Bently Powell
city                 Kathmandu
phone                096523995
age                         42
Name: 3, dtype: object

In [73]:
# updating all items
df.loc[3] = ['MHoward@email.com', 'Mike Howard', 'New Delhi', '225896337', '68']
df

Unnamed: 0,email,full_name,city,phone,age
0,SusanCalvin@email.com,Susan Calvin,London,56152358,28
1,MDonovan@email.com,Mike Donovan,Bangalore,886549702,67
2,GregP14@email.com,Gregory Powell,Moskow,895712365,66
3,MHoward@email.com,Mike Howard,New Delhi,225896337,68


In [74]:
# updating selected items
df.loc[3, ['full_name', 'email']] = ['Mike Donovan', 'MDonovan@email.com']
df

Unnamed: 0,email,full_name,city,phone,age
0,SusanCalvin@email.com,Susan Calvin,London,56152358,28
1,MDonovan@email.com,Mike Donovan,Bangalore,886549702,67
2,GregP14@email.com,Gregory Powell,Moskow,895712365,66
3,MDonovan@email.com,Mike Donovan,New Delhi,225896337,68


### Adding/removing Columns and rows

In [75]:
# splitting column full_name into 2 new columns: first and last
# expand: Boolean value, returns a data frame with different value in different columns if True. Else it returns a series with list of strings.
# https://www.geeksforgeeks.org/python-pandas-split-strings-into-two-list-columns-using-str-split/

df[['first', 'last']] = df['full_name'].str.split(' ', expand=True)
df

Unnamed: 0,email,full_name,city,phone,age,first,last
0,SusanCalvin@email.com,Susan Calvin,London,56152358,28,Susan,Calvin
1,MDonovan@email.com,Mike Donovan,Bangalore,886549702,67,Mike,Donovan
2,GregP14@email.com,Gregory Powell,Moskow,895712365,66,Gregory,Powell
3,MDonovan@email.com,Mike Donovan,New Delhi,225896337,68,Mike,Donovan


In [76]:
# adding items from 2 columns to form a new one
df['full_name_2'] = df['first'] + ' ' + df['last']
df

Unnamed: 0,email,full_name,city,phone,age,first,last,full_name_2
0,SusanCalvin@email.com,Susan Calvin,London,56152358,28,Susan,Calvin,Susan Calvin
1,MDonovan@email.com,Mike Donovan,Bangalore,886549702,67,Mike,Donovan,Mike Donovan
2,GregP14@email.com,Gregory Powell,Moskow,895712365,66,Gregory,Powell,Gregory Powell
3,MDonovan@email.com,Mike Donovan,New Delhi,225896337,68,Mike,Donovan,Mike Donovan


In [77]:
# removing columns
df.drop(columns=['full_name','full_name_2'], inplace=True)
df

Unnamed: 0,email,city,phone,age,first,last
0,SusanCalvin@email.com,London,56152358,28,Susan,Calvin
1,MDonovan@email.com,Bangalore,886549702,67,Mike,Donovan
2,GregP14@email.com,Moskow,895712365,66,Gregory,Powell
3,MDonovan@email.com,New Delhi,225896337,68,Mike,Donovan


In [78]:
# removing rows, to apply use inplace=True
df.drop(index= [1, 3])

Unnamed: 0,email,city,phone,age,first,last
0,SusanCalvin@email.com,London,56152358,28,Susan,Calvin
2,GregP14@email.com,Moskow,895712365,66,Gregory,Powell


### apply and applymap
* apply() - applies a function to each column or row
* applymap() - applies a function to every element of a DataFrame 

In [79]:
# lowering case of all emails usins apply. Notice the function has to be written withouth the end ()
df['email'] = df['email'].apply(str.lower) 
df

Unnamed: 0,email,city,phone,age,first,last
0,susancalvin@email.com,London,56152358,28,Susan,Calvin
1,mdonovan@email.com,Bangalore,886549702,67,Mike,Donovan
2,gregp14@email.com,Moskow,895712365,66,Gregory,Powell
3,mdonovan@email.com,New Delhi,225896337,68,Mike,Donovan


In [80]:
df['email'].apply(len)

0    21
1    18
2    17
3    18
Name: email, dtype: int64

In [81]:
# can also be used with lambda functions
df['email'].apply(lambda x: x.upper())

0    SUSANCALVIN@EMAIL.COM
1       MDONOVAN@EMAIL.COM
2        GREGP14@EMAIL.COM
3       MDONOVAN@EMAIL.COM
Name: email, dtype: object

In [82]:
df.applymap(len)

Unnamed: 0,email,city,phone,age,first,last
0,21,6,9,2,5,6
1,18,9,9,2,4,7
2,17,6,9,2,7,6
3,18,9,9,2,4,7


### Filtering and  "&, |, ~" operations


In [83]:
flt_name = df['first'] == 'Mike'
df.loc[flt_name]

Unnamed: 0,email,city,phone,age,first,last
1,mdonovan@email.com,Bangalore,886549702,67,Mike,Donovan
3,mdonovan@email.com,New Delhi,225896337,68,Mike,Donovan


#### Pandas boolean operators:
* and: &
* or: |
* not: ~

In [87]:
# Not operator
df.loc[~flt_name]

Unnamed: 0,email,city,phone,age,first,last
0,susancalvin@email.com,London,56152358,28,Susan,Calvin
2,gregp14@email.com,Moskow,895712365,66,Gregory,Powell


In [88]:
flt = (df['first'] == 'Mike') & (df['city'] == 'Bangalore')
df.loc[flt]

Unnamed: 0,email,city,phone,age,first,last
1,mdonovan@email.com,Bangalore,886549702,67,Mike,Donovan


In [89]:
flt = (df['first'] == 'Mike') | (df['age'] == '28')
df.loc[flt]

Unnamed: 0,email,city,phone,age,first,last
0,susancalvin@email.com,London,56152358,28,Susan,Calvin
1,mdonovan@email.com,Bangalore,886549702,67,Mike,Donovan
3,mdonovan@email.com,New Delhi,225896337,68,Mike,Donovan


### Concatenating

In [35]:
### types
# apply(sqrt) etc...

df['age'].apply(lambda x:  x**2)

In [None]:
### Map function

In [36]:
### Extra

In [37]:
#.tolist() .to_dict()

In [38]:
# with auxiliary function str
df['email'] = df['email'].str.upper() 
df

Unnamed: 0,email,city,phone,age,first,last
0,SUSANCALVIN@EMAIL.COM,London,56152358,28,Susan,Calvin
1,MDONOVAN@EMAIL.COM,Bangalore,886549702,67,Mike,Donovan
2,GREGP14@EMAIL.COM,Moskow,895712365,66,Gregory,Powell
3,MDONOVAN@EMAIL.COM,225896337,68,MHoward@email.com,Mike,Donovan
