# Pandas Library Cheat Sheet
### This notebook is my attempt to understand and learn the Pandas library.
***
<br>First of all, we are going to import the library:

In [47]:
import pandas as pd

***
# Dataframe Basics:

Dataframe is the main object in the Pandas framework and is a way to represent data with rows and columns.
<br>In order to get the dataframe as a variable we will use the following command:

In [48]:
df = pd.read_csv("C:/Users/mhyms/Documents/Files/Python/Dataframes/Doomsday survial rate.csv")

We are going to put the files directory as a string parameter in the `read_csv()` function.
<br>In this case we are using the survival rate of the people after a apocalyptic scenerio.

In [49]:
df

Unnamed: 0,Name;Date;Survival Rate;Number of People;Temperature;Bitcoin
0,Ankara;1.01.2021;0.5;150000;15;45000
1,London;1.02.2021;0.4;250000;25;85000
2,New York;1.03.2021;0.3;15000000;30;90000
3,Paris;1.04.2021;0.2;5400000;10;20000
4,Tokyo;1.05.2021;0.1;99000000;55;60000
5,Moscow;1.06.2021;0.0;34000;-20;150000


### And now we are going to create the same dataframe with python dictionaries:

In [50]:
survivalRate = {
    'City Name' : ["Ankara", "London", "New York", "Paris", "Tokyo", "Moscow"],
    'Date' : ['1.01.2021', '2.01.2021', '3.01.2021', '4.01.2021', '5.01.2021', '6.01.2021'],
    'Survival Rate' : [0.5, 0.4, 0.3, 0.2, 0.1, 0.0],
    'Number of People' : [150000, 250000, 15000000, 540000, 99000000, 34000],
    'Temperature' : [15, 25, 30, 10, 55, -20]
}

dataFrame = pd.DataFrame(survivalRate)

In [71]:
dataFrame

Unnamed: 0,Date,City Name,Survival Rate,Number of People,Temperature
0,1.01.2021,Ankara,0.5,150000,15
1,2.01.2021,London,0.4,250000,25
2,3.01.2021,New York,0.3,15000000,30
3,4.01.2021,Paris,0.2,540000,10
4,5.01.2021,Tokyo,0.1,99000000,55
5,6.01.2021,Moscow,0.0,34000,-20


To get the shape of the dataframe:

In [52]:
rows, columns = dataFrame.shape
print("rows: ",rows)
print("columns: ", columns)

rows:  6
columns:  5


Usually datasets includes a large amount of data. So viewing all of it is not ideal.
<br>To peak at the contents, we can use `.head()` function.

In [53]:
dataFrame.head(2)

# This shows first 2 rows. If we leave it blank, by default it shows first 5 rows.

Unnamed: 0,City Name,Date,Survival Rate,Number of People,Temperature
0,Ankara,1.01.2021,0.5,150000,15
1,London,2.01.2021,0.4,250000,25


In order to see the last 5, we use `.tail()` function.

## Indexing and Slicing

we can use the same method as the lists in python to see the specific parts of the dataset.
<br>For example if we wanted to illustrate from 2 to 4 (2 is included 4 is not):

In [54]:
dataFrame[2:4]

Unnamed: 0,City Name,Date,Survival Rate,Number of People,Temperature
2,New York,3.01.2021,0.3,15000000,30
3,Paris,4.01.2021,0.2,540000,10


To see all we can use either this: `dataFrame[:]`, or we can type directly the name of the dataset.

Accesing all of the colummns:

In [55]:
dataFrame.columns

Index(['City Name', 'Date', 'Survival Rate', 'Number of People',
       'Temperature'],
      dtype='object')

Or to a desired column, for example 'Date':

In [56]:
dataFrame['Date']

0    1.01.2021
1    2.01.2021
2    3.01.2021
3    4.01.2021
4    5.01.2021
5    6.01.2021
Name: Date, dtype: object

In [57]:
# Types
type(dataFrame['Survival Rate'])

pandas.core.series.Series

Showing only desired columns:

In [58]:
dataFrame[['City Name','Survival Rate', 'Number of People']]

Unnamed: 0,City Name,Survival Rate,Number of People
0,Ankara,0.5,150000
1,London,0.4,250000
2,New York,0.3,15000000
3,Paris,0.2,540000
4,Tokyo,0.1,99000000
5,Moscow,0.0,34000


## Operations

In [59]:
# Maximum in a column:
print('Maximum of the people: ', dataFrame['Number of People'].max())

# Minimum in a column:
print('Minimum of the people: ', dataFrame['Number of People'].min())

# Average of the column:
print('Average of the people: ', dataFrame['Number of People'].mean())

# Standart deviation in a column:
print('Standart Deviation of the people: ', dataFrame['Number of People'].std())

Maximum of the people:  99000000
Minimum of the people:  34000
Average of the people:  19162333.333333332
Standart Deviation of the people:  39555549.90979985


And there is a `.describe()` function.
<br>This function prints the statistics of the dataset:

In [60]:
dataFrame.describe()

Unnamed: 0,Survival Rate,Number of People,Temperature
count,6.0,6.0,6.0
mean,0.25,19162330.0,19.166667
std,0.187083,39555550.0,24.782386
min,0.0,34000.0,-20.0
25%,0.125,175000.0,11.25
50%,0.25,395000.0,20.0
75%,0.375,11385000.0,28.75
max,0.5,99000000.0,55.0


In [61]:
# Filtering

dataFrame[dataFrame.Temperature == dataFrame.Temperature.max()]

Unnamed: 0,City Name,Date,Survival Rate,Number of People,Temperature
4,Tokyo,5.01.2021,0.1,99000000,55


In [62]:
dataFrame[dataFrame.Temperature >= 25]

Unnamed: 0,City Name,Date,Survival Rate,Number of People,Temperature
1,London,2.01.2021,0.4,250000,25
2,New York,3.01.2021,0.3,15000000,30
4,Tokyo,5.01.2021,0.1,99000000,55


In [63]:
# If we wanted to print out the day whic temperature was maximum.

dataFrame['Date'][dataFrame.Temperature == dataFrame.Temperature.max()]

4    5.01.2021
Name: Date, dtype: object

We can add other columns next to see other types if we want to.
<br>For example:

In [64]:
dataFrame[['Date','Temperature']][dataFrame.Temperature == dataFrame.Temperature.max()]

# Be careful that we had to add another bracket to the first array

Unnamed: 0,Date,Temperature
4,5.01.2021,55


To see more operations:
<br><br>-> https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html
<br>-> https://pandas.pydata.org/docs/reference/api/pandas.Series.html

## set_index() and reset_index() methods

In [65]:
dataFrame.index

RangeIndex(start=0, stop=6, step=1)

In order to change the index to another type of variable is as follows.
<br>lets say we want to change the index to 'Date',
<br>`dataFrame.set_index('Date')`

Now this function does not change the original dataset, it returns a new one.
<br>To change it we have to add another parameter called "inplace":
<br>`dataFrame.set_index('Date', inplace = True)`


In [66]:
dataFrame.set_index('Date', inplace = True)

dataFrame

Unnamed: 0_level_0,City Name,Survival Rate,Number of People,Temperature
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1.01.2021,Ankara,0.5,150000,15
2.01.2021,London,0.4,250000,25
3.01.2021,New York,0.3,15000000,30
4.01.2021,Paris,0.2,540000,10
5.01.2021,Tokyo,0.1,99000000,55
6.01.2021,Moscow,0.0,34000,-20


In [67]:
# To call a specisif row according to the index:

dataFrame.loc['4.01.2021']

City Name            Paris
Survival Rate          0.2
Number of People    540000
Temperature             10
Name: 4.01.2021, dtype: object

If we want to change the index to the original one:

In [68]:
dataFrame.reset_index(inplace = True)

In [69]:
dataFrame

Unnamed: 0,Date,City Name,Survival Rate,Number of People,Temperature
0,1.01.2021,Ankara,0.5,150000,15
1,2.01.2021,London,0.4,250000,25
2,3.01.2021,New York,0.3,15000000,30
3,4.01.2021,Paris,0.2,540000,10
4,5.01.2021,Tokyo,0.1,99000000,55
5,6.01.2021,Moscow,0.0,34000,-20


### This is the end of the tutorial 1
***
