# Pandas Library Cheat Sheet
### This notebook is my attempt to understand and learn the Pandas library.
***
<br>First of all, we are going to import the library:

In [19]:
import pandas as pd

***
# Dataframe Basics:

Dataframe is the main object in the Pandas framework and is a way to represent data with rows and columns.
<br>In order to get the dataframe as a variable we will use the following command:

In [20]:
df = pd.read_csv("C:/Users/mhyms/Documents/Files/Python/Dataframes/Doomsday survial rate.csv")

We are going to put the files directory as a string parameter in the `read_csv()` function.
<br>In this case we are using the survival rate of the people after a apocalyptic scenerio.

In [24]:
df

Unnamed: 0,Name;Date;Survival Rate;Number of People
0,Ankara;1.01.2021;0.5;150000
1,London;1.02.2021;0.4;250000
2,New York;1.03.2021;0.3;15000000
3,Paris;1.04.2021;0.2;5400000
4,Tokyo;1.05.2021;0.1;99000000
5,Moscow;1.06.2021;0.0;34000


### And now we are going to create the same dataframe with python dictionaries:

In [25]:
survivalRate = {
    'City Name' : ["Ankara", "London", "New York", "Paris", "Tokyo"],
    'Date' : ['1.01.2021', '2.01.2021', '3.01.2021', '4.01.2021', '5.01.2021'],
    'Survival Rate' : [0.5, 0.4, 0.3, 0.2, 0.1],
    'Number of People' : [150000, 250000, 15000000, 540000, 99000000]
}

dataFrame = pd.DataFrame(survivalRate)

In [26]:
dataFrame

Unnamed: 0,City Name,Date,Survival Rate,Number of People
0,Ankara,1.01.2021,0.5,150000
1,London,2.01.2021,0.4,250000
2,New York,3.01.2021,0.3,15000000
3,Paris,4.01.2021,0.2,540000
4,Tokyo,5.01.2021,0.1,99000000


To get the shape of the dataframe:

In [30]:
rows, columns = dataFrame.shape
print("rows: ",rows)
print("columns: ", columns)

rows:  5
columns:  4


Usually datasets includes a large amount of data. So viewing all of it is not ideal.
<br>To peak at the contents, we can use `.head()` function.

In [32]:
dataFrame.head(2)

# This shows first 2 rows. If we leave it blank, by default it shows first 5 rows.

Unnamed: 0,City Name,Date,Survival Rate,Number of People
0,Ankara,1.01.2021,0.5,150000
1,London,2.01.2021,0.4,250000


In order to see the last 5, we use `.tail()` function.

## Indexing and Slicing

we can use the same method as the lists in python to see the specific parts of the dataset.
<br>For example if we wanted to illustrate from 2 to 4 (2 is included 4 is not):

In [35]:
dataFrame[2:4]

Unnamed: 0,City Name,Date,Survival Rate,Number of People
2,New York,3.01.2021,0.3,15000000
3,Paris,4.01.2021,0.2,540000


To see all we can use either this: `dataFrame[:]`, or we can type directly the name of the dataset.

Accesing all of the colummns:

In [36]:
dataFrame.columns

Index(['City Name', 'Date', 'Survival Rate', 'Number of People'], dtype='object')

Or to a desired column, for example 'Date':

In [37]:
dataFrame['Date']

0    1.01.2021
1    2.01.2021
2    3.01.2021
3    4.01.2021
4    5.01.2021
Name: Date, dtype: object

In [40]:
# Types
type(dataFrame['Survival Rate'])

pandas.core.series.Series

Showing only desired columns:

In [41]:
dataFrame[['City Name','Survival Rate', 'Number of People']]

Unnamed: 0,City Name,Survival Rate,Number of People
0,Ankara,0.5,150000
1,London,0.4,250000
2,New York,0.3,15000000
3,Paris,0.2,540000
4,Tokyo,0.1,99000000


## Operations

In [53]:
# Maximum in a column:
print('Maximum of the people: ', dataFrame['Number of People'].max())

# Minimum in a column:
print('Minimum of the people: ', dataFrame['Number of People'].min())

# Average of the column:
print('Average of the people: ', dataFrame['Number of People'].mean())

# Standart deviation in a column:
print('Standart Deviation of the people: ', dataFrame['Number of People'].std())

Maximum of the people:  99000000
Minimum of the people:  150000
Average of the people:  22988000.0
Standart Deviation of the people:  42965497.43689697


And there is a `.describe()` function.
<br>This function prints the statistics of the dataset:

In [54]:
dataFrame.describe()

Unnamed: 0,Survival Rate,Number of People
count,5.0,5.0
mean,0.3,22988000.0
std,0.158114,42965500.0
min,0.1,150000.0
25%,0.2,250000.0
50%,0.3,540000.0
75%,0.4,15000000.0
max,0.5,99000000.0
