# A Brief Intro to Pandas

### What does pandas do

- Loads data from files to a common work environment, it can work with file types like .csv, .xlsx, .hd5, SQL files, etc..
- Provides data structures that has many inbuilt methods to interact with data

-----
   ##### It helps in 
- Analysing the data
- Manipulating the Data
- Perform basic visualisations
- Perform Feature Engineering (Nothing but manipulating data based on our analysis).

#### Data Types in Pandas
- Series      (1D)
- Data Frames (2D)
- Panel  (Multi Dimensional)

----

- we work a lot with Series and Data Frames in real world, compared to panels (atleast in the basic level).

#### Some Common points to note 
- Strings are represented as object type in pandas.
- float is a commonly used numercial type compared to integer.
- There are several types of data namely **_Categorical, Nominal, Ordinal_** which are very important to deal with.
- There are **_Outliers_**, which affet the generalisation of the data, so they should be handled properly.
- It is imp to know which operation is **inplace** and which is not (_Remember me if I forget to explain this_) 

In [None]:
# importing necessary libraries with commonly used aliases

import pandas as pd

# don't worry about the following we'll discuss them later.
import numpy as np
from datetime import datetime

## Creation of a Pandas Series Object

In [None]:
# Creating a Series using array
arr = list(range(10,21))
ser_arr = pd.Series(arr)
print(ser_arr)

In [None]:
ser_arr.dtype

In [None]:
ser_arr.shape

In [None]:
# Creating a Series using numpy array
np_arr = np.random.randn(10)
ser_np_arr = pd.Series(np_arr)
print(ser_np_arr)

In [None]:
# creating a Series using dictionary
dic = dict()
for i in range(10):
    dic[chr(ord('a')+i)] = i
print(dic)

ser_dic = pd.Series(dic)

print(ser_dic)

In [None]:
# changing the index of Series
ser_dic.index = [chr(ord('k')+i) for i in range(10)]
# if no. of elements in Index doesn't match with no. of data elements in Series, it would raise an error.

print(ser_dic)

In [None]:
# creating a Series object with heterogenous data
dic = {'Name':'Pardhu','Age':21,'Dept':'CSE','Sem':'VI'}
pardhu = pd.Series(dic)
print(pardhu)

In [None]:
#creating same series as above in a different way
pardhu_way2 = pd.Series(['Pardhu',21,'CSE','VI'], index=['Name','Age','Dept','Sem'])
print(pardhu_way2)

## Accessing elements from a Series Object

In [None]:
# the primary way is to acess it as a dictionary in python 
print(pardhu['Name'])
print(ser_dic['m'])
print(ser_arr[6])

print('----------------')

#using a method called loc --> location
print(pardhu.loc['Name'])
print(ser_dic.loc['m'])
print(ser_arr.loc[6])

In [None]:
# every series object has a 0 based indexing irrespective of what index it has explicitly.
print(pardhu[1])   # our index label 'Age'
print(ser_dic[4])  # our index label 'o'

print('----------------')

#using a method called iloc --> index location
print(pardhu.iloc[1])
print(ser_dic.iloc[4])

In [None]:
# Accessing multiple elements
print(pardhu[[1,2,3]])
print(pardhu.iloc[[1,2,3]])
print('---------------')
print(ser_dic[[0,2,4,6,8]])
print(ser_dic.iloc[[0,2,4,6,8]])
print('---------------')
print(ser_arr[[1,3,5,7,9]])
print(ser_arr.iloc[[1,3,5,7,9]])

In [None]:
# Accesing multiple elements using Slicing (Same a slicing a list in python)
print(pardhu[1:3])
print('---------------')
print(ser_dic[0:9:2])
print('---------------')
print(ser_arr[1:10:2])

In [None]:
# first five elements of a series
print(ser_dic.head())

print('------------')

# if we pass an integer n to head, it would return first n rows
ser_dic.head(3)

In [None]:
# last five elements of a series
print(ser_dic.tail())

print('-------')

#if we pass an integer n to head, it would return last n rows
print(ser_dic.tail(4))

## Modifying elements of a series

In [None]:
ser_dic

In [None]:
ser_dic['m'] = 11
ser_dic.head()

In [None]:
pardhu['Sem'] = 'VIII'
pardhu

In [None]:
ser_dic[['m','n','o','p']] = [11,12,13,14]
ser_dic.head(6)

In [None]:
ser_dic.iloc[2] = 17
ser_dic.head()

In [None]:
ser_dic.loc['k'] = 18
ser_dic.head()

# Mathematical operations

In [None]:
ser_dic

In [None]:
print('Sum:',ser_dic.sum())
print('Mean:',ser_dic.mean())
print('Standard Deviation:',ser_dic.std())

In [None]:
print(ser_dic.max())
print(ser_dic.min())

In [None]:
print(ser_dic.idxmax())
print(ser_dic.idxmin())

In [None]:
all_falses = pd.Series([0]*10)
print(all_falses)

In [None]:
# checking if all the elements are non zeros
print(all_falses.all())

# checking if any one the elements is non Zero
print(all_falses.any())

In [None]:
# let's try changing a value in all_falses
all_falses[4] = 'Pardhu'
all_falses[6] = 7
all_falses[2] = True

# Miscellaneous

In [None]:
ser = pd.Series([1,2,3,np.NaN,4,3,2,1,np.NaN,5,6,4,3,np.NaN,2,1])

In [None]:
ser.isnull()

In [None]:
ser.isnull().any()

In [None]:
ser.isnull().all()

In [None]:
# gives all the elements in the Series only once.
ser.unique()

In [None]:
# no, of elements which occured atleast once
ser.nunique()

In [None]:
# gives each element and the no.of occurances of that element
ser.value_counts()

In [None]:
ser==1

In [None]:
ser[ser<6]
# element with index 8 is missing since it's not <6

In [None]:
ser_dic

In [None]:
# default is ascending
ser_dic = ser_dic.sort_values()
ser_dic

In [None]:
ser_dic = ser_dic.sort_index()
ser_dic

In [None]:
neg_pos = pd.Series(list(range(-5,5)))
neg_pos

In [None]:
neg_pos.abs()

In [None]:
neg_pos.add_prefix('X')

In [None]:
neg_pos.add_suffix('Y')

In [None]:
neg_pos.apply(lambda x: x**2 if x>0 else x**3)

In [None]:
a = pd.Series(list(range(5)))
b = pd.Series(list(range(5,11)))
print(a.append(b)) # doesn't ignore index, keep the series' own index
print(a.append(b, ignore_index=1)) # creates a new index

In [None]:
a.astype('float')

In [None]:
a.between(1,10)

In [None]:
c = pd.Series(list(range(2,10)))

In [None]:
# all the elements that are less than lower are changed to lower
# all the elements that are greater than upper are changed to upper
# all the elements that are in between lower and upper are left as they are.
c.clip(lower=4, upper=7)

In [None]:
# Gives cummulative max upto that index
c.cummax()

In [None]:
# Gives cummulative min upto that index
c.cummin()

# Ok let's move to Data Frames now... Bye :)