<a href="https://colab.research.google.com/github/subhajitmajumder/Python_Programming/blob/master/me_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Pandas**

- Pandas is an open source library which is on top of Numpy.There are some advanced features from Numpy Library.
- Allows Data Cleaning, fast analysis, and Data Preparation.
- It also enhances performance and productivity.
- It has some built-in visualisation features.

Installation: **pip install pandas**



####Topics

- Series
- Dataframe
- Missing Data
- GroupBy
- Merging, Joining, Concatenating
- Operations
- Data input & output

###**Series**

In [0]:
import numpy as np
import pandas as pd

In [0]:
#Create various series from various object types.

data_1 = [50, 60, 70]
labels = ['a', 'b', 'c']
arr = np.array(data_1)
dic = {'a': 10, 'b': 20, 'c': 30}

In [3]:
pd.Series(data= data_1)  #showing all the data in data_1 with proper index. 

0    50
1    60
2    70
dtype: int64

In [4]:
pd.Series(data= data_1, index= labels) #Set 'labels' list as index.

a    50
b    60
c    70
dtype: int64

In [8]:
pd.Series(data_1, labels) #No need to cast.

a    50
b    60
c    70
dtype: int64

In [9]:
#Passing any numpy array through pd.

pd.Series(data=arr)

0    50
1    60
2    70
dtype: int64

In [11]:
pd.Series(arr, labels)  #No need to cast once again.

a    50
b    60
c    70
dtype: int64

In [13]:
#Passing dictionaries in pandas

pd.Series(dic) #Index in left and values in right

a    10
b    20
c    30
dtype: int64

We can pass any type of data object as well as built-in functions.

In [16]:
pd.Series(data = [print, len, sum])

0    <built-in function print>
1      <built-in function len>
2      <built-in function sum>
dtype: object

In [5]:
series1 = pd.Series([1, 2, 3, 4], ['Pizza', 'chicken', 'Rice', 'Ghee'])
print(series1) #Here names are the indexes hich are the strings.

Pizza      1
chicken    2
Rice       3
Ghee       4
dtype: int64


In [51]:
series2 = pd.Series([2, 1, 4, 3], ['Rice', 'Pizza', 'Mutton', 'Ghee'])
print(series2)

Rice      2
Pizza     1
Mutton    4
Ghee      3
dtype: int64


In [52]:
#Can show the values by searching indexes.

series1['Rice']

3

In [53]:
series2['Ghee']

3

In [54]:
series1 + series2 #Adds the values of both series when it gets common else shows NaN

Ghee       7.0
Mutton     NaN
Pizza      2.0
Rice       5.0
chicken    NaN
dtype: float64

###**Dataframes**

In [0]:
from numpy.random import randn

In [0]:
np.random.seed(101) #.seed helps to provide same random numbers everytime.

- Creating a DataFrame

In [0]:
df = pd.DataFrame(randn(5, 4), ['A', 'B', 'C', 'D', 'E'],['W', 'X', 'Y', 'Z'])

In [22]:
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [25]:
type(df)

pandas.core.frame.DataFrame

- Indexing & Selection

In [27]:
df['W']  #Gets the W column.

A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64

In [28]:
type(df['W'])

pandas.core.series.Series

In [29]:
#Get multiple columns.
df[['W', 'Z']]

Unnamed: 0,W,Z
A,2.70685,0.503826
B,0.651118,0.605965
C,-2.018168,-0.589001
D,0.188695,0.955057
E,0.190794,0.683509


In [34]:
#Create new columns

df['New_Col'] = df['W'] + df['Z']  #need to add previous columns(any) to create a new column.
df

Unnamed: 0,W,X,Y,Z,New_Column,New_Col
A,2.70685,0.628133,0.907969,0.503826,3.334983,3.210676
B,0.651118,-0.319318,-0.848077,0.605965,0.3318,1.257083
C,-2.018168,0.740122,0.528813,-0.589001,-1.278046,-2.607169
D,0.188695,-0.758872,-0.933237,0.955057,-0.570177,1.143752
E,0.190794,1.978757,2.605967,0.683509,2.169552,0.874303


In [37]:
#Delete a column.

df.drop('New_Column', axis=1) #Axis method refers to rows or columns in a sheet.axis=0 is row and axis=1 is column

Unnamed: 0,W,X,Y,Z,New_Col
A,2.70685,0.628133,0.907969,0.503826,3.210676
B,0.651118,-0.319318,-0.848077,0.605965,1.257083
C,-2.018168,0.740122,0.528813,-0.589001,-2.607169
D,0.188695,-0.758872,-0.933237,0.955057,1.143752
E,0.190794,1.978757,2.605967,0.683509,0.874303


In [39]:
df

Unnamed: 0,W,X,Y,Z,New_Column,New_Col
A,2.70685,0.628133,0.907969,0.503826,3.334983,3.210676
B,0.651118,-0.319318,-0.848077,0.605965,0.3318,1.257083
C,-2.018168,0.740122,0.528813,-0.589001,-1.278046,-2.607169
D,0.188695,-0.758872,-0.933237,0.955057,-0.570177,1.143752
E,0.190794,1.978757,2.605967,0.683509,2.169552,0.874303


- **Drop** method doesnt affects the original dataframe.Pandas doesnt help users to loose data during any adjustments of Dataset.So to drop a column from original dataset there is a method called **inplace** which will drop the column from the original dataset when it is being set to True. 

In [0]:
df.drop('New_Column', axis=1, inplace=True)

In [41]:
df

Unnamed: 0,W,X,Y,Z,New_Col
A,2.70685,0.628133,0.907969,0.503826,3.210676
B,0.651118,-0.319318,-0.848077,0.605965,1.257083
C,-2.018168,0.740122,0.528813,-0.589001,-2.607169
D,0.188695,-0.758872,-0.933237,0.955057,1.143752
E,0.190794,1.978757,2.605967,0.683509,0.874303


In [42]:
#Dropping a row.

df.drop('E') #Here no need to use axis=0 as it is the default value.

Unnamed: 0,W,X,Y,Z,New_Col
A,2.70685,0.628133,0.907969,0.503826,3.210676
B,0.651118,-0.319318,-0.848077,0.605965,1.257083
C,-2.018168,0.740122,0.528813,-0.589001,-2.607169
D,0.188695,-0.758872,-0.933237,0.955057,1.143752


In [43]:
df  #As we didnt use inplace method so actual row is not being dropped.

Unnamed: 0,W,X,Y,Z,New_Col
A,2.70685,0.628133,0.907969,0.503826,3.210676
B,0.651118,-0.319318,-0.848077,0.605965,1.257083
C,-2.018168,0.740122,0.528813,-0.589001,-2.607169
D,0.188695,-0.758872,-0.933237,0.955057,1.143752
E,0.190794,1.978757,2.605967,0.683509,0.874303


In [46]:
df.shape #Get rows and columns count.

(5, 5)

In [51]:
#Get rows from a dataframe.

df.loc['A']  #note that a row is also a series.Pandas always returns a series either for rows or columns 

W          2.706850
X          0.628133
Y          0.907969
Z          0.503826
New_Col    3.210676
Name: A, dtype: float64

In [53]:
#Get multiple rows from a dataframe.

df.loc[['A', 'B']]

Unnamed: 0,W,X,Y,Z,New_Col
A,2.70685,0.628133,0.907969,0.503826,3.210676
B,0.651118,-0.319318,-0.848077,0.605965,1.257083


In [55]:
#Get row from a dataframe by indexing.

df.iloc[2] #Gets row C as the index of row C is 2.

W         -2.018168
X          0.740122
Y          0.528813
Z         -0.589001
New_Col   -2.607169
Name: C, dtype: float64

In [56]:
#Get single cell from a dataframe.
df.loc['C', 'Z'] 

-0.5890005332865824

In [58]:
#Get multiple cells at a time.

print(df)
print('\n')
df.loc[['A', 'D'], ['X', 'Z']]

          W         X         Y         Z   New_Col
A  2.706850  0.628133  0.907969  0.503826  3.210676
B  0.651118 -0.319318 -0.848077  0.605965  1.257083
C -2.018168  0.740122  0.528813 -0.589001 -2.607169
D  0.188695 -0.758872 -0.933237  0.955057  1.143752
E  0.190794  1.978757  2.605967  0.683509  0.874303




Unnamed: 0,X,Z
A,0.628133,0.503826
D,-0.758872,0.955057
