## Day13 - Data Manipulation using Pandas - Part1

- Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures.

##### Note: 
1. First Clean the Evironment (Go to "Kernel" Menu --> "Restart & Clean Output"
2. To execute the code --> Click on a cell and press cntrl + enter key


## Key Features of Pandas
- Fast and efficient DataFrame object with default and customized indexing.
- Tools for loading data into in-memory data objects from different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of date sets.
- Label-based slicing, indexing and subsetting of large data sets.
- Columns from a data structure can be deleted or inserted.
- Group by data for aggregation and transformations.
- High performance merging and joining of data.
- Time Series functionality.


## 1. Import pandas library

In [None]:
#This command imports all the methods related to pandas.

import pandas as pd


## 2. Working with Series

- Series is a one-dimensional labeled array

### 2.1 Create a series using list

In [None]:
import pandas as pd

a1 = [1, 3, 5, 7, 9, 2, 4, 6, 8]
a2 = pd.Series(a1)

print(a2)


### 2.2 A series has been created with data along with it's Index

In [None]:
import pandas as pd

a1 = [1, 3, 5, 7, 9, 2, 4, 6, 8]
a2 = ['a','b','c','d','e','f','g','h','i']
a3 = pd.Series(a1, a2)
print(a3)

## Access the data in series
print ('\n--------------')
print ("a3['b'] -->",a3['b'])
print ("a3['i'] -->",a3['i'])

#Try this Also: Uncomment and run it
#a3 = pd.Series(a2, a1)
#print(a3)


### 2.3 Creating a series using dictionary

In [None]:
import pandas as pd

d1 = {'Oranges':3, 'Apples':4, 'Mangoes':2, 'Banana':12}
d2 = pd.Series(d1)

print (d2)
print (type(d2))

### 2.4 Creating a series using nested list

In [None]:
import pandas as pd

a1 = [[1,3,5],[2,4,6]]
a2 = pd.Series(a1)

print (a2)

## 3. DataFrames
- DataFrames are 2 dimensional data structure which has rows and columns.

### 3.1 Creating a data frame using dictionary

In [None]:
import pandas as pd

d1 = {'Age':[23,33,12,45],'Name':['Rahul','John','Robert','Sneha']}
d2 = pd.DataFrame(d1)

print(d2)


### 3.2 Creating a data frame using nested list

In [None]:
import pandas as pd

d1 = [[4,1900],[3,1600],[2,1100],[1,850]]
d2 = pd.DataFrame(d1, columns = ['Bedrooms','Area'])

print (d2)


### 3.3 Assigning indexes within a data frame

In [None]:
import pandas as pd

d1 = {'Name':['Ankit','Rishitha','Karthik','Vishnu'],'Marks':[78,67,98,56]}
d2 = pd.DataFrame(d1,index = ['Rank 2','Rank 3','Rank 1','Rank 4'])

print (d2)


### 3.4 Creating data frame using list of dictionaries

In [None]:
import pandas as pd

d1 = [{'A':65,'B':66},{'A':97,'B':98,'C':99}]
d2 = pd.DataFrame(d1)

print (d2)


### 3.5 Creating data frame using timestamp and categorical.
- Generate random dataset 

In [None]:
#import numpy as np
import pandas as pd
import random as r

d1 = pd.DataFrame({'A':range(1,5), 
                      'B':pd.Timestamp('20190305'),
                      'C':[r.randint(10,20) for i in range(4)],
                      'D':pd.Categorical(["Test","Train","Car","Bike"]),
                      'E':'Hello',
                      'F':1})
print(d1)


In [None]:
d1.describe()  # Only Numeric columns are selected

### 3.4 Accessing data from dataset - Part 1

In [None]:
type(d1)

#d1[]      # Error
#d1[1]     # Error
#d1[1,]    # Error

#d1[:]      # No Error; Show all rows
#d1[1:4]    # No Error; Show all rows starting from 1
#d1[0,2,3]  # Error
#d1[:,]     # Error

#d1[:]['A']       # No Error; Select All rows of Column A
#d1[:]['A','B']   # Error; Select All rows of Column A and Colun B
#d1[:][['A','B']] # No Error; Select All rows of Column A and Colun B


### 3.5 Accessing data from dataset - Part 2 (using loc - Column Names)

In [None]:
d1
#d1.loc[]                     # Error
#d1.loc[,]                    # Error
#d1.loc[:,]                   # show all rows of all columns
#d1.loc[:,[]]                 # show all index; No Value

# Syntax --> loc[ ROW, COL_Names_in_List ]

#d1.loc[:,['A','B','D']]      # show all rows of A, B, C columns
#d1.loc[:,]                   # show all rows of all columns
#d1.loc[,]                    # Error

#d1.loc[2: , ['A','B','D']]   # show rows from 2 to end of A, B, D columns
#d1.loc[2:,]                  # show rows from 2 to end of all columns
#d1.loc[1:3,]                 # show rows from 1 to 3 of all columns
#d1.loc[1:3, ['A','B','D']]   # show rows from 1 to 3 of A, B, D columns


### 3.6 Accessing data from dataset - Part 3 (using iloc - Column position)

In [None]:
d1

# Syntax --> iloc[ ROW, COL_Position]

#d1.iloc[:,1,2,3]              # Error
#d1.iloc[:,[1,2,3]]            # show all rows of 1, 2, 3 columns
#d1.iloc[:,]                   # show all rows of all columns
#d1.iloc[,]                    # Error

#d1.iloc[2:, [0,1,3]]          # show rows from 2 to end of 0, 1, 3 columns
#d1.iloc[2:,]                  # show rows from 2 to end of all columns
#d1.iloc[1:3,]                 # show rows from 1 to 3 of all columns
#d1.iloc[1:3, [0,1]]           # show rows from 1 to 3 of 0, 1 columns
#d1.iloc[1:3, :5]              # show rows from 1 to 3 of 0-4columns
#d1.iloc[1:3, 1:5]             # show rows from 1 to 3 of 1-4 columns
#d1.iloc[1:3, :-2]             # show rows from 1 to 3 of 0 to 2nd last columns