# Objective : Pandas for Data Wrangling
<hr>

1. Introduction to Data Wrangling & Pandas
2. Series & DataFrames
3. Loading data into DataFrames
4. Indexing and selecting data
5. Merge, join, and concatenate dataframes
6. Reshaping and pivot tables
7. Working with text data
8. Working with missing data
9. Categorical data
10. Visualization
11. Computational tools
12. Group By: split-apply-combine
13. Time series / date functionality
14. Time deltas
15. Styling
16. Options and settings
17. Enhancing performance

<hr>

## 1. Introduction to Data Wrangling & Pandas
<hr>

### Data Wrangling
* Getting & Reading data from different sources.
* Cleaning Data
* Shaping & Structuring Data
* Storing Data

There are many tools & libraries available for data wrangling. Tools like rapidminer & libraries like pandas. Organizations find libraries more suited because of flexibility.

### Pandas
* High Performance, Easy-to-use open source library for Data Analysis
* Creates tabular format of data from different sources like csv, json, database.
* Have utilities for descriptive statistics, aggregation, handling missing data
* Database utilities like merge, join are available
* Fast, Programmable & Easy alternative to spreadsheets

## 2. Series & DataFrames

In [11]:
import pandas as pd
import numpy as np

In [2]:
pd.__version__

'0.24.2'

#### Series
* Series datastructure represents a column.
* Each columns will a data type
* Combine multiple columns to create a table ( .i.e DataFrame )

In [5]:
ser1 = pd.Series(data=[1,2,3,4,5], index=list('abcde'))
ser1

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [6]:
ser1.dtype

dtype('int64')

In [7]:
ser2 = pd.Series(data=[11,22,33,44,55], index=list('abcde'))
ser2

a    11
b    22
c    33
d    44
e    55
dtype: int64

#### DataFrame
* DataFrame is tabular representation of data.
* Combine multiple series to create a dataframe
* Data corresponding to same index belongs to same row

In [8]:
df = pd.DataFrame({'A':ser1, 'B':ser2})

In [9]:
df

Unnamed: 0,A,B
a,1,11
b,2,22
c,3,33
d,4,44
e,5,55


In [12]:
df = pd.DataFrame(data=np.random.randint(1,10,size=(10,10)), 
             index=list('ABCDEFGHIJ'), 
             columns=list('abcdefghij'))
df

Unnamed: 0,a,b,c,d,e,f,g,h,i,j
A,2,1,7,2,5,1,9,9,4,9
B,8,8,3,3,9,1,1,8,1,7
C,5,1,3,8,4,1,1,1,4,7
D,8,5,6,4,9,3,6,1,6,5
E,5,2,9,1,8,5,3,4,5,8
F,4,7,6,7,7,4,2,8,4,5
G,3,1,7,2,7,7,9,1,6,3
H,3,1,1,8,6,3,2,9,8,9
I,6,4,5,5,4,9,7,8,5,5
J,2,2,2,9,1,3,7,8,8,2
