# [Pandas](https://pandas.pydata.org/)

### WHAT'S IT?

![Pandas](Pandas_logo.svg)
 
 **[Pandas](https://pandas.pydata.org/)** = Panel Data, multi-dimensional data involving measurements over time. Create in 2015 by **Was McKinney**. Pandas an open-source library that is built on top of NumPy library. It is a Python package that offers various data structures and operations for manipulating numerical data and time series. It is mainly popular for importing and analyzing data much easier. Pandas is fast and it has high-performance & productivity for users.
  
### Feature of Pandas :
  
  - Series object and dataframe
  - Handling of missing data
  - Data alignment
  - Group by functionality
  - Slicing, indexing, subseting
  - Merging and joining
  - Reshaping
  - Hierarchical labeling of axes
  - Robust input outpun tool
  - Time series-specific functionality
  
### Pandas vs. Numpy

|Pandas  |Numpy | 
|:---------:|:---------:|
|Perform better than numpy for 500k rows or more |  Perform better for 50k rows or less |  
|Pandas series object more plexible as you can define your own labeled index to index and across element of an array |  Element in numpy are accessed by their default integer position |

### Kind of data does suit Pandas

- Tabular data
- Arbitrary matrix
- Time series data

## DATA-SET ON PANDAS

### Series object 

- One-dimensional labeled array
- Contains data of semilar of mixed types
- Create different series object datatype: Array, Dictionary, Scalar

In [15]:
# example of series object on pandas
# creating series object using list
import pandas as pd
data = [1,2,3,4]
series1 = pd.Series(data)
series1

0    1
1    2
2    3
3    4
dtype: int64

In [5]:
type(series1)

pandas.core.series.Series

In [9]:
# change the index ob series object
series1 = pd.Series(data, index = ['a','b','c','d'])
series1

a    1
b    2
c    3
d    4
dtype: int64

### DataFrame

- Two dimensional labeled data structured with column of potentially different types
- Featured of dataframe: Different column types, mutable size, labeled axes, arithmetic operations on rows and columns

In [12]:
# example dataframe on pandas
# creating dataframe using list
import pandas as pd
data = [1,2,3,4,5,6]
df = pd.DataFrame(data)
df

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5
5,6


In [14]:
# creating dataframe using dictionary
dictionary = {'fruits':['apples','banana','mangoes'],'count':[10,20,30]}
df = pd.DataFrame(dictionary)
df

Unnamed: 0,fruits,count
0,apples,10
1,banana,20
2,mangoes,30


In [18]:
# creating dataframe using a series
series = pd.Series([5,10], index = ['a','b'])
df = pd.DataFrame(series)
df

Unnamed: 0,0
a,5
b,10


In [27]:
# creating dataframe using numpy array
import numpy as np
numpyarray = np.array([[5000,10000],['Jhon','Doe']])
df = pd.DataFrame({'name':numpyarray[1],'salary':numpyarray[0]})
df

Unnamed: 0,name,salary
0,Jhon,5000
1,Doe,10000


## MERGE, JOIN, CONCATENATE

### Merge Operation

In [30]:
import pandas as pd
player = ['Player1','Player2','Player3']
point = [8,9,10]
title = ['Game1','Game2','Game3']
df1 = pd.DataFrame({'Player':player,'Point':point,'Title':title})
df1

Unnamed: 0,Player,Point,Title
0,Player1,8,Game1
1,Player2,9,Game2
2,Player3,10,Game3


In [33]:
player = ['Player1','Player5','Player6']
power = ['Punch','Kick','Elbow']
title = ['Game1','Game5','Game6']
df2 = pd.DataFrame({'Player':player,'Power':power,'Title':title})
df2

Unnamed: 0,Player,Power,Title
0,Player1,Punch,Game1
1,Player5,Kick,Game5
2,Player6,Elbow,Game6


In [42]:
# Inner merge
df1.merge(df2, on='Player', how='inner')

Unnamed: 0,Player,Point,Title_x,Power,Title_y
0,Player1,8,Game1,Punch,Game1


In [38]:
# Left merge
df1.merge(df2, on='Player', how='left')

Unnamed: 0,Player,Point,Title_x,Power,Title_y
0,Player1,8,Game1,Punch,Game1
1,Player2,9,Game2,,
2,Player3,10,Game3,,


In [44]:
# Right merge
df1.merge(df2, on='Player', how='right')

Unnamed: 0,Player,Point,Title_x,Power,Title_y
0,Player1,8.0,Game1,Punch,Game1
1,Player5,,,Kick,Game5
2,Player6,,,Elbow,Game6


In [45]:
# Outer merge
df1.merge(df2, on='Player', how='outer')

Unnamed: 0,Player,Point,Title_x,Power,Title_y
0,Player1,8.0,Game1,Punch,Game1
1,Player2,9.0,Game2,,
2,Player3,10.0,Game3,,
3,Player5,,,Kick,Game5
4,Player6,,,Elbow,Game6


### Join Operation

In [62]:
player = ['Player1','Player2','Player3']
point = ['Punch','Kick','Elbow']
title = ['Game1','Game2','Game3']
df3 = pd.DataFrame({'Player':player,'Points':point,'Title':title}, index = ['L1','L2','L3'])
df3

Unnamed: 0,Player,Points,Title
L1,Player1,Punch,Game1
L2,Player2,Kick,Game2
L3,Player3,Elbow,Game3


In [64]:
players = ['Player1','Player5','Player6']
power = ['Punch','Kick','Elbow']
titles = ['Game1','Game5','Game6']
df4 = pd.DataFrame({'Players':players,'Power':power,'Titles':titles}, index = ['L2','L3','L4'])
df4

Unnamed: 0,Players,Power,Titles
L2,Player1,Punch,Game1
L3,Player5,Kick,Game5
L4,Player6,Elbow,Game6


In [65]:
# Inner join
df3.join(df4, how='inner')

Unnamed: 0,Player,Points,Title,Players,Power,Titles
L2,Player2,Kick,Game2,Player1,Punch,Game1
L3,Player3,Elbow,Game3,Player5,Kick,Game5


In [66]:
# Left join
df3.join(df4, how='left')

Unnamed: 0,Player,Points,Title,Players,Power,Titles
L1,Player1,Punch,Game1,,,
L2,Player2,Kick,Game2,Player1,Punch,Game1
L3,Player3,Elbow,Game3,Player5,Kick,Game5


In [67]:
# Right join
df3.join(df4, how='right')

Unnamed: 0,Player,Points,Title,Players,Power,Titles
L2,Player2,Kick,Game2,Player1,Punch,Game1
L3,Player3,Elbow,Game3,Player5,Kick,Game5
L4,,,,Player6,Elbow,Game6


In [None]:
# Outer join
df3.join(df4, how='outer')

### Concatinate 

In [70]:
# concatinate 2 dfs on pandas
pd.concat([df3, df4])

Unnamed: 0,Player,Points,Title,Players,Power,Titles
L1,Player1,Punch,Game1,,,
L2,Player2,Kick,Game2,,,
L3,Player3,Elbow,Game3,,,
L2,,,,Player1,Punch,Game1
L3,,,,Player5,Kick,Game5
L4,,,,Player6,Elbow,Game6


## IMPORTING & ANALYZING DATASET