# Pandas

![](http://pandas.pydata.org/_static/pandas_logo.png)
[Pandas](http://pandas.pydata.org/) is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Pandas is free software released under the three-clause BSD license. The name is derived from the term _panel data_, an econometrics term for multidimensional structured data sets.

## DataFrames


A data frame is like a table, with rows and columns (e.g., as in SQL or Excel).  
Except that :
  - The rows can be indexed by something interesting (there is special support for labels like categorical and timeseries data).
  - Cells can store any Python object. Like in SQL, columns must have a homogenous type.
  - Instead of "NULL", the name for a non-existent value is "NA".  Unlike R, Python's data frames only support NAs in columns of some data types (basically: floating point numbers and 'objects') -- but this is mostly a non-issue (because it will "up-type" integers to float64, etc.)
  
Each of a ```DataFrame```'s columns are an individual ```Series```, (more correctly, a dataframe is a dictionary of Series).  The entires series must have a homogenous type.

In [None]:
#import pandas library as pd
import pandas as pd
#import matplotlib library as plt
import matplotlib.pyplot as plt
import numpy as np

# Dataframe

## 1. Merge two lists using zip function
## 2. Create a Dataframe using a list
## 3. Create a Dataframe using Series
## 4. Dataframe Functions
### a. df.head()
### b. df.tail()
### c. df.info()
### d. df.describe()
### e. df.T
### f. df['Area']
### g. type(df['Area'])
### h. df.sort_values('Area', ascending=False)
### i. df['Area'].sum()
### j. df['Area'][9:11].sum()
### k. Create a graph using matplot library

## 1. Merge two lists using zip function

Example 1

Let's make a dataset that consists of Malaysian States
and the size of each state in $km^2$.  Let's try and rank
the states of Malaysia by land area, and figure out if East
Malaysia is larger or smaller than West Malaysia

Use the zip function to merge the two lists together

In [7]:
states = ['Johor', 'Kedah', 'Kelantan', 'Melaka', 'Negeri Sembilan', 'Pahang', 'Perak', 'Perlis', 'Penang', 'Sabah', 'Sarawak', 'Selangor', 'Terengganu']

area = [19210,9500, 15099, 1664, 6686, 36137, 21035, 821, 1048, 73631, 124450, 8104, 13035]

covid19=[16760, 5253, 2927, 3078, 11092, 2663, 5526, 154, 7275, 47139, 3499,55369,1613]

In [8]:
type(states)

list

In [9]:
type(area)

list

In [11]:
type(covid19)

list

In [13]:
#zip?

In [16]:
# Area Data Set
state_area = zip(states, area)
state_area

<zip at 0x7c1cb7b0d180>

In [17]:
type(state_area)

zip

In [18]:
#Area Data Set
state_area = list(zip(states, area))
state_area

[('Johor', 19210),
 ('Kedah', 9500),
 ('Kelantan', 15099),
 ('Melaka', 1664),
 ('Negeri Sembilan', 6686),
 ('Pahang', 36137),
 ('Perak', 21035),
 ('Perlis', 821),
 ('Penang', 1048),
 ('Sabah', 73631),
 ('Sarawak', 124450),
 ('Selangor', 8104),
 ('Terengganu', 13035)]

In [19]:
#Covid19 Data Set
state_covid19 = zip(states, covid19)
state_covid19

<zip at 0x7c1cb7cf2240>

In [20]:
# Covid19 Data Set
state_covid19 = list(zip(states, covid19))
state_covid19

[('Johor', 16760),
 ('Kedah', 5253),
 ('Kelantan', 2927),
 ('Melaka', 3078),
 ('Negeri Sembilan', 11092),
 ('Pahang', 2663),
 ('Perak', 5526),
 ('Perlis', 154),
 ('Penang', 7275),
 ('Sabah', 47139),
 ('Sarawak', 3499),
 ('Selangor', 55369),
 ('Terengganu', 1613)]

In [21]:
# State, Area, Covid19 Data Set
state_area_covid19 = zip(states, area, covid19)
state_area_covid19

<zip at 0x7c1cb7c69b80>

In [22]:
# State, area, covid19 Data Set
state_area_covid19=list(zip(states, area, covid19))
state_area_covid19

[('Johor', 19210, 16760),
 ('Kedah', 9500, 5253),
 ('Kelantan', 15099, 2927),
 ('Melaka', 1664, 3078),
 ('Negeri Sembilan', 6686, 11092),
 ('Pahang', 36137, 2663),
 ('Perak', 21035, 5526),
 ('Perlis', 821, 154),
 ('Penang', 1048, 7275),
 ('Sabah', 73631, 47139),
 ('Sarawak', 124450, 3499),
 ('Selangor', 8104, 55369),
 ('Terengganu', 13035, 1613)]

## 2. Create a Dataframe using a list

***df*** will be a ***DataFrame*** object. You can think of this object holding the contents of states in a format similar to a sql table or an excel spreadsheet. Lets take a look below at the contents inside ***df***.

In [23]:
dfl=pd.DataFrame(data = state_area, columns = ['State', 'Area'])

In [25]:
dfl

Unnamed: 0,State,Area
0,Johor,19210
1,Kedah,9500
2,Kelantan,15099
3,Melaka,1664
4,Negeri Sembilan,6686
5,Pahang,36137
6,Perak,21035
7,Perlis,821
8,Penang,1048
9,Sabah,73631


In [26]:
dfl = pd.DataFrame(data = state_covid19, columns = ['State', 'Covid19'])

In [27]:
dfl

Unnamed: 0,State,Covid19
0,Johor,16760
1,Kedah,5253
2,Kelantan,2927
3,Melaka,3078
4,Negeri Sembilan,11092
5,Pahang,2663
6,Perak,5526
7,Perlis,154
8,Penang,7275
9,Sabah,47139


In [28]:
dfl = pd.DataFrame(data = state_area_covid19, columns= ['State', 'Area', 'Covid19'])

In [29]:
dfl

Unnamed: 0,State,Area,Covid19
0,Johor,19210,16760
1,Kedah,9500,5253
2,Kelantan,15099,2927
3,Melaka,1664,3078
4,Negeri Sembilan,6686,11092
5,Pahang,36137,2663
6,Perak,21035,5526
7,Perlis,821,154
8,Penang,1048,7275
9,Sabah,73631,47139


## 3. Create a Dataframe using Series

In [37]:
purchase_1 = pd.Series({'Name': 'Chris', 'Item Purchased': 'Dog Food', 'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn', 'Item Purchased': 'Kitty Litter', 'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod', 'Item Purchased': 'Bird Seed', 'Cost': 5.00})
dfs = pd.DataFrame([purchase_1, purchase_2, purchase_3])

In [38]:
dfs

Unnamed: 0,Name,Item Purchased,Cost
0,Chris,Dog Food,22.5
1,Kevyn,Kitty Litter,2.5
2,Vinod,Bird Seed,5.0


In [39]:
purchase_1 = pd.Series({'Name': 'Chris', 'Item Purchased': 'Dog Food', 'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn', 'Item Purchased': 'Kitty Litter', 'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod', 'Item Purchased': 'Bird Seed', 'Cost': 5.00})
dfs = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 2', 'Store 3'])

In [40]:
dfs

Unnamed: 0,Name,Item Purchased,Cost
Store 1,Chris,Dog Food,22.5
Store 2,Kevyn,Kitty Litter,2.5
Store 3,Vinod,Bird Seed,5.0


## 4. Dataframe Functions

### Dataframe from list

### Dataframe from Series

Exercise:


*   Show the first five  rows
*   Show the last five rows


*   Show the info and description of the dataframe
*   Calculate the sum of sales for all stores


*   Calculate the amount spent by any two customers
*   Plot suitable graph






