# Pandas

We have seen Numpy in the last section. It is good at performing math operation on 2d-arrays of numbers. But the major drawback is, it cannot deal with heterogenous values. So, Pandas dataframes are helpful in that aspect for storing different data types and referring the values like a dict in python instead of just referring each item with index.

[Link to Official Documentation](http://pandas.pydata.org/pandas-docs/version/0.23/dsintro.html)

## Series

Pandas series are almost same as nd arrays in numpy, with a additional inferencing ability with custom labels like *keys* in a *dictionary* in python.

In [1]:
import numpy as np
import pandas as pd

In [10]:
#Example

series2 = pd.Series(data = [1,2,3], index = ['key1', 'key2', 'key3'])
series2

key1    1
key2    2
key3    3
dtype: int64

### Question 1

Convert a given dict to pd series.

[**Hint:** Use **.Series**]

In [6]:
import pandas as pd
dict_pandas={'Pass':20,'Fail':5}
df=pd.Series(dict_pandas)
print df

Fail     5
Pass    20
dtype: int64


## Dataframes

A dataframe is a table with labeled columns which can hold different types of data in each column. 

In [8]:
# Example
d1 = {'a': [1,2,3], 'b': [3,4,5], 'c':[6,7,8] }
df1 = pd.DataFrame(d1)
df1

Unnamed: 0,a,b,c
0,1,3,6
1,2,4,7
2,3,5,8


### Question 3

Select second row in the above dataframe df1.



In [39]:
print df1.iloc[1:2]

   a  b  c
1  2  4  7


### Question 4

Select column c in second row of df1.

[ **Hint: ** For using labels use **df.loc[row, column]**. For using numeric indexed use **df.iloc[]**. For using mixture of numeric indexes and labels use **df.ix[row, column]** ]



In [45]:
print df1['c']

0    6
1    7
2    8
Name: c, dtype: int64


## Using Dataframes on a dataset

##### Using the mtcars dataset.

For the below set of questions, we will be using the cars data from [Motor Trend Car Road Tests](http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html)

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). 


Details :
    
A data frame with 32 observations on 11 (numeric) variables.

[, 1] 	mpg 	Miles/(US) gallon

[, 2] 	cyl 	Number of cylinders

[, 3] 	disp 	Displacement (cu.in.)

[, 4] 	hp 	Gross horsepower

[, 5] 	drat 	Rear axle ratio

[, 6] 	wt 	Weight (1000 lbs)

[, 7] 	qsec 	1/4 mile time

[, 8] 	vs 	Engine (0 = V-shaped, 1 = straight)

[, 9] 	am 	Transmission (0 = automatic, 1 = manual)

[,10] 	gear 	Number of forward gears

[,11] 	carb 	Number of carburetors 

In [49]:
## Reading a dataset from a csv file using pandas.
mtcars = pd.read_csv('C:\Users\panmishr\Downloads\mtcars.csv')
mtcars.index = mtcars['model']


Following questions are based on analysing a particular dataset using dataframes.

### Question 5

Check the type and dimensions of given dataset - mtcars.


[ **Hint: ** Use **type()** and **df.shape** ]

In [53]:
mtcars.shape
mtcars.dtypes

model     object
mpg      float64
cyl        int64
disp     float64
hp         int64
drat     float64
wt       float64
qsec     float64
vs         int64
am         int64
gear       int64
carb       int64
dtype: object

### Question 6

Check the first 10 lines and last 10 lines of the given dataset- mtcars.

[**Hint:** Use **.head()** and **.tail()**]

In [55]:
mtcars.head(10)
mtcars.tail(10)

Unnamed: 0_level_0,model,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
AMC Javelin,AMC Javelin,15.2,8,304.0,150,3.15,3.435,17.3,0,0,3,2
Camaro Z28,Camaro Z28,13.3,8,350.0,245,3.73,3.84,15.41,0,0,3,4
Pontiac Firebird,Pontiac Firebird,19.2,8,400.0,175,3.08,3.845,17.05,0,0,3,2
Fiat X1-9,Fiat X1-9,27.3,4,79.0,66,4.08,1.935,18.9,1,1,4,1
Porsche 914-2,Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
Lotus Europa,Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2
Ford Pantera L,Ford Pantera L,15.8,8,351.0,264,4.22,3.17,14.5,0,1,5,4
Ferrari Dino,Ferrari Dino,19.7,6,145.0,175,3.62,2.77,15.5,0,1,5,6
Maserati Bora,Maserati Bora,15.0,8,301.0,335,3.54,3.57,14.6,0,1,5,8
Volvo 142E,Volvo 142E,21.4,4,121.0,109,4.11,2.78,18.6,1,1,4,2


### Question 7

Print all the column labels in the given dataset - mtcars.

[ **Hint: ** Use **df.columns** ]

In [56]:
mtcars.columns

Index([u'model', u'mpg', u'cyl', u'disp', u'hp', u'drat', u'wt', u'qsec',
       u'vs', u'am', u'gear', u'carb'],
      dtype='object')

### Question 8

Select first 6 rows and 3 columns in mtcars dataframe.

**Hint: **  
mtcars.ix[:,:] gives all rows and columns in the dataset.

In [59]:
mtcars.iloc[0:6,[0,1,2]]

Unnamed: 0_level_0,model,mpg,cyl
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mazda RX4,Mazda RX4,21.0,6
Mazda RX4 Wag,Mazda RX4 Wag,21.0,6
Datsun 710,Datsun 710,22.8,4
Hornet 4 Drive,Hornet 4 Drive,21.4,6
Hornet Sportabout,Hornet Sportabout,18.7,8
Valiant,Valiant,18.1,6


### Question 9

Select rows from name **Mazda RX4** to **Valiant** in the mtcars dataset and display only mpg and cyl values of those cars. 

**Hint:** Use df **.ix[rows,columns]**

In [64]:
df['mpg','cyl']

KeyError: ('mpg', 'cyl')

In [65]:
mtcars.iloc[:,[1,2]]

Unnamed: 0_level_0,mpg,cyl
model,Unnamed: 1_level_1,Unnamed: 2_level_1
Mazda RX4,21.0,6
Mazda RX4 Wag,21.0,6
Datsun 710,22.8,4
Hornet 4 Drive,21.4,6
Hornet Sportabout,18.7,8
Valiant,18.1,6
Duster 360,14.3,8
Merc 240D,24.4,4
Merc 230,22.8,4
Merc 280,19.2,6


In [68]:
mtcars.loc[mtcars['mpg']]

KeyError: u'None of [model\nMazda RX4              21.0\nMazda RX4 Wag          21.0\nDatsun 710             22.8\nHornet 4 Drive         21.4\nHornet Sportabout      18.7\nValiant                18.1\nDuster 360             14.3\nMerc 240D              24.4\nMerc 230               22.8\nMerc 280               19.2\nMerc 280C              17.8\nMerc 450SE             16.4\nMerc 450SL             17.3\nMerc 450SLC            15.2\nCadillac Fleetwood     10.4\nLincoln Continental    10.4\nChrysler Imperial      14.7\nFiat 128               32.4\nHonda Civic            30.4\nToyota Corolla         33.9\nToyota Corona          21.5\nDodge Challenger       15.5\nAMC Javelin            15.2\nCamaro Z28             13.3\nPontiac Firebird       19.2\nFiat X1-9              27.3\nPorsche 914-2          26.0\nLotus Europa           30.4\nFord Pantera L         15.8\nFerrari Dino           19.7\nMaserati Bora          15.0\nVolvo 142E             21.4\nName: mpg, dtype: float64] are in the [index]'

In [70]:
mtcars.ix[:,['mpg','cyl']]

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,mpg,cyl
model,Unnamed: 1_level_1,Unnamed: 2_level_1
Mazda RX4,21.0,6
Mazda RX4 Wag,21.0,6
Datsun 710,22.8,4
Hornet 4 Drive,21.4,6
Hornet Sportabout,18.7,8
Valiant,18.1,6
Duster 360,14.3,8
Merc 240D,24.4,4
Merc 230,22.8,4
Merc 280,19.2,6
