# Pandas

NumPy is good at performing math operations on 2d-arrays of numbers. But the major drawback is, it cannot deal with heterogeneous values. So, Pandas dataframes are helpful in that aspect for storing different data types and referring to the values like a dict in python instead of just referring to each item with index.

[Link to Official Documentation](http://pandas.pydata.org/pandas-docs/version/0.23/dsintro.html)

## Series

Pandas series are almost the same as nd arrays in numpy, with an additional inferencing ability with custom labels like *keys* in a *dictionary* in python.

In [1]:
import numpy as np
import pandas as pd

In [3]:
#Example

series1 = pd.Series(data = [1,2,3], index = ['key1', 'key2', 'key3'])
series1

key1    1
key2    2
key3    3
dtype: int64

### Question 1

Create a dictionary with 3 key value pairs and convert it to series.

[**Hint:** Use **.Series**]

In [2]:
question1Series = pd.Series(["Finn", "Lucas", "Jacob"], [1, 2, 3])
print(question1Series)

1     Finn
2    Lucas
3    Jacob
dtype: object


You can directly use numpy functions on series.
### Question 2

Find the dot product of both the series 1st used in example and 2nd you created

[ **Hint:** Use **np.dot()** ]

In [5]:
print(np.dot(series1, question1Series))


FinnLucasLucasJacobJacobJacob


## Dataframes

A dataframe is a table with labeled columns which can hold different types of data in each column. 

In [6]:
# Example
d1 = {'a': [1,2,3], 'b': [3,4,5], 'c':[6,7,8] }
df1 = pd.DataFrame(d1)
df1

Unnamed: 0,a,b,c
0,1,3,6
1,2,4,7
2,3,5,8


### Question 3

Select second row in the above dataframe df1.



In [17]:
print(df1.loc[1])

a    2
b    4
c    7
Name: 1, dtype: int64


### Question 4

Select column c in second row of df1.

[ **Hint:** For using labels use **df.loc[row, column]**. For using numeric indexes use **df.iloc[]**. ]

In [27]:
print(df1.iloc[1, 2])

7


## Using Dataframes on a dataset

##### Using the mtcars dataset.

For the below set of questions, we will be using the cars data from [Motor Trend Car Road Tests](http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html)

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). 


Details :
    
A data frame with 32 observations on 11 (numeric) variables.

[, 1] 	mpg 	Miles/(US) gallon

[, 2] 	cyl 	Number of cylinders

[, 3] 	disp 	Displacement (cu.in.)

[, 4] 	hp 	Gross horsepower

[, 5] 	drat 	Rear axle ratio

[, 6] 	wt 	Weight (1000 lbs)

[, 7] 	qsec 	1/4 mile time

[, 8] 	vs 	Engine (0 = V-shaped, 1 = straight)

[, 9] 	am 	Transmission (0 = automatic, 1 = manual)

[,10] 	gear 	Number of forward gears

[,11] 	carb 	Number of carburetors 

In [28]:
## Reading a dataset from a csv file using pandas.
mtcars = pd.read_csv('mtcars.csv')
mtcars.index = mtcars['name']

Following questions are based on analysing a particular dataset using pandas dataframes.

### Question 5

Check the type and dimensions of the given dataset (mtcars).


[ **Hint:** Use **type()** and **df.shape** ]

In [32]:
df = mtcars
print(type(df))
print(df.shape)

<class 'pandas.core.frame.DataFrame'>
(32, 12)


### Question 6

Check the first 10 lines and last 10 lines of the given dataset (mtcars).

[ **Hint:** Use **.head()** and **.tail()** ]

In [35]:
print(df.head(10))
print(df.tail(10))

                                name   mpg  cyl   disp   hp  drat     wt  \
name                                                                       
Mazda RX4                  Mazda RX4  21.0    6  160.0  110  3.90  2.620   
Mazda RX4 Wag          Mazda RX4 Wag  21.0    6  160.0  110  3.90  2.875   
Datsun 710                Datsun 710  22.8    4  108.0   93  3.85  2.320   
Hornet 4 Drive        Hornet 4 Drive  21.4    6  258.0  110  3.08  3.215   
Hornet Sportabout  Hornet Sportabout  18.7    8  360.0  175  3.15  3.440   
Valiant                      Valiant  18.1    6  225.0  105  2.76  3.460   
Duster 360                Duster 360  14.3    8  360.0  245  3.21  3.570   
Merc 240D                  Merc 240D  24.4    4  146.7   62  3.69  3.190   
Merc 230                    Merc 230  22.8    4  140.8   95  3.92  3.150   
Merc 280                    Merc 280  19.2    6  167.6  123  3.92  3.440   

                    qsec  vs  am  gear  carb  
name                                    

### Question 7

Print all the column labels in the given dataset (mtcars).

In [37]:
print(list(df))

['name', 'mpg', 'cyl', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']


### Question 8

Select first 6 rows and 3 columns in mtcars dataset.

[ **Hint:** **mtcars.iloc[ : , : ]** gives all rows and columns in the dataset ]

In [42]:
print(df.iloc[:6, :3])

                                name   mpg  cyl
name                                           
Mazda RX4                  Mazda RX4  21.0    6
Mazda RX4 Wag          Mazda RX4 Wag  21.0    6
Datsun 710                Datsun 710  22.8    4
Hornet 4 Drive        Hornet 4 Drive  21.4    6
Hornet Sportabout  Hornet Sportabout  18.7    8
Valiant                      Valiant  18.1    6


### Question 9

Select rows from name **Mazda RX4** to **Valiant** in the mtcars dataset and display only mpg and cyl values of those cars. 

[ **Hint:** Use **iloc or loc** ].

In [45]:
print(df.iloc[:6, 1:3])

                    mpg  cyl
name                        
Mazda RX4          21.0    6
Mazda RX4 Wag      21.0    6
Datsun 710         22.8    4
Hornet 4 Drive     21.4    6
Hornet Sportabout  18.7    8
Valiant            18.1    6


### Question 10

Sort the dataframe by mpg (i.e. miles/gallon):

[ **Hint**: **inplace = True** will make changes to the data ]

In [46]:
sortedDf = df.sort_values("mpg")
print(sortedDf.iloc[:, 1])

name
Lincoln Continental    10.4
Cadillac Fleetwood     10.4
Camaro Z28             13.3
Duster 360             14.3
Chrysler Imperial      14.7
Maserati Bora          15.0
Merc 450SLC            15.2
AMC Javelin            15.2
Dodge Challenger       15.5
Ford Pantera L         15.8
Merc 450SE             16.4
Merc 450SL             17.3
Merc 280C              17.8
Valiant                18.1
Hornet Sportabout      18.7
Merc 280               19.2
Pontiac Firebird       19.2
Ferrari Dino           19.7
Mazda RX4              21.0
Mazda RX4 Wag          21.0
Hornet 4 Drive         21.4
Volvo 142E             21.4
Toyota Corona          21.5
Merc 230               22.8
Datsun 710             22.8
Merc 240D              24.4
Porsche 914-2          26.0
Fiat X1-9              27.3
Lotus Europa           30.4
Honda Civic            30.4
Fiat 128               32.4
Toyota Corolla         33.9
Name: mpg, dtype: float64


### Question 11

Print the mean displacement and horsepower of the cars grouped by the number of cylinders.

In [49]:
numCars = 32
totDisp = 0
totHorse = 0
for disp in df["disp"]:
    totDisp += disp
for horse in df["hp"]:
    totHorse += horse

sortedDf = df.sort_values("cyl")

print("Avarage hp:", totHorse/numCars)
print("Avarage disp:", totDisp/numCars)

Avarage hp: 146.6875
Avarage disp: 230.72187500000004


### Question 12

Create a new column in the dataframe whose value will be 1 if the car is of Toyota company and 0 otherwise.

In [53]:
df["toyota"] = 0
for car in df["name"]:
    if "Toyota" in car:
        print(car)

Toyota Corolla
Toyota Corona


### Question 13

Define a function that will multiply all values in a column by 4, and apply it to the qsec column.