# Pandas

Pandas is a Python-based, open-source toolkit for high-performance data analysis and manipulation. 

In [7]:
import numpy as np
import pandas as pd

We load the Boston Housing data into a Pandas DataFrame.

In [9]:
boston_df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv")
boston_df.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


In [10]:
boston_df.dtypes

crim       float64
zn         float64
indus      float64
chas         int64
nox        float64
rm         float64
age        float64
dis        float64
rad          int64
tax          int64
ptratio    float64
b          float64
lstat      float64
medv       float64
dtype: object

## Basic SQL Commands

We now run some standard SQL-like queries on the data (cf. 
    [Comparison with SQL](http://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html)).

```
SELECT crime_rate 
FROM Boston_df
SORT BY crim
LIMIT 10;
```

In [12]:
boston_df[['crim', 'rm']].sort_values(by='crim').head(10)

Unnamed: 0,crim,rm
0,0.00632,6.575
284,0.00906,7.088
285,0.01096,6.453
341,0.01301,7.241
55,0.01311,7.249
54,0.0136,5.888
195,0.01381,7.875
57,0.01432,6.816
194,0.01439,6.604
348,0.01501,6.635


```
SELECT * 
FROM Boston_df
WHERE crim < 0.01
  AND rad = 1;
```

In [15]:
boston_df[(boston_df.crim < 0.015) & (boston_df.rad == 1)]

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
194,0.01439,60.0,2.93,0,0.401,6.604,18.8,6.2196,1,265,15.6,376.7,4.38,29.1
284,0.00906,90.0,2.97,0,0.4,7.088,20.8,7.3073,1,285,15.3,394.72,7.85,32.2
285,0.01096,55.0,2.25,0,0.389,6.453,31.9,7.3073,1,300,15.3,394.72,8.23,22.0
341,0.01301,35.0,1.52,0,0.442,7.241,49.3,7.0379,1,284,15.5,394.74,5.49,32.7


## Apply

Find the mean of all columns.

In [20]:
boston_df.mean()

crim         3.613524
zn          11.363636
indus       11.136779
chas         0.069170
nox          0.554695
rm           6.284634
age         68.574901
dis          3.795043
rad          9.549407
tax        408.237154
ptratio     18.455534
b          356.674032
lstat       12.653063
medv        22.532806
dtype: float64

Add a new column that has double the value of the crime-rate column.

In [28]:
boston_df['double_crim'] = boston_df[['crim']].apply(lambda x: x * 2.0)
boston_df[['crim', 'double_crim']].head(10)

Unnamed: 0,crim,double_crim
0,0.00632,0.01264
1,0.02731,0.05462
2,0.02729,0.05458
3,0.03237,0.06474
4,0.06905,0.1381
5,0.02985,0.0597
6,0.08829,0.17658
7,0.14455,0.2891
8,0.21124,0.42248
9,0.17004,0.34008
