# Tutorial: Machine Learning in scikit-learn
From the [Izaskun Mendia], the GitHub repository (https://github.com/izmendi/)

![Machine learning](images/01_robot.png)

# PANDAS

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np

# Series

In [None]:
index = ['a','b','c','d','e']

In [None]:
series = pd.Series(np.arange(5), index=index)
series

In [None]:
series['a'], series['b']

In [None]:
series[:2]

In [None]:
series[-2:]

# DataFrame

* Dataframe is dict like, you can index column names or lists of column names
* avoid integer column names
* The first dimension is the index.  Subsequent dimensions are the columns

In [None]:
df = pd.DataFrame({'a' : np.random.random(5), 'number 2' : np.random.random(5)})
print df

In [None]:
df['a']

In [None]:
df.loc[2]

In [None]:
df.iloc[2]

In [None]:
df.loc[2,['a']]

In [None]:
df.ix[:3]

# Summarize data

To learn more about the frequency strings, please see [this link](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases)

In [None]:
df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD'))
df.head()

In [None]:
df['A'].count()

In [None]:
df['A'].min()

In [None]:
df['A'].max()

In [None]:
df['A'].mean()

In [None]:
df.min(axis=0)

In [None]:
df.min(axis=1)

In [None]:
df.isnull().count()

**What can we do when we don't have info? **

In [None]:
series = pd.Series([1,2,np.nan,4,np.nan,6,np.nan,8,np.nan,10])
series

In [None]:
series.isnull()

In [None]:
series.fillna(series.mean())

In [None]:
series.fillna(series.interpolate())

In [None]:
series.fillna(method='pad')

In [None]:
series.dropna()

**Dataframes and series play nice with plotting**

In [None]:
df['A'].plot()

In [None]:
df.plot()

In [None]:
df.plot(kind='bar')

**Generate various summary statistics, excluding NaN values.**

In [None]:
df.describe()