# Machine Learning in Python

by [Piotr Migdał](http://p.migdal.pl/) & [Dominik Krzemiński](https://github.com/dokato/)

for El Passion, 2017

## 3.  Linear regression

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

%matplotlib inline

## Bicycles data

This dataset shows number of bikes present at several streets in Warsawtogether with temperature, weekday and month.

source:

- Monika Pawłowska (code: github.com/pawlowska/shiny-server)

- original source: http://rowery.um.warszawa.pl/pomiary-ruchu-rowerowego

In [None]:
bicycles_weather_data = pd.read_csv("data/bicycles_weather.csv", index_col=0)

In [None]:
bicycles_weather_data.head()

In [None]:
bicycles_weather_data.describe()

## Linear regression

Linear regression is modelling linear relationship between dependent variable y and one or more explanatory variables X. The case of one explanatory variable is called **simple linear regression**. For more than one explanatory variable, the process is called **multiple linear regression**.

$$
y = a x + b
$$

!["xkcd"](https://imgs.xkcd.com/comics/linear_regression.png)

Analyticas solutions exist:

- [Ordinary least squares](https://en.wikipedia.org/wiki/Ordinary_least_squares)

- [Ridge regression](https://en.wikipedia.org/wiki/Ridge_regression)

but are not always very efficient!

Correlation:

!["source: wikipedia.org"](https://upload.wikimedia.org/wikipedia/commons/d/d4/Correlation_examples2.svg)

Materials:

 - https://en.wikipedia.org/wiki/Linear_regression
 
 - http://setosa.io/ev/ordinary-least-squares-regression
 
 - http://onlinestatbook.com/2/regression/intro.html
 
 - https://www.youtube.com/watch?v=KsVBBJRb9TE
 


In [None]:
bicycles_weather_data.plot(x='temp', y=['Marszałkowska', 'Banacha', 'Wysockiego'], style='o', figsize=(7,8))
plt.xlim([-20,25])

Scikit learn documentation:

- http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

- http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html

In [None]:
from sklearn import linear_model

We make a linear regression object.

In [None]:
linreg = linear_model.LinearRegression(fit_intercept=True)

In [None]:
street = 'Banacha'
bicycles_weather_subset = bicycles_weather_data[['temp', street]]
bicycles_weather_subset = bicycles_weather_subset.dropna()

In [None]:
x = bicycles_weather_subset['temp'].to_frame()
y = bicycles_weather_subset[street].to_frame()
linreg.fit(x, y)

print('Coefficients:\n a={:.3f}, b={:.3f}'.format(linreg.coef_[0][0], linreg.intercept_[0]))

print("Mean squared error: %.2f"
      % np.mean((linreg.predict(x) - y) ** 2))

We can plot our predicted curve.

In [None]:
bicycles_weather_data.plot(x='temp', y=street, style='o', figsize=(7,8))
plt.plot(x, linreg.predict(x), color='k', linewidth=3)
plt.xlim([-10,20])

### Exercises

(a) Find in the dataset a street which closest to yours living / working place and perform linear regression.

(b) Find a street with the smallest _mean squared error_ of fitting.