## A look at some of the algorithms in the sklearn library

The scikit-learning library has a large collection of algorithms useful for machine learning and optimization.  Here are few examples of the things you can do.

### Linear Regression

In [73]:
from sklearn import linear_model
import numpy as np
from bokeh.plotting import figure
from bokeh.io import  show, output_notebook
from bokeh.models import HoverTool, ColumnDataSource

output_notebook()

Let's create some data for a linear model.  The basic rule is 
$$
y = \sum a_{i}x_{i} + b+ \epsilon
$$
where $\epsilon$ is a normally distributed random variable.  Let's suppose $N=3$, the $a_{i}$ are 1, 2, and 3, and the $x_{i}$ are uniformly distributed between 0 and 1.  We'll use the `LinearRegression` class to fit the model.

In [27]:
N=100
x = np.random.uniform(-10,10,size=(N,3))
b = .7
y = x @ np.array([-3,1,2]) + b+ np.random.normal(0,1,N)
y_true = x @ np.array([-3,1,2]) + b
mse = np.linalg.norm(y-y_true)/N
print(mse)

0.11295100606377584


In [28]:
from sklearn.linear_model import LinearRegression

In [29]:
L = LinearRegression()

In [30]:
L.fit(x,y)

In [40]:
print(f"coeffs = {L.coef_}, intercept = {L.intercept_}")

coeffs = [-2.98395181  0.96378508  2.02432446], intercept = 0.5969070381280994


In [41]:
error = np.linalg.norm(L.predict(x)-y)/N

In [42]:
print(error)

0.10851932122175788


### Classification

In [142]:
x0 = np.random.multivariate_normal([0, 0], [[1, .75],[.75, 1]], 100)
x1 = np.random.multivariate_normal([1, -4], [[1, 0],[0, 1]], 100)
x2 = np.random.multivariate_normal([-2,-2],[[1,-.5],[-.5,1]],100)

In [144]:
F = figure()
F.scatter(x=x0[:,0],y=x0[:,1],color='red',legend_label='x0')
F.scatter(x=x1[:,0],y=x1[:,1],color='blue',legend_label='x1')
F.scatter(x=x2[:,0],y=x2[:,1],color='green',legend_label='x2')

show(F)

In [148]:
from sklearn.neighbors import KNeighborsClassifier
colors = ['red','blue','green']
X = np.vstack([x0,x1,x2])
Y = np.array([0]*100+[1]*100+[2]*100)
training_data = ColumnDataSource(data=dict(x=X[:,0],y=X[:,1],color=['red']*100+['blue']*100+['green']*100))
KN = KNeighborsClassifier(n_neighbors=5)
KN.fit(X,Y)


In [146]:
y0 = np.random.multivariate_normal([0, 0], [[1, .75],[.75, 1]], 10)
y1 = np.random.multivariate_normal([1, -4], [[1, 0],[0, 1]], 10)
y2 = np.random.multivariate_normal([-2,-2],[[1,-.5],[-.5,1]],10)

In [155]:
test = np.array([-1,-2]).reshape(1,2)
predicted = KN.predict(Y)
neighbors = KN.kneighbors(Y)

datadict = {'x':test[:,0],'y':test[:,1],'color':[colors[i] for i in predicted]}
source = ColumnDataSource(data=datadict)
print(source.to_df())
print(neighbors[1])
G=figure()
G.scatter(x='x',y='y',color='color',source=source,size=10)
G.scatter(x='x',y='y',color='color',source=training_data)
G.scatter(x=X[neighbors[1][0]][:,0],y=X[neighbors[1][0]][:,1],size=10,alpha=.3)

show(G)

   x  y  color
0 -1 -2  green
[[  1 233 213 227  32]]
