### supervised learning
in which the data comes with additional attributes that we want to predict.This problem can be either:

- classification:
分类：
samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data. An example of classification problem would be the handwritten digit recognition example, in which the aim is to assign each input vector to one of a finite number of discrete categories. Another way to think of classification is as a discrete (as opposed to continuous) form of supervised learning where one has a limited number of categories and for each of the n samples provided, one is to try to label them with the correct category or class.
一句话总结：样本属于两个以上种类，我们通过已经打过标签的数据预测未打标签的数据，
- regression: 
回归：
if the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the prediction of the length of a salmon as a function of its age and weight.


##  unsupervised learning
in which the training data consists of a set of input vectors x without any corresponding target values. The goal in such problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization (Click here to go to the Scikit-Learn unsupervised learning page

In [6]:
from sklearn import datasets
import numpy as np


[[  0.   0.   5. ...,   0.   0.   0.]
 [  0.   0.   0. ...,  10.   0.   0.]
 [  0.   0.   0. ...,  16.   9.   0.]
 ..., 
 [  0.   0.   1. ...,   6.   0.   0.]
 [  0.   0.   2. ...,  12.   0.   0.]
 [  0.   0.  10. ...,  12.   1.   0.]]
1797 64


### Loading an example dataset
    Machine learning is about learning some properties of a data set and applying them to new data.
    This is why a common practice in machine learning to evaluate an algorithm is to split the data at
    hand into two sets,
    one that we call the training set on which we learn data properties 
    and one that we call the testing set on which we test these properties.

In [None]:
iris = datasets.load_iris()
digits = datasets.load_digits()
print(digits.data)
m,n = np.shape(digits.data)
print(m,n)

In [7]:
#digits.data gives access to the features that can be used to classify the digits samples:
digits.target

array([0, 1, 2, ..., 8, 9, 8])

In [8]:
 digits.images[0]

array([[  0.,   0.,   5.,  13.,   9.,   1.,   0.,   0.],
       [  0.,   0.,  13.,  15.,  10.,  15.,   5.,   0.],
       [  0.,   3.,  15.,   2.,   0.,  11.,   8.,   0.],
       [  0.,   4.,  12.,   0.,   0.,   8.,   8.,   0.],
       [  0.,   5.,   8.,   0.,   0.,   9.,   8.,   0.],
       [  0.,   4.,  11.,   0.,   1.,  12.,   7.,   0.],
       [  0.,   2.,  14.,   5.,  10.,  12.,   0.,   0.],
       [  0.,   0.,   6.,  13.,  10.,   0.,   0.,   0.]])

## Learning and predicting

In [25]:
from sklearn import svm
clf = svm.SVC(gamma=0.001, C=100.)
digits.data[:-1]

array([[  0.,   0.,   5., ...,   0.,   0.,   0.],
       [  0.,   0.,   0., ...,  10.,   0.,   0.],
       [  0.,   0.,   0., ...,  16.,   9.,   0.],
       ..., 
       [  0.,   0.,   6., ...,   6.,   0.,   0.],
       [  0.,   0.,   1., ...,   6.,   0.,   0.],
       [  0.,   0.,   2., ...,  12.,   0.,   0.]])

In [26]:
digits.target[:-1]

array([0, 1, 2, ..., 0, 8, 9])

In [29]:
clf.fit(digits.data[:-1],digits.target[:-1])

SVC(C=100.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [30]:
clf.predict(digits.data[-1:])

array([8])

### Model persistence

#### It is possible to save a model in the scikit by using Python’s built-in persistence model, namely pickle:

In [32]:
from sklearn import svm
from sklearn import datasets
clf = svm.SVC()
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X, y)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [33]:
import pickle
s = pickle.dumps(clf)
cls2 = pickle.loads(s)
clf.predict(X[0:1])

array([0])

In [34]:
y[0]

0

In [37]:
#which is more efficient on big data,can only pickle to the disk
from sklearn.externals import joblib
joblib.dump(clf, 'filename.pkl')
clf = joblib.load('filename.pkl')

### Conventions  :集成

In [38]:
import numpy as np
from sklearn import random_projection

In [39]:
rng = np.random.RandomState(0)
X = rng.rand(10, 2000)
X = np.array(X, dtype='float32')
X.dtype

dtype('float32')

In [40]:
transformer = random_projection.GaussianRandomProjection()
X_new = transformer.fit_transform(X)
X_new.dtype

dtype('float64')

In this example, X is float32, which is cast to float64 by fit_transform(X).

In [48]:
from sklearn import datasets
from sklearn.svm import SVC
iris = datasets.load_iris()
clf = SVC()
clf.fit(iris.data, iris.target)  
list(clf.predict(iris.data[:3]))
clf.fit(iris.data, iris.target_names[iris.target])   

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [47]:
list(clf.predict(iris.data[:3]))

['setosa', 'setosa', 'setosa']

here,the first predict returns an integer array ,since iris.target was used in fit .The second predict() return a string
array since iris.target_names was for fitting

## Refitting and updating parameters

In [49]:
import numpy as np
from sklearn.svm import SVC
rng  = np.random.RandomState(0)
X = rng.rand(100, 10)
y = rng.binomial(1, 0.5, 100)
X_test = rng.rand(5, 10)
clf = SVC()
clf.set_params(kernel = 'linear').fit(X, y)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [50]:
clf.predict(X_test)

array([1, 0, 1, 1, 0])

In [51]:
clf.set_params(kernel='rbf').fit(X, y)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [52]:
clf.predict(X_test)

array([0, 0, 0, 1, 0])

### Multiclas vs. multilabel fitting

In [54]:
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import LabelBinarizer
X = [[1, 2],[2, 4],[4, 5],[3, 2],[3, 1]]
y = [0, 0, 1,1, 2]
classif = OneVsRestClassifier(estimator=SVC(random_state=0))
classif.fit(X, y).predict(X)

array([0, 0, 1, 1, 2])

In [61]:
y = LabelBinarizer().fit_transform(y)
classif.fit(X, y).predict(X)

array([[1, 1, 0, 0, 0],
       [1, 0, 1, 0, 0],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 0],
       [1, 0, 1, 0, 0]])

In [60]:
from sklearn.preprocessing import MultiLabelBinarizer
y = [[0, 1],[0, 2], [1, 3],[0, 2, 3],[2, 4]]
y = MultiLabelBinarizer().fit_transform(y)
classif.fit(X, y).predict(X)

array([[1, 1, 0, 0, 0],
       [1, 0, 1, 0, 0],
       [0, 1, 0, 1, 0],
       [1, 0, 1, 0, 0],
       [1, 0, 1, 0, 0]])