# Practicum in human intelligent information processing
**Prof. Tomohiro Shibata**

Kyushu Institute of Technology

## Content
* numpy
* matplotlib
* scipy

In [1]:
%doctest_mode 
%matplotlib inline 

Exception reporting mode: Plain
Doctest mode is: ON


In [2]:
import numpy as np
import matplotlib.pyplot as plt
 

**Example 1 : creating numpy array using inbuilt function**

In [None]:
a=np.arange(15).reshape(3,5)

In [None]:
a

In [None]:
a.shape

In [None]:
a.ndim

In [None]:
a.dtype.name

In [None]:
a.size

In [None]:
a.itemsize

In [None]:
type(a)

**numpy array declaration and common errors**

In [None]:
#a = np.array(1,2,3,4)    # WRONG
a = np.array([1,2,3,4])  # RIGHT
b = np.array([(1.5,2,3), (4,5,6)])


**Type of array can also be explicity specified at time of creation**

In [None]:
c = np.array( [ [1,2], [3,4] ], dtype=complex )


**Some useful array**

In [None]:
q=np.zeros((3,4))
w=np.ones((2,3,4),dtype=np.int16) #dtype can also be specified
x=np.empty((2,3)) #uninitialized, output may vary

In [None]:
x

### To create _sequence_ of numbers, **Numpy** provides a function analogous to _range_ that retuns _array_ insted of list

In [None]:
np.arange(10,30,5)

In [None]:
np.arange(0,2,0.3) # it can accept float arguments

### When **arange** is used with floating point arguments, it is generally not possible to predict the number of elements obtained, due to the **finite floating point precision**. For this reason, it is usually better to use the function **linspace** that receives as an **_argument_** the number of elements that we want, instead of the step:

In [None]:
np.linspace(0,2,9) # 9 numbers from 0 to 2

In [None]:
from numpy import pi

In [None]:
x = np.linspace( 0, 2*pi, 100 )        # useful to evaluate function at lots of points
f = np.sin(x)


### **Printing arrays**

In [None]:
a = np.arange(6)                         # 1d array
print(a)

In [None]:
b = np.arange(12).reshape(4,3)           # 2d array
print(b)

In [None]:
c = np.arange(24).reshape(2,3,4)         # 3d array
print(c)

### **If an array is too large to be printed, NumPy automatically skips the central part of the array and only prints the corners:**

In [None]:
print(np.arange(10000))

In [None]:
print(np.arange(10000).reshape(100,100))

### **To disable this behaviour and force NumPy to print the entire array, you can change the printing options using set_printoptions.**

In [None]:
#np.set_printoptions(threshold='nan')

#### **Basic operations**

In [None]:
a = np.array( [20,30,40,50] )
b = np.arange( 4 )
b

In [None]:
c = a-b
c

In [None]:
b**2

In [None]:
10*np.sin(a)

In [None]:
a<35

#### **Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product can be performed using the dot function or method:**

In [None]:
A = np.array( [[1,1],
            [0,1]] )
B = np.array( [[2,0],
            [3,4]] )
A*B                         # elementwise product

In [None]:
A.dot(B)                    # matrix product

In [None]:
np.dot(A, B)                # another matrix product

#### Some operations, such as += and *=, act in place to modify an existing array rather than create a new one.

In [None]:
a = np.ones((2,3), dtype=int)
b = np.random.random((2,3))
a *= 3
a

In [None]:
b += a
b

In [None]:
#a += b                  # b is not automatically converted to integer type

**When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise one (a behavior known as upcasting).**

# An introduction to machine learning with scikit-learn

### in this section, we will introduce the machine learning vocabbulary that we will use throughout 
### scikit-learn and give a simple learning example.

# **Machine learning: the problem setting**

In general, a learning problem considers a set of n samples of data and then tries to predict properties of unknown data. If each sample is more than a single number and, for instance, a multi-dimensional entry (aka multivariate data), is it said to have several attributes or features.

We can separate learning problems in a few large categories:



* **_supervised learning_** , in which the data comes with additional attributes that we want to predict (Click here to go to the scikit-learn supervised learning page).This problem can be either:


    * classification: samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data. An example of classification problem would be the handwritten digit recognition example, in which the aim is to assign each input vector to one of a finite number of discrete categories. Another way to think of classification is as a discrete (as opposed to continuous) form of supervised learning where one has a limited number of categories and for each of the n samples provided, one is to try to label them with the correct category or class.
    * regression: if the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the prediction of the length of a salmon as a function of its age and weight.


* **_unsupervised learning_**, in which the training data consists of a set of input vectors x without any corresponding target values. The goal in such problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization

### Loading an example dataset

scikit-learn comes with a few standard datasets, for instance the **iris** and **digits** datasets for **classification** and the **boston house prices dataset** for **regression**.

In [3]:
from sklearn import datasets
iris= datasets.load_iris()
digits= datasets.load_digits()

A dataset is a **dictionary**-like object that holds all the data and some metadata about the data. This data is stored in the **.data** member, which is a **n_samples, n_features** array. In the case of supervised problem, one or more response variables are stored in the **.target** member.

In [4]:
print(digits.data)

[[  0.   0.   5. ...,   0.   0.   0.]
 [  0.   0.   0. ...,  10.   0.   0.]
 [  0.   0.   0. ...,  16.   9.   0.]
 ..., 
 [  0.   0.   1. ...,   6.   0.   0.]
 [  0.   0.   2. ...,  12.   0.   0.]
 [  0.   0.  10. ...,  12.   1.   0.]]


In [5]:
print(digits.target)

[0 1 2 ..., 8 9 8]


### Shape of the data arrays
The data is always a 2D array, **shape (n_samples, n_features)**, although the original data may have had a different shape. In the case of the digits, each original sample is an image of shape (8, 8) and can be accessed using:

In [6]:
digits.images[0]

array([[  0.,   0.,   5.,  13.,   9.,   1.,   0.,   0.],
       [  0.,   0.,  13.,  15.,  10.,  15.,   5.,   0.],
       [  0.,   3.,  15.,   2.,   0.,  11.,   8.,   0.],
       [  0.,   4.,  12.,   0.,   0.,   8.,   8.,   0.],
       [  0.,   5.,   8.,   0.,   0.,   9.,   8.,   0.],
       [  0.,   4.,  11.,   0.,   1.,  12.,   7.,   0.],
       [  0.,   2.,  14.,   5.,  10.,  12.,   0.,   0.],
       [  0.,   0.,   6.,  13.,  10.,   0.,   0.,   0.]])

## Learning and predecting
In the case of the digits dataset, the task is to predict, given an image, which digit it represents. We are given samples of each of the 10 possible classes (the digits zero through nine) on which we fit an estimator to be able to predict the classes to which unseen samples belong.
In scikit-learn, an estimator for classification is a Python object that implements the methods fit(X, y) and predict(T).

An example of an estimator is the class sklearn.svm.SVC that implements support vector classification. The constructor of an estimator takes as arguments the parameters of the model, but for the time being, we will consider the estimator as a black box:

In [7]:
from sklearn import svm
clf = svm.SVC(gamma=0.001,C=100)

In [8]:
clf.fit(digits.data[:-1],digits.target[:-1])

SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [22]:
print (clf.predict(digits.data[-1:]))


[8]


In [20]:
digits.target_names[]

1

In [9]:
#plt.imshow(digits.data[::])
#plt.imshow(digits.data[-1:], cmap=plt.cm.gray_r)#, interpolation='nearest')

<matplotlib.image.AxesImage object at 0x00000000090C9978>

<matplotlib.figure.Figure object at 0x0000000008FEBC50>

## Model Persistence
It is possible to save a model in the scikit by using Python’s built-in persistence model, namely **pickle**:

In [23]:
from sklearn import svm
from sklearn import datasets

In [24]:
clf=svm.SVC()

In [25]:
iris=datasets.load_iris()

In [26]:
x,y=iris.data, iris.target

In [27]:
clf.fit(x,y)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [29]:
import pickle

In [37]:
s=pickle.dump(clf,"model.pkl")

AttributeError: 'str' object has no attribute 'write'

# **Some Machine leaning Example Using Python**

In [None]:
# Standard scientific Python imports
import matplotlib.pyplot as plt

In [None]:
# Import datasets, classifiers and performance metrics
from sklearn import datasets, svm, metrics

In [None]:
# The digits dataset
digits = datasets.load_digits()


# The data that we are interested in is made of 8x8 images of digits, let's
# have a look at the first 3 images, stored in the `images` attribute of the
# dataset.  If we were working from image files, we could load them using
# pylab.imread.  Note that each image must have the same size. For these
# images, we know which digit they represent: it is given in the 'target' of
# the dataset.

In [None]:
images_and_labels = list(zip(digits.images, digits.target))

In [None]:
for index, (image, label) in enumerate(images_and_labels[:8]):
    plt.subplot(2, 4, index + 1)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Training: %i' % label)
plt.show()

# To apply a classifier on this data, we need to flatten the image, to
# turn the data in a (samples, feature) matrix:


In [None]:
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))

# Create a classifier: a support vector classifier

In [None]:
classifier = svm.SVC(gamma=0.001)

# We learn the digits on the first half of the digits

In [None]:
classifier.fit(data[:n_samples / 2], digits.target[:n_samples / 2])

# Now predict the value of the digit on the second half:

In [None]:
expected = digits.target[n_samples / 2:]
predicted = classifier.predict(data[n_samples / 2:])

In [None]:
print("Classification report for classifier %s:\n%s\n"
      % (classifier, metrics.classification_report(expected, predicted)))

In [None]:
print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted))

In [None]:
images_and_predictions = list(zip(digits.images[n_samples / 2:], predicted))

In [None]:
for index, (image, prediction) in enumerate(images_and_predictions[:4]):
    plt.subplot(2, 4, index + 5)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title('Prediction: %i' % prediction)

In [None]:
plt.show()

In [None]:
import pickle


In [None]:
from sklearn import datasets
from sklearn.svm import SVC
iris = datasets.load_iris()
clf = SVC()
clf.fit(iris.data, iris.target)  


In [None]:
iris.target

In [None]:
list(clf.predict(iris.data[:3]))


clf.fit(iris.data, iris.target_names[iris.target])  





list(clf.predict(iris.data[])  



In [14]:
iris.viewkeys()

dict_keys(['target_names', 'data', 'target', 'DESCR', 'feature_names'])

In [None]:
iris.data