# Welcome to Artificial Intelligence, Part II

## This module will cover samplers of machine learning and computer vision

## Today's agenda
* Basic concepts of machine learning
    * Basic terminology
    * Supervised learning concepts
    * Unsupervised learning concepts
    * Some machine learning algorithms
* Introduction to Jupyter Notebooks
* Introduction to NumPy
* Your lab!


# Machine Learning: The Basic Ideas
## Machine learning is a method of data analysis that automates analytical modelbuilding. Using algorithms that iteratively learn from data, machine learningallows computers to find hidden insights without being explicitly programmedwhere to look.

* It’s used to predict/detect --- Supervised learning
* It’s used to find patterns --- Unsupervised learning
* It’s used to compress data/find underlying relationships --- Dimensionality reduction
* Machine learning can also be thought of as algorithms to find statistical models to use for prediction or finding patterns in data


# Some Uses of Machine Learning
* Predicting prices in the stock market
* Predicting whether a person with such and such features will buy a product
* Recommender engines
* Anomaly Detection
* Find distinct customer bases for targeted marketing
* Object recognition and detection in images/video
* And many, many more!

# A Few Terms
* Training example: a member of a data set
* Feature: an aspect of a training example (aka, independent variable)
* Feature vector: an array of feature values for a given training example
* Label (aka, dependent variable, y-value): what we are trying to predict
* Model: A function that can be used to predict labels given feature vectors

# Supervised Learning

* Data has both feature vectors and labels
* Creates model to predict labels when only feature vector is known
* Examples: make a model to predict which Enron employees will be prosecuted from  emails; predict housing prices from house features, detect when road curves left, right 

## Below is a flow chart of the process for supervised learning

![supervised learning flowchart](imgs/supervised_learning.jpg)

* The parts with the blue arrows is the training part of the process. In training, we use labled data and a machine learning algorithm to create a model that can make predictions
* The parts with the green arrows is the process of using an already trained model to make predictions on new data

# Classification versus Regression

* Classification: The label is a category (nondegreed)
* Regression: The label is quantitative, degreed
* A classification example: Predicting iris flower type:
  * Four features: Petal length and height, sepal length and height
  * Three kinds: Iris Setosa, Iris Versicolour, Iris Virginica Iris dataset: example used in scikit-learn
  
  ![](imgs/iris.png)
  
* A regression example: predicting house price from square footage

  ![](imgs/houseprice.jpg)


# Unsupervised Learning

* The data is not labeled: the dependent variable (output variable) is not known
* Unsupervised algorithms try to find patterns in data
* Example: Pizza Hut delivery --- use clustering to find groups in unlabled data
 ![](imgs/pizzahut.png)


# Some Algorithms

* We will be using the scikit-learn machine learning library
* Here are some of the algorithms it supports, in an advisory flow-chart
 ![](imgs/ml_map.png)

# Installing Scientific Computing Frameworks

## Using Python from Anaconda
* The Anaconda package is here: https://www.anaconda.com/download/#macos
* Anaconda includes everything we need for data science, Jupyter, numpy, sci-kit learn, matplotlib, excluding opencv (we will install that later), and deep learning libraries




# Introduction to Jupyter Notebooks

* Jupyter notebooks are a way to do python programming with interactive cells
* The cells can be run independently
* Jupyter also has markup cells: cells that are used for text, images, diagrems, etc, for presentations or reports
* Jupyter allows the rendering of graphs using python code from the code cells
* Jupyter comes with Anaconda
* Jupyter works with either Python 3 or Python 2
* To start Jupyter, just go to the command line and type: jupyter notebook, a webpage at localhost will appear in your browser; that is your jupyter notebook homepage. You can create new notebooks from that page using the very simple and intuitive GUI there
* Jupyter programs are served to you via your browser; the server lives at localhost
* This document is a Jupyter notebook
* See demo of various features of Jupyter
* See https://www.cheatography.com/weidadeyue/cheat-sheets/jupyter-notebook/ for a Jupyter cheetsheet


# Introduction to NumPy

* NumPy is a package for scientific computing in Python
* It is often used for matrix manipulation using its most basic object, the NumPy array
* Many machine learning and computer vision libraries use NumPy arrays
* NumPy comes with Anaconda, though many people install it with pip as well
* see NumPy's quick start tutorial at https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
* Let's learn by doing!

* First, we need to install Anaconda for Python 3.6 on your machine: go to https://docs.anaconda.com/anaconda/install/ and follow the instructions

* Anaconda will install its own version of Python, your old Python installation won't be removed
* We will access the new Python and other tools through the Anaconda
 Navigator (See in class demo)
* Now that Anaconda is installed and ready

In [45]:
# note, I will be using python 2 for this demo
# importing numpy with the usual alias
import numpy as np

a = np.array([1,2,3,4,5])
print(a)

#array of consecutive numbers
a = np.arange(12)
print(a)


[1 2 3 4 5]
[ 0  1  2  3  4  5  6  7  8  9 10 11]


In [46]:
#add arrays
a = np.arange(12)
b = np.arange(12)
print (a, b)
print (a + b)

#multiply by scalar
a * 3
# note: * does not do dot product in numpy


[ 0  1  2  3  4  5  6  7  8  9 10 11] [ 0  1  2  3  4  5  6  7  8  9 10 11]
[ 0  2  4  6  8 10 12 14 16 18 20 22]


array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30, 33])

In [47]:
#get types of elements of array
print (a.dtype)
#get shape
print (a.shape, np.shape(a))
#get size
print (a.size, np.size(a))

int64
(12,) (12,)
12 12


In [48]:
#Using the array constructor
a = np.array([2, 3, 4])
a.dtype
b = np.array([1.2, 3.5, 5.1])
b.dtype

#The type of the array can also be explicitly specified at 
#creation time
c = np.array([ [1, 2], [3, 4] ], dtype=complex)
print(c)


[[ 1.+0.j  2.+0.j]
 [ 3.+0.j  4.+0.j]]


In [49]:
#fill all values in an array
a.fill(0)
print(a)
a[:] = 1
print(a)
print("a matrix of zeros {}".format(np.zeros((3, 2))))
print("a matrix of ones {}".format(np.ones((3, 2))))

#a random matrix
print("a random matrix")
r = np.random.rand(2, 3)
print(r)

[0 0 0]
[1 1 1]
a matrix of zeros [[ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]
a matrix of ones [[ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]]
a random matrix
[[ 0.73899662  0.54693874  0.6597939 ]
 [ 0.86816173  0.78311543  0.64483266]]


In [50]:
#fill all values in an array cont
#6 numbers from 0 to 50
x = np.linspace(0, 50, 6)
print(x)
# use a math function to transform
print(np.sin(x))

[  0.  10.  20.  30.  40.  50.]
[ 0.         -0.54402111  0.91294525 -0.98803162  0.74511316 -0.26237485]


In [51]:
# arange
a = np.arange(6)
print(a)
b = np.arange(12).reshape(4,3)
print(b)
# won't work: print(b.reshape(5,5)), not enough elements

# first arg, beg(incl), second, end(non-incl), third, step
c = np.arange(1, 13, 2)
print(c)


[0 1 2 3 4 5]
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[ 1  3  5  7  9 11]


#### Basic operations

In [52]:
a = np.array( [20, 30, 40, 50] )
print(a)
b = np.arange(4)
print(b)
c = a - b
print(c)
print(b**2)
print(10 * np.sin(a))
print(a < 35)
a += 2
print(a)

[20 30 40 50]
[0 1 2 3]
[20 29 38 47]
[0 1 4 9]
[ 9.12945251 -9.88031624  7.4511316  -2.62374854]
[ True  True False False]
[22 32 42 52]


In [53]:
a = np.array( [[1, 1], [0, 1]] )
b = np.array( [[2, 0], [3, 4]] )
print(a * b)   # elementwise product 
print(a.dot(b)) # matrix product

[[2 0]
 [0 4]]
[[5 4]
 [3 4]]


In [54]:
# random numbers from 0 to .9999...
# sums, max, min
a = np.random.random((2,3))
print("a: {}\nsum: {}\nmin: {}\nmax {}".format(a, a.sum(), a.min(), a.max()))


a: [[ 0.03433018  0.57760635  0.45429204]
 [ 0.55493083  0.48544341  0.04987378]]
sum: 2.156476603207885
min: 0.0343301843554642
max 0.5776063524973211


In [55]:
#for matrices
b = np.arange(12).reshape(3,4)

sumb = b.sum(axis=0)                            # sum of each column
minb = b.min(axis=1)                            # min of each row
cumsumb = b.cumsum(axis=1)                         # cumulative sum along each row

print("b:\n{}\nsumb: {}\nminb: {}\ncumsumb: \n{}".format(a, sumb, minb, cumsumb))

b:
[[ 0.03433018  0.57760635  0.45429204]
 [ 0.55493083  0.48544341  0.04987378]]
sumb: [12 15 18 21]
minb: [0 4 8]
cumsumb: 
[[ 0  1  3  6]
 [ 4  9 15 22]
 [ 8 17 27 38]]


#### Some universal functions in NumPy are:
all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, inv, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose, var, vdot, vectorize, where

see: https://docs.scipy.org/doc/numpy/reference/ufuncs.html


### Slicing

In [56]:
# Single dimension
a = np.arange(10)**3
print(a[2])
print(a[2:5])
a[:6:2] = -1000
print(a) 
# equivalent to a[0:6:2] = -1000; 
# from start to position 6, exclusive,
# set every 2nd element to -1000

print(a[ : :-1])                                
# reversed a


8
[ 8 27 64]
[-1000     1 -1000    27 -1000   125   216   343   512   729]
[  729   512   343   216   125 -1000    27 -1000     1 -1000]


In [57]:
a = np.arange(2, 30)
print(a)
#last 2 elements
print(a[-2:])
#every other element
#the syntax: [first:last:step]
# if blank, assume first all the way to and including the end
print(a[::2])

[ 2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
 27 28 29]
[28 29]
[ 2  4  6  8 10 12 14 16 18 20 22 24 26 28]


In [58]:
# Multidimensional
def f(x, y):
    return 10 * x + y

b = np.fromfunction(f,(5, 4),dtype=int)
print("b:", b)
print("b[2, 3]:", b[2, 3])
print("b[0:5, 1]:", b[0:5, 1])                       # each row in the second column of b
print("b[ : ,1]:", b[ : ,1])                       # equivalent to the previous example
print("b[1:3, : ]:", b[1:3, : ])                      # each column in the second and third row of b

# When fewer indices are provided than the number of axes, the missing indices are considered complete slices:
print("b[-1]:", b[-1])                                  # the last row. Equivalent to b[-1,:]

b: [[ 0  1  2  3]
 [10 11 12 13]
 [20 21 22 23]
 [30 31 32 33]
 [40 41 42 43]]
b[2, 3]: 23
b[0:5, 1]: [ 1 11 21 31 41]
b[ : ,1]: [ 1 11 21 31 41]
b[1:3, : ]: [[10 11 12 13]
 [20 21 22 23]]
b[-1]: [40 41 42 43]


The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent the remaining axes. NumPy also allows you to write this using dots as b[i,...].

The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is a rank 5 array (i.e., it has 5 axes), then

x[1,2,...] is equivalent to x[1,2,:,:,:],
x[...,3] to x[:,:,:,:,3] and
x[4,...,5,:] to x[4,:,:,5,:].

In [59]:
c = np.array( [[[  0,  1,  2],               # a 3D array (two stacked 2D arrays)
                 [ 10, 12, 13]],
                [[100,101,102],
                [110,112,113]]])
print(c.shape)
print(c[1,...])                                 # same as c[1,:,:] or c[1]
print(c[...,2])                                   # same as c[:,:,2]

(2, 2, 3)
[[100 101 102]
 [110 112 113]]
[[  2  13]
 [102 113]]


## Shape Manipulation

In [60]:
# Changing shape

a = np.floor(10 * np.random.random((3, 4)))
print("a:", a)
print("a.shape:", a.shape)
print("a.ravel():", a.ravel())  # returns the array, flattened
print("a.reshape(6,2):", a.reshape(6,2))  # returns the array with a modified shape
# if -1 is used as the last dimension, NumPy figures out what it has to be

print("a.T:", a.T)  # returns the array, transposed
print("a.T.shape:", a.T.shape)
print("a.shape:", a.shape)



a: [[ 2.  5.  2.  3.]
 [ 2.  6.  9.  6.]
 [ 5.  5.  0.  0.]]
a.shape: (3, 4)
a.ravel(): [ 2.  5.  2.  3.  2.  6.  9.  6.  5.  5.  0.  0.]
a.reshape(6,2): [[ 2.  5.]
 [ 2.  3.]
 [ 2.  6.]
 [ 9.  6.]
 [ 5.  5.]
 [ 0.  0.]]
a.T: [[ 2.  2.  5.]
 [ 5.  6.  5.]
 [ 2.  9.  0.]
 [ 3.  6.  0.]]
a.T.shape: (4, 3)
a.shape: (3, 4)


In [61]:
# Stacking arrays

a = np.floor(10 * np.random.random((2,2)))
print("a:\n", a)
b = np.floor(10 * np.random.random((2,2)))
print("b:\n", b)
print("np.vstack((a,b)):\n", np.vstack((a,b)))
print("np.hstack((a,b)):\n", np.hstack((a,b)))

a:
 [[ 5.  4.]
 [ 8.  0.]]
b:
 [[ 4.  7.]
 [ 2.  6.]]
np.vstack((a,b)):
 [[ 5.  4.]
 [ 8.  0.]
 [ 4.  7.]
 [ 2.  6.]]
np.hstack((a,b)):
 [[ 5.  4.  4.  7.]
 [ 8.  0.  2.  6.]]


### Functions and Methods Overview
from https://docs.scipy.org/doc/numpy-dev/user/quickstart.html

Here is a list of some useful NumPy functions and methods names ordered in categories. See Routines for the full list.

##### Array Creation
arange, array, copy, empty, empty_like, eye, fromfile, fromfunction, identity, linspace, logspace, mgrid, ogrid, ones, ones_like, r, zeros, zeros_like
##### Conversions
ndarray.astype, atleast_1d, atleast_2d, atleast_3d, mat
##### Manipulations
array_split, column_stack, concatenate, diagonal, dsplit, dstack, hsplit, hstack, ndarray.item, newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes, take, transpose, vsplit, vstack
##### Questions
all, any, nonzero, where
##### Ordering
argmax, argmin, argsort, max, min, ptp, searchsorted, sort
##### Operations
choose, compress, cumprod, cumsum, inner, ndarray.fill, imag, prod, put, putmask, real, sum
##### Basic Statistics
cov, mean, std, var
##### Basic Linear Algebra
cross, dot, outer, linalg.svd, vdot

#### Indexing with Boolean Arrays

In [66]:
a = np.arange(12).reshape(3,4)
b = a > 4
print(b)                                          # b is a boolean with a's shape
print(a[b])                                     # 1d array with the selected elements

#This property can be very useful in assignments:

a[b] = 0                                   # All elements of 'a' higher than 4 become 0
print(a)

#or a shortcut
a = np.arange(25).reshape(5,5)
a[a > 4] = -1
print(a)

# modifying specified values
a[a > 0] += 5
print(a)


[[False False False False]
 [False  True  True  True]
 [ True  True  True  True]]
[ 5  6  7  8  9 10 11]
[[0 1 2 3]
 [4 0 0 0]
 [0 0 0 0]]
[[ 0  1  2  3  4]
 [-1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1]]
[[ 0  6  7  8  9]
 [-1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1]]
