# Welcome to Artificial Intelligence, Part II

## This module will cover samplers of machine learning and computer vision

## Today's agenda
* Basic concepts of machine learning
    * Basic terminology
    * Supervised learning concepts
    * Unsupervised learning concepts
    * Some machine learning algorithms
* Introduction to Jupyter Notebooks
* Introduction to NumPy
* Your lab!


# Machine Learning: The Basic Ideas
## Machine learning is a method of data analysis that automates analytical modelbuilding. Using algorithms that iteratively learn from data, machine learningallows computers to find hidden insights without being explicitly programmedwhere to look.

* It’s used to predict/detect --- Supervised learning
* It’s used to find patterns --- Unsupervised learning
* It’s used to compress data/find underlying relationships --- Dimensionality reduction
* Machine learning: algorithms to find statistical models


# A Few Terms
* Training example: a member of a data set
* Feature: an aspect of a training example (aka, independent variable)
* Feature vector: an array of feature values for a given training example
* Label (aka, dependent variable, y-value): what we are trying to predict
* Model: A function that can be used to predict labels given feature vectors

# Supervised Learning

* Data has both feature vectors and labels
* Creates model to predict labelswhen only feature vector is known
* Examples: make a model to predict which Enron employees will be prosecuted from  emails; predict housing prices from house features, detect when road curves left, right 

## Below is a flow chart of the process for supervised learning

![supervised learning flowchart](imgs/supervised_learning.jpg)

* The parts with the blue arrows is the training part of the process. In training, we use labled data and a machine learning algorithm to create a model that can make predictions
* The parts with the green arrows is the process of using an already trained model to make predictions on new data

# Classification versus Regression

* Classification: The label is a category (nondegreed)
* Regression: The label is quantitative, degreed
* A classification example: Predicting iris flower type:
  * Four features: Petal length and height, sepal length and height
  * Three kinds: Iris Setosa, Iris Versicolour, Iris Virginica Iris dataset: example used in scikit-learn
  
  ![](imgs/iris.png)
  
* A regression example: predicting house price from square footage

  ![](imgs/houseprice.jpg)


# Unsupervised Learning

* The data is not labeled: the dependent variable (output variable) is not known
* Unsupervised algorithms try to find patterns in data
* Example: Pizza Hut delivery --- use clustering to find groups in unlabled data
 ![](imgs/pizzahut.png)


# Some Algorithms

* We will be using the scikit-learn machine learning library
* Here are some of the algorithms it supports, in an advisory flow-chart
 ![](imgs/ml_map.png)

# Introduction to Jupyter Notebooks

* Jupyter notebooks are a way to do python programming with interactive cells
* The cells can be run independently
* Jupyter also has markup cells: cells that are used for text, images, diagrems, etc, for presentations or reports
* Jupyter allows the rendering of graphs using python code from the code cells
* Jupyter comes with Anaconda
* Jupyter works with either Python 3 or Python 2
* To start Jupyter, just go to the command line and type: jupyter notebook, a webpage at localhost will appear in your browser; that is your jupyter notebook homepage. You can create new notebooks from that page using the very simple and intuitive GUI there
* Jupyter programs are served to you via your browser; the server lives at localhost
* This document is a Jupyter notebook
* See demo of various features of Jupyter
* See https://www.cheatography.com/weidadeyue/cheat-sheets/jupyter-notebook/ for a Jupyter cheetsheet


# Very Brief Intro to Linear Algebra

# Introduction to NumPy

* NumPy is a package for scientific computing in Python
* It is often used for matrix manipulation using its most basic object, the NumPy array
* Many machine learning and computer vision libraries use NumPy arrays
* NumPy comes with Anaconda, though many people install it with pip as well
* see NumPy's quick start tutorial at https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
* Let's learn by doing!

* First, we need to install Anaconda on your machine

for Windows, we will follow the instructions on this site: http://mathalope.co.uk/2015/05/07/opencv-python-how-to-install-opencv-python-package-to-anaconda-windows/

** Step by step instructions coming for Windows users!!**

* Now that Anaconda is installed and ready

In [2]:
# note, I will be using python 2 for this demo
# importing numpy with the usual alias
import numpy as np

a = np.array([1,2,3,4,5])

print a

#array of consecutive numbers
a = np.arange(12)

print a


[1 2 3 4 5]
[ 0  1  2  3  4  5  6  7  8  9 10 11]


In [9]:
#add arrays
a = np.arange(12)
b = np.arange(12)
print a, b
print a + b

#multiply by scalar
a * 3
# note: * does not do dot product in numpy


[ 0  1  2  3  4  5  6  7  8  9 10 11] [ 0  1  2  3  4  5  6  7  8  9 10 11]
[ 0  2  4  6  8 10 12 14 16 18 20 22]


array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30, 33])

In [14]:
#get types of elements of array
print a.dtype
#get shape
print a.shape, np.shape(a)
#get size
print a.size, np.size(a)

int64
(12,) (12,)
12 12


In [22]:
#fill all values in an array
a.fill(0)
print a
a[:] = 1
print a
print np.zeros((3, 2))
print np.ones((3, 2))
print np.empty((4, 5))
#beware of type conversion: a float will be
#coerced into an int32 if that’s the dtype
#a random matrix
r = np.random.rand(3, 3)
print r

[0 0 0 0 0 0 0 0 0 0 0 0]
[1 1 1 1 1 1 1 1 1 1 1 1]
[[ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]
[[ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]]
[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]
[[ 0.77557269  0.51734378  0.72375147]
 [ 0.69857473  0.56313202  0.5610686 ]
 [ 0.9907688   0.32696968  0.82852391]]


In [37]:
#fill all values in an array cont
#100 numbers from 0 to 50
x = np.linspace(0, 50, 101)
print x
# use a math function to transform
np.sin(x)

[  0.    0.5   1.    1.5   2.    2.5   3.    3.5   4.    4.5   5.    5.5
   6.    6.5   7.    7.5   8.    8.5   9.    9.5  10.   10.5  11.   11.5
  12.   12.5  13.   13.5  14.   14.5  15.   15.5  16.   16.5  17.   17.5
  18.   18.5  19.   19.5  20.   20.5  21.   21.5  22.   22.5  23.   23.5
  24.   24.5  25.   25.5  26.   26.5  27.   27.5  28.   28.5  29.   29.5
  30.   30.5  31.   31.5  32.   32.5  33.   33.5  34.   34.5  35.   35.5
  36.   36.5  37.   37.5  38.   38.5  39.   39.5  40.   40.5  41.   41.5
  42.   42.5  43.   43.5  44.   44.5  45.   45.5  46.   46.5  47.   47.5
  48.   48.5  49.   49.5  50. ]


array([ 0.        ,  0.47942554,  0.84147098,  0.99749499,  0.90929743,
        0.59847214,  0.14112001, -0.35078323, -0.7568025 , -0.97753012,
       -0.95892427, -0.70554033, -0.2794155 ,  0.21511999,  0.6569866 ,
        0.93799998,  0.98935825,  0.79848711,  0.41211849, -0.07515112,
       -0.54402111, -0.87969576, -0.99999021, -0.87545217, -0.53657292,
       -0.0663219 ,  0.42016704,  0.80378443,  0.99060736,  0.93489506,
        0.65028784,  0.20646748, -0.28790332, -0.71178534, -0.96139749,
       -0.97562601, -0.75098725, -0.34248062,  0.14987721,  0.60553987,
        0.91294525,  0.99682979,  0.83665564,  0.471639  , -0.00885131,
       -0.48717451, -0.8462204 , -0.99808203, -0.90557836, -0.59135753,
       -0.13235175,  0.35905835,  0.76255845,  0.97935764,  0.95637593,
        0.69924003,  0.27090579, -0.22375564, -0.66363388, -0.94103141,
       -0.98803162, -0.79312724, -0.40403765,  0.08397446,  0.55142668,
        0.88387042,  0.99991186,  0.87114   ,  0.52908269,  0.05

Slicing
  

In [43]:
a = np.arange(2, 30)
print a
#last 2 elements
print a[-2:]
#every other element
#the syntax: [first:last:step]
# if blank, assume first all the way to and including the end
print a[::2]

[ 2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
 27 28 29]
[28 29]
[ 2  4  6  8 10 12 14 16 18 20 22 24 26 28]


In [44]:
a = np.array( [[1, 1], [0, 1]] )
b = np.array( [[2, 0], [3, 4]] )
print a * b   # elementwise product 
print a.dot(b) # matrix product

[[2 0]
 [0 4]]
[[5 4]
 [3 4]]


In [46]:
#upcasting to more general type
a = np.ones(3, dtype=np.int32) #dtype int32
b = np.linspace(0, np.pi, 3)
b.dtype.name #'float64'
c = a + b
c.dtype.name #'float64'

'float64'