# Welcome to Artificial Intelligence, Part II

## This module will cover samplers of machine learning and computer vision

## Today's agenda
* Basic concepts of machine learning
    * Basic terminology
    * Supervised learning concepts
    * Unsupervised learning concepts
    * Some machine learning algorithms
* Introduction to Jupyter Notebooks
* Introduction to NumPy
* Your lab!


# Machine Learning: The Basic Ideas
## Machine learning is a method of data analysis that automates analytical modelbuilding. Using algorithms that iteratively learn from data, machine learningallows computers to find hidden insights without being explicitly programmedwhere to look.

* It’s used to predict/detect --- Supervised learning
* It’s used to find patterns --- Unsupervised learning
* It’s used to compress data/find underlying relationships --- Dimensionality reduction
* Machine learning: algorithms to find statistical models


# A Few Terms
* Training example: a member of a data set
* Feature: an aspect of a training example (aka, independent variable)
* Feature vector: an array of feature values for a given training example
* Label (aka, dependent variable, y-value): what we are trying to predict
* Model: A function that can be used to predict labels given feature vectors

# Supervised Learning

* Data has both feature vectors and labels
* Creates model to predict labelswhen only feature vector is known
* Examples: make a model to predict which Enron employees will be prosecuted from  emails; predict housing prices from house features, detect when road curves left, right 

## Below is a flow chart of the process for supervised learning

![supervised learning flowchart](imgs/supervised_learning.jpg)

* The parts with the blue arrows is the training part of the process. In training, we use labled data and a machine learning algorithm to create a model that can make predictions
* The parts with the green arrows is the process of using an already trained model to make predictions on new data

# Classification versus Regression

* Classification: The label is a category (nondegreed)
* Regression: The label is quantitative, degreed
* A classification example: Predicting iris flower type:
  * Four features: Petal length and height, sepal length and height
  * Three kinds: Iris Setosa, Iris Versicolour, Iris Virginica Iris dataset: example used in scikit-learn
  
  ![](imgs/iris.png)
  
* A regression example: predicting house price from square footage

  ![](imgs/houseprice.jpg)


# Unsupervised Learning

* The data is not labeled: the dependent variable (output variable) is not known
* Unsupervised algorithms try to find patterns in data
* Example: Pizza Hut delivery --- use clustering to find groups in unlabled data
 ![](imgs/pizzahut.png)


# Some Algorithms

* We will be using the scikit-learn machine learning library
* Here are some of the algorithms it supports, in an advisory flow-chart
 ![](imgs/ml_map.png)

# Installing Scientific Computing Frameworks

## Using Python from Anaconda
* The Anaconda package is here: https://www.anaconda.com/download/#macos
* Anaconda includes everything we need for data science, Jupyter, numpy, sci-kit learn, matplotlib, excluding opencv (we will install that later), and deep learning libraries




# Introduction to Jupyter Notebooks

* Jupyter notebooks are a way to do python programming with interactive cells
* The cells can be run independently
* Jupyter also has markup cells: cells that are used for text, images, diagrems, etc, for presentations or reports
* Jupyter allows the rendering of graphs using python code from the code cells
* Jupyter comes with Anaconda
* Jupyter works with either Python 3 or Python 2
* To start Jupyter, just go to the command line and type: jupyter notebook, a webpage at localhost will appear in your browser; that is your jupyter notebook homepage. You can create new notebooks from that page using the very simple and intuitive GUI there
* Jupyter programs are served to you via your browser; the server lives at localhost
* This document is a Jupyter notebook
* See demo of various features of Jupyter
* See https://www.cheatography.com/weidadeyue/cheat-sheets/jupyter-notebook/ for a Jupyter cheetsheet


# Very Brief Intro to Linear Algebra

# Introduction to NumPy

* NumPy is a package for scientific computing in Python
* It is often used for matrix manipulation using its most basic object, the NumPy array
* Many machine learning and computer vision libraries use NumPy arrays
* NumPy comes with Anaconda, though many people install it with pip as well
* see NumPy's quick start tutorial at https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
* Let's learn by doing!

* First, we need to install Anaconda on your machine

for Windows, we will follow the instructions on this site: http://mathalope.co.uk/2015/05/07/opencv-python-how-to-install-opencv-python-package-to-anaconda-windows/

** Step by step instructions coming for Windows users!!**

* Now that Anaconda is installed and ready

In [3]:
# note, I will be using python 2 for this demo
# importing numpy with the usual alias
import numpy as np

a = np.array([1,2,3,4,5])

print(a)

#array of consecutive numbers
a = np.arange(12)

print(a)


[1 2 3 4 5]
[ 0  1  2  3  4  5  6  7  8  9 10 11]


In [4]:
#add arrays
a = np.arange(12)
b = np.arange(12)
print (a, b)
print (a + b)

#multiply by scalar
a * 3
# note: * does not do dot product in numpy


[ 0  1  2  3  4  5  6  7  8  9 10 11] [ 0  1  2  3  4  5  6  7  8  9 10 11]
[ 0  2  4  6  8 10 12 14 16 18 20 22]


array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30, 33])

In [5]:
#get types of elements of array
print (a.dtype)
#get shape
print (a.shape, np.shape(a))
#get size
print (a.size, np.size(a))

int64
(12,) (12,)
12 12


In [13]:
#Using the array constructor
a = np.array([2, 3, 4])
a.dtype
b = np.array([1.2, 3.5, 5.1])
b.dtype

#The type of the array can also be explicitly specified at 
#creation time
c = np.array([ [1, 2], [3, 4] ], dtype=complex)
print(c)


[[ 1.+0.j  2.+0.j]
 [ 3.+0.j  4.+0.j]]


In [29]:
#fill all values in an array
a.fill(0)
print(a)
a[:] = 1
print(a)
print("a matrix of zeros {}".format(np.zeros((3, 2))))
print("a matrix of ones {}".format(np.ones((3, 2))))

#a random matrix
print("a random matrix")
r = np.random.rand(2, 3)
print(r)

[0 0 0]
[1 1 1]
a matrix of zeros [[ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]
a matrix of ones [[ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]]
a random matrix
[[ 0.32908572  0.90604583  0.01593509]
 [ 0.52042461  0.14939923  0.55723097]]


In [42]:
#fill all values in an array cont
#6 numbers from 0 to 50
x = np.linspace(0, 50, 6)
print(x)
# use a math function to transform
print(np.sin(x))

[  0.  10.  20.  30.  40.  50.]
[ 0.         -0.54402111  0.91294525 -0.98803162  0.74511316 -0.26237485]


In [48]:
# arange
a = np.arange(6)
print(a)
b = np.arange(12).reshape(4,3)
print(b)
# won't work: print(b.reshape(5,5))

# first arg, beg(incl), second, end(non-incl), third, step
c = np.arange(1, 13, 2)
print(c)


[0 1 2 3 4 5]
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[ 1  3  5  7  9 11]


Basic operations

In [50]:
a = np.array( [20, 30, 40, 50] )
print(a)
b = np.arange(4)
print(b)
c = a-b
print(c)
print(b**2)
print(10 * np.sin(a))
print(a < 35)
a += 2
print(a)

[20 30 40 50]
[0 1 2 3]
[20 29 38 47]
[0 1 4 9]
[ 9.12945251 -9.88031624  7.4511316  -2.62374854]
[ True  True False False]
[22 32 42 52]


Slicing

In [11]:
a = np.arange(2, 30)
print(a)
#last 2 elements
print(a[-2:])
#every other element
#the syntax: [first:last:step]
# if blank, assume first all the way to and including the end
print(a[::2])

[ 2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
 27 28 29]
[28 29]
[ 2  4  6  8 10 12 14 16 18 20 22 24 26 28]


In [10]:
a = np.array( [[1, 1], [0, 1]] )
b = np.array( [[2, 0], [3, 4]] )
print(a * b)   # elementwise product 
print(a.dot(b)) # matrix product

[[2 0]
 [0 4]]
[[5 4]
 [3 4]]


In [46]:
#upcasting to a more general type
a = np.ones(3, dtype=np.int32) #dtype int32

#using type coercion
b = np.linspace(0, np.pi, 3)
b.dtype.name #'float64'
c = a + b
c.dtype.name #'float64'

'float64'