ml-nanodegree

Machine Learning Udacity Nanodegree program

Course Notes

0. Foundations

Artificial Intelligence.
Machine Learning.
Data Science.

1. Model Evaluation and Validation

Project 1: Predict Boston House Prices

In this project we use Decision Tree Regression to predict house prices.
We have 506 examples in our dataset and 13 features for each example.
We explore the data and discuss the nature of the problem.
We discuss the available performance metrics and choose an appropriate one.
We split out dataset into a training and testing set and discuss grid search and cross validation.
We look at learning graphs and complexity graphs to inform our choice of complexity for the model.
Having chosen a model we make a prediction for the price of a house given its features.
We sanity check the predicted price comparing the output price with the statistics for our dataset and use KNN to compare it to the 10 nearest neighbours.

2. Supervised Learning

Project 2: Build a Student Student Intervention System

In this project we are interested in building a system which detects when a student is at risk of failing and would benefit from early intervention.
We have data for 395 students and 30 features for each example in addition to whether they passed or failed.
We discuss the nature of the data we have.
The data requires a fair amount of processing, transforming features to binary and removing redundant data.
We discuss the performance metric F₁ score and how to make appropriate choices to ensure it is meaningful.
We discuss how to split the data into training and testing sets to account for the imbalance.
We calculate baseline performance metrics for our classifier.
We choose 3 models (Support Vector Machine, K-Nearest Neighbours and Bernoulli Naive Bayes), discuss the pros and cons of each and choose our final model based on performance (Bernoulli Naive Bayes).
We discuss how our chosen model works and fine tune it giving a final F₁ score and specifying the parameters which yielded the result.

3. Unsupervised Learning

Project 3: Creating Customer Segments

In this project we are interested in segmenting the customers of a wholesale retailer so that when making changes, the retailer can assess the impact on the different types of customers independently.
The retailer has data for 440 customers, specifically their spending in 6 product categories (Fresh, Milk, Grocery, Frozen, Detergents_Paper, Delicatessen).
We look for relationships between the features:
1. We see if we can predict one feature given the others using a Decision Tree Regressor.
2. We look at the feature distributions and scatter plots for all pairs of features looking for correlation.
Noting all the distributions to be positively skewed we transform our data into log space.
We use Tukey's method for identifying outliers and from these select which data points to remove from our analysis.
We calculate a correlation matrix heat map.
We use principal component analysis to reduce the dimensionality of the problem from 6 to 2 (these 2 dimensions explain 73% of the variance in the data).
We use K-means and Gaussian Mixture Model clustering algorithms in our reduced space to find customer segments.

4. Reinforcement Learning

Project 4: Train a Smart Cab

In this project we use reinforcement learning to teach a cab how to efficiently navigate an idealised grid-like city to get from A to B within a specified time limit.
The cab must decide given based in input information whether to turn right, left, go straight or stay put. Input information includes:
- the direction it would ideally move in next to get to the destination as quickly as possible (left / right / straight),
- the traffic light colour (red / green),
- location of other vehicles at the intersection and the direction they intend to travel in (US right of way rules apply),
- time remaining.
We implement an initial (not so smart) cab which navigates the city randomly.
We reduce the state space by considering which information is necessary in order to know if actions are legal.
We implement Q-learning and discuss the choices for the initial action.
We apply reinforcement learning and tweak parameters (discount factor, initial Q-values) to improve performance.
We find and discuss the optimal policy and analyse the penalty/reward structure.

5. Capstone Project

Project 5: Digit Recognition in Natural Scene Images

We design a system which, given a natural image of a digit, is capable of recognising the digit at the centre of it.
We use neural networks and more specifically TensorFlow to do this.
We train and test our system on the Street View House Numbers (SVHN) dataset obtained from Google Street View images.
The dataset is large containing 630,420 labelled colour images, each one is a 32x32x3 array.
We download and extract the data.
We choose our performance metric and explore the data looking at the distribution of digits to find a lower bound for it we also note human performance on the dataset (98%).
We convert the images to grey scale to reduce its size and normalise it and save our data format which makes it possible to access it as NumPy arrays.
We discuss the algorithms and techniques we use: deep neural networks, stochastic gradient descent, Xavier weight initialisation, L2 regularisation and dropout.
Our general approach is to start with a simple model and gradually increase its complexity. The size of the dataset is such that once the model was complex enough, training the network became computationally too expensive and we were constrained by our hardware.
A three layer network is sufficient to get 93% accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
C0_Foundations		C0_Foundations
C1_ModelEvaluationAndValidation		C1_ModelEvaluationAndValidation
C2_SupervisedLearning		C2_SupervisedLearning
C3_UnsupervisedLearning		C3_UnsupervisedLearning
C4_ReinforcementLearning		C4_ReinforcementLearning
GitRepositories		GitRepositories
P1_BostonHousePrices		P1_BostonHousePrices
P2_StudentIntervention		P2_StudentIntervention
P3_CustomerSegments		P3_CustomerSegments
P4_SmartCab		P4_SmartCab
P5_Capstone		P5_Capstone
.gitignore		.gitignore
README.md		README.md
ml-nanodegree.pdf		ml-nanodegree.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ml-nanodegree

Machine Learning Udacity Nanodegree program

0. Foundations

1. Model Evaluation and Validation

2. Supervised Learning

3. Unsupervised Learning

4. Reinforcement Learning

5. Capstone Project

About

Releases

Packages

Languages

leenamurgai/ml-nanodegree

Folders and files

Latest commit

History

Repository files navigation

ml-nanodegree

Machine Learning Udacity Nanodegree program

0. Foundations

1. Model Evaluation and Validation

2. Supervised Learning

3. Unsupervised Learning

4. Reinforcement Learning

5. Capstone Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages