personal_notes.txt

=============
Introduction:
=============
Machine learning is grew out of AI
New capability for computers
Ex:

- Database mining -- large datasets from growth of automation/web e.g.; web click data, medical records, biology, engineering
- Application can't program by hand -- autonomous helicoper, handwriting recognition, NLP (learning by text parsing), computer vision (learn by parsing images)
- Self-customizing programs -- Amazon, netflix recommendation
- Understanding human learning (brain, real AI)

==========================
What is machine learning?
==========================
"A computer program is said to learn from experience E with respect to some task T, and some performance measure P, if its performance on T, as measured by P, improves with experience E"
SPAM mail filter is one of the basic example, in this case computer program learns from the action we take on the mail (if we check it as a spam) so program learns from      our selection and next time try to put in the spam
Machine learning algorithms:

- Supervised learning
- Unsupervised learning

Others:

- Reinforcement learning
- Recommender systems

====================
Supervised learning
====================
In supervised learning "right answers" given
housing price prediction
this is a kind of regression problem. Regression means we get continuous output values. we can predict continuous value output (price)
breast-cancer study
These kind of problems comes under classification problem. We get a discrete value output.

=====================
Unsupervised learning
=====================
unlike supervised learning we have not given any data point/right answers,  in fact we have just given some data which doesn't have no label, and asked to look for some structure so unsupervised algorithms tries to break the data in clusters so this is also called a clustering algorithm. e.g.; clustering example is Google news, understanding gnome data, large computer clusters, social network analysis, cock tail party algorithm (separating audio sources)  
There is great way to write/optimize the learning algorithms using Octave/Matlab programming environment. Using these these env can reduce it to few lines of code. These env are very good in prototype of the learning algorithms. like svd linear algebra function can be implemented much easier and faster using octave/matlab for prototyping than C++/Java.

===================================
Linear regression with one variable
===================================
Model representation
Notation
m = no of training examples or can say no of rows in a table
x's = "input" variable/features
y's = "output" variable/"target" varaiable     

(x,y) - Single training example 
(x(i), y(i)) - ith training example, index of ith row of the table

Example:
Training set -> Learning algorithm -> h (hypothesis)

                      size of the house -> h (hypothesis) -> estimated price 

h is a function which maps x's to y's 
* hypothesis is not a great name for this function as this was used since initial days of MI so thats why is still there.

Designing a learning algorithms we need to decide how to design this function hypothesis.

How to represent h?
h(x) = theta 0 + theta 1(x)

we we plot x and y, this function is a linear function
This is called linear regression with one variable which is x or univariate linear regression

==============
Cost function
==============
The objective of linear regression is to minimize the cost function. We define something called cost function, This will figure out how to fit the best straight line to our data.
We have training set and hypothesis h(x) = theta 0 + theta 1 x, here theta's are called parameters. Question is how to choose the values of theta 1 and theta 0?
So in linear regression we have a training set and we want do is come with the values for parameters theta 0 and theta 1 so the straight line we get out of this fits well with the data. good fit to data. So, choose theta 0 and theta 1 so that h(x) is close to y to our training example (x,y).   
Mathematically, in linear regression we will like to optimize theta 0 and theta 1
square difference between the predicted price of the house and the price at house is actually sold.
Just to summarize, "Find me the values of theta 0 and theta 1 that cause hypothesis function h(x) = theta 0 + theta 1 x
Minimize theta 0 and theta 1, J(theta 0, theta 1) is called cost function
J(theta 0, theta 1) = 1/2m (h(x)-y)2
     This cost function is also called Squared error function, which is most commonly used for regression problems.