In [None]:
# Refer ISLR Book
# An Introduction to Statistical Learning

In [None]:
# Complete Reading Chapter 1 and 2

In [11]:
# - Machine Learning is a method of data analysis that automates analytical model building.
# - Using algorithms that iteratively learn from data, machine learning allows computers to find
# hidden insights without being explicitly programmed where to look.

In [5]:
# What is it used for?
# - Fraud Detection
# - Web Search results
# - real-time ads on web pages
# - credit scoring and next-best offers
# - Prediction of equipment failures
# - New pricing models.
# - Network intrusion detection
# - Recommendation Engines
# - Customer Segmentation
# - Text Sentimental Analysis
# - Predicting Customer Churn
# - Pattern and image recognition
# - Email spam filtering
# - Financial Modeling

In [6]:
# Machine Learning Process
# - Data Acquisition
# - Data Cleaning
# - Split Data as Test Data and Train Data
# - Deploy Model

In [7]:
# What is Machine Learning?
# - There are 3 main types of Machine Learning algorithms
#     - Supervised Learning
#     - Unsupervised Learning
#     - Reinforcement Learning

In [12]:
# Supervised Learning
# - You have labeled data and are trying to predict a label based off of known features
# - Algorithms are trained using labaeled examples, such as an input where the desired output is known
# - Ex: a piece of equipment could have data points labeled either "F"(failed) or "R"(runs)
# - The learning algorithm receives a set of inputs along with the corresponding correct outputs.
#   and the algorithms learns by comparing its actual output with correct outputs to find errors.
# - Through methods like classification, regression, prediction and gradient boosting, supervised learning
#   uses patterns to predict the value of the label on additional unlabeled data.
# - Supervised learning is commonly used in applications where historical data predicts likely future events.
# - Ex: It can anticipate when credit card transactions are likely to be fraudulent or which insurance 
#       customer is likely to file a claim
# - Or it can attempt to predict the price of a house based on different features for houses for which we have
#   historical price data.

In [13]:
# Unsupervised Learning
# - You have unlabeled data and are trying to group together similar data points based off features
# - Unsupervised learning is used against data that has no historical labels.
# - The systems is not told the "right answer". The algorithm must figure out what is being shown.
# - The goal is to explore the data and find some structure within.
# - Or it can find the main attributes that separate customer segments from each other.
# - Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering
#   and singular value decomposition.
#  These algorithms are also used to segment text topics, recommend items and identify data outliers.


In [17]:
# Reinforcement Learning
# - Algorithms learns to perform an action from experience
# - Reinforcement learning is often used for robotics, gaming and navigation
# - With reinforcement learning, the algorithm discovers throug trial and error which actions
#   yield the greatest rewards.
# - This type of learning has three primary components:
#     - the agent(the learner or decision maker)
#     - the environment(everything the agent interacts with)
#     - and actions (what the agent can do)
# - the objective is for the agent to choose actions that maximize the expected reward over a given
#   amount of time.
# - the agent will reach the goal much faster by following a good policy.
# - So the goal in reinforcement learning is to learn the best policy.


In [18]:
# Scikit Learn Package
# - It's the most popular machine learning package for python and has a lot of algorithms built-in
# - pip install scikit-learn
# - Every algorithm is exposed in scikit-learn via "Estimator"
# - Import the model, the general form is:
#      - from sklearn.family import Model
#      - Ex: from sklearn.linear_model import LinearRegression

In [19]:
# Estimator Parameters:
# - All the parameters of an estimator can be set when it is instantiated, and have
#   suitable default values
# - Ex: model = LinearRegression(normalize=True)
#       print(model)

#       LinearRegression(copy_X=True, fit_intercept=True, normalize=True)

In [20]:
import numpy as np
from sklearn.model_selection import train_test_split

In [21]:
x,y = np.arange(10).reshape((5,2)) , range(5)

In [22]:
x

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [24]:
list(y)

[0, 1, 2, 3, 4]

In [35]:
x_train,x_test,y_train,y_test =  train_test_split(x,y, test_size=0.3)

In [36]:
x_train

array([[8, 9],
       [0, 1],
       [6, 7]])

In [37]:
x_test

array([[2, 3],
       [4, 5]])

In [38]:
y_train

[4, 0, 3]

In [39]:
y_test

[1, 2]

In [41]:
# model.fit(x_train,y_train)
# Now the model has been fit and trained on the training data

In [42]:
# We get the predicted values using the predict method:
#     predictions = model.predict(x_test)

In [43]:
# The evaluation method depends on what sort of machine learning algorithm we are using
#  Ex: Regression, Classification, Clustering etc

In [44]:
# - model.fit() : fit training data
# - For supervised learning applications, this accepts two arguments: the data X and the labels y
#   (Ex: model.fit(x,y))
# - For unsupervised learning applications, this accepts only a single argument, the data X
#   (Ex: model.fit(x))

In [46]:
# model.predict() :
#     given a trainded model, predict the label of a new set of data. This method accepts one arguments,
#     the new data x_new 
#     Ex: model.predict(x_new)
#     and returns the learned label for each object in the array
# - Available in supervised estimators 

In [47]:
# model.predict_proba() :
#     For classification problems, some estimators also proide this method, which returns the 
#     probability that a new obserbvation has each categorical label. in this case, the label with the 
#     the highest probability is returned by model.predict()
# - Available in supervised estimators 

In [48]:
# model.score():
#     for classification or regression problems, most estimators implement a score method. 
#     Scores are between O and 1, with a larger score indicating a better fit.
# - Available in supervised estimators 

In [49]:
# model.predict()
#     predict labels in clustering algorithms
# - Available in Unsupervised estimators 

In [50]:
# model.transform()
#     give an unsupervised model, transform new data into the new basis. this also accepts one
#     argument x_new and returns the new representation of the data based on the unsupervised model
# - Available in Unsupervised estimators 

In [51]:
# model.fit_transform()
#     some estimators implement this method, which more efficiently performs a fit and a transform
#     on the same input data.
# - Available in Unsupervised estimators 

In [52]:
# classification
# regression
# clustering
# dimensional reduction

In [None]:
# Supervised learning – the machine is presented with a set of inputs and expected outputs, later given a new input the output is predicted.
# Unsupervised learning – the machine aims to find patterns, within a dataset without an explicit input from a human as to what these patterns might look like.