# Machine Learning

ML is the field of study that gives computers the ability to learn without being explicitly programmed.

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T as measured by P , improves with experience E.

-> Training set - Examples that the system uses to learn. Each training example is called a training instance (or sample).

-> Model - Part of the ML syatem that learns and makes predictions. eg: Random Forests

## Why ML ?

-> For problems for which existing solutions require a lot of fine tuning or long lists of rules

Spam filter based on ML techniques automatically learns which words and phrases are good predictors of spam by detecting unusually frequent patterns of words in the spam examples compared to ham examples. Program will be shorter, easier to maintain and most likely more accurate. If we program a spam filter, we need to write lot of complex rules.

-> Automatically adapting to change . In fluctuating environments, ML system can easily be retrained on new data always keeping it up to date.

Spammers notice all emails with "4U" are blocked. They start using "For U" instead. If spammers keep working around your spam filter, you will need to keep writing new rules forever. In ML based spam filter, it automatically notices that "For U" has become unusually frequent in spam flagged by users, and it starts flagging them without our intervention.

-> For problems that are too complex for traditional approaches or have no known algorithm.

Speech recognition. eg: a program capable of distinguishing the words one or two. Complexity - Different people in noisy envmts and in dozens of languages. Best solution is an algorithm that learns by itself given many recordings for each word.

-> Help humans learn. Getting insights about complex  problems and large amounts of data.

Inspecting a spam filter will reveal the list of wrds and combinations of words that it believes are the best predictors of spam. Sometimes it reveals unsuspected correlations or new trends. Digging into large amounts of data to discover hidden patterns is called data mining and ML excels at it.

## Applications

1. Image classification,Segmentation (using CNNs or Transformers)

Analyzing images of products on a production line to automatically classify them

Detecting tumors in brain scans

2. Text classification, Summarization (using NLP techniques like RNNs and CNNs,Transformers)

Automatically classifying news articles

Automatically flagging offensive comments on discussion forums.

Summarizing ling documents automatically

Creating a chat bot or personal assistant

3. Regression

Forecasting company's revenue next year based on many performance metrics.

4. Speech recognition (Using RNN,CNN,Transformers)

Making your app react to voice commands

5. Anomaly detection (Using Isolation Forest, Gaussian Mixture Model)

Detecting credit card fraud

6. Clustering (Using K-means, DBSCAN)

Segmenting clients based on their purchases so that you can design different marketing strategy for each segment.

7. Data visualization (using dimensionality reduction techniques)

Representing a complex high dimensional dataset in a clear and insightful diagram.

8. Recommender system (neural network)

Recommending a product that a client may be interested in based on past purchases

9. Reinforcement learning - Branch of ML that trains agents to pick the actions that will maximize their rewards over time within a given envmt. eg: AlphaGo program

Buiding an intelligent bot for a game.



# Types of ML systems

**1. How they are supervised during training**

-> Supervised Learning

Training set we feed to the algorithm contains desired solutions called labels or target. eg: classification - spam filter, regression - car price predictor

-> Unsupervised Learning

Training data is unlabelled.

eg: clustering algorithm to try to detect groups of similar visitors of a blog website. It might notice 40% of visitors are teenagers who love comic books and generally read your blog after school.
We can traget posts for each group.

Visualization algorithms

Dimensionality reduction - simplify the data without losing too much information. One way is to merge several correlated features into one. eg: a car's mileage is strongly correlated with a car's age. This is feature extraction.

Anomaly detection - detecting unusual credit card transactions, automatically removing outliers, catching manufacturing defects. SYstem will be shown mostly normal instances during training so it learns to recognize them. When a new instance is seen, it can easily identify if its a normal one or an anomaly.

Novelty detection - Aims to detect new instances that look different from all instances in the training set. This requires a clean training set devoid of any instance that you would like the algorithm to detect.

Association rule learning - Goal is to dig into large amounts of data and discover interesting relations between attributes. Suppose in a supermarket, sales log may reveal people who buy bread and jam also tend to buy butter. So we may want to place these items close to one another.

-> Semi supervised Learning

Data is partially labelled. Labelling data is time consuming. Plenty of unlabelled instances and few labelled instances. eg: Google Photos

Person A appers in photos 1,5,11. Label person in one photo. It will be able to name everyone in every photo.

Most Semi-supervised algo are combinations of unsupervised and supervised. Clustering is used to group similar instances together. Then every unlabelled instance can be labelled with the most common label in its cluster. Once whole dataset is labelled, then we can use any supervised algo.

-> Self supervised Learning

Generating a fully labelled dataset from a fully unlabelled one. eg: Large dataset of unlabeled images. We can randomly mask a small part of each image and then train a model to recover the original image. During training, masked images are used as innputs to the model and original images are used as the labels.

-> Reinforcement Learning

Learning system called an agent can observe the environment select and perform actions and get rewards in return or penalities in the form of negative rewards. It must then learn by itself whats the best strategy called a policy to get the most reward over time. A policy defines what action the agent should choose when its in given situation. eg: AlphaGO program by DeepMind beat the ranked one player of the world at the game of Go.

**2. Whether or not they can learn incrementally on the fly.**

-> Batch Learning (Offline Learning)

Trained using all the available data. Takes a lot of time and computing resources. Typically done offline. First the system is trained and then launched into production and runs without learning anymore. It just applies what it has learned.

Model's performance tends to decay slowly over time - model rot or data drift. Solution - regularly retrain the model on up to date data. To adapt to change, simply update the data and train a new version of the system from scratch as often as needed.

-> Online Learning

Train the system incrementally by feeding it data instances sequentially, either individually or in small groups called mini batches. Each learning step is cheap and fast. System can learn about new data on the fly. Useful for s/ms that need to adapt to change extremely rapidly eg: detect new patterns in stock market. Also good if computing resources are limited eg: model trained on a mobile device. It can be used to train models on huge datasets that cannot fit in one machine's main memory - out-of-core learning.

How fast they should adapt to changing data - learning rate. High learning rate - System will rapidly adapt to new data but it will also tend to forget the old data. Low learning rate - System will learn more slowly but less sensitive to noise in new data.

Challenge - If bad data- malfunctioning sensor or robot, someone trying to game the s/m(spamming a search engine to get high in search results)- is fed to the s/m, the performance will degrade quickly. If its a live s/m, clients will notice it. Need to monitor s/m closely and promptly switching learning rate off when ther's a performance drop.

***3. Whether they work by simply comparing new data points to known data points, or instead by detecting patterns in the training data and building a predictive model. Basically how they generalize. ***

-> Instance based Learning

System learns the examples by heart then generalizes to new cases by using a similarity measure to compare them to the learned examples. eg: In spam filtering, similarity measure between 2 emails could be the count of words they have in common

-> Model based Learning

Build a model of training examples and then use that model to make predictions. Training - Running an algorithm to find the model parameters that will make it best fit the training data and hopefully make good predictions on new data.

# Challenges of ML (bad model or bad data)

**-> Bad data**

1. Insufficient quantity of training data
2. Non representative training data

Too small sample - Sampling noise. Large samples but nonrepresentative if sampling method is flawed - Sampling bias.

3. Poor quality data
4. Irrelevant features

Coming up with good features is called feature engineering - 2 steps : feature selection (selecting most useful features), feature extraction (combining existing features to produce more useful one), creating new features by gathering new data

**-> Bad model**

1. Overfitting the training data

Overgeneralizing. When the model is too complex relative to the amount and noisiness of the training data.

2. Underfitting the training data

Model is too simple to learn the underlying structure of the data.

