#### Module 1: Fundamentals of Machine Learning-Intro to SciKit Learn

**What is Machine Learning (ML)**

- the study of computer programs (algorithms) that can learn by example
- ML algorithms can generalize from existing examples of a task
    - After seeing a training set o flabeled images, an image classifier can figure out how to apply labels accurately to new, previosly unseen images

- Algorithms learn rules from *labelled examples*
- A set of labellied examples used for learning is called training data
- The learned rules should also be able to generalize to correctly recognize or predic new examples not in the training set

ML models learn from experience:
- Labeled examples (Email spam detection)
- User feedback (Clicks on a search page)
- Surrounding enviroment (self-driving cars)

Machine Learning brings together statistics, computer science, and more..

- statistical methods
    - Infer conclusions from data
    - Estimate reliability of predictions
- Computer science
    - Large-scale computing architectures
    - Algorithms for capturing, manipulationg, indexing, combining, retrieving and performing predicions on data
    - Software pipelines that manage the complexity of multiple subtasks
- Economics, biology, psychology
    - How can an individual or system efficiently improve their performance in a given environment?
    - What is learning and how can it be optimized?

 **Key Concepts in Machine Learning**

 - Supervised machine learning: Learn to predict target values from labelled data.
    - SML Classification (target values are discrete classes)
    - SML Regression (target values are continuous values)
    - Training set $(X,Y)$ where $X=[x_1,x_2,...,x_i,...,x_n]$: Samble, $Y=[y_1,y_2,...,y_i,...,y_n]$: Target value (Label)
    - Classifier $f: X \to Y$, at training time, the classifier uses labelled exmples to learn rules for recognizing each fruit type.
    - Label: After training, at prediction time, the trained model is used to predict type for new instances using the learned rules
    - Explicit labels: Human judges/annotators
    - Implicit labels: Clicking and reading the "Mackinac Island" result can be an implicit label for search engine to learn that "Mackinac Island" is especially relevant for the query for that specific user

- Unsupervised machine learning: Find structure in unlabeled data
   - Find groups of similar instances in the data (Clustering)
   - Finding unusual patterns (outlier detection)

**A Basic Machine Learning Workflow**

1. Representation: Choose:
   - A feature representation
   - Type of classifier to use
2. Evaluation: Choose:
   - What criterion distinguishes good vs bad classifiers
3.  Optimization: Choose:
   - How to search for settings/parameters that give the best classifier for this evaluation criterion

$1 \to 2 \to 3 \to 2 \to 3 ...$

**Feature Representations**

Examples:

 - Email: A list of words with their frequency counts.
 - Picture: A matrix of color values (pixeles)
 - Sea Creatures: A set of attribute values

In [4]:
pip install seaborn

Defaulting to user installation because normal site-packages is not writeable
Collecting seaborn
  Downloading seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
Downloading seaborn-0.13.2-py3-none-any.whl (294 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.9/294.9 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: seaborn
Successfully installed seaborn-0.13.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [6]:
pip install graphviz

Defaulting to user installation because normal site-packages is not writeable
Collecting graphviz
  Downloading graphviz-0.20.3-py3-none-any.whl.metadata (12 kB)
Downloading graphviz-0.20.3-py3-none-any.whl (47 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.1/47.1 kB[0m [31m427.7 kB/s[0m eta [36m0:00:00[0m[36m0:00:01[0m
[?25hInstalling collected packages: graphviz
Successfully installed graphviz-0.20.3

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [7]:
pip install scikit-learn

Defaulting to user installation because normal site-packages is not writeable
Collecting scikit-learn
  Downloading scikit_learn-1.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting joblib>=1.2.0 (from scikit-learn)
  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Downloading threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.4/13.4 MB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hDownloading joblib-1.4.2-py3-none-any.whl (301 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.8/301.8 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hDownloading threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, joblib, scikit-learn
Successf

In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
import graphviz
import sklearn as sk