# SciKit-Learn

[tutorial](https://scikit-learn.org/stable/tutorial/basic/tutorial.html)

Machine learning was formerly the preserve of advanced research computer science, but tools such as SciKit-Learn and TensorFlow have opened up the field. These tool allow data analysts to begin creating predictive analytics, especially since they are open-source so no additional commercial budgeting is required.

SciKit-Learn is built on matplotlib, NumPy and SciPy, so the core functions will be familiar.

These notes are largely extensions of what is in the SciKit-Learn tutorials.

## Core methods of machine learning in SciKit-Learn

### Classification
Encompasses image recognition, spam detection and some of the main ways of powering autonomous vehicles, as well as lots of software that detects potential heart problems.

### Regression
Used to detect continuous trends between an attribute and an object. Core uses for this are things like prediction of future stock pricing or economi performance.

### Clustering
The grouping of lots of similar objects into segments or sets

### Dimensionality Reduction
Reducing the number of variables required in order to make a particular decision. What are the key features affecting a particular dataset? Think of what are the most relevant correlations in the Fisher's iris dataset. This often helps in reducing a particularly large piece of information down to its most relevant parts.

### Model Selection
Comparing and choosing parameters and models

### Preprocessing
Feature extraction and normalisation of datasets, or signal processing, the example Brian used is instead of bringing in the entire heartbeat data, calculating the distance between beats to use that.

In [2]:
from sklearn import datasets
iris = datasets.load_iris()
digits = datasets.load_digits()

print(iris)

{'data': array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
     