# The Understand data science for machine learning learning path
https://learn.microsoft.com/en-us/training/paths/understand-machine-learning/

## Mod 01: Introduction to machine learning
https://learn.microsoft.com/en-us/training/modules/introduction-to-machine-learning/

Date: 2024-09-22

#### Notes

### Introduction
https://learn.microsoft.com/en-us/training/modules/introduction-to-machine-learning/1-introduction

### What are machine learning models?
https://learn.microsoft.com/en-us/training/modules/introduction-to-machine-learning/2-what-are-ml-models

### Exercise - Create a machine learning model
https://learn.microsoft.com/en-us/training/modules/introduction-to-machine-learning/3-exercise-create-ml-model

We've learned that models are computer code that processes information to make a prediction or a decision. Here, we train a model to guess a comfortable boot size for a dog, based on the size of the harness that fits it.

In the following examples, there's no need to edit any code. Try to read it, understand it, then press the Run button to run it. As always with these notebooks, it's vitally important that these code blocks are run in the correct order, and nothing is missed.

#### Preparing data

In [7]:
import pandas
import wget


In [8]:
# !wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/graphing.py
wget.download('https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/graphing.py')
print('downloaded graphing.py')


downloaded graphing.py


In [10]:
# !wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/Data/doggy-boot-harness.csv
wget.download('https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/Data/doggy-boot-harness.csv')
print ('downloaded doggy-boot-harness.csv')


downloaded doggy-boot-harness.csv


In [12]:
# !pip install statsmodels


In [14]:
# Make a dictionary of data for boot sizes
# and harness sizes in cm
data = {
    'boot_size' : [ 39, 38, 37, 39, 38, 35, 37, 36, 35, 40, 
                    40, 36, 38, 39, 42, 42, 36, 36, 35, 41, 
                    42, 38, 37, 35, 40, 36, 35, 39, 41, 37, 
                    35, 41, 39, 41, 42, 42, 36, 37, 37, 39,
                    42, 35, 36, 41, 41, 41, 39, 39, 35, 39
 ],
    'harness_size': [ 58, 58, 52, 58, 57, 52, 55, 53, 49, 54,
                59, 56, 53, 58, 57, 58, 56, 51, 50, 59,
                59, 59, 55, 50, 55, 52, 53, 54, 61, 56,
                55, 60, 57, 56, 61, 58, 53, 57, 57, 55,
                60, 51, 52, 56, 55, 57, 58, 57, 51, 59
                ]
}

# Convert it into a table using pandas
dataset = pandas.DataFrame(data)

# Print the data
# In normal python we would write
# print(dataset)
# but in Jupyter notebooks, we simply write the name
# of the variable and it is printed nicely 
dataset.head()


Unnamed: 0,boot_size,harness_size
0,39,58
1,38,58
2,37,52
3,39,58
4,38,57


#### Select a model

The first thing we need to do is select a model. We're just getting started, so let's start with a very simple model called OLS. This is just a straight line (sometimes called a trendline).

Let's use an existing library to create our model, but we won't train it yet.

In [15]:
# Load a library to do the hard work for us
import statsmodels.formula.api as smf

# First, we define our formula using a special syntax
# This says that boot_size is explained by harness_size
formula = 'boot_size ~ harness_size'

# Create the model, but don't train it yet
model = smf.ols(formula  = formula, data = dataset)

# Note that we have created our model but it does not 
# have internal parameters set yet
if not hasattr(model, 'params'):
    print("model selected but it does not have parameters set. We need to train it.")



model selected but it does not have parameters set. We need to train it.


#### Train our model
OLS models have two parameters (a slope and an offset), but these haven't been set in our model yet. We need to train (fit) our model to find these values so that the model can reliably estimate dogs' boot size based on their harness size.

The following code fits our model to data you've now seen: