This lab will create a very simple Azure ML experiment based on car data.
The lab will use Linear Regression to predict the price of a car based on its features (brand, door, bhp etc).
The lab is split into two parts. The first deals with training the machine learning experiment, the second deals with publishing it as a predictive experiment and calling teh API endpoint.
In this part, we'll create a training experiment in Machine Learning studio.
In this section, we'll create a new blank experiment and upload our data.
- Sign into the Azure Machine learning studio using this short link: http://aiday.info/MLStudio
- Datasets > New > From Local File >
Car prices.csv
- Experiments > New > Blank experiment
- Save the experiment as
Car Price Prediction
using the bottom command bar
In this section, we'll add the data as the starting point in our experiment.
The ML Studio uses a drag and drop canvas where you drag modules from the left side navigation and drop them onto the canvas. You then 'stitch' modules together by dragging a line between the input/output ports on the modules. The ports are the small circles at the top (input) and bottom (output) of the modules.
- Drag Saved Datasets > My DataSets >
Car prices.csv
to the canvas - Visualize the dataset. Do this by right-clicking the output port > Visualise
It is important that machine learning is performed on clean, uniformed, 'prepared' data.
The Car prices.csv
data has some missing values so it is not ready for machine learning yet.
In this section, we'll use the 'Clean missing data' module to to remove any rows that have any missing data. The output of this will be a clean, prepared data set that is ready for machine learning.
- Drag the Transforms >
Clean missing data
module (or just search for it) - Connect the output port of
Car prices.csv
to the input ofClean missing data
- Click on
Clean missing data
and use the right side panel to set the Cleaning mode = "Remove entire row" - Run the experiment using bottom command bar (the green arrow) and observe green ticks. This indicates that everything is working as it should be
- Right-click > Visualise the output port of
Clean missing data
and note that the rows with missing data have been removed.
Before we can apply machine learning algorithms, we must reserve some data to test what the algorithms learnt (i.e. compare the predicted car price to the actual one).
We'll use the Split Data module to split the data into 75% for training and 25% for testing
- Drag the Data Transformation > Sample & Split >
Split Data
module (or search for it) - Connect to output port of
Clean Missing Data
to the input port ofSplit Data
- Click on 'Split Data' and use the right side panel to set
Fraction of rows in the first output dataset
to 0.75 - Run the experiment and observe the green ticks.
The left output port of the Split Data
module now represents a random 75% of the data and the right output port represents a random 25%.
The linear regression algorithm is the machine learning algorithm that is best suited for this task of predicting a single data point.
In this section, we'll add the Linear Regression algorithm to the experiment.
- Drag the Machine Learning > Initialize Model > Regression >
Linear Regression
module (or just search for it) - Place next to the
Split data
module
We want to train on the price field. This means we want to use Linear Regression to learn what factors in the data affect and impact the price and then use those factors to predict the price for each car. The predicted price is called a 'Scored Label'.
- Drag the Machine Learning > Train >
Train Model
module (or search for it) - Connect the left input port of
Train Model
to the output port forLinear regression
- Connect the right input port of
Train Model
to the left output port ofSplit Data
- Click on
Train Model
and click theLaunch column selector
in the right side panel - Add
price
as a selected column - Run the experiment and observe the green ticks
At this point we can see that we're using the Linear Regression algorithm to train on price using 75% of the data set.
We now need to score the model by comparing the model we've trained against the remaining 25% of data to see how accurate the price prediction is.
- Drag the Machine Learning > Score >
Score Model
module (or search for it) - Connect the left input port of
Score Model
to the output port ofTrain Model
- Connect the right input port of
Score Model
to right output port ofSplit data
- Run the experiment and observe the green ticks
- Right-click > Visualise the output port of
Score Model
- Compare the
price
toscored label
. This shows that the predicted price (i.e. scored label) is in the right 'ball park' compared to the actual price.
We now have a functional training experiment.
In this section, we'll convert the training experiment to a predictive experiment and test the the API with some new data.
So far the experiment has just been a 'training experiment'. We now need to convert it to a model that can be used to score new data.
- Run the experiment and observe the green ticks
- Using the bottom command bar open the
Setup Web Service
menu and choosePredictive Web Service
- Run the new predictive experiment (this takes approx 30 seconds)
- Using the bottom command bar,
Deploy Web Service
. The experiment will now be deployed and you'll see a screen containing the endpoint, key and some test interfaces when it is completed
We'll now use our deployed predictive experiment to test some new car data and get a new predicted price.
- Using the left navigation panel, go to Web Services >
Car Price Prediction [Predictive Exp]
- Click
Test (preview)
. This is in the Test column for the request/response endpoint - not the big blue button, the small link next to it - Complete the
Input1
form with the following data- make =
audi
- fuel =
diesel
- doors =
four
- body =
hatchback
- drive =
fwd
- weight =
1900
- engine-size =
150
- bhp =
150
- mpg =
55
- price =
23000
- make =
- Click
Test Request-Response
- Observe
scored labels
(the predicted price) is lower than the actual price of £23,000. We know the model is right because it is an Audi and therefore it is overpriced :)