# Regression model for Taxi fares using NimbusML

Regression is a ML task type of supervised machine learning algorithms. A regression ML model predicts continuous value outputs (such as numbers). For instance, predicting the fare of a Taxi trip or predicting the price of a car is a regression problem.

## Verify your NimbusML version

In [1]:
import nimbusml
from nimbusml import FileDataStream, DataSchema
print("nimbusml version: ", nimbusml.__version__)

nimbusml version:  1.7.0


## Load your data

In [3]:
# These are actually faster than Pandas Dataframes
# Do you want to know more? https://arxiv.org/pdf/1905.05715.pdf
ds_train = FileDataStream.read_csv("../datasets/taxi/taxi-fare-train.csv")
ds_test = FileDataStream.read_csv("../datasets/taxi/taxi-fare-test.csv")
ds_train.head(5)

Unnamed: 0,VendorId,RateCode,PassengerCount,TripTime,TripDistance,PaymentType,FareAmount
0,CMT,1,1,1271,3.8,CRD,17.5
1,CMT,1,1,474,1.5,CRD,8.0
2,CMT,1,1,637,1.4,CRD,8.5
3,CMT,1,1,181,0.6,CSH,4.5
4,CMT,1,1,661,1.1,CRD,8.5


## Verify the schema of your data 

In [None]:
# Want to know more about the Schema?
# https://docs.microsoft.com/en-us/nimbusml/concepts/schema

# Want to know more about data types? in Nimbus?
# https://docs.microsoft.com/en-us/nimbusml/concepts/types
ds_train.schema

#### This is the infered schema. Does this look right to you?

In [None]:
# I'm changing the type of some columns
# the defaults are not smart, I know more of the problem...
ds_train.schema['RateCode'].type = "TX"
ds_train.schema['PassengerCount'].type = "R4"
ds_train.schema['TripTime'].type = "R4"
ds_train.schema['TripDistance'].type = "R4"
ds_train.schema[6].type = "R4"

In [None]:
ds_train.schema

In [None]:
# you can also set it at load time:
# schema = DataSchema.read_schema("./datasets/taxi/taxi-fare-train.csv", collapse = False, sep = ',')
#modify then load
ds_test = FileDataStream.read_csv("./datasets/taxi/taxi-fare-test.csv",  schema=ds_train.schema  )

In [None]:
ds_test.schema

## Data transformations pipeline for NimbusML model

In [None]:
from nimbusml import Pipeline
from nimbusml.feature_extraction.categorical import OneHotVectorizer
from nimbusml.preprocessing.normalization import MeanVarianceScaler

# https://docs.microsoft.com/en-us/python/api/nimbusml/nimbusml.feature_extraction.categorical.onehotvectorizer?view=nimbusml-py-latest
onv = OneHotVectorizer()        <<['VendorId','RateCode','PaymentType']
mvs = MeanVarianceScaler()      <<['PassengerCount','TripTime','TripDistance']

preprocess_pipeline = Pipeline([onv,mvs])

## Make it a training pipeline

In [None]:
from nimbusml.linear_model import OrdinaryLeastSquaresRegressor

olsr = OrdinaryLeastSquaresRegressor(feature= ['VendorId', 'RateCode', 'PassengerCount', 'TripTime','TripDistance', 'PaymentType'] , label='FareAmount')

training_pipeline = preprocess_pipeline.clone()

In [None]:
training_pipeline.append(olsr)

## Fit both pipelines

In [None]:
# fit the pipelines
preprocess_pipeline.fit(ds_train, 'y')
training_pipeline.fit(ds_train, 'y')

print("preprocess pipeline:", preprocess_pipeline)
print("training pipeline:", training_pipeline)

## Observe the transformed data

In [None]:
preprocess_pipeline.transform(ds_train)

## Measure the performance on the training and test set

In [None]:
metrics, scores = training_pipeline.test(ds_train, output_scores=True)
#print(scores) # uncomment this if you want to look at the scores
metrics

In [None]:
metrics, scores = training_pipeline.test(ds_test, output_scores=True)
#print(scores) # uncomment this if you want to look at the scores
metrics

## Visualize the pipeline

In [None]:
### Does this not work?
### make sure you've installed graphviz https://graphviz.gitlab.io/download/
### make sure you've run set_path_graphviz.bat 
from nimbusml.utils.exports import img_export_pipeline
figure = img_export_pipeline(training_pipeline,ds_train)
figure

## Save the model

In [None]:
print("Saving the ML.NET Model as a file...")

model_file_path = "../models/ml_net_taxi_python.zip"
training_pipeline.save_model(model_file_path)

print("The model was saved to: {}".format(model_file_path))