# Regression model for Taxi fares using NimbusML

Regression is a ML task type of supervised machine learning algorithms. A regression ML model predicts continuous value outputs (such as numbers). For instance, predicting the fare of a Taxi trip or predicting the price of a car is a regression problem.

## Verify your NimbusML version

In [None]:
import nimbusml
print( nimbusml.__version__)

## Load your data

In [None]:
from nimbusml import FileDataStream

# these are actually faster than Pandas Dataframes
# Do you want to know more? https://arxiv.org/pdf/1905.05715.pdf
ds_train = FileDataStream.read_csv("./datasets/taxi/taxi-fare-train.csv")
ds_test = FileDataStream.read_csv("./datasets/taxi/taxi-fare-test.csv")
ds_train.head(5)

## Verify the schema of your data 

In [None]:
# Want to know more about the Schema?
# https://docs.microsoft.com/en-us/nimbusml/concepts/schema

# Want to know more about data types? in Nimbus?
# https://docs.microsoft.com/en-us/nimbusml/concepts/types
ds_train.schema

## Data transformations pipeline for NimbusML model

In [None]:
from nimbusml import Pipeline
from nimbusml.feature_extraction.categorical import OneHotVectorizer
from nimbusml.preprocessing.normalization import MeanVarianceScaler

# https://docs.microsoft.com/en-us/python/api/nimbusml/nimbusml.feature_extraction.categorical.onehotvectorizer?view=nimbusml-py-latest
onv = OneHotVectorizer()        <<['vendor_id','rate_code','payment_type']
mvs = MeanVarianceScaler()      <<['passenger_count','trip_time_in_secs','trip_distance']

dataProcessPipeline = Pipeline([onv,mvs])

In [None]:
result = dataProcessPipeline.fit_transform(ds_train, 'y')
print(dataProcessPipeline)
result

In [None]:
result.columns[16]

In [None]:
from nimbusml.linear_model import OrdinaryLeastSquaresRegressor
olsr = OrdinaryLeastSquaresRegressor(feature= list(result.columns[0:16]) , label='fare_amount')

In [None]:
trainingPipeline = dataProcessPipeline.append(olsr)