<h1>Taxi Demand Prediction</h1>

http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml 

<h3> DATA DICTIONARY </h3>
<table>
<th>Field Name <th>Description
<tr> <td>VendorID <td>A code indicating the TPEP provider that provided the record.
1= Creative Mobile Technologies, LLC; 2= VeriFone Inc.
<tr>
<td>tpep_pickup_datetime <td> The date and time when the meter was engaged.
<tr>
<td>tpep_dropoff_datetime <td>The date and time when the meter was disengaged.
<tr>
<td>Passenger_count <td>The number of passengers in the vehicle.
This is a driver-entered value.
<tr>
<td>Trip_distance <td>The elapsed trip distance in miles reported by the taximeter.
<tr>
<td>PULocationID <td>TLC Taxi Zone in which the taximeter was engaged
<tr><td>DOLocationID <td>TLC Taxi Zone in which the taximeter was disengaged
<tr><td>RateCodeID <td>The final rate code in effect at the end of the trip.
1= Standard rate
2=JFK
3=Newark
4=Nassau or Westchester
5=Negotiated fare
6=Group ride
<tr>
<td>Store_and_fwd_flag <td>This flag indicates whether the trip record was held in vehicle
memory before sending to the vendor, aka “store and forward,”
because the vehicle did not have a connection to the server.
Y= store and forward trip
N= not a store and forward trip
<tr>
<td>Payment_type <td>A numeric code signifying how the passenger paid for the trip.
1= Credit card
2= Cash
3= No charge
4= Dispute
5= Unknown
6= Voided trip
<tr><td>
Fare_amount <td>The time-and-distance fare calculated by the meter.
<tr><td>
Extra <td>Miscellaneous extras and surcharges. Currently, this only includes
the $0.50 and $1 rush hour and overnight charges.
<tr><td>
MTA_tax <td>$0.50 MTA tax that is automatically triggered based on the metered
rate in use.
<tr><td>
Improvement_surcharge <td>$0.30 improvement surcharge assessed trips at the flag drop. The
improvement surcharge began being levied in 2015.
<tr><td>
Tip_amount <td>Tip amount – This field is automatically populated for credit card
tips. Cash tips are not included.
<tr><td>
Tolls_amount <td>Total amount of all tolls paid in trip.
<tr><td>
Total_amount <td>The total amount charged to passengers. Does not include cash tips.
<tr><td>
Congestion_Surcharge <td>Total amount collected in trip for NYS congestion surcharge.
<tr><td>
Airport_fee <td>$1.25 for pick up only at LaGuardia and John F. Kennedy Airports
</table>

In [129]:
import pandas as pd

In [130]:
%pip install pyarrow
%pip install fastparquet

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [131]:
df = pd.read_parquet("C:/Users/GarimaJi/Downloads/yellow_tripdata_2022-01.parquet")

In [132]:
df.columns

Index(['VendorID', 'tpep_pickup_datetime', 'tpep_dropoff_datetime',
       'passenger_count', 'trip_distance', 'RatecodeID', 'store_and_fwd_flag',
       'PULocationID', 'DOLocationID', 'payment_type', 'fare_amount', 'extra',
       'mta_tax', 'tip_amount', 'tolls_amount', 'improvement_surcharge',
       'total_amount', 'congestion_surcharge', 'airport_fee'],
      dtype='object')

**Columns with Null Values** : 
    <ol>
    
        passenger_count

        RatecodeID

        store_and_fwd_flag

        congestion_surcharge

        airport_fee
    

In [133]:
df.loc[df['VendorID'].isnull()]

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee


<h3>Handling Null Values of Airport_fee and Congestion Surcharge </h3>

Airport Fee : All NaN values replaced by zero

Congestion Surcharge : All NaN values replaced by
$$
Total amount -(fare amount+extra+mta tax+tip amount+tolls amount+improvement surcharge)
$$

In [134]:
df['airport_fee']= df['airport_fee'].fillna(0)
df.loc[df['airport_fee'].isnull()]

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee


In [135]:
df['congestion_surcharge']= df['congestion_surcharge'].fillna(df['total_amount']-(df['fare_amount']+df['extra']+df['mta_tax']+df['tip_amount']+df['tolls_amount']+df['improvement_surcharge']))
df

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee
0,1,2022-01-01 00:35:40,2022-01-01 00:53:29,2.0,3.80,1.0,N,142,236,1,14.50,3.0,0.5,3.65,0.0,0.3,21.95,2.5,0.0
1,1,2022-01-01 00:33:43,2022-01-01 00:42:07,1.0,2.10,1.0,N,236,42,1,8.00,0.5,0.5,4.00,0.0,0.3,13.30,0.0,0.0
2,2,2022-01-01 00:53:21,2022-01-01 01:02:19,1.0,0.97,1.0,N,166,166,1,7.50,0.5,0.5,1.76,0.0,0.3,10.56,0.0,0.0
3,2,2022-01-01 00:25:21,2022-01-01 00:35:23,1.0,1.09,1.0,N,114,68,2,8.00,0.5,0.5,0.00,0.0,0.3,11.80,2.5,0.0
4,2,2022-01-01 00:36:48,2022-01-01 01:14:20,1.0,4.30,1.0,N,68,163,1,23.50,0.5,0.5,3.00,0.0,0.3,30.30,2.5,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2463926,2,2022-01-31 23:36:53,2022-01-31 23:42:51,,1.32,,,90,170,0,8.00,0.0,0.5,2.39,0.0,0.3,13.69,2.5,0.0
2463927,2,2022-01-31 23:44:22,2022-01-31 23:55:01,,4.19,,,107,75,0,16.80,0.0,0.5,4.35,0.0,0.3,24.45,2.5,0.0
2463928,2,2022-01-31 23:39:00,2022-01-31 23:50:00,,2.10,,,113,246,0,11.22,0.0,0.5,2.00,0.0,0.3,16.52,2.5,0.0
2463929,2,2022-01-31 23:36:42,2022-01-31 23:48:45,,2.92,,,148,164,0,12.40,0.0,0.5,0.00,0.0,0.3,15.70,2.5,0.0


In [136]:
df=df.drop('store_and_fwd_flag',axis=1)

In [137]:
df.loc[df['passenger_count'].isnull()]
df.iloc[[2395142]]

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee
2395142,2,2022-01-02 15:41:00,2022-01-02 15:42:00,,0.01,,140,140,0,48.61,0.0,0.5,0.0,0.0,0.3,51.91,2.5,0.0


In [138]:
import numpy as np
average=df['total_amount']/df['trip_distance']


In [139]:
mean=np.mean(average[np.isfinite(average)])
mean

11.198534962613284

In [140]:
from numpy.core.umath import ceil

df. loc[df['passenger_count'].isnull() & df['trip_distance']==0,'passenger_count']=0

df.loc[df['passenger_count'].isnull()]


Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee
2392428,2,2022-01-01 00:50:00,2022-01-01 00:54:00,,1.00,,68,246,0,13.20,0.0,0.5,1.75,0.0,0.3,18.25,2.5,0.0
2392429,2,2022-01-01 00:49:24,2022-01-01 01:27:36,,13.31,,257,223,0,44.87,0.0,0.5,10.05,0.0,0.3,55.72,0.0,0.0
2392430,2,2022-01-01 00:42:00,2022-01-01 00:56:00,,2.87,,143,236,0,13.23,0.0,0.5,3.51,0.0,0.3,20.04,2.5,0.0
2392431,2,2022-01-01 00:40:00,2022-01-01 00:55:00,,3.24,,143,262,0,14.19,0.0,0.5,3.72,0.0,0.3,21.21,2.5,0.0
2392432,2,2022-01-01 00:40:00,2022-01-01 00:52:00,,2.19,,239,166,0,13.20,0.0,0.5,5.25,0.0,0.3,21.75,2.5,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2463926,2,2022-01-31 23:36:53,2022-01-31 23:42:51,,1.32,,90,170,0,8.00,0.0,0.5,2.39,0.0,0.3,13.69,2.5,0.0
2463927,2,2022-01-31 23:44:22,2022-01-31 23:55:01,,4.19,,107,75,0,16.80,0.0,0.5,4.35,0.0,0.3,24.45,2.5,0.0
2463928,2,2022-01-31 23:39:00,2022-01-31 23:50:00,,2.10,,113,246,0,11.22,0.0,0.5,2.00,0.0,0.3,16.52,2.5,0.0
2463929,2,2022-01-31 23:36:42,2022-01-31 23:48:45,,2.92,,148,164,0,12.40,0.0,0.5,0.00,0.0,0.3,15.70,2.5,0.0


In [141]:
df. loc[df['passenger_count'].isnull(),'passenger_count']=ceil(df['total_amount']/(mean*df['trip_distance']))

df.loc[df['passenger_count'].isnull()]

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee


In [142]:
df.loc[df['passenger_count'].isnull()]
df.loc[df['passenger_count']== 0.0]

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee
0,1,2022-01-01 00:35:40,2022-01-01 00:53:29,0.0,3.80,1.0,142,236,1,14.50,3.0,0.5,3.65,0.0,0.3,21.95,2.5,0.0
1,1,2022-01-01 00:33:43,2022-01-01 00:42:07,0.0,2.10,1.0,236,42,1,8.00,0.5,0.5,4.00,0.0,0.3,13.30,0.0,0.0
2,2,2022-01-01 00:53:21,2022-01-01 01:02:19,0.0,0.97,1.0,166,166,1,7.50,0.5,0.5,1.76,0.0,0.3,10.56,0.0,0.0
3,2,2022-01-01 00:25:21,2022-01-01 00:35:23,0.0,1.09,1.0,114,68,2,8.00,0.5,0.5,0.00,0.0,0.3,11.80,2.5,0.0
4,2,2022-01-01 00:36:48,2022-01-01 01:14:20,0.0,4.30,1.0,68,163,1,23.50,0.5,0.5,3.00,0.0,0.3,30.30,2.5,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2463840,2,2022-01-31 22:59:00,2022-01-31 22:59:08,0.0,0.00,,48,48,0,8.10,0.0,0.5,2.00,0.0,0.3,13.40,2.5,0.0
2463869,1,2022-01-31 23:38:03,2022-01-31 23:57:57,0.0,0.00,,238,136,0,24.20,0.5,0.5,4.20,0.0,0.3,34.20,4.5,0.0
2463872,2,2022-01-31 23:57:46,2022-02-01 00:06:05,0.0,0.00,,234,161,0,9.92,0.0,0.5,4.23,0.0,0.3,17.45,2.5,0.0
2463873,2,2022-01-31 23:38:02,2022-01-31 23:49:32,0.0,0.00,,68,170,0,10.18,0.0,0.5,1.00,0.0,0.3,14.48,2.5,0.0


In [143]:
df['passenger_count']= df['passenger_count'].fillna(0)
df.loc[df['passenger_count'].isnull()]

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee


In [144]:
df. loc[df['passenger_count']>8,'passenger_count']=8

In [145]:
df.loc[df['passenger_count']==max(df['passenger_count'])]

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee
2392459,6,2022-01-01 00:01:45,2022-01-01 00:01:48,8.0,0.15,,265,129,0,15.20,0.0,0.5,0.00,0.0,0.3,16.00,0.0,0.0
2392466,2,2022-01-01 00:44:47,2022-01-01 00:44:59,8.0,0.08,,239,239,0,13.20,0.0,0.5,3.50,0.0,0.3,20.00,2.5,0.0
2392610,2,2022-01-01 01:57:00,2022-01-01 01:58:32,8.0,0.10,,256,256,0,13.20,0.0,0.5,3.00,0.0,0.3,17.00,0.0,0.0
2392710,2,2022-01-01 01:36:00,2022-01-01 01:39:00,8.0,0.29,,249,249,0,20.18,0.0,0.5,2.00,0.0,0.3,25.48,2.5,0.0
2392737,2,2022-01-01 01:01:00,2022-01-01 01:29:00,8.0,0.02,,90,90,0,13.70,0.0,0.5,3.15,0.0,0.3,20.15,2.5,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2462400,2,2022-01-31 15:21:00,2022-01-31 15:26:00,8.0,0.49,,107,234,0,39.16,0.0,0.5,10.64,0.0,0.3,53.10,2.5,0.0
2462866,2,2022-01-31 17:16:13,2022-01-31 17:16:42,8.0,0.11,,234,234,0,8.69,0.0,0.5,1.91,0.0,0.3,13.90,2.5,0.0
2462975,2,2022-01-31 18:36:02,2022-01-31 18:39:44,8.0,0.28,,263,262,0,21.54,0.0,0.5,6.72,0.0,0.3,31.56,2.5,0.0
2463461,2,2022-01-31 19:31:49,2022-01-31 19:32:07,8.0,0.04,,237,237,0,7.70,0.0,0.5,2.33,0.0,0.3,13.33,2.5,0.0


In [146]:
df.loc[df['RatecodeID']==max(df['RatecodeID'])]

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee
3587,1,2022-01-01 00:10:40,2022-01-01 00:56:08,0.0,0.0,99.0,49,136,1,52.2,0.0,0.5,0.0,6.55,0.3,59.55,0.0,0.0
8172,1,2022-01-01 01:11:42,2022-01-01 01:34:48,0.0,4.2,99.0,254,69,1,23.2,0.0,0.5,0.0,0.00,0.3,24.00,0.0,0.0
8173,1,2022-01-01 01:49:58,2022-01-01 02:12:27,0.0,5.1,99.0,213,41,1,23.2,0.0,0.5,0.0,0.00,0.3,24.00,0.0,0.0
8523,1,2022-01-01 01:02:36,2022-01-01 01:36:11,0.0,0.0,99.0,77,127,1,53.2,0.0,0.5,0.0,6.55,0.3,60.55,0.0,0.0
8524,1,2022-01-01 01:53:03,2022-01-01 02:06:10,0.0,1.6,99.0,74,238,1,15.2,0.0,0.5,0.0,0.00,0.3,16.00,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2386991,1,2022-01-31 21:24:42,2022-01-31 21:59:39,0.0,16.6,99.0,222,65,1,55.2,0.0,0.5,0.0,0.00,0.3,56.00,0.0,0.0
2389700,1,2022-01-31 22:17:46,2022-01-31 22:54:00,0.0,0.0,99.0,153,202,1,31.2,0.0,0.5,0.0,0.00,0.3,32.00,0.0,0.0
2390925,1,2022-01-31 23:29:09,2022-02-01 00:13:12,0.0,0.0,99.0,90,23,1,53.2,0.0,0.5,0.0,19.65,0.3,73.65,0.0,0.0
2391807,1,2022-01-31 23:12:04,2022-02-01 00:17:44,0.0,8.6,99.0,48,20,1,35.2,0.0,0.5,0.0,0.00,0.3,36.00,0.0,0.0


In [147]:
df['RatecodeID']= df['RatecodeID'].fillna(0)
df. loc[df['RatecodeID']>6,'RatecodeID']=6


In [148]:
# for i,row in df.iterrows():
#     grp = row((df['RatecodeID']).notnull())
#     if(df.at[i,'RatecodeID']%6.0==0.0):
#         df.at[i,'RatecodeID']=6.0
#     elif(df.at[i,'RatecodeID']%5.0==0.0):
#         df.at[i,'RatecodeID']=5.0
#     elif(df.at[i,'RatecodeID']%4.0==0.0):
#         df.at[i,'RatecodeID']=4.0
#     elif(df.at[i,'RatecodeID']%3.0==0.0):
#         df.at[i,'RatecodeID']=3.0
#     elif(df.at[i,'RatecodeID']%2.0==0.0):
#         df.at[i,'RatecodeID']=2.0
#     else:
#         df.at[i,'RatecodeID']=1.0
        

In [149]:
df.dtypes


VendorID                          int64
tpep_pickup_datetime     datetime64[ns]
tpep_dropoff_datetime    datetime64[ns]
passenger_count                 float64
trip_distance                   float64
RatecodeID                      float64
PULocationID                      int64
DOLocationID                      int64
payment_type                      int64
fare_amount                     float64
extra                           float64
mta_tax                         float64
tip_amount                      float64
tolls_amount                    float64
improvement_surcharge           float64
total_amount                    float64
congestion_surcharge            float64
airport_fee                     float64
dtype: object

In [150]:
df.loc[df['RatecodeID'].isnull()]

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee


In [151]:
import datetime
import time

In [152]:
def convert_to_unix(s):
    # return time.mktime(datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S").timetuple())
    return (s- np.datetime64('1970-01-01T00:00:00Z'))/np.timedelta64(1, 's')


In [153]:
duration = df[['tpep_pickup_datetime', 'tpep_dropoff_datetime']]
# pickups and dropoffs to unix time
duration_pickup = [convert_to_unix(x) for x in duration['tpep_pickup_datetime'].values]
duration_drop = [convert_to_unix(x) for x in duration['tpep_dropoff_datetime'].values]
# calculate duration of trips
durations = (np.array(duration_drop) - np.array(duration_pickup))/float(60)

  return (s- np.datetime64('1970-01-01T00:00:00Z'))/np.timedelta64(1, 's')


In [154]:
# append durations of trips and speed in miles/hr to a new dataframe
new_frame = df[['passenger_count', 'trip_distance', 'PULocationID','DOLocationID','total_amount']].copy()
new_frame['trip_time'] = durations
new_frame['pickup_times']= duration_pickup
new_frame['Speed'] = 60 *(new_frame['trip_distance']/new_frame['trip_time'])

In [155]:
new_frame

Unnamed: 0,passenger_count,trip_distance,PULocationID,DOLocationID,total_amount,trip_time,pickup_times,Speed
0,0.0,3.80,142,236,21.95,17.816667,1.640997e+09,12.797007
1,0.0,2.10,236,42,13.30,8.400000,1.640997e+09,15.000000
2,0.0,0.97,166,166,10.56,8.966667,1.640998e+09,6.490706
3,0.0,1.09,114,68,11.80,10.033333,1.640997e+09,6.518272
4,0.0,4.30,68,163,30.30,37.533333,1.640997e+09,6.873890
...,...,...,...,...,...,...,...,...
2463926,1.0,1.32,90,170,13.69,5.966667,1.643672e+09,13.273743
2463927,1.0,4.19,107,75,24.45,10.650000,1.643673e+09,23.605634
2463928,1.0,2.10,113,246,16.52,11.000000,1.643672e+09,11.454545
2463929,1.0,2.92,148,164,15.70,12.050000,1.643672e+09,14.539419


In [156]:
# from datetime import datetime
# dtpick = df['tpep_pickup_datetime']
# dtdrop=df['tpep_dropoff_datetime']
# tpick = (dtpick - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')
# tpick
# tdrop = (dtdrop - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')
# tdrop
# duration=(tdrop-tpick)/float(60)
# duration
# new_df=df[['passenger_count','trip_distance','PULocationID','DOLocationID','total_amount']]
# new_df['trip_times']=duration
# new_df['pickup_times']=tpick
# new_df['speed']=60*(new_df['trip_distance']/new_df['trip_times'])



Z Score Normalization

In [157]:
from scipy.stats import zscore

In [164]:
data = new_frame['total_amount']
#calculate z-score
result = zscore(data)
#Print the result
new_frame['total_amount']=result
# print("Z-score array: ",result)

test and train split:

In [165]:
from sklearn.model_selection import train_test_split
X= df.loc[:, ['passenger_count', 'trip_distance' ]]
y= df.loc[:, ['PULocationID']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
print(X_train)
print(y_train)

         passenger_count  trip_distance
2044952              0.0           0.53
1494259              0.0           1.57
366609               0.0          10.50
96145                0.0           1.20
1335765              0.0           2.34
...                  ...            ...
804318               0.0           0.92
41987                0.0           1.50
1566988              0.0           0.58
1472658              0.0           1.89
1722377              0.0           1.73

[1971144 rows x 2 columns]
         PULocationID
2044952           239
1494259            50
366609            222
96145             113
1335765            68
...               ...
804318             90
41987             211
1566988           239
1472658           140
1722377           161

[1971144 rows x 1 columns]


Linear regression:

In [166]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
regressor = LinearRegression()
regressor.fit(X_train, y_train)

In [167]:
score = regressor.predict(X_test)
print("Coefficient : ",regressor.coef_)
print("MSE",mean_squared_error(y_test,score))

Coefficient :  [[ 2.04048803e+00 -4.02456244e-04]]
MSE 4279.157474009512


In [168]:
# from sklearn.preprocessing import MinMaxScaler

In [169]:
# data=new_frame['Speed']
# scaler=MinMaxScaler()
# print(scaler.fit(data))
# MinMaxScaler()
# print(scaler.transform(data))