### Introduction

This notebook contains work related to Regression using Pytorch's ANN, the task is to predict price of the taxi fare within New York City. 

__Feature Info:__

__1).__ __pickup_datetime__:- _The timestamp at which passenger is picked up._ 

__2).__ __fare_amount__:- _The amount in USD charged._

__3).__ __fare_class__:- _Class of the fare._

__4).__ __pickup_latitude__:- _Latitude of Pickup Location._

__5).__ __pickup_longitude__:- _Longitude of Pickup Location._

__6).__ __dropoff_latitude__:- _Latitude of Dropoff Location._

__7).__ __dropoff_longitude__:- _Longitude of Dropoff Location._

__8).__ __passenger_count__:- _Number of passenger in the ride._

### Setup

In [1]:
%load_ext autoreload
%autoreload 2

In [16]:
from ann_taxi_functions import (get_info, read_csv_file, get_shape,
                                convert_to_category_type, haversine_distance)

### Reading the file.

In [3]:
data = read_csv_file(file_name='NYCTaxiFares.csv')
data.head()

Unnamed: 0,pickup_datetime,fare_amount,fare_class,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,passenger_count
0,2010-04-19 08:17:56 UTC,6.5,0,-73.992365,40.730521,-73.975499,40.744746,1
1,2010-04-17 15:43:53 UTC,6.9,0,-73.990078,40.740558,-73.974232,40.744114,1
2,2010-04-17 11:23:26 UTC,10.1,1,-73.994149,40.751118,-73.960064,40.766235,2
3,2010-04-11 21:25:03 UTC,8.9,0,-73.990485,40.756422,-73.971205,40.748192,1
4,2010-04-17 02:19:01 UTC,19.7,1,-73.990976,40.734202,-73.905956,40.743115,1


### Getting the shape of the data.

In [17]:
get_shape(data=data)

(120000, 9)

### Getting info of data.

In [13]:
get_info(data=data)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120000 entries, 0 to 119999
Data columns (total 9 columns):
 #   Column             Non-Null Count   Dtype  
---  ------             --------------   -----  
 0   pickup_datetime    120000 non-null  object 
 1   fare_amount        120000 non-null  float64
 2   fare_class         120000 non-null  int64  
 3   pickup_longitude   120000 non-null  float64
 4   pickup_latitude    120000 non-null  float64
 5   dropoff_longitude  120000 non-null  float64
 6   dropoff_latitude   120000 non-null  float64
 7   passenger_count    120000 non-null  int64  
 8   Distance in Kms    120000 non-null  float64
dtypes: float64(6), int64(2), object(1)
memory usage: 8.2+ MB


### Applying Haversine Distance to calculate distance in Kms.

In [14]:
data['Distance in Kms'] = haversine_distance(lat1=data['pickup_latitude'], lon1=data['pickup_longitude'],
                                             lat2=data['dropoff_latitude'], lon2=data['dropoff_longitude'])

data

Unnamed: 0,pickup_datetime,fare_amount,fare_class,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,passenger_count,Distance in Kms
0,2010-04-19 08:17:56 UTC,6.5,0,-73.992365,40.730521,-73.975499,40.744746,1,2.13
1,2010-04-17 15:43:53 UTC,6.9,0,-73.990078,40.740558,-73.974232,40.744114,1,1.39
2,2010-04-17 11:23:26 UTC,10.1,1,-73.994149,40.751118,-73.960064,40.766235,2,3.33
3,2010-04-11 21:25:03 UTC,8.9,0,-73.990485,40.756422,-73.971205,40.748192,1,1.86
4,2010-04-17 02:19:01 UTC,19.7,1,-73.990976,40.734202,-73.905956,40.743115,1,7.23
...,...,...,...,...,...,...,...,...,...
119995,2010-04-18 14:33:03 UTC,15.3,1,-73.955857,40.784590,-73.981941,40.736789,1,5.75
119996,2010-04-23 10:27:48 UTC,15.3,1,-73.996329,40.772727,-74.049890,40.740413,1,5.77
119997,2010-04-18 18:50:40 UTC,12.5,1,-73.988574,40.749772,-74.011541,40.707799,3,5.05
119998,2010-04-13 08:14:44 UTC,4.9,0,-74.004449,40.724529,-73.992697,40.730765,1,1.21
