# Objective
- create 3 features: **travel_distance, day_of_week, percentage_of_day**

The objective of this notebook is to show you how to handle DATES and COORDINATES.

By: Hugo Lopes

In [13]:
%matplotlib inline

import numpy as np 
import pandas as pd
from subprocess import check_output
from tqdm import tqdm
import matplotlib.pyplot as plt
print(check_output(["ls", "../input"]).decode("utf8"))

In [6]:
df = pd.read_csv('../input/train.csv', parse_dates=['pickup_datetime', 'dropoff_datetime'])
df.head()

Let us reduce a bit the dataset for the following actions (to 10% of the data)...

In [7]:
df = df.sample(round(df.shape[0]*0.10), random_state=3435).sort_index()
df.shape

## New Feature: Distance (between pickup and dropoff)

In [11]:
# ref: https://stackoverflow.com/questions/19412462/getting-distance-between-two-points-based-on-latitude-longitude
import geopy.distance

distance_feature = np.zeros(df.shape[0])
for k in range(df.shape[0]):
    distance_feature[k] = geopy.distance.vincenty(tuple(df[['pickup_latitude', 'pickup_longitude']].iloc[k,:]), 
                                               tuple(df[['dropoff_latitude', 'dropoff_longitude']].iloc[k,:])).km

In [21]:
# Distance in km
df['travel_distance'] = distance_feature
df['travel_distance'].head()

## New feature: Day of Week

In [20]:
# The day of the week with Monday=0, Sunday=6
df['day_of_week'] = df['pickup_datetime'].apply(lambda x: x.dayofweek)
df['day_of_week'].head()

## New Feature: Percentage of the Day
Useful to know if night, morning, rush hour, etc...

In [23]:
df['pct_of_day'] = df['pickup_datetime'].apply(lambda x: (x.hour*60 + x.minute)/1440)
df['pct_of_day'].head()

# Summary
Three features were created:  
- 'travel_distance'  
- 'day_of_week'  
- 'pct_of_day'  

They can be expanded! It is just a matter of exploring them! I hope you find this useful! Time to go to bed :P