# The Effects of the Coronavirus on the NYC Taxi Industry

The goal of this project is to answer a number of questions about the effects of the coronavirus on travel by taxi in NYC.

# Import

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime, date
import warnings

warnings.filterwarnings("ignore")

In [2]:
taxi_2019 = pd.read_csv('C:\\Users\\15164\\Desktop\\nyc-taxis-vs-covid\\data\\taxi_2019.csv')
taxi_2020 = pd.read_csv('C:\\Users\\15164\\Desktop\\nyc-taxis-vs-covid\\data\\taxi_2020.csv')

In [3]:
taxi_2019.head(2)

Unnamed: 0,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,PULocationID,DOLocationID,payment_type,fare_amount,tip_amount,tolls_amount,total_amount,day_of_week,Start_Zone,End_Zone
0,2019-03-22 08:22:39,2019-03-22 08:32:50,1.0,1.45,170,107,1.0,8.5,2.95,0.0,14.75,Friday,Murray Hill-Queens,Gravesend
1,2019-03-21 15:31:46,2019-03-21 15:55:19,1.0,2.0,186,163,1.0,15.0,1.5,0.0,20.8,Thursday,Port Richmond,Midtown South


In [15]:
taxi_2020.head(2)

Unnamed: 0,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,PULocationID,DOLocationID,payment_type,fare_amount,tip_amount,tolls_amount,total_amount,day_of_week,Start_Zone,End_Zone
0,2020-03-08 00:00:33,2020-03-08 00:11:14,2.0,1.06,148,4,1.0,8.0,2.36,0.0,14.16,Sunday,Madison,Arden Heights
1,2020-03-08 00:02:10,2020-03-08 00:06:24,1.0,0.73,74,41,2.0,5.0,0.0,0.0,6.3,Sunday,East Harlem South,Central Harlem North


# Question 1:
The first questions this analysis will answer are the following:

**1-** What was the most expensive trip before the pandemic and between what zones did it occur? 

**2-** What was the least expensive trip before the pandemic and where did it occur?

**3-** Conversely, what were the most and least expensive trips during the pandemic and between which zones did they occur?

### 1- What was the most expensive trip before the pandemic and between what zones?

In [6]:
most_expensive = taxi_2019.sort_values('total_amount', ascending=False)

In [7]:
most_expensive.head(1)

Unnamed: 0,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,PULocationID,DOLocationID,payment_type,fare_amount,tip_amount,tolls_amount,total_amount,day_of_week,Start_Zone,End_Zone
2014869,2019-03-15 19:00:45,2019-03-15 19:12:48,1.0,0.0,233,79,3.0,943274.8,141492.02,0.0,1084772.17,Friday,Union Sq,East Williamsburg


As we can see above, according to our data **the most expensive trip in 2019 occured on March 15th and was for a total amount of a whopping $1,084,772.17. It was between Union Square and East Williamsburg.**

It's however important to note that while this trip was recorded **it was not paid for**: according to our data dictionary, the payment_type '3.0' means that there was no charge. So while I will keep note of what the most expensive record is, I'd prefer to see what the most expensive trip **paid** for was.

### 2 - What was the least expensive trip before the pandemic and where did it occur?

To answer this question, I'll first select the trips that have a total amount greater than $0.00 because anything less most likely means that no actual trip was taken or they may have been an error in the data entry.

Then I'll sort my data in order from lowest 'total_amount' to highest and display the lowest entry only.

In [8]:
least_expensive = taxi_2019.loc[taxi_2019['total_amount'] > 0]

In [9]:
least_expensive = least_expensive.sort_values('total_amount', ascending=True)

In [10]:
least_expensive.head(1)

Unnamed: 0,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,PULocationID,DOLocationID,payment_type,fare_amount,tip_amount,tolls_amount,total_amount,day_of_week,Start_Zone,End_Zone
24063600,2019-06-15 08:54:15,2019-06-15 08:54:59,1.0,0.0,236,236,2.0,0.01,0.0,0.0,0.01,Saturday,Upper East Side South,Upper East Side South


According to this, the least expensive trip was **$0.01 and occured within one zone only, the Upper East Side South**. When examining the pickup and drop off times, I notice that there is only a 44 second difference between these two which tells me that while the meter was started, there most likely was no trip taken; the rider (or driver) probably changed his/her mind before taking off.

In theory, this answer could answer our question. However, since the question is "what is the least expensive ***trip***, and barely a trip was actually taken, I've decided to define what a trip actually means by again filtering my data to reflect only the trips that have different pick up and drop off locations:

In [12]:
least_expensive = least_expensive.loc[least_expensive['PULocationID'] != least_expensive['DOLocationID']]

In [13]:
least_expensive.head(1)

Unnamed: 0,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,PULocationID,DOLocationID,payment_type,fare_amount,tip_amount,tolls_amount,total_amount,day_of_week,Start_Zone,End_Zone
3304069,2019-03-21 05:21:43,2019-03-21 05:22:44,1.0,0.0,181,25,2.0,0.01,0.0,0.0,0.01,Thursday,Parkchester,Borough Park


And now we have a more appropriate answer to our question, which is: the cheapest trip occured on March 21, 2019, lasted for a minute and one second, and **cost a mere $0.01. It occured between the two zones of Parkchester and Borough Park.**

### 3- What are the most and least expensive trips to occur during the pandemic?

#### Most expensive:

In [16]:
most_expensive = taxi_2020.sort_values('total_amount', ascending=False)

In [17]:
most_expensive.head(1)

Unnamed: 0,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,PULocationID,DOLocationID,payment_type,fare_amount,tip_amount,tolls_amount,total_amount,day_of_week,Start_Zone,End_Zone
366384,2020-03-10 09:58:11,2020-03-10 10:09:46,1.0,0.0,193,193,2.0,2.5,0.0,0.0,1000003.8,Tuesday,Randalls Island,Randalls Island


As seen above, the most expensive trip during the peak of the pandemic was for **$1,000,003.8 and though it lasted for 11 minutes, was in one zone only, Randalls Island.**

I will also have a look at the most expensive trip with a different pick up and drop off location:

In [18]:
most_expensive2 = most_expensive.loc[most_expensive['PULocationID'] != most_expensive['DOLocationID']]
most_expensive2.head(1)

Unnamed: 0,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,PULocationID,DOLocationID,payment_type,fare_amount,tip_amount,tolls_amount,total_amount,day_of_week,Start_Zone,End_Zone
5619447,2020-10-07 10:35:56,2020-10-07 10:40:14,1.0,0.7,41,42,3.0,998310.03,0.0,0.0,998325.61,Wednesday,Central Harlem North,Central Park


**The most expensive trip with a drop off location different from the pickup location was for $998,310.03 and was between Central Harlem North and Central Park.** It lasted for just over 5 minutes. However, since this trip was **not paid for** ('payment_type' 3.0 means 'no charge'), I will look at the most expensive paid trip:

In [36]:
most_expensive2 = most_expensive2.loc[most_expensive2['payment_type'] != 3]
most_expensive2.head(1)

MemoryError: Unable to allocate 480. MiB for an array with shape (7, 8989064) and data type float64

#### Least Expensive:

In [19]:
least_expensive = taxi_2020.loc[taxi_2020['total_amount'] > 0]
least_expensive = least_expensive.sort_values('total_amount', ascending=True)
least_expensive.head(1)

Unnamed: 0,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,PULocationID,DOLocationID,payment_type,fare_amount,tip_amount,tolls_amount,total_amount,day_of_week,Start_Zone,End_Zone
2806051,2020-07-17 21:38:05,2020-07-17 21:39:40,1.0,0.0,229,229,3.0,0.11,0.0,0.0,0.11,Friday,Times Sq/Theatre District,Times Sq/Theatre District


Again, the least expensive trip occured within **one zone only, Times Sq/Theatre District, and was $0.11**. While this trip lasted just over a minute, there most likely was no trip taken again; the rider (or driver) probably changed his/her mind before taking off.

So again, I will find the least expensive trip with different pick up and drop off locations:

In [20]:
least_expensive = least_expensive.loc[least_expensive['PULocationID'] != least_expensive['DOLocationID']]
least_expensive.head(1)

Unnamed: 0,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,PULocationID,DOLocationID,payment_type,fare_amount,tip_amount,tolls_amount,total_amount,day_of_week,Start_Zone,End_Zone
8020365,2020-11-24 05:13:17,2020-11-24 05:15:07,1.0,1.8,56,82,3.0,0.0,0.0,0.0,0.3,Tuesday,Corona,Elmhurst/Maspeth


**The cheapest trip occured on November 24th, 2020, lasted for a 2 minutes, and cost a mere $0.30. It occured between the two zones of Corona and Elmhurst/Maspeth.**

### Are there any major differences between pre-pandemic and peak pandemic most and least expensive trips?

While there was some difference between the most expensive trips of 2019 and 2020: **84,768 dollars**  between the most expensive and **$0.29** between the cheapest, there isn't enough information to effectively determine if whether or not COVID-19 influenced these specific charges or not.

# Question 3:
**1-** What is the most popular payment method?

**2-** Did the pandemic affect the payment methods?