# Maximizing revenue for taxi cab drivers through payment type analysis

## Problem statement

In the fast-paced taxi booking sector, making the most of revenue is essential for long-term success and driver happiness.

Our goal is to use data-driven insights to maximise revenue streams for taxi drivers in order to meet this need. Our research aims to determine whether payment methods have an impact on fare pricing by focusing on the relationship between payment type and fare amount.

## Objective

This project's main goal is to run an A/B test to examine the relationship between the total fare and the method of payment. We use Python hypothesis testing and descriptive statistics to extract useful information that can help taxi drivers generate more cash. In particular, we want to find out if there is a big difference in the fares for those who pay with credit cards versus those who pay with cash.

## Research question

**Is there a relationship between total fare amount and payment type?**
Can we nudge customers towards payment methods that generate higher revenue for drivers, without negatively impacting customer experience?

## Downloading dataset

In [1]:
from urllib.request import urlretrieve

url = "https://data.cityofnewyork.us/api/views/kxp8-n2sj/rows.csv?accessType=DOWNLOAD"
data = urlretrieve(url, "yellow taxi trip.csv")

## Importing library

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as st

import warnings
warnings.filterwarnings('ignore')

## Loading dataset 

In [None]:
%%time

df = pd.read_csv("yellow taxi trip.csv")

In [None]:
df.head()

## Exploratory data analysis

In [None]:
df.shape

In [None]:
df.columns

In [None]:
df.info()

In [None]:
df['tpep_pickup_datetime'] = pd.to_datetime(df['tpep_pickup_datetime'])
df['tpep_dropoff_datetime'] = pd.to_datetime(df['tpep_dropoff_datetime'])

In [None]:
df['duration'] = df['tpep_dropoff_datetime'] - df['tpep_pickup_datetime'] # give in days
df['duration'] = df['duration'].dt.total_seconds()/60 # in minutes

In [None]:
df = df[['passenger_count', 'payment_type', 'fare_amount', 'trip_distance', 'duration']]