# G2M insight for Cab Investment firm 

## Table of Content
1. [Business problem](#1)<br>
2. [Hypothesis](#2)
3. [Data Understanding and Preparation](#3)
4. [Exploratory Data Analysis](#4)
5. [Profit Analysis](#5)
6. [Demand Analysis](#6)
7. [Client Analysis](#7)
8. [Conclusion](#8)

<a id="1"></a> <br>
## 1. Business problem

The main purpose of this notebook is to understand the market before investing in the Cab industry according to Go-to-Market(G2M) strategy. Go-to-market strategies tend to focus on the short-term, but effective ones will also consider how any immediate success can be sustained over a longer period. 

<a id="2"></a> <br>
## 2. Hypothesis
 
1. How does the profit change over time?
2. How does the percentage of profitable trips change by the city?
3. How does average profit change by holidays?
4. How does the demand of the cab industry change over time?
5. How the demand varies according to age?
6. Loyalty of customers
7. Fluctuations of payment methods

<a id="3"></a> <br>
## 3. Data Understanding and Preparation

#### Importing nescessary packages

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

#### Cloning the dataset from Github

In [2]:
!git clone https://github.com/DataGlacier/DataSets.git

#### Loading all the datasets

In [3]:
cab_data = pd.read_csv('./DataSets/Cab_Data.csv')
city = pd.read_csv('./DataSets/City.csv')
customer_id = pd.read_csv('./DataSets/Customer_ID.csv')
transaction_id = pd.read_csv('./DataSets/Transaction_ID.csv')
holiday_data = pd.read_csv('../input/us-holiday-dates-2004-2021/US Holiday Dates (2004-2021).csv')

#### CAB DATASET

In [4]:
print('raws = ',cab_data.shape[0], 'Columns = ', cab_data.shape[1])
print(cab_data.dtypes)
cab_data.head()

In [5]:
# Time Analysis - Unit Conversion
def to_date_format(n):
    date_str =(datetime(1899,12,30) + timedelta(n-1)).strftime("%d-%m-%Y")
    date_date =  datetime.strptime(date_str, "%d-%m-%Y")
    return date_date

cab_data['Date of Travel']=cab_data['Date of Travel'].apply(lambda x:to_date_format(x))
cab_data.head()

In [6]:
cab_data=cab_data.sort_values(by=['Date of Travel'])
cab_data=cab_data.reset_index(drop=True)

print('rows=',cab_data.shape[0], 'colums=', cab_data.shape[1])
cab_data.head(10)

In [7]:
round(cab_data.describe(include='all',datetime_is_numeric=True),2)

In [8]:
cab_data.dtypes

#### CITY DATASET

In [9]:
print('raws = ',city.shape[0], 'Columns = ', city.shape[1])
city.head(20)

In [10]:
city.dtypes

In [11]:
city['Population'] = [x.replace(',','') for x in city['Population']]
city['Users'] = [x.replace(',','') for x in city['Users']]
city['Population'] = city['Population'].astype(float)
city['Users'] = city['Users'].astype(float)
city.dtypes

In [12]:
city.head()

#### CUSTOMER ID DATASET

In [13]:
print('raws = ',customer_id.shape[0], 'Columns = ', customer_id.shape[1])
customer_id.head()

In [14]:
customer_id.dtypes

Every data type is correct

#### TRANSACTION ID DATASET

In [15]:
print('raws = ',transaction_id.shape[0], 'Columns = ', transaction_id.shape[1])
transaction_id.head()

In [16]:
transaction_id.dtypes

Every data type is correct

#### HOLIDAY DATASET

In [17]:
print('raws = ',holiday_data.shape[0], 'Columns = ', holiday_data.shape[1])
holiday_data.head()

In [18]:
holiday_data.dtypes

In [19]:
holiday_data['Date'] = pd.to_datetime(holiday_data['Date'])
holiday_data.dtypes

#### MASTER DATASET

In [20]:
cab_data.head(1)

In [21]:
city.head(1)

In [22]:
customer_id.head(1)

In [23]:
transaction_id.head(1)

In [24]:
holiday_data.head(1)

The time period of cab_data is from 31/01/2016 to 31/12/2018 and the Time period of holiday_data is from 2004 - 2021. Therefore all the necessary holiday data is existing in the holiday_data dataset.

In [25]:
cab_data['is_holiday'] = cab_data['Date of Travel'].isin(holiday_data['Date']).astype(bool)
cab_data.head()

In [26]:
master_df = cab_data.merge(transaction_id, on= 'Transaction ID').merge(customer_id,on ='Customer ID').merge(city, on = 'City')
print('raws = ',master_df.shape[0], 'Columns = ', master_df.shape[1])
master_df.head()

<a id="4"></a> <br>
## 4. Exploratory Data Analysis

#### Description of the Dataset

In [27]:
round(master_df.describe(datetime_is_numeric=True, include='all'),2)

#### FEATURES ANALYSIS

#### Distance Travelled (Km)

In [28]:
plt.figure(figsize=(9,6))
ax=sns.histplot(data=master_df, x="KM Travelled",bins=60, color= 'yellow')
plt.title('Km Travelled Distribution', fontsize=15)
plt.ylabel('Frequency')
plt.xlabel('Km Travelled')

#### Charged price

In [29]:
plt.figure(figsize=(9,6))
sns.histplot(data=master_df, x="Price Charged",bins=60 , color= 'lime')
plt.title('Price Charged Distribution', fontsize=15)
plt.ylabel('Frequency')
plt.xlabel('Price Charged')

#### Cost of trip

In [30]:
plt.figure(figsize=(9,6))
sns.histplot(data= master_df,x="Cost of Trip",bins=60 , color= 'red')
plt.title('Cost of Trip Distribution', fontsize=20)
plt.ylabel('Frequency')
plt.xlabel('Cost of Trip')

#### Payment Mode

In [31]:
plt.figure(figsize=(9,6))
sns.countplot(data=master_df, x="Payment_Mode", palette='rocket')
plt.title("Payment's Countplot", fontsize=15)
plt.ylabel('Frequency')
plt.xlabel('Payment Mode')

#### Gender

In [32]:
plt.figure(figsize=(9,6))
sns.countplot(data=master_df, x="Gender", palette='rocket')
plt.title('Gender', fontsize=15)
plt.ylabel('Frequency')
plt.xlabel('Gender')

#### Age distribution

In [33]:
plt.figure(figsize=(9,6))
sns.histplot(data= master_df,x="Age",bins=48 , color= 'blue')
plt.title('Age', fontsize=15)
plt.ylabel('Frequency')
plt.xlabel('Age')

#### Cost per Km

In [34]:
plt.figure(figsize=(15,6))
sns.lineplot(data=master_df, x="KM Travelled",y='Cost of Trip', color= 'green')
plt.title('Cost of Trip Vs KM Travelled', fontsize=15)
plt.xlabel('KM Travelled')
plt.ylabel('Cost of Trip')
plt.show()

#### Holiday Data

In [35]:
plt.figure(figsize=(9,6))
sns.countplot(data=master_df, x="is_holiday", palette='rocket')
plt.title('Holiday Data', fontsize=15)
plt.ylabel('Frequency')
plt.xlabel('is_holiday')

#### CORRELATIONS

In [36]:
master_df.corr()

#### Check Data Types

In [37]:
master_df.dtypes

In [38]:
master_df['Income (USD/Month)'] = master_df['Income (USD/Month)'].astype(float)
master_df['Population'] = master_df['Population'].astype(int)
master_df['Users'] = master_df['Users'].astype(int)
master_df.dtypes

Now, every feature is in an appropiate data type.

#### Missing Values

In [39]:
master_df.apply(lambda x: sum(x.isnull()),axis=0)

There are not missing values

#### Check Duplicates

In [40]:
duplicated_rows = master_df[master_df.duplicated()]
print('The number of duplicated rows', duplicated_rows.shape[0])

#### OUTLIERS

In [41]:
plt.figure(figsize=(18,9))

plt.subplot(1,2,1)
sns.set(font_scale = 1.3)
sns.boxplot(data = master_df, y = 'Company', x = "Price Charged", dodge=False)


plt.subplot(1,2,2)
sns.set(font_scale = 1.3)
sns.boxplot(data = master_df, y = 'Gender', x = "Income (USD/Month)", dodge=False)


plt.tight_layout()
plt.show()

According to the graphs, there are outliers in the `Price Charged` feature. But since we do not have enough information on the components that made the `Price Charged`, it is not appropriate to treat it as an outlier.

#### Time Series

Creating a new dataset same as master dataframe and setting 'Date of Travel' as index to work properly Time Series Visualizations.

In [42]:
master_df['Year of Travel'] = master_df['Date of Travel'].dt.year
master_df['Month of Travel'] = master_df['Date of Travel'].dt.month
master_df['Day of Travel'] = master_df['Date of Travel'].dt.day
master_df['Profit'] = master_df['Price Charged'] - master_df['Cost of Trip']

master_st=master_df.set_index('Date of Travel')
master_st.sort_values('Date of Travel').head()

#### YELLOW CAB VS PINK CAB

Creating separate datasets for each company.

#### Yellow Cab

In [43]:
yellow_cab_st= master_st[master_st['Company']=='Yellow Cab']
print(yellow_cab_st.shape)
yellow_cab_st.head()

In [44]:
yellow_cab= master_df[master_df.Company.isin(['Yellow Cab'])]
print(yellow_cab.shape)
yellow_cab.head()

#### Pink Cab

In [45]:
pink_cab_st= master_st[master_st.Company.isin(['Pink Cab'])]
print(pink_cab_st.shape)
pink_cab_st.head()

In [46]:
pink_cab= master_df[master_df.Company.isin(['Pink Cab'])]
print(pink_cab.shape)
pink_cab.head()

#### PROFIT SUMMARY

In [47]:
p=master_df.groupby(['Company', 'Year of Travel']).Profit.sum().to_frame('Profit')
p.head(6)

In [48]:
q=master_df.groupby(['Company', 'Year of Travel'])['Price Charged'].sum().to_frame('Price')
q.head()

In [49]:
q['%Profit'] =(p['Profit']*100) /q['Price']
q.head(6)

<a id="5"></a> <br>
## 5. Profit Analysis

#### Annual Profits

In [50]:
y=yellow_cab_st.Profit.resample('Y').sum()
ypy= pd.DataFrame(y)

p=pink_cab_st.Profit.resample('Y').sum()
ppy= pd.DataFrame(p)

print(ypy)
print(ppy)

y=yellow_cab_st.Profit.resample('m').sum()
ypm= pd.DataFrame(y)
p=pink_cab_st.Profit.resample('m').sum()
ppm= pd.DataFrame(p)

print(ypm.head())
print(ppm.head())

In [51]:
ypy['ProfitORides']= ypy['Profit']/yellow_cab['Date of Travel'].value_counts().resample('Y').sum()
ppy['ProfitORides']= ppy['Profit']/pink_cab['Date of Travel'].value_counts().resample('Y').sum()
ypm['ProfitORides']= ypm['Profit']/yellow_cab['Date of Travel'].value_counts().resample('m').sum()
ppm['ProfitORides']= ppm['Profit']/pink_cab['Date of Travel'].value_counts().resample('m').sum()
ypy['ProfitOKM']= ypy['Profit']/yellow_cab_st['KM Travelled'].resample('Y').sum()
ppy['ProfitOKM']= ppy['Profit']/pink_cab_st['KM Travelled'].resample('Y').sum()
ypm['ProfitOKM']= ypm['Profit']/yellow_cab_st['KM Travelled'].resample('m').sum()
ppm['ProfitOKM']= ppm['Profit']/pink_cab_st['KM Travelled'].resample('m').sum()

In [52]:
plt.figure(figsize=(15,6))

fig= yellow_cab_st.Profit.resample('Y').sum().plot.line(color = '#FFD801',label='Yellow Cab Company',linewidth=3, marker='o')
fig=pink_cab_st.Profit.resample('Y').sum().plot.line(color = '#FF1493',label='Pink Cab Company',linewidth=3, marker='o')
plt.ylabel('Profit [Millions]', fontsize=20)
plt.title('Annual Profits',fontsize=25)
plt.xlabel('Year',  fontsize=20)
plt.xticks(rotation=0, fontsize=15)
plt.yticks(rotation=0, fontsize=15)
plt.legend(loc='best', shadow=True, fontsize=15)

In [53]:
total_profits_sum = round(master_df.groupby(['Company']).Profit.sum().to_frame('Total Profit'),0)
total_profits_sum

In [54]:
print("It seems that Yellow Cab's profits are", round(total_profits_sum['Total Profit'][1]/total_profits_sum['Total Profit'][0]), "times higher than Pink Cab's Company over the last 3 years.")

#### Monthly Profits

In [55]:
plt.figure(figsize=(30,9))

fig= yellow_cab_st.Profit.resample('m').sum().plot.line(color = '#FFD801',label='Yellow Cab Company',linewidth=3, marker ='o',fontsize=25)
fig=pink_cab_st.Profit.resample('m').sum().plot.line(color = '#FF1493',label='Pink Cab Company',linewidth=3, marker='o', fontsize=25)
plt.xticks(rotation=0, fontsize=25)
plt.yticks(rotation=0, fontsize=25)
plt.ylabel('Profit [Millions]',fontsize=30)
plt.title('Monthly Profits',fontsize=40)
plt.xlabel('Month',fontsize=30)
plt.legend(loc='best', shadow=True, fontsize=30)

In [56]:
dpm=master_df.groupby(['Company','Month of Travel'])['Profit'].sum().to_frame('Profit')
dpm=dpm.reset_index(level='Month of Travel')
dpm=dpm.reset_index(level='Company')
dpm.head()

In [57]:
y=yellow_cab_st.Profit.resample('m').sum().to_frame('Profit')
p=pink_cab_st.Profit.resample('m').sum().to_frame('Profit')

In [58]:
print('The average monthly profit of Yellow Cab Company is ', round(y.Profit.mean(),1),"  and it's std deviation is ",round(y.Profit.std(),2), 'so, the percentage of the deviation is', round((y.Profit.std() /y.Profit.mean())*100,2),'%')
print('The average monthly profit of Pink Cab Company is ', round(p.Profit.mean(),1),"  and it's std deviation is ",round(p.Profit.std(),2), 'so, the percentage of the deviation is', round((p.Profit.std() /p.Profit.mean())*100,2),'%')

We can see that over the months, Yellow Cab Company's earnings are more stable with fluctuations of 23.08% while PinkCab Company's earnings vary with fluctuations of 61.22%.

### Average profits over Rides

Profit per Ride is an indicator that measures how efficient the company is, in terms of operational costs.

`Profit per Ride = (Total Profits over a certain Period of Time) / ( Number of Rides over that period of Time)`

In [59]:
plt.figure(figsize=(30,12))

plt.subplot(1,2,2)
x1= ypy.ProfitORides.resample('Y').sum()
x2= ppy.ProfitORides.resample('Y').sum()
plt.bar(x= x1.index.strftime('%Y'), height='ProfitORides', data = ypy,color = '#FFD801',label='Yellow Cab Company')
plt.bar(x= x2.index.strftime('%Y'), height='ProfitORides', data = ppy, color = '#FF1493',label='Pink Cab Company')
plt.xticks(rotation=0, fontsize=25)
plt.yticks(rotation=0, fontsize=25)
plt.title('Annual Profits over Rides',fontsize=40)
plt.ylabel('Profit / Rides [USD]',fontsize=30)
plt.xlabel('Year',fontsize=30)
plt.legend(loc='upper right', shadow=True, fontsize=18)

plt.subplot(1,2,1)
x1= ypy.ProfitORides.resample('Y').sum()
x2= ppy.ProfitORides.resample('Y').sum()
plt.plot(x1.index.strftime('%Y'),'ProfitORides', data = ypy,color = '#FFD801',label='Yellow Cab Company',linewidth=3, marker = 'o')
plt.plot(x2.index.strftime('%Y'),'ProfitORides', data = ppy, color = '#FF1493',label='Pink Cab Company',linewidth=3, marker='o')
plt.xticks(rotation=0, fontsize=25)
plt.yticks(rotation=0, fontsize=25)
plt.title('Annual Profits over Rides',fontsize=40)
plt.ylabel('Profit / Rides [USD]', fontsize=30)
plt.xlabel('Year', fontsize=30)
plt.legend(loc='upper right', shadow=True, fontsize=18)

Profit per ride decreases over time in both companies. 

#### Yearly POR per Company

Here is the function to add labels on top of each bar in a bar chart.

In [60]:
def add_value_labels(ax, spacing=5):
    """Add labels to the end of each bar in a bar chart.

    Arguments:
        ax (matplotlib.axes.Axes): The matplotlib object containing the axes
            of the plot to annotate.
        spacing (int): The distance between the labels and the bars.
    """

    # For each bar: Place a label
    for rect in ax.patches:
        # Get X and Y placement of label from rect.
        y_value = rect.get_height()
        x_value = rect.get_x() + rect.get_width() / 2

        # Number of points between bar and label. Change to your liking.
        space = spacing
        # Vertical alignment for positive values
        va = 'bottom'

        # If value of bar is negative: Place label below bar
        if y_value < 0:
            # Invert space to place label below
            space *= -1
            # Vertically align label at top
            va = 'top'

        # Use Y value as label and format number with one decimal place
        label = "${:.0f}".format(y_value)

        # Create annotation
        ax.annotate(
            label,                      # Use `label` as label
            (x_value, y_value),         # Place label at end of the bar
            xytext=(0, space),          # Vertically shift label by `space`
            textcoords="offset points", # Interpret `xytext` as offset in points
            ha='center',                # Horizontally center label
            va=va, fontsize=3)                      # Vertically align label differently for
                                        # positive and negative values.


# Call the function above. All the magic happens there.
add_value_labels(ax)

In [61]:
ypy_new = ypy.reset_index(level='Date of Travel')
ypy_new.head()

In [62]:
g=sns.catplot(x='Date of Travel',y='ProfitORides',data=ypy_new,kind='bar')

ax = g.facet_axis(0,0)
for p in ax.patches:
    ax.text(p.get_x() + 0.125, 
            p.get_height() * 1.02, 
            "${:.2f}".format(p.get_height()), 
            color='black', rotation='horizontal', size='small')
años = ['2016','2017','2018']
mapeado = range(len(años))
plt.xticks(mapeado, años, rotation =0)
plt.title('Yearly POR Yellow Cab Company',fontsize=15)
plt.ylabel('Profit / Rides [USD]')
plt.xlabel('Year')

In [63]:
ppy_new = ppy.reset_index('Date of Travel')
ppy_new

In [64]:
g=sns.catplot(x='Date of Travel',y='ProfitORides',data=ppy_new,kind='bar')

ax = g.facet_axis(0,0)
for p in ax.patches:
    ax.text(p.get_x() + 0.125, 
            p.get_height() * 1.02, 
            "${:.2f}".format(p.get_height()), 
            color='black', rotation='horizontal', size='small')
years = ['2016','2017','2018']
noy = range(len(years))
plt.xticks(noy, years, rotation =0)
plt.title('Yearly POR PinkCab Company',fontsize=15)
plt.ylabel('Profit / Rides [USD]')
plt.xlabel('Year')

We can conclude that the profit per ride of Yellow cab company is higher than Pink cab company over three years.

#### Monthly profits over Rides

In [65]:
plt.figure(figsize=(30,9))

x1= ypm.ProfitORides.resample('m').sum()
x2= ppm.ProfitORides.resample('m').sum()
plt.plot(x1.index,'ProfitORides', data = ypm,color = '#FFD801', linewidth = 3,label='Yellow Cab Company', marker='o')
plt.plot(x2.index,'ProfitORides', data = ppm, color = '#FF1493',linewidth = 3,label='Pink Cab Company', marker='o')
plt.xticks(rotation=0,fontsize=20)
plt.yticks(rotation=0,fontsize=20)
plt.title('Monthly Profits over Rides',fontsize=30)
plt.ylabel('Profit / Rides [USD]',fontsize=25)
plt.xlabel('Month',fontsize=25)
plt.legend(loc='upper right', shadow=True, fontsize=20)

### Average profits over KM

#### Annual profits over KM

In [66]:
plt.figure(figsize=(30,9))

plt.subplot(1,2,2)
x1= ypy.ProfitOKM.resample('Y').sum()
x2= ppy.ProfitOKM.resample('Y').sum()
plt.bar(x= x1.index.strftime('%Y'), height='ProfitOKM', data = ypy,color = '#FFD801',label='Yellow Cab Company')
plt.bar(x= x2.index.strftime('%Y'), height='ProfitOKM', data = ppy, color = '#FF1493',label='Pink Cab Company')
plt.xticks(rotation=0,fontsize=20)
plt.title('Annual Profits over KM',fontsize=30)
plt.ylabel('Profit / KM [USD]',fontsize=25)
plt.xlabel('Year',fontsize=25)
plt.xticks(rotation=0,fontsize=20)
plt.yticks(rotation=0,fontsize=20)
plt.legend(loc='upper right', shadow=True,fontsize=20)

plt.subplot(1,2,1)
x1= ypy.ProfitOKM.resample('Y').sum()
x2= ppy.ProfitOKM.resample('Y').sum()
plt.ylabel('Profit / KM [USD]', fontsize=20)
plt.plot(x1.index.strftime('%Y'),'ProfitOKM', data = ypy,color = '#FFD801',label='Yellow Cab Company',linewidth=3,marker='o')
plt.plot(x2.index.strftime('%Y'),'ProfitOKM', data = ppy, color = '#FF1493',label='Pink Cab Company',linewidth=3,marker='o')
plt.title('Annual Profits over KM',fontsize=30)
plt.ylabel('Profit / KM [USD]',fontsize=25)
plt.xlabel('Year',fontsize=25)
plt.xticks(rotation=0,fontsize=20)
plt.yticks(rotation=0,fontsize=20)
plt.legend(loc='upper right', shadow=True,fontsize=20)

Profit per Km decreases over time in both companies.

#### Monthly profit over KM

In [67]:
plt.figure(figsize=(30,9))

x1= ypm.ProfitOKM.resample('m').sum()
x2= ppm.ProfitOKM.resample('m').sum()
plt.ylabel('Profit / KM [USD]', fontsize=25)
plt.plot(x1.index,'ProfitOKM', data = ypm,color = '#FFD801',label='Yellow Cab Company',linewidth=3, marker='o')
plt.plot(x2.index,'ProfitOKM', data = ppm, color = '#FF1493',label='Pink Cab Company',linewidth=3,marker='o')
plt.xticks(rotation=0,fontsize=20)
plt.yticks(rotation=0,fontsize=20)
plt.title('Monthly Profits over KM',fontsize=30)
plt.ylabel('Profit / KM [USD]',fontsize=25)
plt.xlabel('Month',fontsize=25)
plt.legend(loc='upper right', shadow=True,fontsize=20)

'Profit per Km' in Yellow cab company is higher than Pink cab company for each and every month.

### Average profits per City

In [68]:
ppc= yellow_cab.groupby('City').Profit.sum()
ppc= pd.DataFrame(ppc)
ppc = ppc.sort_values(by='Profit', ascending= False )
ppc.head()

plt.figure(figsize=(22,11))
fig= ppc.Profit.plot.bar(color = '#FFD801',edgecolor='black',linewidth=1.5)
plt.xticks(rotation=45,fontsize=20)
plt.yticks(rotation=0,fontsize=20)

plt.ylabel('Profits [Millions of USD]', fontsize=25)
plt.xlabel('Cities', fontsize=25)
plt.title('Yellow Cab Profits over City',fontsize=30)
add_value_labels(fig)

In [69]:
ppc= pink_cab.groupby('City').Profit.sum()
ppc= pd.DataFrame(ppc)
ppc = ppc.sort_values(by='Profit', ascending= False )
ppc.head()

plt.figure(figsize=(22,11))
fig= ppc.Profit.plot.bar(color = '#FF1493',edgecolor='black',linewidth=1.5)
plt.xticks(rotation=45,fontsize=20)
plt.yticks(rotation=0,fontsize=20)

plt.ylabel('Profits [Millions of USD]', fontsize=25)
plt.xlabel('Cities', fontsize=25)
plt.title('Pink Cab Profits over City',fontsize=30)
add_value_labels(fig)

In [70]:
h=master_df.groupby(['Company','City']).Profit.sum().to_frame('Profit Over City')
h=h.reset_index(level='City', col_level=1)
h=h.reset_index(level='Company', col_level=1)
h = h.sort_values(by='Profit Over City', ascending= False )

In [71]:
h=h.drop([28,9],axis=0)
h.head()

In [72]:
g=sns.catplot(x='City',y='Profit Over City',data=h,kind='bar',palette = 'rocket',col='Company', height=9, aspect=1.2)

ax = g.facet_axis(0,0)
for p in ax.patches:
    ax.text(p.get_x() + 0.3, 
            p.get_height(), 
            "  {:.0f}".format(p.get_height()), 
            color='black', rotation='vertical', size='large')
plt.ylabel('Profits [Millions of USD]', fontsize=25)
plt.xlabel('Cities', fontsize=25)
plt.title('Yellow Cab Profits over City',fontsize=30)
plt.xticks(rotation=80,fontsize=15)
plt.yticks(rotation=0,fontsize=20)
ax = g.facet_axis(0,1)
for p in ax.patches:
    ax.text(p.get_x() + 0.3, 
            p.get_height(), 
            "  {:.0f}".format(p.get_height()), 
            color='black', rotation='vertical', size='large')
plt.ylabel('Profits [Millions of USD]', fontsize=25)
plt.xlabel('Cities', fontsize=25)
plt.title('Pink Cab Profits over City',fontsize=30)
plt.xticks(rotation=80,fontsize=15)
plt.yticks(rotation=0,fontsize=20)
plt.show()


In this analysis New York city has been removed from both cab companies to get a better sight of the profits over the other cities.

And we can conclude that Yellow Cab has greater market share in every City.


#### Citywise profitable rides (percentage) 

In [73]:
yellow_rides = pd.DataFrame(yellow_cab['City'])
yellow_rides['is_profitable'] = pd.DataFrame(yellow_cab['Profit']>0)
yellow_rides = yellow_rides.groupby(['City','is_profitable']).size().to_frame()
yellow_rides = yellow_rides.rename(columns = {0:'count'})
yellow_rides = yellow_rides.pivot_table('count', ['City'], 'is_profitable')

yellow_rides = yellow_rides.fillna(0)

yellow_rides['profitability_%'] = round(((yellow_rides[True]/(yellow_rides[True]+yellow_rides[False]))*100),2)
yellow_rides = yellow_rides.reset_index(level='City', col_level=1)
yellow_rides = yellow_rides.rename_axis(None, axis=1)
yellow_rides = yellow_rides.sort_values(by='profitability_%', ascending= False)

#list(yellow_rides.columns)
yellow_rides

In [74]:
yellow_rides_i = yellow_rides.set_index('City')
yellow_rides_i

In [75]:
pink_rides = pd.DataFrame(pink_cab['City'])
pink_rides['is_profitable'] = pd.DataFrame(pink_cab['Profit']>0)
pink_rides = pink_rides.groupby(['City','is_profitable']).size().to_frame()
pink_rides = pink_rides.rename(columns = {0:'count'})
pink_rides = pink_rides.pivot_table('count', ['City'], 'is_profitable')

pink_rides = pink_rides.fillna(0)
pink_rides['profitability_%'] = round(((pink_rides[True]/(pink_rides[True]+pink_rides[False]))*100),2)
pink_rides = pink_rides.reset_index(level='City', col_level=1)
pink_rides = pink_rides.rename_axis(None, axis=1)
pink_rides = pink_rides.sort_values(by='profitability_%', ascending= False )

#list(yellow_rides.columns)
pink_rides

In [76]:
pink_rides_i = pink_rides.set_index('City')
pink_rides_i

In [77]:
plt.figure(figsize=(30,9))
plt.bar(x= yellow_rides_i.index, height='profitability_%', data = yellow_rides_i,edgecolor = 'black',color = '#FFD801',linewidth=1,label='Yellow Cab')
plt.bar(x= pink_rides_i.index, height='profitability_%', data = pink_rides_i,edgecolor = 'black',color = '#FF1493',linewidth=1,label='Pink Cab')
plt.xticks(rotation=45,fontsize=20)
plt.yticks(rotation=0,fontsize=20)
plt.ylabel('Profitable Rides  [%]', fontsize=25)
plt.title('Citywise Profitable rides percentage ', fontsize = 30)
plt.legend(loc='upper right', shadow=True)
plt.xlabel('Cities', fontsize=25)

In [78]:
g=sns.catplot(x='City',y='profitability_%',data=yellow_rides,kind='bar',palette = 'rocket', height=8.27, aspect=18/8.27)

ax = g.facet_axis(0,0)
for p in ax.patches:
    ax.text(p.get_x() + 0.195, 
            p.get_height() * 1.02, 
            "{:.0f}%".format(p.get_height()), 
            color='black', rotation='horizontal', size='large')
plt.ylabel('Profitable [ % ]', fontsize=16)
plt.xlabel('Cities', fontsize=16)
plt.title('Yellow Cab Profitable Rides %',fontsize=20)
plt.xticks(rotation=45)

plt.show()

In [79]:
g=sns.catplot(x='City',y='profitability_%',data=pink_rides,kind='bar',palette = 'rocket', height=8.27, aspect=18/8.27)

ax = g.facet_axis(0,0)
for p in ax.patches:
    ax.text(p.get_x() + 0.195, 
            p.get_height() * 1.02, 
            "{:.0f}%".format(p.get_height()), 
            color='black', rotation='horizontal', size='large')
plt.ylabel('Profitable [ % ]', fontsize=16)
plt.xlabel('Cities', fontsize=16)
plt.title('Pink Cab Profitable Rides %',fontsize=20)
plt.xticks(rotation=45)
plt.show()

Now, we can answer the second hypothesis. For that, let's assume that if the profitability percentage of rides per city is higher than 80%, it will perform well.<br>
Then we can say that:<br>
<br>
profitability percentage of rides change by cities and Yellow Cab has a high performance acording to the analysis by mantaining a high level of profitable rides in every City.




#### Profits in holiday

In [80]:
y_hc = yellow_cab.groupby(['Date of Travel','is_holiday']).is_holiday.count().to_frame('trips')
y_hc=y_hc.reset_index(level='is_holiday', col_level=1)
y_hc=y_hc.reset_index(level='Date of Travel', col_level=1)
y_count =  y_hc['is_holiday'].value_counts()

p_hc = pink_cab.groupby(['Date of Travel','is_holiday']).is_holiday.count().to_frame('trips')
p_hc=p_hc.reset_index(level='is_holiday', col_level=1)
p_hc=p_hc.reset_index(level='Date of Travel', col_level=1)
p_count =  p_hc['is_holiday'].value_counts()

y_count.to_frame()
p_count.to_frame()
y_count=y_count.reset_index()
p_count=p_count.reset_index()

In [81]:
y_pc = yellow_cab.groupby(['Date of Travel','is_holiday']).Profit.sum().to_frame('profit')
y_pc=y_pc.reset_index(level='is_holiday', col_level=1)
y_pc=y_pc.reset_index(level='Date of Travel', col_level=1)
y_profit = y_pc.groupby(['is_holiday']).profit.sum()

p_pc = pink_cab.groupby(['Date of Travel','is_holiday']).Profit.sum().to_frame('profit')
p_pc=p_pc.reset_index(level='is_holiday', col_level=1)
p_pc=p_pc.reset_index(level='Date of Travel', col_level=1)
p_profit = p_pc.groupby(['is_holiday']).profit.sum()

y_profit.to_frame()
p_profit.to_frame()
y_profit=y_profit.reset_index()
p_profit=p_profit.reset_index()

In [82]:
y_profit['avg_profit_per_day']= y_profit['profit']/y_count['is_holiday']
y_profit.insert(0,'Company' ,"Yellow_cab")
y_profit

In [83]:
p_profit['avg_profit_per_day']= p_profit['profit']/p_count['is_holiday']
p_profit.insert(0,'Company' ,'pink_cab')
p_profit

In [84]:
holiday_profits = y_profit.append(p_profit, ignore_index=True)
holiday_profits

In [85]:
g=sns.catplot(x='is_holiday',y='avg_profit_per_day',data=holiday_profits,kind='bar',hue='Company',palette=sns.color_palette(['#FFD801','#FF1493']), height=8.27, aspect=11.7/8.27)

ax = g.facet_axis(0,0)
for p in ax.patches:
    ax.text(p.get_x() + 0.015, 
            p.get_height() * 1.02, 
            "{:.0f}".format(p.get_height()), 
            color='black', rotation='horizontal', size='medium')
plt.title('Average profit per day',fontsize=30)
plt.ylabel('Average Profit',fontsize=25)
plt.xlabel('Is Holiday?',fontsize=25)
plt.xticks(rotation=0,fontsize=20)
plt.yticks(rotation=0,fontsize=20)
plt.show()

In Yellow cabs, the average profit per holiday is less than the average profit per normal day. But when considering the average profit per day in pink cab company, they have gained a bit of higher profit on holidays than normal days. 

<a id="6"></a> <br>
## 6. Demand Analysis

### Demand

#### Yearly Demand

In [86]:
plt.figure(figsize=(30,12))

yellow_cab['Date of Travel'].value_counts().resample('Y').sum().plot.line(color = '#FFD801',linewidth =3,marker='o')
pink_cab['Date of Travel'].value_counts().resample('Y').sum().plot.line(color = '#FF1493',linewidth =3,marker='o')
plt.legend(['Yellow Cab', 'Pink Cab'],fontsize=20)
plt.title('Yearly Demand of each Company',fontsize=30)
plt.ylabel('Demand [No. of Trips]',fontsize=25)
plt.xlabel('Year',fontsize=25)
plt.xticks(rotation=0,fontsize=20)
plt.yticks(rotation=0,fontsize=20)

plt.show()

Yearly demand of Yellow cab company is maintained nearly 4 times greater than the demand of Pink cab company. 

#### Monthly Demand

In [87]:
plt.figure(figsize=(30,12))

yellow_cab['Date of Travel'].value_counts().resample('m').sum().plot.line(color = '#FFD801',label='Yellow Cab Company',linewidth=3, marker='o')
pink_cab['Date of Travel'].value_counts().resample('m').sum().plot.line(color = '#FF1493',label='Pink Cab Company',linewidth=3,marker='o')
plt.legend(['Yellow Cab', 'Pink Cab'],fontsize=20)
plt.title('Monthly historical demand of each Compny',fontsize=30)
plt.ylabel('Demand [Nº of Trips]',fontsize=25)
plt.xlabel('Month',fontsize=25)
plt.xticks(rotation=0,fontsize=20)
plt.yticks(rotation=0,fontsize=20)
plt.show()

There is a seasonality of demand changing throughout the year.

#### Monthly Average Demand

In [88]:
dpm=master_df.groupby(['Company','Month of Travel'])['Transaction ID'].agg(['count'])
dpm=dpm.reset_index(level='Month of Travel', col_level=1)
dpm=dpm.reset_index(level='Company', col_level=1)
dpm = dpm.sort_values(by='count', ascending= False)
dpm.head()

Function to target the bars in terms of just numbers (not in $ as in Profit section)

In [89]:
def add_value_labels1(ax, spacing=5):
    """Add labels to the end of each bar in a bar chart.

    Arguments:
        ax (matplotlib.axes.Axes): The matplotlib object containing the axes
            of the plot to annotate.
        spacing (int): The distance between the labels and the bars.
    """

    # For each bar: Place a label
    for rect in ax.patches:
        # Get X and Y placement of label from rect.
        y_value = rect.get_height()
        x_value = rect.get_x() + rect.get_width() / 2

        # Number of points between bar and label. Change to your liking.
        space = spacing
        # Vertical alignment for positive values
        va = 'bottom'

        # If value of bar is negative: Place label below bar
        if y_value < 0:
            # Invert space to place label below
            space *= -1
            # Vertically align label at top
            va = 'top'

        # Use Y value as label and format number with one decimal place
        label = "{:.0f}".format(y_value)

        # Create annotation
        ax.annotate(
            label,                      # Use `label` as label
            (x_value, y_value),         # Place label at end of the bar
            xytext=(0, space),          # Vertically shift label by `space`
            textcoords="offset points", # Interpret `xytext` as offset in points
            ha='center',                # Horizontally center label
            va=va, fontsize=12)                      # Vertically align label differently for
                                        # positive and negative values.


# Call the function above. All the magic happens there.
add_value_labels1(ax)

In [90]:
g=sns.catplot(x='Month of Travel',y='count',data=dpm,kind='bar',hue='Company',palette=sns.color_palette(['#FFD801', '#FF1493']), height=8.27, aspect=11.7/8.27)

ax = g.facet_axis(0,0)
for p in ax.patches:
    ax.text(p.get_x() + 0.015, 
            p.get_height() * 1.02, 
            "{:.0f}".format(p.get_height()), 
            color='black', rotation='horizontal', size='medium')
plt.title('Monthly average demand of each Compny',fontsize=30)
plt.ylabel('Demand [No. of Trips]',fontsize=25)
plt.xlabel('Month',fontsize=25)
plt.xticks(rotation=0,fontsize=20)
plt.yticks(rotation=0,fontsize=20)
plt.show()

The highest demand is in December.

### Demand Agewise

In [91]:
a1=master_df[(master_df['Age']>18) & (master_df['Age']<25)]
a2=master_df[(master_df['Age']>26) & (master_df['Age']<40)]
a3=master_df[(master_df['Age']>41) & (master_df['Age']<60)]
a4=master_df[(master_df['Age']>60)]

a11= a1.groupby(['Company','Year of Travel'])['Customer ID'].agg(['count'])
a11.columns=['Total Customers']
a11=a11.reset_index(level='Year of Travel', col_level=1)
a11=a11.reset_index(level='Company', col_level=1)
a11['Class']= '18-25'

a22= a2.groupby(['Company','Year of Travel'])['Customer ID'].agg(['count'])
a22.columns=['Total Customers']
a22=a22.reset_index(level='Year of Travel', col_level=1)
a22=a22.reset_index(level='Company', col_level=1)
a22['Class']= '26-40'

a33= a3.groupby(['Company','Year of Travel'])['Customer ID'].agg(['count'])
a33.columns=['Total Customers']
a33=a33.reset_index(level='Year of Travel', col_level=1)
a33=a33.reset_index(level='Company', col_level=1)
a33['Class']= '40-60'

a44= a4.groupby(['Company','Year of Travel'])['Customer ID'].agg(['count'])
a44.columns=['Total Customers']
a44=a44.reset_index(level='Year of Travel', col_level=1)
a44=a44.reset_index(level='Company', col_level=1)
a44['Class']= '60+'

agegroup=a11
agegroup=pd.concat([agegroup,a22,a33,a44])
agegroup['Total Customers']=agegroup['Total Customers'].astype(int)
agegroup.head(5)

In [92]:
g=sns.catplot(x='Year of Travel',y='Total Customers',data=agegroup,col= 'Company',kind='bar',hue='Class',palette = 'rocket', height=9, aspect=1.2)

ax = g.facet_axis(0,0)
for p in ax.patches:
    ax.text(p.get_x() + 0.015, 
            p.get_height() * 1.02, 
            "{:.0f}".format(p.get_height()), 
            color='black', rotation='horizontal', size='medium')
ax = g.facet_axis(0,1)
for p in ax.patches:
    ax.text(p.get_x() + 0.015, 
            p.get_height() * 1.02, 
            "{:.0f}".format(p.get_height()), 
            color='black', rotation='horizontal', size='medium')
plt.show()

Each year both companies have more customers in the age class of 20-40 years.

<a id="7"></a> <br>
## 7. Clients Analysis

#### Loyalty Rates

To analyze the Loyalty rates, Let's define 2 classes:

**1. Medium loyalty Customers-->**   Customers who took more than 10 rides yearly.<br>
**2. High loyalty Customers-->**     Customers who took more than 10 rides monthly

In [93]:
years=[2016,2017,2018]
c10r_y=[]
c10r_p=[]
for year in years:
  yellow_year=master_df[(master_df['Year of Travel']==year) & (master_df['Company']=='Yellow Cab')].groupby('Customer ID')['Company'].agg(['count'])
  yellow_year.reset_index(inplace=True)
  yellow_year[yellow_year['count']>=10]
  c10r_y.append(len(yellow_year))
  pink_year=master_df[(master_df['Year of Travel']==year) & (master_df['Company']=='Pink Cab')].groupby('Customer ID')['Company'].agg(['count'])
  pink_year.reset_index(inplace=True)
  pink_year[pink_year['count']>=10]
  c10r_p.append(len(pink_year))

plt.figure(figsize=(12,9))
X = np.arange(3)
plt.bar(X+0.00,c10r_y,color='#FFD801', label='Yellow Cab', width= 0.25,edgecolor='black')
plt.bar(X+0.25,c10r_p,color='#FF1493',label='Pink Cab',width = 0.25,edgecolor='black')
plt.xticks(X+0.15,['2016','2017','2018'])
leg=plt.gca().legend(loc='center left',bbox_to_anchor = (1,0.5))
plt.setp(leg.get_texts(), color='black')
plt.title('Medium loyalty Customers',fontsize=30)
plt.ylabel('No. of Clients')
plt.xlabel('Year')
plt.show()

In [94]:
months=[1,2,3,4,5,6,7,8,9,10,11,12]
c10r_ym=[]
c10r_pm=[]
for month in months:
  yellow_year=master_df[(master_df['Month of Travel']==month) & (master_df['Company']=='Yellow Cab')].groupby('Customer ID')['Company'].agg(['count'])
  yellow_year.reset_index(inplace=True)
  yellow_year[yellow_year['count']>=10]
  c10r_ym.append(len(yellow_year))
  pink_year=master_df[(master_df['Month of Travel']==month) & (master_df['Company']=='Pink Cab')].groupby('Customer ID')['Company'].agg(['count'])
  pink_year.reset_index(inplace=True)
  pink_year[pink_year['count']>=10]
  c10r_pm.append(len(pink_year))

plt.figure(figsize=(12,9))
X = np.arange(12)
plt.bar(X+0.00,c10r_ym,color='#FFD801', label='Yellow Cab', width= 0.25,edgecolor='black')
plt.bar(X+0.25,c10r_pm,color='#FF1493',label='Pink Cab',width = 0.25,edgecolor='black')
plt.xticks(X+0.15,['JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC'])
leg=plt.gca().legend(loc='center left',bbox_to_anchor = (1,0.5))
plt.setp(leg.get_texts(), color='black')
plt.title(' High loyalty Customers',fontsize=20)
plt.ylabel('No. of Clients')
plt.xlabel('Year')
plt.show()

In [95]:
c10r_ym=pd.DataFrame(c10r_ym)
c10r_pm=pd.DataFrame(c10r_pm)

In [96]:
c10r_pm.columns=c10r_pm.columns=['High Loyalty Clients']
c10r_ym.columns=c10r_ym.columns=['High Loyalty Clients']

In [97]:
c10r_ym['x_index'] = c10r_ym.index
c10r_ym.head()

In [98]:
c10r_pm['x_index'] = c10r_pm.index
c10r_pm.head()

In [99]:
g=sns.catplot(x='x_index',y='High Loyalty Clients',data=c10r_ym,kind='bar',palette = 'rocket', height=8.27, aspect=11.7/8.27)

ax = g.facet_axis(0,0)
for p in ax.patches:
    ax.text(p.get_x() + 0.115, 
            p.get_height() * 1.02, 
            "{:.0f}".format(p.get_height()), 
            color='black', rotation='horizontal', size='small')
plt.title("Yellow Cab - High Loyalty Clients",fontsize=20)
plt.ylabel('No. of Clients')
plt.xlabel('Month')
plt.xticks(X+0.15,['JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC'])
plt.show()

In [100]:
g=sns.catplot(x= 'x_index',y='High Loyalty Clients',data=c10r_pm,kind='bar',palette = 'rocket', height=8.27, aspect=11.7/8.27)

ax = g.facet_axis(0,0)
for p in ax.patches:
    ax.text(p.get_x() + 0.115, 
            p.get_height() * 1.02, 
            "{:.0f}".format(p.get_height()), 
            color='black', rotation='horizontal', size='small')
plt.title("Pink Cab - High Loyalty Clients",fontsize=20)
plt.ylabel('No. of Clients')
plt.xlabel('Month')
plt.xticks(X+0.15,['JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC'])
plt.show()

YellowCab Company is doing better in both classes of Loyalty Rates.

### Payment Mode Distribution

#### Payment Mode Distribution Yearly

In [101]:
u=master_df.groupby(['Year of Travel'])['Transaction ID'].agg(['count'])
u.columns = ['RidesPerYear']
u=u.reset_index(level='Year of Travel', col_level=1)
payment=master_df.groupby(['Year of Travel','Payment_Mode'])['Transaction ID'].agg(['count'])
payment=payment.reset_index(level='Year of Travel', col_level=1)
payment=payment.reset_index(level='Payment_Mode', col_level=1)
payment=payment.merge(u,on= 'Year of Travel')
payment.head()
payment1=payment
payment1['per']=payment1['count']/payment1['RidesPerYear']
payment.head()

In [102]:
g=sns.catplot(x='Year of Travel',y='per',data=payment1,kind='bar',hue='Payment_Mode',palette="rocket", height=8.27, aspect=11.7/8.27)
plt.title('Percentage of Payment Mode Yearly',fontsize=20)
plt.xlabel('Year')
plt.ylabel('Percentage')
ax = g.facet_axis(0,0)
for p in ax.patches:
    ax.text(p.get_x() + 0.115, 
            p.get_height() * 1.02, 
            "{:.2f}%".format(p.get_height()*100), 
            color='black', rotation='horizontal', size='large')
plt.show()

Minimal deviations in the payment method are observed over time.

#### Payment Mode Distribution citywise

In [103]:
payment2=master_df.groupby(['Year of Travel','Payment_Mode','City'])['Transaction ID'].agg(['count'])
payment2=payment2.reset_index(level='Year of Travel', col_level=1)
payment2=payment2.reset_index(level='Payment_Mode', col_level=1)
payment2=payment2.reset_index(level='City', col_level=1)
payment2 = payment2.sort_values(by='count', ascending= False )
payment2.head()

In [104]:
fig, ax = plt.subplots(figsize = (15, 10))

sns.barplot(x='City',y='count',data=payment2,hue='Payment_Mode',palette="rocket")
plt.title('No. of Payments citywise',fontsize=20)
plt.xlabel('City')
plt.ylabel('No. of Transactions')
plt.xticks(rotation=70)
plt.show()

Minimal deviations of the payment method are observed in each city.

#### Payment Mode Distribution agewise

In [105]:
a1=master_df[(master_df['Age']>18) & (master_df['Age']<25)]
a2=master_df[(master_df['Age']>26) & (master_df['Age']<40)]
a3=master_df[(master_df['Age']>41) & (master_df['Age']<60)]
a4=master_df[(master_df['Age']>60)]

a11= a1.groupby(['Payment_Mode'])['Customer ID'].agg(['count'])
a11.columns=['cash/card']
a11=a11.reset_index(level='Payment_Mode', col_level=1)
a11['Class']= '18-25'


a22= a2.groupby(['Payment_Mode'])['Customer ID'].agg(['count'])
a22.columns=['cash/card']
a22=a22.reset_index(level='Payment_Mode', col_level=1)
a22['Class']= '26-40'


a33= a3.groupby(['Payment_Mode'])['Customer ID'].agg(['count'])
a33.columns=['cash/card']
a33=a33.reset_index(level='Payment_Mode', col_level=1)
a33['Class']= '40-60'


a44= a4.groupby(['Payment_Mode'])['Customer ID'].agg(['count'])
a44.columns=['cash/card']
a44=a44.reset_index(level='Payment_Mode', col_level=1)
a44['Class']= '60+'

payage=a11
payage=pd.concat([payage,a22,a33,a44])
payage.head()

x=payage.groupby('Class').sum()
x.columns=['Total']
x=x.reset_index(level='Class', col_level=1)
x.head()

payage=payage.merge(x,on='Class')
payage['percentage'] = round(((payage['cash/card']/payage['Total'])*100),2)
payage.head()

In [106]:
plt.figure(figsize=(15,9))
g=sns.catplot(x='Payment_Mode',y='percentage',data=payage,kind='bar',hue='Class', palette='rocket', height=8.27, aspect=11.7/8.27)
ax = g.facet_axis(0,0)
for p in ax.patches:
    ax.text(p.get_x() + 0.035, 
            p.get_height() * 1.02, 
            "{:.2f}%".format(p.get_height()), 
            color='black', rotation='horizontal', size='small')
plt.title('Percentage of Payment Mode Agewise',fontsize=20)
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.show()

Minimal deviations of the payment method are observed with respect to the age of the clients.

<a id="8"></a> <br>
## 8. Conclusion

**1. How does the profit change over time?**<br>

- Accoding to the analysis, Yellow Cab Company's earnings are more stable than Pink Cab Company's earnings.
- Profit per ride of Yellow cab company is higher than Pink cab company over three years.
- Profit per Km decreases over time in both companies.
- Profit per Km in Yellow cab company is higher than Pink cab company for each and every month.

**2. How does the percentage of profitable trips change by the city?**<br>

- Profitability percentage of rides change by cities.
- Yellow Cab has greater market share in every City.

**3. How does average profit change by holidays?**<br>

- In Yellow cabs, the average profit per holiday is less than the average profit per normal day. 
- Pink cab company have gained a bit of higher profit on holidays than normal days.

**4. How does the demand of the cab industry change over time?**<br>

- Yearly demand of Yellow cab company is about 4 times greater than yearly demand of Pink cab company. 
- There is a seasonality of demand changing throughout the year in both cab companies.

**5. How the demand varies according to age?**<br>

- Each year both companies have more customers in the age class of 20-40 years.

**6.Loyalty of customers<br>**

- YellowCab Company is doing better in both classes of Loyalty Rates.

**7. Fluctuations of payment methods**<br>

- Minimal deviations in the payment method are observed over time.
- Minimal deviations of the payment method are observed in each city.
- Minimal deviations of the payment method are observed with respect to the age of the clients.


### According to the overall analysis Yellow cab company is better than the Pink cab company for investing.