## Introduction
The Datathon is a weekend-long competition where you are challenged to work on a real-world business case from different areas of Machine Learning, AI, and Data Science.

## Context
Argo Solutions - A leading technology company in Latin America, developing solutions to facilitate expense management and corporate travel using technology as an enabler of these processes. ​Our team is committed in simplifying our customers' routine, providing an efficient, innovative and seamless experience.​

## Challenge
In this competition, we provided a dataset simulating real corporate travel systems - focusing on flights and hotels.
Competitors must analyze this set with over one thousand users and 250 thousand travels to produce insights. How can Argo offer the best travel experience for its customers? Explore, invent and surprise us! See an online BI report.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## Framework
* Product Summary
* Metrics to improve
* Use Cases 
* Solutions
* Validation Metrics

First I will describe what I think Argos does, who the customers are, what issues the product has, and suggest a few ideas on how to solve them.  

## Product Summary
* Argo is an online travel agent like expedia, agoda, trip.com
* It's target audience are corporate travellers e.g. business travel
* The app provides platform for flights and hotel bookings e.g. skyscanner + booking.com

## Metrics to improve
* Customer base
* Revenue
* Retention
* Conversion
* User engagement

These are some example high level business metrics that will correlate with customer experience. For the purposes of this analysis, I would like to focus on improving conversion as I can see an [alarming decline in travel bookings](https://datastudio.google.com/u/0/reporting/14gCXooYzrbL5WUnspGlBTDriW_jZvXgY/page/JuJ1?s=nRM_CsIvVuI) over time!

## Use Cases
* We saw earlier that travel bookings have steadily declined over time from triple digits in 2019 to single digits in 2023. This is highly problematic and I would like to diagnose the root cause behind this decline in travel bookings. 
* From [exploratory data analysis](https://public.tableau.com/profile/malcolmng#!/vizhome/ArgoSolutions/Dashboard), we see that the decline in bookings (and revenue) is happening at the same rate across all dimensions regardless of company, agency, gender, fare class, origin, and destination. Further analysis across different measures show that number of users (customer base) have also been declining. Therefore we can conclude that the decline in sales is driven by lower volume, which in turn is driven by a dwindling customer base. We need to address this issue across all customer segments and flights.
* Let's assume one possible use case is that business travellers need to book hotels together with flights. This is quite a reasonable assumption as most companies do allow travel to be expensed. In this case, it would be useful to dive into all paired bookings and its trend over time.

## Solutions
Possible solutions to improve paired bookings (flights + hotels) customer experience prioritized according to its impact on our selected business objective of conversions (bookings) vs effort (actionable):
* attractive cta for bundled hotel + flight packages (with discount) on home screen (top of funnel) a la expedia
* prompt user with recommended list of hotels at destination upon checkout completion for flight only booking i.e. airlines
* force all users to make bundled hotel + flight packages bookings i.e. travel agency

## Validation Metrics
Since our goal is to improve conversions, I would use the following success metrics to measure the success of my solution:
* total number of bookings over time
* proportion of bookings that are hotel + flights bundles vs flights only over time
* various drop-offs rates along the funnel from home -> flights -> hotels -> checkout etc.

I would either perform pre/post analysis or a/b test my solution on a selected sample of users to test my solution

In [None]:
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
print("Setup Complete")

import numpy as np
from datetime import datetime

In [None]:
# import data tables
users = pd.read_csv('/kaggle/input/argodatathon2019/users.csv')
hotels = pd.read_csv('/kaggle/input/argodatathon2019/hotels.csv')
flights = pd.read_csv('/kaggle/input/argodatathon2019/flights.csv')

In [None]:
users.head()

In [None]:
hotels.head()

In [None]:
flights.head()

In [None]:
# join tables
df = pd.merge(flights, hotels, how='inner', on=['userCode', 'travelCode'])
df.head()

In [None]:
# data aggregation
df['month'] = pd.to_datetime(df['date_x'], format='%m/%d/%Y').dt.month
df['year'] = pd.to_datetime(df['date_x'], format='%m/%d/%Y').dt.year
# df.groupby(['year', 'month']).count()
df['date'] = pd.to_datetime(df['date_x'], format='%m/%d/%Y')
df.set_index('date', inplace=True)

In [None]:
# visualize 
bookings = df.resample('1m').nunique()['travelCode']

# Set the width and height of the figure
plt.figure(figsize=(14,6))

# Add title
plt.title("Monthly travel bookings (flight + hotel)")

# Line chart showing monthly number of paired travel bookings
sns.lineplot(data=bookings)

# Add label for horizontal axis
plt.xlabel(bookings.index.name)