# TravelTide


## Overview
This report offers a deep dive into your customers' travel behaviors and preferences. By analyzing your extensive data, we aim to uncover what drives your travelers and how to keep them engaged. Our approach segments customers into distinct groups or "traveler tribes," allowing for the customization of rewards and incentives that feel uniquely tailored to each group. This analysis distills vast data into clear, actionable metrics that shed light on travel patterns, service preferences, and responsiveness to promotions. The ultimate goal is to enhance customer loyalty and drive business success by keeping your travelers engaged with your services.

## Approach and Data Analysis
This section outlines how customer segmentation can be applied within TravelTide to create targeted rewards and incentives. By categorizing customers based on their behaviors and preferences, we can significantly improve customer satisfaction and loyalty.

Our objective is to leverage segmentation to deliver personalized perks that boost loyalty, engagement, and overall satisfaction. We aim to offer at least five specific perks:

## Reward Offerings:

-  Exclusive Discounts
- 1 Night Free Hotel with Flight
- No Cancellation Fees
- Free Hotel Meal
- Free Checked Bag

## Data Processing and Metrics Creation
Starting with a dataset of 50,570 sessions, we distilled the information into 5,998 unique customers, using PostgreSQL to generate key metrics. This process was essential for simplifying the raw data into manageable and insightful variables.

Note: The SQL script file is in SQL folder.

## Key Metrics for Analysis
To determine the most suitable perks for each customer, we focused on the following aggregated metrics:

- Flight/Hotel/Both Preferences
- Age
- Age Group
- Conversion Rate
- Cancellation Rate
- Total Sessions
- Total Trips Booked
- Engagement (Click Efficiency)
- User Activity Level
- Average Checked Bags
- Discount Responsiveness
- Proportions of Discounts (Flights/Hotels/Both)
- Average Offers Received
- Flight Hunter Index
- Hotel Hunter Index
  
We are currently refining the calculation of the Flight Hunter Index, which involves determining the distance between geographic points while considering the Earth's shape. To achieve this, we utilized the geopy.distance library’s geodesic function.

## File need to install
- pip install scikit-learn
- pip install geopy


In [8]:
import pandas as pd
import pandas as pd
import os
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from datetime import datetime
# custom functions 
import Support.dbSupport as dbs
import Support.calSupport as cs

## Db Connection setup

In [3]:
current_dir = os.getcwd()

In [4]:
dbs.check_tables()

['users', 'hotels', 'flights', 'sessions']

In [5]:
dbs.table_row_count()

users: 1020926 records
hotels: 1918617 records
flights: 1901038 records
sessions: 5408063 records


In [6]:
##Full_data.sql
sql_file_path_full = os.path.join(current_dir,'SQL','Full_data.sql')
#Combained query
sql_file_path_com = os.path.join(current_dir, 'SQL', 'Combained.sql')



In [7]:
CombainedData = dbs.execute_sql_file(sql_file_path_com)
fullData = dbs.execute_sql_file(sql_file_path_full)

In [9]:
CombainedData.head()

Unnamed: 0,user_id,birthdate,gender,married,has_children,home_country,home_city,age,age_group,latest_session,...,average_hotel_discount,average_flight_discount,flight_discount_proportion,hotel_discount_proportion,both_discount_proportion,discount_responsiveness,total_hotel_usd_spent,total_flight_usd_spent,total_usd_spent,hotel_hunter_index
0,23557,1958-12-08,F,True,False,usa,new york,65.0,65+,2023-07-14,...,0.175,0.15,0.083333,0.166667,0.0,0.083333,563.0,1344.96,1907.96,0.004796
1,94883,1972-03-16,F,True,False,usa,kansas city,52.0,45-54,2023-05-28,...,0.075,0.1,0.083333,0.166667,0.0,0.027778,230.0,5354.86,5584.86,0.0
2,101486,1972-12-07,F,True,True,usa,tacoma,51.0,45-54,2023-07-18,...,0.0,0.075,0.307692,0.0,0.0,0.0,1195.0,5994.28,7189.28,0.0
3,101961,1980-09-14,F,True,False,usa,boston,43.0,35-44,2023-06-22,...,0.1,0.133333,0.25,0.083333,0.0,0.011905,1052.0,1929.2,2981.2,0.0
4,106907,1978-11-17,F,True,True,usa,miami,45.0,45-54,2023-07-27,...,0.2,0.15,0.071429,0.071429,0.0,0.017857,1185.0,27969.63,29154.63,0.001623


In [10]:
CombainedData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5998 entries, 0 to 5997
Data columns (total 32 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   user_id                     5998 non-null   int64  
 1   birthdate                   5998 non-null   object 
 2   gender                      5998 non-null   object 
 3   married                     5998 non-null   bool   
 4   has_children                5998 non-null   bool   
 5   home_country                5998 non-null   object 
 6   home_city                   5998 non-null   object 
 7   age                         5998 non-null   float64
 8   age_group                   5998 non-null   object 
 9   latest_session              5998 non-null   object 
 10  total_trips                 5998 non-null   int64  
 11  total_cancellations         5998 non-null   int64  
 12  total_sessions              5998 non-null   int64  
 13  total_cancellation_rate     5998 

In [11]:
fullData.head()

Unnamed: 0,user_id,trip_id,birthdate,gender,married,has_children,home_country,home_city,sign_up_date,f_discount,...,hotel_per_room_usd,f_destination,f_return_booked,f_timespent,f_checked_bags,home_airport_lat,home_airport_lon,destination_airport_lat,destination_airport_lon,base_fare_usd
0,23557,,1958-12-08,F,True,False,usa,new york,2021-07-22,False,...,,,,NaT,,40.777,-73.872,,,0.0
1,23557,,1958-12-08,F,True,False,usa,new york,2021-07-22,False,...,,,,NaT,,40.777,-73.872,,,0.0
2,23557,,1958-12-08,F,True,False,usa,new york,2021-07-22,False,...,,,,NaT,,40.777,-73.872,,,0.0
3,23557,,1958-12-08,F,True,False,usa,new york,2021-07-22,False,...,,,,NaT,,40.777,-73.872,,,0.0
4,23557,,1958-12-08,F,True,False,usa,new york,2021-07-22,False,...,,,,NaT,,40.777,-73.872,,,0.0


In [None]:
# Apply the distance calculation function to the DataFrame fullData
fullData['distance'] = fullData.apply(cs.calculate_travel_distance, axis=1)