# Project Title
## Tourism Experience Analytics Using Regression, Classification and Recommendation System

## Github Link for this project :- https://github.com/shabbu8111999/Tourism_Experience_Analytics

## Problem Statement
### Tourism platforms generate large volumes of data related to user visits, attraction details, travel modes, and ratings. However, this data is often underutilized and stored across multiple datasets, making it difficult to extract meaningful insights and provide personalized experiences to users. The objective of this project is to analyze tourism-related data by integrating multiple datasets into a single consolidated dataset and applying machine learning techniques to:

### - Predict the rating a user is likely to give to a tourist attraction.

### - Classify the userâ€™s visit mode (such as Business, Family, Couples, or Friends) based on historical travel behavior and demographic information.

### - Recommend tourist attractions to users based on their preferences and past interactions.

### By leveraging data cleaning, exploratory data analysis, and machine learning models, this project aims to help tourism businesses improve customer satisfaction, enable data-driven decision-making, and deliver personalized attraction recommendations.

## Importing Necessary Libraries

In [1]:
# For System Operations
import os
import sys

# For Data Manipulation
import numpy as np
import pandas as pd

# For Visualization
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
sns.set(style="whitegrid")
plt.style.use("seaborn-v0_8")

# Statistical Analysis and Hypothesis Testing
from scipy import stats
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Data Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import (
    LabelEncoder,
    OneHotEncoder,
    StandardScaler
)

# Model Evaluation
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (
    mean_squared_error,
    r2_score,
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    classification_report,
    confusion_matrix
)

# Utilities
from tqdm import tqdm

import warnings
warnings.filterwarnings('ignore')

In [2]:
# Moving one level up to the project root folder
os.path.abspath('../')

'c:\\Users\\H P\\Desktop\\Tourism_Experience_Analytics'

## Dataset Loading and Initial Inspection

### Loading all the Datasets

In [4]:
# Loading Data
user_df = pd.read_excel('../data/User.xlsx')
transaction_df = pd.read_excel('../data/Transaction.xlsx')
type_df = pd.read_excel('../data/Type.xlsx')
region_df = pd.read_excel('../data/Region.xlsx')
mode_df = pd.read_excel('../data/Mode.xlsx')
item_df = pd.read_excel('../data/Item.xlsx')
country_df = pd.read_excel('../data/Country.xlsx')
continent_df = pd.read_excel('../data/Continent.xlsx')
city_df = pd.read_excel('../data/City.xlsx')

print("Datasets Loaded Successfully")

Datasets Loaded Successfully


#### Datasets have loaded successfully, I used openpyxl package to load the .xlsx files.

### Structural Inspection

In [5]:
# Dataset's Shapes

datasets = {
    "User": user_df,
    "Transaction": transaction_df,
    "Type": type_df,
    "Region": region_df,
    "Mode": mode_df,
    "Item": item_df,
    "Country": country_df,
    "Continent": continent_df,
    "City": city_df
}

for name, df in datasets.items():
    print(f"{name} Dataset Shape: {df.shape}")

User Dataset Shape: (33530, 5)
Transaction Dataset Shape: (52930, 7)
Type Dataset Shape: (17, 2)
Region Dataset Shape: (22, 3)
Mode Dataset Shape: (6, 2)
Item Dataset Shape: (30, 5)
Country Dataset Shape: (165, 3)
Continent Dataset Shape: (6, 2)
City Dataset Shape: (9143, 3)


#### There are total of 9 datasets and each datasets have different rows and columns, some datasets have less number of rows and columns compared to the most numbers like User Data and Transaction Data.

### Column Level Inspection

In [6]:
# Column wise overview of each dataset
for name, df in datasets.items():
    print(f"\n{name} Dataset Columns:")
    print(df.columns.tolist())


User Dataset Columns:
['UserId', 'ContinentId', 'RegionId', 'CountryId', 'CityId']

Transaction Dataset Columns:
['TransactionId', 'UserId', 'VisitYear', 'VisitMonth', 'VisitMode', 'AttractionId', 'Rating']

Type Dataset Columns:
['AttractionTypeId', 'AttractionType']

Region Dataset Columns:
['Region', 'RegionId', 'ContinentId']

Mode Dataset Columns:
['VisitModeId', 'VisitMode']

Item Dataset Columns:
['AttractionId', 'AttractionCityId', 'AttractionTypeId', 'Attraction', 'AttractionAddress']

Country Dataset Columns:
['CountryId', 'Country', 'RegionId']

Continent Dataset Columns:
['ContinentId', 'Continent']

City Dataset Columns:
['CityId', 'CityName', 'CountryId']


#### These are the Column names of each and every datasets, I used Loop to simply show the dataset names in a list format easy to visualize.

### Sample Rows

In [7]:
# First few rows of each dataset
for name, df in datasets.items():
    print(f"\n{name} Dataset Preview:")
    display(df.head())


User Dataset Preview:


Unnamed: 0,UserId,ContinentId,RegionId,CountryId,CityId
0,14,5,20,155,220.0
1,16,3,14,101,3098.0
2,20,4,15,109,4303.0
3,23,1,4,22,154.0
4,25,3,14,101,3098.0



Transaction Dataset Preview:


Unnamed: 0,TransactionId,UserId,VisitYear,VisitMonth,VisitMode,AttractionId,Rating
0,3,70456,2022,10,2,640,5
1,8,7567,2022,10,4,640,5
2,9,79069,2022,10,3,640,5
3,10,31019,2022,10,3,640,3
4,15,43611,2022,10,2,640,3



Type Dataset Preview:


Unnamed: 0,AttractionTypeId,AttractionType
0,2,Ancient Ruins
1,10,Ballets
2,13,Beaches
3,19,Caverns & Caves
4,34,Flea & Street Markets



Region Dataset Preview:


Unnamed: 0,Region,RegionId,ContinentId
0,-,0,0
1,Central Africa,1,1
2,East Africa,2,1
3,North Africa,3,1
4,Southern Africa,4,1



Mode Dataset Preview:


Unnamed: 0,VisitModeId,VisitMode
0,0,-
1,1,Business
2,2,Couples
3,3,Family
4,4,Friends



Item Dataset Preview:


Unnamed: 0,AttractionId,AttractionCityId,AttractionTypeId,Attraction,AttractionAddress
0,369,1,13,Kuta Beach - Bali,Kuta
1,481,1,13,Nusa Dua Beach,"Semenanjung Nusa Dua, Nusa Dua 80517 Indonesia"
2,640,1,63,Sacred Monkey Forest Sanctuary,"Jl. Monkey Forest, Ubud 80571 Indonesia"
3,650,1,13,Sanur Beach,Sanur
4,673,1,13,Seminyak Beach,Seminyak



Country Dataset Preview:


Unnamed: 0,CountryId,Country,RegionId
0,0,-,0
1,1,Cameroon,1
2,2,Chad,1
3,3,Rwanda,1
4,4,Ethiopia,2



Continent Dataset Preview:


Unnamed: 0,ContinentId,Continent
0,0,-
1,1,Africa
2,2,America
3,3,Asia
4,4,Australia & Oceania



City Dataset Preview:


Unnamed: 0,CityId,CityName,CountryId
0,0,-,0
1,1,Douala,1
2,2,South Region,1
3,3,N'Djamena,2
4,4,Kigali,3


#### These are the First Five rows of every datasets including every column names and information.

In [8]:
# Last few rows of each dataset
for name, df in datasets.items():
    print(f"\n{name} Dataset Last 5 Rows:")
    display(df.tail())


User Dataset Last 5 Rows:


Unnamed: 0,UserId,ContinentId,RegionId,CountryId,CityId
33525,88179,5,21,162,7833.0
33526,88185,3,12,80,2534.0
33527,88187,3,12,88,2604.0
33528,88189,5,17,131,6129.0
33529,88190,5,21,159,7494.0



Transaction Dataset Last 5 Rows:


Unnamed: 0,TransactionId,UserId,VisitYear,VisitMonth,VisitMode,AttractionId,Rating
52925,211227,87100,2018,9,2,1297,4
52926,211238,88112,2016,2,2,1297,5
52927,211239,88112,2016,2,2,1297,4
52928,211240,88112,2016,2,2,1297,4
52929,211241,88112,2016,2,2,1297,5



Type Dataset Last 5 Rows:


Unnamed: 0,AttractionTypeId,AttractionType
12,82,Spas
13,84,Speciality Museums
14,91,Volcanos
15,92,Water Parks
16,93,Waterfalls



Region Dataset Last 5 Rows:


Unnamed: 0,Region,RegionId,ContinentId
17,Central Europe,17,5
18,Eastern Europe,18,5
19,Northern Europe,19,5
20,Southern Europe,20,5
21,Western Europe,21,5



Mode Dataset Last 5 Rows:


Unnamed: 0,VisitModeId,VisitMode
1,1,Business
2,2,Couples
3,3,Family
4,4,Friends
5,5,Solo



Item Dataset Last 5 Rows:


Unnamed: 0,AttractionId,AttractionCityId,AttractionTypeId,Attraction,AttractionAddress
25,1225,3,2,Ratu Boko Temple,Yogyakarta
26,1238,3,2,Sewu Temple,Yogyakarta
27,1278,3,45,Ullen Sentalu Museum,"Jl. Boyong Taman Wisata, 55581 Indonesia"
28,1280,3,72,Water Castle (Tamansari),"Jl. Taman, 55133 Indonesia"
29,1297,3,44,Yogyakarta Palace,Yogyakarta



Country Dataset Last 5 Rows:


Unnamed: 0,CountryId,Country,RegionId
160,160,Ireland,21
161,161,Monaco,21
162,162,Netherlands,21
163,163,United Kingdom,21
164,164,Yemen,12



Continent Dataset Last 5 Rows:


Unnamed: 0,ContinentId,Continent
1,1,Africa
2,2,America
3,3,Asia
4,4,Australia & Oceania
5,5,Europe



City Dataset Last 5 Rows:


Unnamed: 0,CityId,CityName,CountryId
9138,9138,Yeovil,163
9139,9139,York,163
9140,9140,Yorkshire,163
9141,9141,Zaandam,163
9142,9142,Sanaa,164


#### These are the last five rows of every datasets, including every column names and it's information.