Data Mining and Machine Learning Project

In this project, we used the techniques learned throughout the data mining and machine learning courses and applied them to the Hotel dataset. We included in the Github repository the datasets used and notebooks containing the code for the models.

Dataset Description:

One of the hotels (H1) is a resort hotel and the other is a city hotel (H2). The dataset has 31 variables describing the 40,060 observations of H1 and 79,330 observations of H2. Each observation represents a hotel booking. Since this is hotel real data, all data elements pertaining hotel or costumer identification were deleted.

Project Phases:

Phase 1: Data Preprocessing:

We used pandas to construct new features and turn categorical variables into dummy variables.
We created a dataset "hotel_data.csv" ready for the machine learning algorithms

Phase 2: Exploratory Data Analysis:

We used pandas to create summary statistics of date variables, categorical variables and then numeric and integer variables.
We created a plot of the distribution of hotel type and cancellations, then a plot of the distribution of cancellations and number of adults

Phase 3: Classification:

We trained three classification models; logistic regression, support vector machine and K nearest neighbors to predict cancellations.
KNN had the best performance with an accuracy score of 96% and 96% precision.

Phase 4: Clustering using K-Means:

We used K-Means algorithm for customer segmentation using three variables to see which segment presents more potential profit for the hotel. The number of clusters that gave the best silhouette score is 4.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.ipynb_checkpoints		.ipynb_checkpoints
K-means.ipynb		K-means.ipynb
KNN.ipynb		KNN.ipynb
LogisticReg.ipynb		LogisticReg.ipynb
Part1.ipynb		Part1.ipynb
README.md		README.md
SVMclassifier.ipynb		SVMclassifier.ipynb
hotel_bookings.csv		hotel_bookings.csv
hotel_data.csv		hotel_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Mining and Machine Learning Project

Project Phases:

Phase 1: Data Preprocessing:

Phase 2: Exploratory Data Analysis:

Phase 3: Classification:

Phase 4: Clustering using K-Means:

About

Contributors 2

Languages

safaena123/DM_ML_Project

Folders and files

Latest commit

History

Repository files navigation

Data Mining and Machine Learning Project

Project Phases:

Phase 1: Data Preprocessing:

Phase 2: Exploratory Data Analysis:

Phase 3: Classification:

Phase 4: Clustering using K-Means:

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages