# Implementation of machine learning on Vacation Data

## Import the required Libraries

In [None]:
# Import numpy
import numpy as np

# Import pandas
import pandas as pd

#Import matplotlib
import matplotlib.pyplot as plt

#Import seaborn
import seaborn as sns

### Set Background Color for graphs

In [None]:
bg_color = '#a6ecf5'
graph_color = 'green'

## Load the dataset and perform data overview

In [None]:
# Read the data using pandas
df_vacations_data = pd.read_excel('vacation_data.xlsx')

In [None]:
# Show the first five observations
df_vacations_data.head()

In [None]:
# Show the last five observations
df_vacations_data.tail()

In [None]:
# Show the size of the data
df_vacations_data.shape

**INTERPRETATION**
 - We have 1000 observation
 - We have 32 attributes 

In [None]:
# Show the columns of the data
df_vacations_data.columns

**COLUMN INFO**
- Gender: Gender of the person who is going for vacation (e.g., Male, Female, Non-binary, etc.).
- Age: Age of the individual, likely in years.
- Education: Level of education attained by the individual (e.g., High School, Bachelor’s Degree, Master’s Degree, etc.).
- Occupation: Job or profession of the individual.
- State: State or region where the individual resides.
- Relationship.Status: Relationship status of the individual (e.g., Single, Married, Divorced, etc.).
- Obligation: Indicates some form of obligation (e.g., financial, familial, or other responsibilities). Exact meaning depends on the dataset context.
- Obligation2: Another column related to obligations, possibly a secondary or additional type of obligation.
- NEP: Likely an acronym or specific term related to the dataset. Without additional context, it’s unclear what this represents (e.g., "Net Effective Price," "Non-Exempt Personnel," etc.).
- Vacation.Behaviour**: Describes the individual’s behavior or preferences related to vacations (e.g., frequency, type of vacations, etc.).
- rest and relax: Preference for seeking rest and relaxation during vacations.
- luxury / be spoilt: Preference for luxury or being pampered during vacations.
- do sports: Preference for engaging in sports or physical activities during vacations.
- excitement, a challenge: Preference for seeking excitement or challenges during vacations.
- not exceed planned budget: Preference for staying within a planned budget during vacations.
- realise creativity: Preference for activities that allow the individual to express or realize their creativity during vacations.
- fun and entertainment: Preference for fun and entertainment during vacations.
- good company: Preference for spending vacations with good company (e.g., friends, family).
- health and beauty: Preference for activities related to health and beauty during vacations (e.g., spa treatments, wellness activities).
- free-and-easy-going: Preference for a relaxed, unstructured, or spontaneous vacation style.
- entertainment facilities: Preference for vacations that include access to entertainment facilities (e.g., resorts, theme parks).
- not care about prices: Indicates that the individual does not prioritize cost when planning vacations.
- life style of the local people: Preference for experiencing or learning about the lifestyle of local people during vacations.
- intense experience of nature: Preference for immersive or intense experiences in nature during vacations.
- cosiness/familiar atmosphere: Preference for cozy or familiar environments during vacations.
- maintain unspoilt surroundings: Preference for vacations that prioritize preserving or enjoying unspoiled natural surroundings.
- everything organised: Preference for vacations where everything is pre-organized or planned.
- unspoilt nature/natural landscape: Preference for vacations that focus on unspoiled natural landscapes.
- cultural offers: Preference for vacations that include cultural activities or experiences (e.g., museums, historical sites).
- change of surroundings: Preference for vacations that provide a change of environment or scenery.
- Income(k$): The individual’s income, likely in thousands of dollars (k$).
- Expenditure: The individual’s expenditure, possibly related to vacations or general spending.

In [None]:
# Show the datatype of the data (Information of the data)
df_vacations_data.info()

**INTERPRETATION**
 - We have 7 numerical columns
 - We have 25 categorical columns
 - Space utilized by the data is 250.1+ KB
 - We have null records in the data

In [None]:
# Showcase the basic statistics of the data
df_vacations_data.describe().T

**INTERPRETATION**
 - We don't have outliers in the data

## Data Preprocessing

### Data Cleaning

In [None]:
# Show first five observations of the data
df_vacations_data.head()

#### Perform the renaming action

In [None]:
# Show the columns in the data
df_vacations_data.columns

**INTERPRETATION**
 - Column names are already in good structure so there is no need of renaming

### Null value treatment

In [None]:
# Fetch the null records present in the data (count)
df_vacations_data.isnull().sum()

**INTERPRETATION**
 - We have 8 missing records in Education column
 - We have 59 missing records in Occupation column
 - We have 8 missing records in relationship_status column
 - We have 25 missing records in vacation_behaviour column
 - We have 800 missing records in Income_Dollar_k column
 - We have 800 missing records in Expenditure_dollar_k column

In [None]:
# Find the percentage of the missing records
df_vacations_data.isnull().sum()/len(df_vacations_data) * 100

**INTERPRETATION**
 - As per the first law the observations we need to remove are Education, Occupation, relationship_status, vacation_behaviour
 - As per the second law there is no as such column where we need to impute the missing records
 - As per the third law the columns we need to remove from the data are  Income_Dollar_k, Expenditure_dollar_k

In [None]:
# Perform the action of attributes removal
df_vacations_data.drop(['Income(k$)', 'Expenditure'], axis=1, inplace=True)

In [None]:
# Perform the action of observation removal
df_vacations_data.dropna(inplace=True)

In [None]:
# Perform the authentic check
df_vacations_data.isnull().sum()/len(df_vacations_data)*100

In [None]:
# Check the changing number of observations in data due to null value treatment
df_vacations_data.shape

**INTERPRETATION**
 - Droping the null records is not impacting on records present in the data, the difference is only 88 from original

## Null value treatment done

## EDA (Exploratory Data Analysis)

## Univariate Analysis

In [None]:
# Show the first five observations of the data
df_vacations_data.head()

In [None]:
# Segregate the data based on the data type

# Numerical Data
df_numerical = df_vacations_data.select_dtypes(include='number')

# Show the first five observations of Numerical Data
df_numerical.head()

In [None]:
# Categorical Data
df_categorical = df_vacations_data.select_dtypes(include='object')

# Show the first five observations of Categorical Data
df_categorical.head()

### Perform the Univariate Analysis on numerical data

In [None]:
# Show the first five observations of Numerical Data
df_numerical.head()

In [None]:
# Show the columns present in the numerical data
df_numerical.columns

##### Age

In [None]:
# Show the minimum
df_numerical.Age.min()

In [None]:
# Show the maximum
df_numerical.Age.max()

In [None]:
# Show the average(mean)
round(df_numerical.Age.mean(), 3)

In [None]:
# Check the distribution(KDE Plot)
df_numerical.Age.plot(kind='kde', color=graph_color)
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Distribution of Age column')
plt.xlabel('Age')
plt.grid()
plt.show()

**INTERPRETATION**
 - Minimum age of the person going for vacation is 18 and maximum is 105
 - Average age of the person going for vacation is 44.225
 - In visualization we can see multimodal data, mostly we have two age groups who are travelling those are 25 to 50 and 55 to 70

In [None]:
# Show the columns in the numerical data
df_numerical.columns

##### Education

In [None]:
# Show the minimum
df_numerical.Education.min()

In [None]:
# Show the maximum
df_numerical.Education.max()

In [None]:
# Show the average(mean)
round(df_numerical.Education.mean(), 3)

In [None]:
# Check the distribution(KDE Plot)
df_numerical.Education.plot(kind='kde', color=graph_color)
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Distribution of Education column')
plt.xlabel('Education')
plt.grid()
plt.show()

**INTERPRETATION**
 - Minimum Education Of The Person Who Is Going To Vacation Is 1.000 .
 - Maximum Education Of The Person Who Is Going To Vacation Is 8.000 .
 - Average Education Of The Person Who Is Going To Vacation Is 4.884 .
 - In Visualization We See The Multi-Modal Data.(By Two Peak Values)
 - Mostly We Have Two Education Groups Who Are Traveling, Those Are From 2-4 & 6-8.

##### obligation_rating

In [None]:
# Show the minimum
df_numerical.obligation_rating.min()

In [None]:
# Show the maximum
df_numerical.obligation_rating.max()

In [None]:
# Show the average(mean)
round(df_numerical.obligation_rating.mean(), 3)

In [None]:
# Check the distribution(KDE Plot)
df_numerical.obligation_rating.plot(kind='kde', color=graph_color)
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Distribution of obligation_rating column')
plt.xlabel('Obligation Rating')
plt.grid()
plt.show()

**INTERPRETATION**
 - Minimum Obligation_Rating Of The Person Who Is Going To Vacation Is 1.000 .
 - Maximum Obligation_Rating Of The Person Who Is Going To Vacation Is 5.000 .
 - Average Obligation_Rating Of The Person Who Is Going To Vacation Is 3.735 .
 - In Visualization We See The Multi-Modal Data.(By One Peak Values)
 - Mostly We Have One Obligation_Rating Groups Who Are Traveling, Those Are From 3.0 to 4.6.

##### NEP

In [None]:
# Show the minimum
df_numerical.NEP.min()

In [None]:
# Show the maximum
df_numerical.NEP.max()

In [None]:
# Show the average(mean)
round(df_numerical.NEP.mean(), 3)

In [None]:
# Check the distribution(KDE Plot)
df_numerical.NEP.plot(kind='kde', color=graph_color)
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Distribution of NEP column')
plt.xlabel('NEP')
plt.grid()
plt.show()

**INTERPRETATION**
 - Minimum NEP Of The Person Who Is Going To Vacation Is 1.733 .
 - Maximum NEP Of The Person Who Is Going To Vacation Is 5.000 .
 - Average NEP Of The Person Who Is Going To Vacation Is 3.647 .
 - In Visualization We See The Multi-Modal Data.(By One Peak Values)
 - Mostly We Have One NEP Groups Who Are Traveling, Those Are From 3.0 to 4.3.

##### vacation_behaviour

In [None]:
# Show the minimum
df_numerical.vacation_behaviour.min()

In [None]:
# Show the maximum
df_numerical.vacation_behaviour.max()

In [None]:
# Show the average(mean)
round(df_numerical.vacation_behaviour.mean(), 3)

In [None]:
# Check the distribution(KDE Plot)
df_numerical.vacation_behaviour.plot(kind='kde', color=graph_color)
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Distribution of vacation_behaviour column')
plt.xlabel('Vacation Behaviour')
plt.grid()
plt.show()

**INTERPRETATION**
 - Minimum Vacation_Behaviour Of The Person Who Is Going To Vacation Is 1.392 .
 - Maximum Vacation_Behaviour Of The Person Who Is Going To Vacation Is 4.766 .
 - Average Vacation_Behaviour Of The Person Who Is Going To Vacation Is 2.962 .
 - In Visualization We See The Multi-Modal Data.(By One Peak Values)
 - Mostly We Have One Vacation_Behaviour Groups Who Are Traveling, Those Are From 2-4.

### Univariate analysis on categorical data

In [None]:
# Show the first five observations of categorical data
df_categorical.head()

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### Gender

In [None]:
# Show the count
df_categorical.Gender.value_counts()

In [None]:
# Create a visualization
df_categorical.Gender.value_counts().plot(kind='bar', color=graph_color)
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Count of Gender column')
plt.xlabel('Gender')
plt.grid()
plt.xticks(rotation='0')
plt.show()

**INTERPRETATION**
 - In Gender column we have 482 Male and 430 Female.
 - It represent quite balanced data

In [None]:
# Show the columns present in the categorical dataframe
df_categorical.columns

##### Occupation

In [None]:
# Show the count
df_categorical.Occupation.value_counts()

In [None]:
# Create a visualization
df_categorical.Occupation.value_counts().plot(kind='bar', color=graph_color, figsize=(12,5))
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Count of Occupation column')
plt.xlabel('Occupation')
plt.grid()
plt.show()

**INTERPRETATION**
 - In this data Professional and manager of administrator are dominating that we can see in the graph
 - As per the labourer and transport worker contributing least in the data

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### State

In [None]:
# Show the count
df_categorical.State.value_counts()

In [None]:
# Create a visualization
df_categorical.State.value_counts().plot(kind='bar', color=graph_color, figsize=(10,5))
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Count of State column')
plt.xlabel('States')
plt.grid()
plt.show()

**INTERPRETATION**
 - In this data NSW, VIC and QLD are dominating that we can see in the graph
 - As per the data NT, Tas and ACT are contributing least

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### relationship_status

In [None]:
# Show the count
df_categorical.relationship_status.value_counts()

In [None]:
# Create a visualization
df_categorical.relationship_status.value_counts().plot(kind='bar', color=graph_color, figsize=(12,5))
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Count of relationship_status column')
plt.xlabel('relationship_status')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - As per the data, Mostly married people goes for vacation
 - As per the data, widowed people prefer less to go for vacation

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### obligation_category

In [None]:
# Show the count
df_categorical.obligation_category.value_counts()

In [None]:
# Create a visualization
df_categorical.obligation_category.value_counts().plot(kind='bar', color=graph_color, figsize=(12,5))
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Count of obligation_category column')
plt.xlabel('obligation_category')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - Data is balanced

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### rest_and_relax

In [None]:
# Show the count
df_categorical.rest_and_relax.value_counts()

In [None]:
# Create a visualization
df_categorical.rest_and_relax.value_counts().plot(kind='bar', color=graph_color, figsize=(8,5))
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Count of rest_and_relax column')
plt.xlabel('rest_and_relax')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - Data is imbalanced
 - In this data, 825 are good for 'rest and relax' remaining are not good. 
 - Most places are good for rest and relax

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### luxury_or_be_spoilt

In [None]:
# Show the count
df_categorical.luxury_or_be_spoilt.value_counts()

In [None]:
# Create a visualization
df_categorical.luxury_or_be_spoilt.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of luxury_or_be_spoilt column')
plt.xlabel('luxury_or_be_spoilt')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - Data is imbalanced
 - In this data, 254 are good for 'luxury or be sploit' and remaining are not. 

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### do_sports

In [None]:
# Show the count
df_categorical.do_sports.value_counts()

In [None]:
# Create a visualization
df_categorical.do_sports.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of do_sports column')
plt.xlabel('do_sports')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - Data is imbalanced
 - In this data, most of the people don't want to do sports 

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### excitement_and_challenge

In [None]:
# Show the count
df_categorical.excitement_and_challenge.value_counts()

In [None]:
# Create a visualization
df_categorical.excitement_and_challenge.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of excitement_and_challenge column')
plt.xlabel('excitement_and_challenge')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - Most peoples are not excited for challanges
 - In this data, there are 601 'No' and 311 'Yes'

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### not_exceed_planned_budget

In [None]:
# Show the count
df_categorical.not_exceed_planned_budget.value_counts()

In [None]:
# Create a visualization
df_categorical.not_exceed_planned_budget.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of not_exceed_planned_budget column')
plt.xlabel('not_exceed_planned_budget')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is quite balanced
 - Almost same no of people exceeded and not exceeded the budget

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### realise_creativity

In [None]:
# Show the count
df_categorical.realise_creativity.value_counts()

In [None]:
# Create a visualization
df_categorical.realise_creativity.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of realise_creativity column')
plt.xlabel('realise_creativity')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is imbalanced
 - As per the data, Less number of people realized creativity 

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### fun_and_entertainment

In [None]:
# Show the count
df_categorical.fun_and_entertainment.value_counts()

In [None]:
# Create a visualization
df_categorical.fun_and_entertainment.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of fun_and_entertainment column')
plt.xlabel('fun_and_entertainment')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is well-balanced, indicating a relatively even distribution across the categories.
 - A nearly equal number of people found vacations to be fun and entertaining, suggesting a consistent perception of enjoyment related to vacations.

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### good_company

In [None]:
# Show the count
df_categorical.good_company.value_counts()

In [None]:
# Create a visualization
df_categorical.good_company.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of good_company column')
plt.xlabel('good_company')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is well-balanced, showing a fairly even distribution across the categories.
 - Nearly the same number of people reported good company as those who did not.

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### health_and_beauty

In [None]:
# Show the count
df_categorical.health_and_beauty.value_counts()

In [None]:
# Create a visualization
df_categorical.health_and_beauty.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of health_and_beauty column')
plt.xlabel('health_and_beauty')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is imbalanced.
 - As per the data, Most tourist doesn't focus on their health and beauty.

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### free_and_easy_going

In [None]:
# Show the count
df_categorical.free_and_easy_going.value_counts()

In [None]:
# Create a visualization
df_categorical.free_and_easy_going.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of free_and_easy_going column')
plt.xlabel('free_and_easy_going')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is quite balanced
 - Almost same number of people found their trip to be free and easygoing, while an almost equal number did not

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### entertainment_facilities

In [None]:
# Show the count
df_categorical.entertainment_facilities.value_counts()

In [None]:
# Create a visualization
df_categorical.entertainment_facilities.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of entertainment_facilities column')
plt.xlabel('entertainment_facilities')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is imbalanced.
 - Most tourist felt that there is not good entertainment facilities.

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### not_care_about_prices

In [None]:
# Show the count
df_categorical.not_care_about_prices.value_counts()

In [None]:
# Create a visualization
df_categorical.not_care_about_prices.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of not_care_about_prices column')
plt.xlabel('not_care_about_prices')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is imbalanced.
 - Most tourist don't care about the prices during their trip.

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### life_style_of_the_local_people

In [None]:
# Show the count
df_categorical.life_style_of_the_local_people.value_counts()

In [None]:
# Create a visualization
df_categorical.life_style_of_the_local_people.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of life_style_of_the_local_people column')
plt.xlabel('life_style_of_the_local_people')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is well-balanced
 - An almost equal number of tourists found the local lifestyle as good and not good.

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### intense_experience_of_nature

In [None]:
# Show the count
df_categorical.intense_experience_of_nature.value_counts()

In [None]:
# Create a visualization
df_categorical.intense_experience_of_nature.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of intense_experience_of_nature column')
plt.xlabel('intense_experience_of_nature')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is imbalanced.
 - Most tourists didn't found intense experience of nature.

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### cosiness/familiar_atmosphere

In [None]:
# Show the count
df_categorical['cosiness/familiar_atmosphere'].value_counts()

In [None]:
# Create a visualization
df_categorical['cosiness/familiar_atmosphere'].value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of cosiness/familiar_atmosphere column')
plt.xlabel('cosiness/familiar_atmosphere')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is imbalanced.
 - Most tourists do not feel cosiness or familiar atmosphere during their trip.

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### maintain_unspoilt_surroundings

In [None]:
# Show the count
df_categorical.maintain_unspoilt_surroundings.value_counts()

In [None]:
# Create a visualization
df_categorical.maintain_unspoilt_surroundings.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of maintain_unspoilt_surroundings column')
plt.xlabel('maintain_unspoilt_surroundings')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is imbalanced.
 - Majority of people don't find unsploit surroundings.

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### everything_organised

In [None]:
# Show the count
df_categorical.everything_organised.value_counts()

In [None]:
# Create a visualization
df_categorical.everything_organised.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of everything_organised column')
plt.xlabel('everything_organised')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is imbalanced
 - Most tourist do not organize everything.

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### unspoilt_nature/natural_landscape

In [None]:
# Show the count
df_categorical['unspoilt_nature/natural_landscape'].value_counts()

In [None]:
# Create a visualization
df_categorical['unspoilt_nature/natural_landscape'].value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of unspoilt_nature/natural_landscape column')
plt.xlabel('unspoilt_nature/natural_landscape')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is imbalanced.
 - Most tourist do not want to go unspoilt nature or natural landscape.

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

In [None]:
# Show the count
df_categorical.cultural_offers.value_counts()

In [None]:
# Create a visualization
df_categorical.cultural_offers.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of cultural_offers column')
plt.xlabel('cultural_offers')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is imbalanced
 - Most tourists don't get cultural offers while travelling.

In [None]:
# Show the columns present in categorical data frame
df_categorical.columns

##### change_of_surroundings

In [None]:
# Show the count
df_categorical.change_of_surroundings.value_counts()

In [None]:
# Create a visualization
df_categorical.change_of_surroundings.value_counts().plot(kind='bar', color='green', figsize=(8,5))
ax = plt.gca()
ax.set_facecolor('#a6ecf5')
plt.title('Count of change_of_surroundings column')
plt.xlabel('change_of_surroundings')
plt.xticks(rotation='0')
plt.grid()
plt.show()

**INTERPRETATION**
 - The data is imbalanced
 - Most tourists go for vacation to chaneg surroundings

## Bivaraiate Analysis

In [None]:
# Show first five records of the dataframe
df_vacations_data.head()

In [None]:
# Step 1: Find the correlation matrix
df_numerical.corr()

**INTERPRETATION**
 - There is no such correlation between numerical variables in the given data
 
**NOTE**
 - Just for demonstration we are going to plot correlation between `obligation_rating` and `vacation_behaviour`

##### obligation_rating vs vacation_behaviour

In [None]:
# Plot the scatter between given variables
sns.scatterplot(x=df_numerical.obligation_rating, y=df_numerical.vacation_behaviour, color=graph_color)
plt.show()

### Bivariate analysis on one numeric and one categorical variable

In [None]:
# Show the column present in categorical data frame
df_categorical.columns

In [None]:
# Show the column present in numerical data frame
df_numerical.columns

#### Gender, Age

In [None]:
# Plot the barplot
sns.barplot(x='Gender', y='Age', data=df_vacations_data, color=graph_color)
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Gender v/s Age')
plt.grid()
plt.show()

In [None]:
# Plot the barplot
sns.barplot(x='Gender', y='Age', data=df_vacations_data, estimator=min, color=graph_color)
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Gender v/s Age')
plt.grid()
plt.show()

In [None]:
# Plot the barplot
sns.barplot(x='Gender', y='Age', data=df_vacations_data, estimator=max, color=graph_color)
ax = plt.gca()
ax.set_facecolor(bg_color)
plt.title('Gender v/s Age')
plt.grid()
plt.show()

**INTERPRETATION**
 - In the data we can see that categories are balanced.
 - Minimum age by the gender is quite balanced but maximum age behaving quite unusual. 
 - Maximum age for female is more than 100 and the male age is less than 80.