# Tourism Analysis in East Kalimantan during Eid Holiday 2023

In this notebook, we will analyze the tourism data of East Kalimantan during the Eid holiday in 2023. The aim is to understand the trends and anomalies in the data, and identify the favorite tourist destinations. The analysis will be presented in a simple and accessible manner, with charts, graphs, and descriptions to help understand the story behind the data.

## Data Loading

First, we need to load the data from the provided URL. We will use the pandas library, which is a powerful tool for data manipulation and analysis in Python.

In [None]:
import pandas as pd

data_url = 'https://raw.githubusercontent.com/sya2rawie/data-kunjungan/main/DATA%20LIBURAN%20LEBARAN%20Kaltim%202023%20fix.csv'
df = pd.read_csv(data_url)
df.head()

## Data Cleaning

Before we can analyze the data, we need to ensure it's in the right format. We see that the 'JUMLAH WISATAWAN LEBARAN 2023' column is currently a string with commas. We'll convert this to an integer for easier analysis. We'll also rename the columns to English for easier understanding.

In [None]:
# Remove commas and convert to integer
df['JUMLAH WISATAWAN LEBARAN 2023'] = df['JUMLAH WISATAWAN LEBARAN 2023'].str.replace(',', '').astype(int)

# Rename columns
df.columns = ['City', 'Tourist Destination', 'Number of Tourists 2023', 'Description']
df.head()

## Data Analysis

Now that our data is clean, we can start analyzing it. We'll start by looking at the total number of tourists in each city. This will give us an idea of which cities are the most popular.

In [None]:
city_totals = df.groupby('City')['Number of Tourists 2023'].sum().sort_values(ascending=False)
city_totals

From the data, we can see that Kutai Kertanegara is the most visited city, followed by Penajam Paser Utara and Berau. This could be due to the presence of popular tourist destinations in these cities.

Next, let's look at the top 10 tourist destinations across all cities.

In [None]:
top_destinations = df.sort_values('Number of Tourists 2023', ascending=False).head(10)
top_destinations

The table above shows the top 10 tourist destinations in East Kalimantan during the Eid holiday in 2023. The most visited destination is Pantai Panrita Lopi in Kutai Kertanegara, followed by Salo Loang + Pejala (Tanjung Jumlai) in Penajam Paser Utara and PANTI MANGGAR SEGARA SARI in Balikpapan.

Next, let's visualize this data to get a better understanding of the distribution of tourists across these top destinations.

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
plt.barh(top_destinations['Tourist Destination'], top_destinations['Number of Tourists 2023'], color='skyblue')
plt.xlabel('Number of Tourists 2023')
plt.title('Top 10 Tourist Destinations in East Kalimantan during Eid Holiday 2023')
plt.gca().invert_yaxis()
plt.show()

## Distribution of Tourists Across Destinations

To understand the distribution of tourists across all destinations in each city, we'll calculate the average number of tourists per destination in each city. This will give us an idea of whether the tourists are concentrated in a few popular destinations or spread out across many destinations.

In [None]:
average_tourists_per_destination = df.groupby('City')['Number of Tourists 2023'].mean().sort_values(ascending=False)
average_tourists_per_destination

## Visualizing Tourist Numbers and Distribution

We'll create two bar charts: one showing the total number of tourists in each city, and another showing the average number of tourists per destination in each city. This will help us compare the overall popularity of the cities with the distribution of tourists within the cities.

In [None]:
plt.figure(figsize=(14, 6))

plt.subplot(1, 2, 1)
plt.barh(city_totals.index, city_totals, color='skyblue')
plt.xlabel('Total Number of Tourists 2023')
plt.title('Total Number of Tourists in Each City')
plt.gca().invert_yaxis()

plt.subplot(1, 2, 2)
plt.barh(average_tourists_per_destination.index, average_tourists_per_destination, color='skyblue')
plt.xlabel('Average Number of Tourists per Destination')
plt.title('Average Number of Tourists per Destination in Each City')
plt.gca().invert_yaxis()

plt.tight_layout()
plt.show()

## Adding Visit Dates

Since the visit dates are from 23rd to 25th April 2023, we'll add a new column 'Visit Date' to our dataframe. We'll distribute the total number of tourists evenly across these three days for each destination. After that, we'll create a heatmap to visualize the number of tourists visiting each city over these three days.

In [None]:
import numpy as np
from datetime import datetime, timedelta

# Create a new dataframe with visit dates
dates = pd.date_range(start='2023-04-23', end='2023-04-25')
df_dates = pd.DataFrame({'Visit Date': np.repeat(dates, df.shape[0]),
                         'City': np.tile(df['City'], len(dates)),
                         'Tourist Destination': np.tile(df['Tourist Destination'], len(dates)),
                         'Number of Tourists': np.tile(df['Number of Tourists 2023'], len(dates)) / len(dates)})

df_dates.head()

In [None]:
import pandas as pd

data_url = 'https://raw.githubusercontent.com/sya2rawie/data-kunjungan/main/DATA%20LIBURAN%20LEBARAN%20Kaltim%202023%20fix.csv'
df = pd.read_csv(data_url)

# Remove commas and convert to integer
df['JUMLAH WISATAWAN LEBARAN 2023'] = df['JUMLAH WISATAWAN LEBARAN 2023'].str.replace(',', '').astype(int)

# Rename columns
df.columns = ['City', 'Tourist Destination', 'Number of Tourists 2023', 'Description']

In [None]:
import numpy as np
from datetime import datetime, timedelta

# Create a new dataframe with visit dates
dates = pd.date_range(start='2023-04-23', end='2023-04-25')
df_dates = pd.DataFrame({'Visit Date': np.repeat(dates, df.shape[0]),
                         'City': np.tile(df['City'], len(dates)),
                         'Tourist Destination': np.tile(df['Tourist Destination'], len(dates)),
                         'Number of Tourists': np.tile(df['Number of Tourists 2023'], len(dates)) / len(dates)})

df_dates.head()

Now that we have added the visit dates to our data, we can create a heatmap to visualize the number of tourists visiting each city over these three days. For this, we'll use the seaborn library, which provides a function to create heatmaps.

In [None]:
import seaborn as sns

# Pivot the data for the heatmap
heatmap_data = df_dates.pivot_table(index='City', columns='Visit Date', values='Number of Tourists', aggfunc='sum')

# Create the heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(heatmap_data, cmap='YlGnBu', annot=True, fmt='.0f')
plt.title('Number of Tourists Visiting Each City Over Time')
plt.show()

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Pivot the data for the heatmap
heatmap_data = df_dates.pivot_table(index='City', columns='Visit Date', values='Number of Tourists', aggfunc='sum')

# Create the heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(heatmap_data, cmap='YlGnBu', annot=True, fmt='.0f')
plt.title('Number of Tourists Visiting Each City Over Time')
plt.show()

## Boxplot for Tourist Numbers

A boxplot is a standardized way of displaying the distribution of data based on a five number summary ('minimum', first quartile (Q1), median, third quartile (Q3), and 'maximum'). It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.

Let's create a boxplot to visualize the distribution of the number of tourists in each city.

In [None]:
plt.figure(figsize=(10, 8))
sns.boxplot(x='Number of Tourists', y='City', data=df_dates, palette='Set3')
plt.title('Boxplot of Number of Tourists in Each City')
plt.show()

In [None]:
# Check the unique city names in the original dataframe
print('Unique city names in the original dataframe:')
print(df_dates['City'].unique())

# Check the city names in the coordinates dataframe
print('\nCity names in the coordinates dataframe:')
print(city_coordinates['City'].values)

In [None]:
# Create a mapping of city names
city_mapping = {
    'Kutai Kertanegara': 'Kutai Kartanegara Regency',
    'Balikpapan': 'Balikpapan City',
    'Samarinda': 'Samarinda City',
    'Kutai Barat': 'West Kutai Regency',
    'Paser': 'Paser Regency',
    'Penajam Paser Utara': 'North Penajam Paser Regency',
    'Bontang': 'Bontang City',
    'Berau': 'Berau Regency',
    'Mahulu': 'Mahakam Ulu Regency',
    'Kutim': 'East Kutai Regency'
}

# Replace the city names in the original dataframe
df_dates['City'] = df_dates['City'].replace(city_mapping)

# Merge the coordinates dataframe with the original dataframe
df_dates = pd.merge(df_dates, city_coordinates, on='City', how='left')

df_dates.head()

In [None]:
# Check the unique city names in the original dataframe
print('Unique city names in the original dataframe:')
print(df_dates['City'].unique())

# Check the city names in the coordinates dataframe
print('\nCity names in the coordinates dataframe:')
print(city_coordinates['City'].values)

In [None]:
# Drop the existing 'Latitude' and 'Longitude' columns
df_dates = df_dates.drop(columns=['Latitude_x', 'Longitude_x', 'Latitude_y', 'Longitude_y', 'Latitude_x_1', 'Longitude_x_1', 'Latitude_y_1', 'Longitude_y_1'])

# Merge the coordinates dataframe with the original dataframe
df_dates = pd.merge(df_dates, city_coordinates, on='City', how='left')

df_dates.head()

In [None]:
# Check the existing columns in the dataframe
df_dates.columns

In [None]:
# Drop the existing 'Latitude' and 'Longitude' columns
df_dates = df_dates.drop(columns=['Latitude_x', 'Longitude_x', 'Latitude_y', 'Longitude_y'])

# Merge the coordinates dataframe with the original dataframe
df_dates = pd.merge(df_dates, city_coordinates, on='City', how='left')

df_dates.head()

## Creating a Geographical Map

Now that we have added the geographical coordinates to our data, we can create a geographical map to visualize the number of tourists in each city. For this, we'll use the folium library, which provides a function to create interactive maps.

Please note that the map might not display correctly in this notebook due to rendering issues. In that case, you can view the map in a separate web browser.

In [None]:
# Create a new dataframe with city names and their coordinates
city_coordinates = pd.DataFrame({
    'City': ['Berau', 'West Kutai', 'Kutai Kartanegara', 'East Kutai', 'Mahakam Ulu', 'Paser', 'North Penajam Paser', 'Balikpapan', 'Bontang', 'Samarinda'],
    'Latitude': [2.167, -0.167, 0.25, 0.667, 1, -1.667, -1.25, -1.25, 0.167, -0.5],
    'Longitude': [117.583, 115.5, 116.833, 117, 114.5, 116.417, 116.833, 116.833, 117.5, 117.167]
})

# Merge the coordinates dataframe with the original dataframe
df_dates = pd.merge(df_dates, city_coordinates, on='City', how='left')

df_dates.head()

In [None]:
# Check the unique city names in the original dataframe
print('Unique city names in the original dataframe:')
print(df_dates['City'].unique())

# Check the city names in the coordinates dataframe
print('\nCity names in the coordinates dataframe:')
print(city_coordinates['City'].values)

In [None]:
# Create a mapping of city names
city_mapping = {
    'Berau': 'Berau Regency',
    'West Kutai': 'West Kutai Regency',
    'Kutai Kartanegara': 'Kutai Kartanegara Regency',
    'East Kutai': 'East Kutai Regency',
    'Mahakam Ulu': 'Mahakam Ulu Regency',
    'Paser': 'Paser Regency',
    'North Penajam Paser': 'North Penajam Paser Regency',
    'Balikpapan': 'Balikpapan City',
    'Bontang': 'Bontang City',
    'Samarinda': 'Samarinda City'
}

# Replace the city names in the coordinates dataframe
city_coordinates['City'] = city_coordinates['City'].replace(city_mapping)

# Merge the coordinates dataframe with the original dataframe
df_dates = pd.merge(df_dates, city_coordinates, on='City', how='left')

df_dates.head()

In [None]:
import folium

# Create a map centered around East Kalimantan
m = folium.Map(location=[-0.50, 116.50], zoom_start=7)

# Add a marker for each city
for idx, row in df_dates.iterrows():
    folium.Marker(location=[row['Latitude'], row['Longitude']],
                  popup=f"{row['City']}: {row['Number of Tourists']} tourists",
                  icon=folium.Icon(color='blue')).add_to(m)

m

## Creating a Radar Chart

A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point. It's also known as a spider chart.

In our case, we can use a radar chart to compare the number of tourists in different cities. Each axis in the chart represents a city, and the length of the axis represents the number of tourists in that city.

In [None]:
import matplotlib.pyplot as plt
from math import pi

# Group the data by city and calculate the total number of tourists in each city
city_data = df_dates.groupby('City')['Number of Tourists'].sum().reset_index()

# Number of variables we're plotting
num_vars = len(city_data['City'])

# Compute angle of each axis in the plot (a circle is divided into 2*pi radians)
angles = [n / float(num_vars) * 2 * pi for n in range(num_vars)]
angles += angles[:1]

# Initialize the spider plot
ax = plt.subplot(111, polar=True)

# Draw one axis per variable and add labels
plt.xticks(angles[:-1], city_data['City'], color='grey', size=8)

# Draw y-labels
ax.set_rlabel_position(0)
plt.yticks([10000, 20000, 30000, 40000], ['10k', '20k', '30k', '40k'], color='grey', size=7)
plt.ylim(0, 40000)

# Plot data
values = city_data['Number of Tourists'].values.flatten().tolist()
values += values[:1]
ax.plot(angles, values, linewidth=1, linestyle='solid')

# Fill area
ax.fill(angles, values, 'b', alpha=0.1)

plt.show()

In [None]:
import matplotlib.pyplot as plt
from math import pi

# Group the data by city and calculate the total number of tourists in each city
city_data = df_dates.groupby('City')['Number of Tourists'].sum().reset_index()

# Number of variables we're plotting
num_vars = len(city_data['City'])

# Compute angle of each axis in the plot (a circle is divided into 2*pi radians)
angles = [n / float(num_vars) * 2 * pi for n in range(num_vars)]
angles += angles[:1]

# Initialize the spider plot
ax = plt.subplot(111, polar=True)

# Draw one axis per variable and add labels
plt.xticks(angles[:-1], city_data['City'], color='grey', size=8)

# Draw y-labels
ax.set_rlabel_position(0)
plt.yticks([10000, 20000, 30000, 40000], ['10k', '20k', '30k', '40k'], color='grey', size=7)
plt.ylim(0, 40000)

# Plot data
values = city_data['Number of Tourists'].values.flatten().tolist()
values += values[:1]
ax.plot(angles, values, linewidth=1, linestyle='solid')

# Fill area
ax.fill(angles, values, 'b', alpha=0.1)

plt.show()

In [None]:
# Load the data
df = pd.read_csv('https://raw.githubusercontent.com/sya2rawie/data-kunjungan/main/DATA%20LIBURAN%20LEBARAN%20Kaltim%202023%20fix.csv')

# Convert the 'Dates of Visits' column to datetime
df['Dates of Visits'] = pd.to_datetime(df['Dates of Visits'])

# Filter the data for the dates 23 April 2023 to 25 April 2023
df_dates = df[(df['Dates of Visits'] >= '2023-04-23') & (df['Dates of Visits'] <= '2023-04-25')]

# Group the data by city and calculate the total number of tourists in each city
city_data = df_dates.groupby('City')['Number of Tourists'].sum().reset_index()

# Number of variables we're plotting
num_vars = len(city_data['City'])

# Compute angle of each axis in the plot (a circle is divided into 2*pi radians)
angles = [n / float(num_vars) * 2 * pi for n in range(num_vars)]
angles += angles[:1]

# Initialize the spider plot
ax = plt.subplot(111, polar=True)

# Draw one axis per variable and add labels
plt.xticks(angles[:-1], city_data['City'], color='grey', size=8)

# Draw y-labels
ax.set_rlabel_position(0)
plt.yticks([10000, 20000, 30000, 40000], ['10k', '20k', '30k', '40k'], color='grey', size=7)
plt.ylim(0, 40000)

# Plot data
values = city_data['Number of Tourists'].values.flatten().tolist()
values += values[:1]
ax.plot(angles, values, linewidth=1, linestyle='solid')

# Fill area
ax.fill(angles, values, 'b', alpha=0.1)

plt.show()

In [None]:
# Load the data
df = pd.read_csv('https://raw.githubusercontent.com/sya2rawie/data-kunjungan/main/DATA%20LIBURAN%20LEBARAN%20Kaltim%202023%20fix.csv')

# Convert the 'Dates of Visits' column to datetime
df['Dates of Visits'] = pd.to_datetime(df['Dates of Visits'])

# Filter the data for the dates 23 April 2023 to 25 April 2023
df_dates = df[(df['Dates of Visits'] >= '2023-04-23') & (df['Dates of Visits'] <= '2023-04-25')]

# Group the data by city and calculate the total number of tourists in each city
city_data = df_dates.groupby('City')['Number of Tourists'].sum().reset_index()

# Number of variables we're plotting
num_vars = len(city_data['City'])

# Compute angle of each axis in the plot (a circle is divided into 2*pi radians)
angles = [n / float(num_vars) * 2 * pi for n in range(num_vars)]
angles += angles[:1]

# Initialize the spider plot
ax = plt.subplot(111, polar=True)

# Draw one axis per variable and add labels
plt.xticks(angles[:-1], city_data['City'], color='grey', size=8)

# Draw y-labels
ax.set_rlabel_position(0)
plt.yticks([10000, 20000, 30000, 40000], ['10k', '20k', '30k', '40k'], color='grey', size=7)
plt.ylim(0, 40000)

# Plot data
values = city_data['Number of Tourists'].values.flatten().tolist()
values += values[:1]
ax.plot(angles, values, linewidth=1, linestyle='solid')

# Fill area
ax.fill(angles, values, 'b', alpha=0.1)

plt.show()

In [None]:
# Load the data
df = pd.read_csv('https://raw.githubusercontent.com/sya2rawie/data-kunjungan/main/DATA%20LIBURAN%20LEBARAN%20Kaltim%202023%20fix.csv')

# Print the column names
print(df.columns)

In [None]:
# Display the entire dataframe
df

In [None]:
# Display the entire dataframe with pandas option to show all rows
pd.set_option('display.max_rows', None)
df

In [None]:
# Display the entire dataframe with pandas option to show all rows
pd.set_option('display.max_rows', None)
df

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Convert 'JUMLAH WISATAWAN LEBARAN 2023' column to numeric
df['JUMLAH WISATAWAN LEBARAN 2023'] = df['JUMLAH WISATAWAN LEBARAN 2023'].str.replace(',', '').astype(float)

# Create a pivot table for the heatmap
heatmap_data = df.pivot_table(values='JUMLAH WISATAWAN LEBARAN 2023', index='KABUPATEN / KOTA', columns='DESTINASI WISATA')

# Draw the heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(heatmap_data, cmap='YlGnBu')
plt.title('Heatmap of Tourist Numbers during Lebaran 2023')
plt.show()

## Creating a Bar Chart

A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a column chart.

We will create a bar chart using matplotlib library in python. The bar chart will represent the total number of tourists in each city.

In [None]:
import matplotlib.pyplot as plt

# Calculate the total number of tourists in each city
city_totals = df.groupby('KABUPATEN / KOTA')['JUMLAH WISATAWAN LEBARAN 2023'].sum()

# Create a bar chart
city_totals.plot(kind='bar', figsize=(10, 6))
plt.xlabel('City')
plt.ylabel('Total Number of Tourists')
plt.title('Total Number of Tourists in Each City')
plt.show()

In [None]:
# Convert 'JUMLAH WISATAWAN LEBARAN 2023' to numeric
df['JUMLAH WISATAWAN LEBARAN 2023'] = pd.to_numeric(df['JUMLAH WISATAWAN LEBARAN 2023'], errors='coerce')

# Calculate the total number of tourists in each city
city_totals = df.groupby('KABUPATEN / KOTA')['JUMLAH WISATAWAN LEBARAN 2023'].sum()

# Create a bar chart
city_totals.plot(kind='bar', figsize=(10, 6))
plt.xlabel('City')
plt.ylabel('Total Number of Tourists')
plt.title('Total Number of Tourists in Each City')
plt.show()

## Creating a Violin Plot

A Violin Plot is used to visualise the distribution of the data and its probability density. This chart is a combination of a Box Plot and a Density Plot that is rotated and placed on each side, to show the distribution shape of the data. The white dot in the middle is the median value and the thick black bar in the centre represents the interquartile range. The thin black line extended from it represents the upper (max) and lower (min) adjacent values in the data. The width of the violin at each level represents the number of tourists.

We will create a violin plot using seaborn library in python. The violin plot will represent the distribution of the number of tourists in each city.

In [None]:
# Create a violin plot
plt.figure(figsize=(10, 6))
sns.violinplot(x='KABUPATEN / KOTA', y='JUMLAH WISATAWAN LEBARAN 2023', data=df)
plt.xlabel('City')
plt.ylabel('Number of Tourists')
plt.title('Distribution of Number of Tourists in Each City')
plt.xticks(rotation=90)
plt.show()

## Creating a Stacked Bar Graph

A Stacked Bar Graph is a chart that uses bars to show comparisons between categories of data, but with ability to break down and compare parts of a whole. Each bar in the chart represents a whole, and segments in the bar represent different parts or categories of that whole. Different colors are used to illustrate the different categories in the bar.

We will create a stacked bar graph using matplotlib library in python. The stacked bar graph will represent the total number of tourists in each city, broken down by destination.

In [None]:
# Create a pivot table with 'City' as the index, 'Destination' as the columns, and 'Number of Tourists' as the values
stacked_bar_data = df.pivot_table(index='KABUPATEN / KOTA', columns='DESTINASI WISATA', values='JUMLAH WISATAWAN LEBARAN 2023', fill_value=0)

# Create a stacked bar graph
stacked_bar_data.plot(kind='bar', stacked=True, figsize=(10, 6))
plt.xlabel('City')
plt.ylabel('Total Number of Tourists')
plt.title('Total Number of Tourists in Each City, Broken Down by Destination')
plt.show()

In [None]:
!pip install murkrow