In [1]:
import pandas as pd
import numpy as np

<h1 align="center">Data Wrangling: Import and Cleaning(2016)</h1>

In [2]:
file_paths = {
    'jan': r'C:\Users\Ivan\Desktop\tourism_analysis\data\2016\januar_2016.xls',
    'feb': r'C:\Users\Ivan\Desktop\tourism_analysis\data\2016\februar_2016.xls',
    'mar': r'C:\Users\Ivan\Desktop\tourism_analysis\data\2016\mart_2016.xls',
    'apr': r'C:\Users\Ivan\Desktop\tourism_analysis\data\2016\april_2016.xls',
    'maj': r'C:\Users\Ivan\Desktop\tourism_analysis\data\2016\maj_2016.xls',
    'jun': r'C:\Users\Ivan\Desktop\tourism_analysis\data\2016\jun_2016.xls',
    'jul': r'C:\Users\Ivan\Desktop\tourism_analysis\data\2016\jul_2016.xls',
    'avg': r'C:\Users\Ivan\Desktop\tourism_analysis\data\2016\avgust_2016.xls',
    'sep': r'C:\Users\Ivan\Desktop\tourism_analysis\data\2016\septembar_2016.xls',
    'okt': r'C:\Users\Ivan\Desktop\tourism_analysis\data\2016\oktobar_2016.xls',
    'nov': r'C:\Users\Ivan\Desktop\tourism_analysis\data\2016\novembar_2016.xls',
    'dec': r'C:\Users\Ivan\Desktop\tourism_analysis\data\2016\decembar_2016.xls'
}

<h3>Table 1: Tourists by Accommodation Type</h3> <br>

The code defines a dictionary `file_paths`| that contains file paths for monthly tourism data in 2016. It then reads the data from each file, cleans and prepares the data by renaming columns, dropping unnecessary columns, and re-categorizing accommodation types. The cleaned data is stored in a dictionary `tourists_by_acc_dict`, where the key is the month abbreviation and the value is a pandas DataFrame containing the cleaned data for that month. Finally, the code concatenates the DataFrames for all months except January and sums the number of tourists by accommodation type to create a summary DataFrame `tourists_by_acc_2016`.

In [3]:
tourists_by_acc_dict = {}
for month, path in file_paths.items():
    tourists_by_acc_dict[f'{month}'] = pd.read_excel(path, sheet_name='Sheet6', header=4, nrows=2)
    
for year, df in tourists_by_acc_dict.items():
    df.rename(columns={
            df.columns[0]: 'tip_smjestaja',
            df.columns[1]: 'dolasci_turista'},
            inplace=True
    )
    df.drop(columns=[df.columns[2], df.columns[3]],
            inplace=True)
    filt = df['tip_smjestaja'].str.contains('Kolektivni')
    filt1 = df['tip_smjestaja'].str.contains('smještaj 2')
    df.loc[filt, 'tip_smjestaja'] = 'Kolektivni smještaj'
    df.loc[filt1, 'tip_smjestaja'] = 'Privatni smještaj'

In [4]:
df1 = tourists_by_acc_dict['jan'].copy()
for key in file_paths:
    if key != 'jan':
        df1 = pd.concat([df1, tourists_by_acc_dict[key]], ignore_index=True).groupby('tip_smjestaja', as_index=False).sum()

In [5]:
tourists_by_acc_2016 = df1.copy()
tourists_by_acc_2016.head()

Unnamed: 0,tip_smjestaja,dolasci_turista
0,Kolektivni smještaj,808788
1,Privatni smještaj,1005029


In [6]:
tourists_by_acc_dict = {}
for month, path in file_paths.items():
    tourists_by_acc_dict[f'{month}'] = pd.read_excel(path, sheet_name='Sheet7', header=4, nrows=2)
    
for year, df in tourists_by_acc_dict.items():
    df.rename(columns={
            df.columns[0]: 'tip_smjestaja',
            df.columns[1]: 'nocenje_turista'},
            inplace=True
    )
    df.drop(columns=[df.columns[2], df.columns[3]],
            inplace=True)
    filt = df['tip_smjestaja'].str.contains('Kolektivni')
    filt1 = df['tip_smjestaja'].str.contains('smještaj 2')
    df.loc[filt, 'tip_smjestaja'] = 'Kolektivni smještaj'
    df.loc[filt1, 'tip_smjestaja'] = 'Privatni smještaj'

In [7]:
df1 = tourists_by_acc_dict['jan'].copy()
for key in file_paths:
    if key != 'jan':
        df1 = pd.concat([df1, tourists_by_acc_dict[key]], ignore_index=True).groupby('tip_smjestaja', as_index=False).sum()

In [8]:
tourists_by_acc_2016 = pd.concat([tourists_by_acc_2016, df1], ignore_index=True).groupby('tip_smjestaja', as_index=False).sum()
tourists_by_acc_2016[['dolasci_turista', 'nocenje_turista']] = tourists_by_acc_2016[['dolasci_turista', 'nocenje_turista']].astype('int')

In [31]:
tourists_by_acc_2016.loc[tourists_by_acc_2016['tip_smjestaja'].str.contains('Privatni'), 'tip_smjestaja'] = 'Individualni turistički smještaj'

In [34]:
tourists_by_acc_2016

Unnamed: 0,tip_smjestaja,dolasci_turista,nocenje_turista
0,Kolektivni smještaj,808788,3521897
1,Individualni turistički smještaj,1005029,7728108


In [33]:
tourists_by_acc_2016.to_csv(fr'C:\Users\Ivan\Desktop\tourism_analysis\data\cleaned_data\final_data_accomodation\tourists_by_accomodation_2016.csv')

<h3>Table 2: Tourists by Municipality</h3><br>

The code provided reads data from multiple Excel sheets and combines them into a single dataframe. It then performs data cleaning operations such as renaming columns, dropping unnecessary columns, filling missing values, and replacing dashes with zeros. Finally, it creates a new dataframe and saves it as a CSV file. The resulting dataframe contains information about the number of foreign and domestic tourists and their overnight stays in various municipalities in Montenegro in `2016`.

In [10]:
tourists_by_mun_dict = {}
for month, path in file_paths.items():
    tourists_by_mun_dict[f'{month}'] = pd.read_excel(path, sheet_name='Sheet4', header=3)

for month, df in tourists_by_mun_dict.items():
    df.rename(columns={
            df.columns[0]: 'opstina',
            df.columns[1]: 'dolasci_stranih_turista',
            df.columns[2]: 'dolasci_domacih_turista',
            df.columns[5]: 'nocenje_stranih_turista',
            df.columns[6]: 'nocenje_domacih_turista'},
            inplace=True
    )
    df.drop(labels=0, inplace=True)
    df.drop(columns=[df.columns[3], df.columns[4], df.columns[7], df.columns[8]],
            inplace=True)
    tourists_by_mun_dict[month] = tourists_by_mun_dict[month].fillna(0)
    tourists_by_mun_dict[month] = tourists_by_mun_dict[month].replace('-', 0)

In [11]:
df1 = tourists_by_mun_dict['jan'].copy()
for key in file_paths:
    if key != 'jan':
        df1 = pd.concat([df1, tourists_by_mun_dict[key]], ignore_index=True).groupby('opstina', as_index=False).sum()
        

In [12]:
tourists_by_municipality_2016 = df1.copy()
tourists_by_municipality_2016.head()

Unnamed: 0,opstina,dolasci_stranih_turista,dolasci_domacih_turista,nocenje_stranih_turista,nocenje_domacih_turista
0,Andrijevica,57,259,72,319
1,Bar,178426,10690,1487979,42611
2,Berane,1391,990,1806,1293
3,Bijelo Polje,4126,2749,8698,5052
4,Budva,741870,64601,4659848,345069


In [19]:
 tourists_by_municipality_2016.to_csv(fr'C:\Users\Ivan\Desktop\tourism_analysis\data\cleaned_data\final_data_municipality\tourists_by_municipality_2016.csv', index=False)

<h3>Table 3: Tourists by Country</h3><br>
This code creates a dictionary of DataFrames with information about tourists by country for each month, from multiple Excel files. The code performs some data cleaning on each DataFrame, including renaming columns, dropping unnecessary columns, removing rows for certain countries, filling in missing values, and replacing dashes with zeros. The code then aggregates the information from each month into a single DataFrame for the year 2016 , which is saved as a CSV file. The total number of tourist arrivals for 2016 is also printed.

In [13]:
tourists_by_country_dict = {}
for month, path in file_paths.items():
    tourists_by_country_dict[f'{month}'] = pd.read_excel(path, sheet_name='Sheet5', header=1)

for month, df in tourists_by_country_dict.items():
    df.rename(columns={
        df.columns[0]: 'zemlja_porijekla', 
        df.columns[1]: 'dolasci',
        df.columns[3]:'nocenja',},
        inplace= True)
    
    df.drop(columns=[df.columns[2], df.columns[4]],
            inplace=True)
    
    filt = (~df['zemlja_porijekla'].isin(['Strani turisti', 'Evropa', 'Vanevropske zemlje']))
    tourists_by_country_dict[month] = df[filt]
    tourists_by_country_dict[month] = tourists_by_country_dict[month].reset_index(drop=True)
    tourists_by_country_dict[month] = tourists_by_country_dict[month].fillna(0)
    tourists_by_country_dict[month] = tourists_by_country_dict[month].replace('-', 0)

In [14]:
df1 = tourists_by_country_dict['jan'].copy()
for key in file_paths:
    if key != 'jan':
        df1['dolasci'] = df1['dolasci'] + tourists_by_country_dict[key]['dolasci']
        df1['nocenja'] = df1['dolasci'] + tourists_by_country_dict[key]['nocenja']

In [20]:
tourists_by_country_2016 = df1.copy()
tourists_by_country_2016['dolasci'].sum()

1662121

In [17]:
tourists_by_country_2016.to_csv(fr'C:/Users/Ivan/Desktop/tourism_analysis/data/cleaned_data/final_data_country/tourists_by_country_2016.csv')