<a href="https://colab.research.google.com/github/ddenebb/Hackaton-Tech-4-good/blob/main/DS_for_Sustainable_Tourism_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###Sustainable Tourism Project
For this project we were asked to deliver a dataset including 4 of the most popular touristic attractions in Barcelona, linked to their 5 closest points of cultural interest in JSON format. We used an Open BCN dataset as reference: [Punts d'interès cultural de la ciutat de Barcelona](https://opendata-ajuntament.barcelona.cat/data/ca/dataset/punts-informacio-turistica).
The data was processed and cleaned in an Excel file where pictures and description of the sites were added, and columns not relevant were suppressed. The xlsx file was saved in Google Drive, then loaded in this notebook and converted to panda to apply further processing.


In [1]:

import pandas as pd

# Mount Google Drive to access files
from google.colab import drive
drive.mount('/content/drive')

import pandas as pd

# Specify the path to the Excel file
xlsx_file_path = '/content/drive/MyDrive/AA DS Hackathon/Data/dfTop15.xlsx'

# Read the Excel file into a DataFrame
df = pd.read_excel(xlsx_file_path)

# Print the DataFrame to verify the data
print(df)

Mounted at /content/drive
      register_id                                               name  \
0    ﻿92086011949                                         Park Güell   
1    ﻿92086004491  Temple Expiatori de la Sagrada Família - Basílica   
2    ﻿75990352709              L'Aquàrium de Barcelona - Aspro Parks   
3       ﻿61171409                                   Zoo de Barcelona   
4    ﻿92086009153                              Telefèric de Montjuïc   
..            ...                                                ...   
866  ﻿99400477467                                    Villa Esperanza   
867  ﻿75990539136                                 World Trade Center   
868  ﻿99400390600                       Xemeneia de Can Galta Cremat   
869  ﻿99400477410        Xemeneia de l'antiga Fábrica "La Porcelana"   
870  ﻿99400119148                                      Zona de Banys   

                                               Picture  \
0    https://www.cataloniahotels.com/es/blog/wp-con

In [2]:
df

Unnamed: 0,register_id,name,Picture,Description,Top 15,Visitantes,lat,lng
0,﻿92086011949,Park Güell,https://www.cataloniahotels.com/es/blog/wp-con...,El parque Güell es un parque público con jard...,SI,1506642.0,41.413509,2.153127
1,﻿92086004491,Temple Expiatori de la Sagrada Família - Basílica,https://images.photowall.com/products/63181/fa...,"El Templo Expiatorio de la Sagrada Familia, co...",SI,1011241.0,41.403953,2.173926
2,﻿75990352709,L'Aquàrium de Barcelona - Aspro Parks,https://estaticos-cdn.elperiodico.com/clip/0b6...,"L'Aquàrium de Barcelona, situado en el Puerto ...",SI,692754.0,41.376805,2.184384
3,﻿61171409,Zoo de Barcelona,https://www.turismoencatalunya.es/images/3-ele...,El Parque Zoológico de Barcelona es un zoológi...,SI,621137.0,41.386154,2.187187
4,﻿92086009153,Telefèric de Montjuïc,,,SI,548986.0,41.368851,2.163051
...,...,...,...,...,...,...,...,...
866,﻿99400477467,Villa Esperanza,,,NO,,41.413632,2.146376
867,﻿75990539136,World Trade Center,,,NO,,41.371091,2.182364
868,﻿99400390600,Xemeneia de Can Galta Cremat,,,NO,,41.440636,2.187715
869,﻿99400477410,"Xemeneia de l'antiga Fábrica ""La Porcelana""",,,NO,,41.373027,2.144526


Analyzing the dataset, here's a summary of the columns:

* register_id: It represents the unique identifier or registration ID for each record in the dataset.
* name: It denotes the name or title of the attraction or tourist site.
* Picture: It might contain the picture or image associated with the attraction.
* Description: It provides a description or information about the attraction.
* Top 15: It indicates whether the attraction is among the top 15 attractions based on certain criteria or ranking.
* Visitantes: It represents the number of visitors or tourists that have visited the attraction.
* lat: It specifies the latitude coordinate of the attraction's location.
* lng: It specifies the longitude coordinate of the attraction's location.
These columns contain information related to each attraction, including its unique identifier, name, associated picture, description, ranking, number of visitors, and geographical coordinates.

register_id IS NOT RELEVANT, SO IT WILL BE DROPPED







In [3]:
df = df.drop("register_id", axis=1)


In [4]:
df

Unnamed: 0,name,Picture,Description,Top 15,Visitantes,lat,lng
0,Park Güell,https://www.cataloniahotels.com/es/blog/wp-con...,El parque Güell es un parque público con jard...,SI,1506642.0,41.413509,2.153127
1,Temple Expiatori de la Sagrada Família - Basílica,https://images.photowall.com/products/63181/fa...,"El Templo Expiatorio de la Sagrada Familia, co...",SI,1011241.0,41.403953,2.173926
2,L'Aquàrium de Barcelona - Aspro Parks,https://estaticos-cdn.elperiodico.com/clip/0b6...,"L'Aquàrium de Barcelona, situado en el Puerto ...",SI,692754.0,41.376805,2.184384
3,Zoo de Barcelona,https://www.turismoencatalunya.es/images/3-ele...,El Parque Zoológico de Barcelona es un zoológi...,SI,621137.0,41.386154,2.187187
4,Telefèric de Montjuïc,,,SI,548986.0,41.368851,2.163051
...,...,...,...,...,...,...,...
866,Villa Esperanza,,,NO,,41.413632,2.146376
867,World Trade Center,,,NO,,41.371091,2.182364
868,Xemeneia de Can Galta Cremat,,,NO,,41.440636,2.187715
869,"Xemeneia de l'antiga Fábrica ""La Porcelana""",,,NO,,41.373027,2.144526


The final step would be filtering the DataFrame to find attractions with specific names and criteria, calculateing the distances between those attractions and all other attractions, selects the closest attractions, converting the result to JSON, and saving them to separate JSON files for each attraction.

In [5]:
from geopy.distance import geodesic
import json

attractions = ["Sagrada Família", "L'Aquàrium", "Park Güell", "Zoo"]

for attraction in attractions:
    # Filter the DataFrame for attractions with the given name and 'SI' in the 'Top 15' column
    filtered_attractions = df[(df["name"].str.contains(attraction)) & (df["Top 15"] == "SI")]

    if not filtered_attractions.empty:
        attraction_lat = filtered_attractions["lat"].iloc[0]
        attraction_lng = filtered_attractions["lng"].iloc[0]

        # Calculate the distance and find the closest 5 attractions
        distances = df.apply(lambda row: geodesic((attraction_lat, attraction_lng), (row["lat"], row["lng"])).km, axis=1)
        df["distance"] = distances
        closest_attractions = df.nsmallest(5, "distance")

        # Convert the closest attractions to JSON
        json_data = closest_attractions.to_json(orient="records")

        # Save the JSON data to a file
        file_path = f"/content/drive/MyDrive/AA DS Hackathon/Data/closest_{attraction.lower()}_attractions.json"
        with open(file_path, "w") as json_file:
            json_file.write(json_data)

        print(f"JSON file for {attraction} saved successfully.")
    else:
        print(f"No attractions found with {attraction} in the 'name' column and 'SI' in the 'Top 15' column.")

JSON file for Sagrada Família saved successfully.
JSON file for L'Aquàrium saved successfully.
JSON file for Park Güell saved successfully.
JSON file for Zoo saved successfully.


Once we have the requested 4 JSON files, we are ready to pass them over to Back End