<h1 style="color:Blue; font-size:40px;">Data Science in Action</h1>

In this final phase, you will bring everything together by integrating your preprocessed dataset and transforming it into **meaningful insights or tools**. This is
your opportunity to be creative: choose techniques you find most appropriate and
develop something that would **genuinely help people compare European cities
and decide where to live**.

Your work will be evaluated not only on technical execution but also on the
**relevance, originality, and clarity** of your analysis or tool. Higher levels of
complexity, critical thinking, and real-world applicability will be rewarded.

## **Group Members**

- Diogo Gonçalves - 20241817
- Gustavo Franco - 20241816
- João Marques - 20241771
- Juan Mendes - 20241804

## **Objectives**

You are **free to choose your own direction**, but here are some possible starting points:

- **Recommendation System:**  
  Allow users to specify the characteristics they care about (e.g., cost of living, salary, safety, education opportunities) and suggest the cities that best match their profile.

- **Interactive Dashboard:**  
  Combine your dataset with additional external sources (e.g., education, healthcare, transportation) and design a dashboard that allows users to visually explore and compare cities.

- **Comparative Analysis:**  
  Perform a deep dive into one or more dimensions (such as cost of living vs. salary, unemployment vs. GDP, population demographics, etc.) and present clear visualizations and interpretations that highlight meaningful trade-offs.


## **Install libraries**

In [10]:
!pip install dash plotly pandas

Collecting dash
  Downloading dash-3.3.0-py3-none-any.whl.metadata (11 kB)
Collecting retrying (from dash)
  Downloading retrying-1.4.2-py3-none-any.whl.metadata (5.5 kB)
Downloading dash-3.3.0-py3-none-any.whl (7.9 MB)
   ---------------------------------------- 0.0/7.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/7.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/7.9 MB 653.6 kB/s eta 0:00:13
    --------------------------------------- 0.1/7.9 MB 1.4 MB/s eta 0:00:06
    --------------------------------------- 0.2/7.9 MB 1.3 MB/s eta 0:00:06
   - -------------------------------------- 0.2/7.9 MB 1.3 MB/s eta 0:00:07
   -- ------------------------------------- 0.4/7.9 MB 1.6 MB/s eta 0:00:05
   --- ------------------------------------ 0.6/7.9 MB 2.1 MB/s eta 0:00:04
   ---- ----------------------------------- 0.8/7.9 MB 2.4 MB/s eta 0:00:03
   ----- ---------------------------------- 1.1/7.9 MB 2.7 MB/s eta 0:00:03
   ------ ----------------------

## **Import Data**

In [16]:
data_copy = pd.read_csv("city_data_copy.csv", index_col = 0)

## **Web Scraping**

In [None]:
def safety_index(city):
    browser = 

## **Interactive dashboard**

In [20]:
# Import libraries
import dash
from dash import Dash, dcc, html, Input, Output
import plotly.express as px
import pandas as pd

# Assume your dataset is already loaded as 'data_copy'
# For demonstration, ensure 'data_copy' has the following columns:
# "City", "Country", "Average Monthly Salary", "Average Cost of Living", "Population", "Lat_dd", "Lon_dd"

# Initialize the app
app = Dash(__name__)

# App layout
app.layout = html.Div([
    html.H1("City Comparison Dashboard"),

    html.Div([
        html.Label("Select Country:"),
        dcc.Dropdown(
            id='country-dropdown',
            options=[{'label': c, 'value': c} for c in sorted(data_copy["Country"].dropna().unique())],
            multi=True,
            placeholder="Select one or more countries"
        ),
    ], style={'width': '30%', 'display': 'inline-block', 'verticalAlign': 'top'}),

    html.Div([
        html.Label("Minimum Salary (€):"),
        dcc.Slider(
            id='salary-slider',
            min=int(data_copy["Average Monthly Salary"].min() or 0),
            max=int(data_copy["Average Monthly Salary"].max() or 5000),
            step=100,
            value=2000,
            marks={0: '0', 2000: '2000', 4000: '4000', 6000: '6000'}
        ),
    ], style={'width': '60%', 'padding': '0px 20px 20px 20px'}),

    html.Div([
        html.Label("Maximum Cost of Living (€):"),
        dcc.Slider(
            id='cost-slider',
            min=int(data_copy["Average Cost of Living"].min() or 0),
            max=int(data_copy["Average Cost of Living"].max() or 5000),
            step=50,
            value=1600,
            marks={0: '0', 1000: '1000', 2000: '2000', 3000: '3000'}
        ),
    ], style={'width': '60%', 'padding': '0px 20px 20px 20px'}),

    dcc.Graph(id='city-scatter')
])

# Callback to update chart based on filters
@app.callback(
    Output('city-scatter', 'figure'),
    Input('country-dropdown', 'value'),
    Input('salary-slider', 'value'),
    Input('cost-slider', 'value')
)
def update_scatter(selected_countries, min_salary, max_cost):
    # Filter data based on user input
    df = data_copy.copy()
    
    if selected_countries:
        df = df[df["Country"].isin(selected_countries)]
    df = df[(df["Average Monthly Salary"] >= min_salary) & 
            (df["Average Cost of Living"] <= max_cost)]
    
    # Create scatter plot
    fig = px.scatter(
        df,
        x="Average Cost of Living",
        y="Average Monthly Salary",
        color="Country",
        size="Population",
        hover_name="City",
        hover_data=["Country", "Population", "Average Monthly Salary", "Average Cost of Living"],
        title="Cities: Salary vs Cost of Living"
    )
    fig.update_layout(transition_duration=500)
    return fig

# Run the app
if __name__ == '__main__':
    app.run(debug=True)

