# PIPELINE 3

This script represents the final step of my project and serves to display insights based on the collected data, as well as to verify the functionality of the previous pipelines. The script consists of creating three distinct dashboards, each implemented through Dash, a Python library dedicated to creating analytical web applications. Each dashboard is designed to display aggregated data from bike-sharing stations in different graphical formats: a bar chart, a pie chart, and a line chart.

Each dashboard is hosted on a Dash server configured to run on specific ports, avoiding conflicts and ensuring accessibility via a browser. Dash callbacks, triggered by Interval components, execute MongoDB aggregation pipelines to update the charts with the latest data, transforming the results into pandas DataFrames for visualization with Plotly Express.

Each script begins by establishing a connection with MongoDB through MongoClient, specifically accessing the sensor_data database. From there, it interfaces with different collections: sensor_data for the bar and pie charts, and aggregated_hourly_departures_per_station for the line chart.

In [1]:
from dash import Dash, dcc, html
from dash.dependencies import Input, Output
import plotly.express as px
from pymongo import MongoClient
import pandas as pd

## Plotly Bar chart

The provided code establishes a connection with MongoDB to process data from the sensor_data collection, creating a bar chart that updates every 2 seconds. 

The chart showcases the top 20 stations by changes in the number of available bicycles, using a MongoDB aggregation pipeline to select and order the stations based on their activity  and showing a clear view of the fluctuations in bicycle availability.

**In the resulting chart, bars oriented downwards indicate the arrival of bicycles at the stations, while those oriented upwards represent bicycles leaving the stations.**

In [2]:
# Connessione a MongoDB
mongo_uri = "mongodb://mongoadmin:secret@localhost:27017/"
client = MongoClient(mongo_uri)
db = client['sensor_data']
collection_sensor_data = db['sensor_data']

# Inizializzazione dell'app Dash
app1 = Dash(__name__)

# Layout dell'app con componente Graph per il grafico a barre
app1.layout = html.Div([
    dcc.Graph(id='live-update-graph', style={'height': '500px'}),
    dcc.Interval(
        id='interval-component-live',
        interval=2000,  # Aggiornamento ogni 2 secondi
        n_intervals=0
    )
])

# Callback per aggiornare il grafico a barre
@app1.callback(Output('live-update-graph', 'figure'),
              [Input('interval-component-live', 'n_intervals')])
def update_graph_live(n):
    pipeline = [
        {"$sort": {"timestamp": -1}},
        {"$group": {
            "_id": "$metadata.station_id",
            "records": {"$push": "$$ROOT"}
        }},
        {"$project": {
            "last_records": {"$slice": ["$records", 150]}
        }},
        {"$addFields": {
            "first": {"$arrayElemAt": ["$last_records.total_bikes_available", -1]},
            "last": {"$arrayElemAt": ["$last_records.total_bikes_available", 0]},
            "station_name": {"$arrayElemAt": ["$last_records.metadata.name", 0]}
        }},
        {"$match": {"station_name": {"$ne": ""}, "station_name": {"$exists": True}, "station_name": {"$type": "string"}}},  # Escludere stazioni con nome vuoto o mancante
        {"$addFields": {
            "difference": {"$subtract": ["$last", "$first"]},
            "absolute_difference": {"$abs": {"$subtract": ["$last", "$first"]}}
        }},
        {"$sort": {"absolute_difference": -1}},
        {"$limit": 25}
    ]
    aggregated_data = collection_sensor_data.aggregate(pipeline)

    data = []
    for doc in aggregated_data:
        adjusted_difference = doc['difference'] + 1 if doc['difference'] >= 0 else doc['difference'] - 1
        data.append({
            'Station Name': doc['station_name'],
            'Adjusted Bikes Available Change': adjusted_difference,
            'First Record Timestamp': doc['last_records'][-1]['timestamp'],
            'Last Record Timestamp': doc['last_records'][0]['timestamp']
        })

    df = pd.DataFrame(data)
    df = df.sort_values('Station Name')

    fig = px.bar(df, x='Station Name', y='Adjusted Bikes Available Change',
                 title="Top 20 Stations by Activity (Near-Real-Time Values)",
                 labels={"Adjusted Bikes Available Change": "Bikes Available Change"},
                 color='Adjusted Bikes Available Change',
                 hover_data=['First Record Timestamp', 'Last Record Timestamp'],
                 color_continuous_scale=px.colors.sequential.Viridis,
                 range_y=[-10, 10])

    fig.update_layout(xaxis_tickangle=-45, xaxis_tickfont=dict(size=10))

    return fig

# Avvia il server
if __name__ == '__main__':
    app1.run_server(debug=True, port=8095)  # Porta modificata per evitare conflitti


## Plotly Pie chart

Data from the sensor_data collection are processed to generate a pie chart representing the top 20 stations by total number of departures. Departures are aggregated and sorted by station, providing an immediate overview of the most active stations.

In [3]:
# Connessione a MongoDB
mongo_uri = "mongodb://mongoadmin:secret@localhost:27017/"
client = MongoClient(mongo_uri)
db = client['sensor_data']

# Inizializzazione dell'app Dash
app2 = Dash(__name__)

# Layout dell'app con componente Graph per il grafico a torta
app2.layout = html.Div([
    dcc.Graph(id='pie-chart'),
    dcc.Interval(
        id='interval-component',
        interval=2000,  # Aggiornamento ogni 2 secondi
        n_intervals=0
    )
])

# Callback per aggiornare il grafico a torta
@app2.callback(Output('pie-chart', 'figure'),
              [Input('interval-component', 'n_intervals')])
def update_pie_chart(n):
    aggregation_pipeline = [
        {"$group": {
            "_id": "$metadata.name",
            "total_partenze": {"$sum": "$departures"}
        }},
        {"$sort": {"total_partenze": -1}},
        {"$limit": 20}
    ]
    top_stations = list(db['sensor_data'].aggregate(aggregation_pipeline))
    df = pd.DataFrame(top_stations)
    
    if not df.empty:
        fig = px.pie(df, names='_id', values='total_partenze', title='Top 20 Stations by total departures')
    else:
        fig = px.pie(title="Nessun dato disponibile")
    
    return fig

# Avvia il server
if __name__ == '__main__':
    app2.run_server(debug=True, port=9096)  # Porta modificata per evitare conflitti


## Plotly Line chart

Data from the aggregated_hourly_departures_per_station collection are processed to plot a line chart that shows hourly departures from the top 20 stations over the last available day. This visualization leverages the aggregated hourly data to offer detailed insights into the trend of departures throughout the day.

In [4]:
from dash import Dash, dcc, html
from dash.dependencies import Input, Output
import plotly.express as px
from pymongo import MongoClient
import pandas as pd

# Connessione a MongoDB
mongo_uri = "mongodb://mongoadmin:secret@localhost:27017/"
client = MongoClient(mongo_uri)
db = client['sensor_data']
collection_aggregated_hourly_sensor_per_station = db['aggregated_hourly_departures_per_station']

# Inizializzazione dell'app Dash
app = Dash(__name__)
# Layout dell'app con componente Graph per il grafico a linee
app.layout = html.Div([
    dcc.Graph(id='line-chart'),
    dcc.Interval(
        id='interval-component-line',
        interval=2000,  # Aggiornamento ogni 2 secondi
        n_intervals=0
    )
])

# Callback per aggiornare il grafico a linee
@app.callback(Output('line-chart', 'figure'),
              [Input('interval-component-line', 'n_intervals')])
def update_line_chart(n):
    most_recent_date_doc = collection_aggregated_hourly_sensor_per_station.find_one(sort=[("date", -1)])
    if most_recent_date_doc:
        most_recent_date = most_recent_date_doc["date"]
    else:
        return px.line(title="Nessun dato disponibile")

    aggregation_pipeline = [
        {"$match": {"date": most_recent_date}},
        {"$group": {
            "_id": "$station_name",
            "total_partenze": {"$sum": "$cnt_partenze"}
        }},
        {"$sort": {"total_partenze": -1}},
        {"$limit": 20}
    ]
    top_stations = list(collection_aggregated_hourly_sensor_per_station.aggregate(aggregation_pipeline))
    top_station_names = [station['_id'] for station in top_stations]

    filtered_documents = collection_aggregated_hourly_sensor_per_station.find({
        "date": most_recent_date,
        "station_name": {"$in": top_station_names}
    }).sort([("station_name", 1), ("ora", 1)])

    df = pd.DataFrame(list(filtered_documents))
    if df.empty:
        return px.line(title="Nessun dato disponibile")

    df.rename(columns={'ora': 'Ora', 'station_name': 'Nome Stazione', 'cnt_partenze': 'Cnt Partenze', 'date': 'Date'}, inplace=True)

    fig = px.line(df, x='Ora', y='Cnt Partenze', color='Nome Stazione',
                  title=f"Top 20 Stations by Hourly Departures - Date: {most_recent_date}",
                  labels={"Cnt Partenze": "Number of Departures", "Ora": "Hour of the Day", "Nome Stazione": "Station Name"})

    return fig

# Avvia il server
if __name__ == '__main__':
    app.run_server(debug=True, port=8099)  # Porta modificata per evitare conflitti
