# Visualization of open data on education inscriptions Catalonia

- [Configuration](#Configuration)
- [Data Handling](#Data-handling)
- [Configuration of Visualization](#Visualization-configuration)
- [Filtering and Updating Data](#Filtering-and-updating-the-data)
- [Application](#Generation-of-Bokeh-application)
- [ToDo's](#ToDo's)

Data: https://analisi.transparenciacatalunya.cat/Educaci-/Estad-stica-de-l-assignaci-de-places-en-el-proc-s-/99md-r3rq/about_data

## Libraries needed for processing data and visualing using Bokeh

In [1]:
from os.path import dirname, join

import numpy as np
import pandas as pd
import requests
import logging
import sys
import math


from bokeh.io import show, output_notebook, curdoc
from bokeh.layouts import column, row
from bokeh.models import ColumnDataSource, Div, Select, Slider, TextInput, AutocompleteInput, Button, Jitter, WheelZoomTool, BoxZoomTool, ResetTool, PanTool, TapTool, NumeralTickFormatter
from bokeh.plotting import figure
from bokeh.transform import cumsum

from io import StringIO
from datetime import datetime



## Configuration
[To Index](#Visualization-of-open-data-on-education-inscriptions-Catalonia)


### VARS

In [2]:
END_YEAR = 2024
START_YEAR = 2016

### Logger


In [3]:
# Create a logger
logger = logging.getLogger(__name__)
logger.setLevel(logging.ERROR)

# Create a handler that writes log messages to stdout
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.ERROR)

# Create a formatter and add it to the handler
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)

# Add the handler to the logger
# logger.addHandler(handler)

# Test the logger
logger.info("This is a test log message")
logger.warning("This is a warning")
logger.error("This is an error")

This is an error


### Configuration of the output of the Notebook

- output_file(): The output will be saved to an HTML file, which is also opened in a new browser window or tab
- output_notebook(): The output will be inline in the associated notebook output cell

In [4]:
output_notebook() # The output will be inline in the associated notebook output cell

## Data handling
[To Index](#Visualization-of-open-data-on-education-inscriptions-Catalonia)


### Data from API

Loading CSV data from an API with pagination support. The `load_csv_from_api` function allows for efficient retrieval of large datasets by fetching data in chunks and supporting a maximum row limit. It handles pagination, combines the data into a single DataFrame, and saves the result as a CSV file.

In [5]:
def load_csv_from_api(url, chunk_size=1000, max_rows=None):
    """
    Loads CSV data from an API with pagination support.

    This function retrieves data from the specified API URL in chunks, combines them
    into a single DataFrame, and optionally limits the total number of rows fetched.
    The resulting dataset is saved as a CSV file.

    Args:
        url (str): The base URL of the API endpoint.
        chunk_size (int, optional): The number of rows to fetch in each API call. Defaults to 1000.
        max_rows (int, optional): The maximum number of rows to fetch in total. Defaults to None (no limit).

    Returns:
        pandas.DataFrame: The complete dataset fetched from the API.

    Raises:
        requests.RequestException: If there's an error in making the API request.
    """
    offset = 0
    all_data = []
    total_rows_fetched = 0
    
    logger.info(f"Starting data retrieval from {url}")
    logger.info(f"Chunk size: {chunk_size}, Max rows: {max_rows if max_rows else 'No limit'}")
    
    while True:
        # Construct the URL with offset and limit parameters
        paginated_url = f"{url}?$offset={offset}&$limit={chunk_size}"
        
        # Make the API request
        response = requests.get(paginated_url)
        
        # Check if the request was successful
        if response.status_code == 200:
            # Convert the response content to a pandas DataFrame
            chunk = pd.read_csv(StringIO(response.text))
            
            # If the chunk is empty, we've reached the end of the data
            if chunk.empty:
                logger.info("Received empty chunk. Ending pagination.")
                break
            
            # Append the chunk to our list of DataFrames
            all_data.append(chunk)
            
            # Update total rows fetched
            total_rows_fetched += len(chunk)
            logger.info(f"Fetched {len(chunk)} rows. Total rows so far: {total_rows_fetched}")
            
            # Increment the offset for the next request
            offset += chunk_size
            
            # If we've reached the maximum number of rows, stop
            if max_rows and total_rows_fetched >= max_rows:
                logger.info(f"Reached or exceeded max rows ({max_rows}). Stopping pagination.")
                break
        else:
            logger.error(f"Error fetching data: HTTP {response.status_code}")
            break
    
    # Combine all chunks into a single DataFrame
    logger.info("Combining all fetched data into a single DataFrame")
    full_dataset = pd.concat(all_data, ignore_index=True)
    
    # If max_rows was specified, trim the dataset
    if max_rows and len(full_dataset) > max_rows:
        logger.info(f"Trimming dataset to {max_rows} rows")
        full_dataset = full_dataset.head(max_rows)
    
    logger.info(f"Final dataset size: {len(full_dataset)} rows")
    
    # Save the dataset as a CSV file
    current_date = datetime.now().strftime("%Y%m%d")
    dirname = '../data/'
    filename = f"Estadistica_places_{current_date}.csv"
    full_dataset.to_csv(dirname + filename, index=False)
    logger.info(f"Dataset saved to {filename}")    
    
    return full_dataset

### Data processing the data

The `data_processing` function loads data either from a CSV file or an API, performs various data transformations, and computes additional metrics. It's designed to handle educational data, particularly focusing on school assignments and offerings.


In [6]:
def data_processing(source='file'):
    """
    Processes educational statistics data from either a CSV file or an API.

    This function loads the data, performs various transformations including
    data type conversions, computes new metrics, and assigns colors to schools
    based on their nature (public or private).

    Args:
        source (str, optional): The source of the data. Either 'file' to load from a CSV,
                                or 'API' to fetch from a specified URL. Defaults to 'file'.

    Returns:
        pd.DataFrame: The processed DataFrame with additional columns and transformations.

    Raises:
        ValueError: If an invalid source is specified.
    """
    if source == 'file':
        # Load the dataset from the specified CSV file
        places = pd.read_csv('../../data/Estadistica_places_20241007.csv')
    elif source == 'API':
        url = "https://analisi.transparenciacatalunya.cat/resource/99md-r3rq.csv"
        places = load_csv_from_api(url, chunk_size=1000)
        logger.info(f"Loaded {len(places)} rows of data")
    else:
        raise ValueError("Invalid source specified. Use 'file' or 'API'.")

    # Create a new column "Any" with the year extracted from the "curs" column
    places['Any'] = places['curs'].str.split('/', expand=True)[0].astype(int)

    # Convert specified columns to string type
    string_columns = ['codi_centre', 'nivell', 'nom_comarca']
    for col in string_columns:
        places[col] = places[col].astype(str)

    # Replace all NaN values in the dataset with '-'
    places = places.replace(np.nan, '-', regex=True)

    # Ensure no NaNs are returned by replacing them with 0 and ensuring values are between 0 and 100
    places["Perc_assignacions_1a"] = np.where(places["places_ofertades_a_la"] == 0, 0,
                                              (places["assignacions_1a_peticio"] / places["places_ofertades_a_la"]) * 100)
    places["Perc_assignacions_Altres"] = np.where(places["places_ofertades_a_la"] == 0, 0,
                                                   (places["assignacions_altres_peticions"] / places["places_ofertades_a_la"]) * 100)

    # Assign colors to schools based on the "nom_naturalesa" column
    places["color"] = np.where(places["nom_naturalesa"] == "Privat", "#1d2f6f", "#db2b39")

    return places

In [7]:
places = data_processing()


In [8]:
places.iloc[10]


curs                                             2023/2024
codi_centre                                        8000049
denominaci_completa                           Escola Fabra
codi_naturalesa                                          1
nom_naturalesa                                      Públic
codi_titularitat                                         1
nom_titularitat                     Departament d'Educació
codi_delegaci                                          508
nom_delegaci                     Maresme - Vallès Oriental
codi_comarca                                            21
nom_comarca                                        Maresme
codi_municipi_5                                       8003
codi_municipi_6                                      80039
nom_municipi                                        Alella
codi_districte_municipal                                 0
nom_dm                                                   -
coordenades_utm_x                                 441402

## Visualization configuration
The following functions configure the visualizations:
1. get_axis_map(): dictionary that configure the dynamic axis
2. get_tooltips(): dictionary with the fields that are showed when hover over the plot
3. create_controls(): creates all the input controls for filtering and configuring the plot
4. create_plot(): generates the plot
   
[To Index](#Visualization-of-open-data-on-education-inscriptions-Catalonia)



In [9]:
def get_axis_map():
    """
    Returns a dictionary mapping axis labels.
    """
    
    return {
        "assignacions": "assignacions",
        "assignacions_1a_peticio": "assignacions_1a_peticio",
        "assignacions_altres_peticions": "assignacions_altres_peticions",
        "Index assignacions: 1a Petició": "Perc_assignacions_1a",
        "Index assignacions_altres_peticions": "Perc_assignacions_Altres",
    }

In [10]:
def get_tooltips():
    """
    Returns the tooltips for data plotted.
    
    """
    return [
        #("Title", "@title"),
        ("Any", "@any"),
        ("Denominació", "@denominacio"),
        #("Comarca", "@comarca"),
        #("Municipi", "@municipi"),
        ("Ensenyament", "@ensenyament" + " - " + "@nivell"),
        ("Places Inicials / Ofertades", "@places_inicials / @places_ofertades"),
        ("Assignacions", "@assignacions"),
        ("Assignacions 1a", "@assignacions_1a" + " (@perc_assignacions_1a%)"),
        ("Assignacions Altres", "@assignacions_altres" + " (@perc_assignacions_altres%)")
    ]



### Interactive control elements for a data visualization interface

The `create_controls` function generates various input controls such as sliders, dropdowns, and text inputs
for filtering and manipulating educational data. These controls are designed to work with Bokeh, a Python 
interactive visualization library.

In [11]:
def create_controls(places):
    """
    Creates and returns the input controls for filtering and manipulating educational data.

    This function generates a set of Bokeh widgets that allow users to interactively filter
    and explore the educational data. It includes controls for year range, educational level,
    geographic location, and visualization axes.

    Args:
        places (pd.DataFrame): The DataFrame containing the educational data.

    Returns:
        tuple: A tuple containing the following Bokeh widget objects:
            - min_year (Slider): Control for selecting the start year.
            - max_year (Slider): Control for selecting the end year.
            - nom_ensenyament (Select): Dropdown for selecting the type of education.
            - nivell (Select): Dropdown for selecting the educational level.
            - comarca (Select): Dropdown for selecting the county.
            - municipi (Select): Dropdown for selecting the municipality.
            - districte (Select): Dropdown for selecting the district.
            - centre (Select): Dropdown for selecting the school center.
            - cast (AutocompleteInput): Autocomplete input for searching school names.
            - x_axis (Select): Dropdown for selecting the X-axis metric.
            - y_axis (Select): Dropdown for selecting the Y-axis metric.
    """
    # Create year range sliders
    min_year = Slider(title="Year released", start=START_YEAR, end=END_YEAR, value=2023, step=1)
    max_year = Slider(title="End Year released", start=START_YEAR, end=END_YEAR, value=2024, step=1)

    # Create dropdowns for educational attributes
    nom_ensenyament = Select(options=sorted(places['nom_ensenyament'].unique()), 
                             value=places['nom_ensenyament'][0], 
                             title='Ensenyament')
    nivell = Select(options=sorted(places['nivell'].unique()), 
                    value=places['nivell'][0], 
                    title='nivell')

    # Create dropdowns for geographic attributes
    comarca = Select(options=sorted(places['nom_comarca'].unique()), 
                     value=None, 
                     title='Comarca')
    municipi = Select(options=sorted(places['nom_municipi'].unique()), 
                      value=None, 
                      title='Municipi')
    districte = Select(options=sorted([x for x in list(places['nom_dm'].unique()) if str(x) != 'nan']), 
                       value=None, 
                       title='Districte')
    centre = Select(title="Nom del centre", 
                    value="", 
                    options=sorted(places['denominaci_completa'].unique()))

    # Create autocomplete input for school names
    cast = AutocompleteInput(title="Nom del centre", 
                             completions=list(places['denominaci_completa'].unique()), 
                             placeholder="Search or select")

    # Create dropdowns for selecting visualization axes
    x_axis = Select(title="X Axis", 
                    options=sorted(get_axis_map().keys()), 
                    value="Index assignacions: 1a Petició")
    y_axis = Select(title="Y Axis", 
                    options=sorted(get_axis_map().keys()), 
                    value="Index assignacions_altres_peticions")

    return min_year, max_year, nom_ensenyament, nivell, comarca, municipi, districte, centre, cast, x_axis, y_axis


### Scatter plot funcionalities

The `create_scatter_plot` function generates a customized Bokeh figure object that represents
a scatter plot with jittered points. This plot is designed to visualize educational data
with interactive tooltips and customizable aesthetics.

In [12]:
def create_scatter_plot(source, tooltips):
    """
    Creates and returns a customized Bokeh Scatter Plot.

    This function generates a Bokeh figure object representing a scatter plot
    with jittered points to avoid overlap. The plot is designed to be responsive
    and includes interactive tooltips for data exploration.

    Args:
        source (ColumnDataSource): The Bokeh ColumnDataSource containing the data to be plotted.
        tooltips (list): A list of tuples defining the content of the tooltips.

    Returns:
        figure: A Bokeh figure object representing the scatter plot.

    Note:
        The function assumes that 'x' and 'y' columns exist in the source data,
        as well as a 'color' column for point colors.
    """
    # Create a new Bokeh figure
    p = figure(height=600, 
               title="", 
               toolbar_location="above", 
               tooltips=tooltips, 
               sizing_mode="stretch_width",
               x_range=(-10, 110),  # Set fixed x-axis range
               y_range=(-10, 110),  # Set fixed y-axis range
               tools=[WheelZoomTool(), BoxZoomTool(), ResetTool(), PanTool()])  # Add tools
  

    # Add scatter plot with jittered points
    p.circle(x={'field': 'x', 'transform': Jitter(width=2.5)},
             y={'field': 'y', 'transform': Jitter(width=2.5)},
             source=source,
             size=10,
             color="color",
             fill_alpha=0.6,
             line_color=None)

    # Customize axis labels and ticks
    #p.xaxis.axis_label = "X Axis (0-100)"
    #p.yaxis.axis_label = "Y Axis (0-100)"
    
    # Set tick marks at intervals of 20
    p.xaxis.ticker = list(range(0, 401, 10))
    p.yaxis.ticker = list(range(0, 401, 10))

    return p

### Bar plot functionalities

The `create_bar_plot` function generates a customized Bokeh figure object that represents
a bar plot showing the number of assignations over years. This plot is designed to visualize
educational assignment data with three categories of assignations.

In [13]:
def create_bar_plot(source):
    """
    Creates and returns a customized Bokeh Bar Plot for visualizing assignations over years.

    This function generates a Bokeh figure object representing a bar plot with three sets
    of vertical bars, each representing a different category of assignation. The plot includes
    a legend with a mute feature for interactivity.

    Args:
        source (ColumnDataSource): The Bokeh ColumnDataSource containing the data to be plotted.
                                   Expected to have columns: 'year', 'year_offset_1', 'year_offset_2',
                                   'assignacions', 'assignacions_1a', and 'assignacions_altres'.

    Returns:
        figure: A Bokeh figure object representing the bar plot.

    Note:
        The function assumes specific column names in the source data for years and different
        types of assignations. The 'year_offset_1' and 'year_offset_2' columns are used to
        position the bars side by side.
    """
    # Create a new Bokeh figure for the bar plot
    bar = figure(height=400,
                 title="Assignations over Years", 
                 x_axis_label='Year', 
                 y_axis_label='Number of Assignations',
                 toolbar_location=None,
                 sizing_mode="stretch_width",
                 x_axis_type="linear")  # Ensure the x-axis uses linear scaling
    
    # Create three sets of vertical bars for each category of assignation
    bar.vbar(x='year_offset_1', 
             top='assignacions', 
             width=0.2, 
             color='#edf6f9', 
             source=source, 
             legend_label='assignacions', 
             muted_alpha=0.2)

    bar.vbar(x='year', 
             top='assignacions_1a', 
             width=0.2, 
             color='#83c5be', 
             source=source, 
             legend_label='assignacions_1a_peticio', 
             muted_alpha=0.2)

    bar.vbar(x='year_offset_2', 
             top='assignacions_altres', 
             width=0.2, 
             color='#006d77', 
             source=source, 
             legend_label='assignacions_altres_peticions', 
             muted_alpha=0.2)
    
    # Set the x-axis formatter to display the years as integers
    bar.xaxis.formatter = NumeralTickFormatter(format="0")  # Ensures no decimals on the x-axis

    # Configure the legend
    bar.legend.location = "top_left"
    bar.legend.click_policy = "mute"
    
    return bar

### Pie chart functionalities

The `create_pie_chart` function generates a customized Bokeh figure object that represents
a pie chart showing the proportion of public and private schools. This plot is designed 
to provide a quick visual comparison between different types of schools in the dataset.

In [14]:
def create_pie_chart(source):
    """
    Creates and returns a customized Bokeh pie chart for visualizing the proportion of public and private schools.

    This function generates a Bokeh figure object representing a pie chart. The chart includes
    interactive hover tooltips and a legend. The chart is designed to be compact and focused,
    with unnecessary elements like axes and grids removed.

    Args:
        source (ColumnDataSource): The Bokeh ColumnDataSource containing the data to be plotted.
                                   Expected to have columns: 'type', 'value', 'angle', and 'color'.

    Returns:
        figure: A Bokeh figure object representing the pie chart.

    Note:
        The function assumes specific column names in the source data for school types,
        their values, angles for the pie segments, and colors. The 'angle' column should
        contain pre-calculated angles for each pie segment.
    """
    # Create a new Bokeh figure for the pie chart
    p = figure(height=400,
               title="Public vs Private Schools", 
               toolbar_location=None,
               tools="hover", 
               tooltips="@type: @value", 
               x_range=(-0.5, 1.0))
    
    # Add a wedge glyph to create the pie chart
    p.wedge(x=0, y=1, radius=0.4,
            start_angle=cumsum('angle', include_zero=True), 
            end_angle=cumsum('angle'),
            line_color="white", 
            fill_color='color', 
            legend_field='type', 
            source=source)
    
    # Configure the plot appearance
    p.axis.axis_label = None
    p.axis.visible = False
    p.grid.grid_line_color = None
    
    return p

## Filtering and updating the data

[To Index](#Visualization-of-open-data-on-education-inscriptions-Catalonia)

### Filtering

Three functions that aim to filter the data showed (filter_places()), update the plot (update_plot), and update the filter options for interdepentment fields like "municipi" and "districte".

1. This module provides functionality to filter educational data based on various criteria.
The `filter_places` function applies multiple filters to a DataFrame containing information
about educational institutions. It allows for filtering based on location (comarca, municipi, 
districte), type of education, educational level, school name, and year range.

2. This module provides functionality to dynamically update dropdown options in a data visualization interface.
The `update_municipi_and_districte_options` function updates the available options for 'Municipi' (municipality)
and 'Districte' (district) dropdowns based on the selected 'Comarca' (county).

3. This module provides functionality to dynamically update school name options in a data visualization interface.
The `update_school_names` function filters the available school names based on the selected geographic areas
(Comarca, Municipi, and Districte) and updates the corresponding input widgets.


In [15]:
def filter_places(places, comarca_val, municipi_val, districte_val, nom_ensenyament_val, nivell_val, cast_val, min_year, max_year):
    """
    Filters the educational data based on the selected input control values.

    This function applies multiple filters to the input DataFrame based on various
    criteria such as location, type of education, educational level, school name,
    and year range. It provides a flexible way to subset the data for further analysis
    or visualization.

    Args:
        places (pd.DataFrame): The original DataFrame containing all educational data.
        comarca_val (str): The selected comarca (county) value.
        municipi_val (str): The selected municipi (municipality) value.
        districte_val (str): The selected districte (district) value.
        nom_ensenyament_val (str): The selected type of education.
        nivell_val (str): The selected educational level.
        cast_val (str): The search string for school names.
        min_year (int): The minimum year for the date range filter.
        max_year (int): The maximum year for the date range filter.

    Returns:
        pd.DataFrame: A filtered DataFrame containing only the rows that match all specified criteria.

    Note:
        The function assumes that the 'places' DataFrame has columns corresponding to all
        the filter criteria (e.g., 'nom_ensenyament', 'nivell', 'Any', etc.).
    """
    # Apply base filters (type of education, level, and year range)
    selected = places[
        (places['nom_ensenyament'] == nom_ensenyament_val) &
        (places['nivell'] == nivell_val) & 
        (places.Any >= min_year) &
        (places.Any <= max_year)
    ]

    # Apply location filters if values are provided
    if comarca_val:
        selected = selected[selected['nom_comarca'] == comarca_val]
    if municipi_val:
        selected = selected[selected['nom_municipi'] == municipi_val]
    if districte_val:
        selected = selected[selected['nom_dm'] == districte_val]

    # Apply school name filter if a search string is provided
    if cast_val:
        selected = selected[selected['denominaci_completa'].str.contains(cast_val, case=False, na=False)]

    return selected

In [16]:
def update_municipi_and_districte_options(places, comarca_val, municipi, districte):
    """
    Updates the available options for 'Municipi' and 'Districte' dropdowns based on the selected 'Comarca'.

    This function filters the available municipalities based on the selected Comarca, and then
    filters the available Districtes based on the selected Municipi. It updates the options
    of the respective Bokeh Select widgets accordingly.

    Args:
        places (pd.DataFrame): The DataFrame containing all the data, including geographic information.
        comarca_val (str): The currently selected comarca value.
        municipi (Select): The Bokeh Select widget for municipi selection.
        districte (Select): The Bokeh Select widget for districte selection.

    Note:
        This function assumes that the 'places' DataFrame has 'nom_comarca', 'nom_municipi', and 'nom_dm'
        columns for county, municipality, and district names respectively.
    """
    # Update municipality options based on selected county
    municipis = places[places['nom_comarca'] == comarca_val]['nom_municipi'].unique()
    municipi.options = list(np.sort(municipis))

    # Update district options based on selected municipality
    districtes = places[places['nom_municipi'] == municipi.value]['nom_dm'].unique()
    districte.options = list(districtes)

In [17]:
def update_school_names(places: pd.DataFrame, comarca_val: str, municipi_val: str, districte_val: str, cast: AutocompleteInput, centre: Select):
    """
    Filters and updates the available school names based on the selected Comarca, Municipi, and Districte.

    This function applies geographic filters to the dataset and updates the completions for the
    AutocompleteInput widget and options for the Select widget with the filtered list of school names.

    Args:
        places (pd.DataFrame): The DataFrame containing all the data, including school and geographic information.
        comarca_val (str): The currently selected comarca (county) value.
        municipi_val (str): The currently selected municipi (municipality) value.
        districte_val (str): The currently selected districte (district) value.
        cast (AutocompleteInput): The Bokeh AutocompleteInput widget for school name search.
        centre (Select): The Bokeh Select widget for school selection.

    Note:
        This function assumes that the 'places' DataFrame has 'nom_comarca', 'nom_municipi', 'nom_dm',
        and 'denominaci_completa' columns for county, municipality, district, and school names respectively.
    """
    # Create a copy of the DataFrame to avoid modifying the original
    filtered_places = places.copy()

    # Apply filters based on selected geographic areas
    if comarca_val:
        filtered_places = filtered_places[filtered_places['nom_comarca'] == comarca_val]
    
    if municipi_val:
        filtered_places = filtered_places[filtered_places['nom_municipi'] == municipi_val]

    if districte_val:
        filtered_places = filtered_places[filtered_places['nom_dm'] == districte_val]

    # Get the filtered list of unique school names
    school_names = list(filtered_places['denominaci_completa'].unique())

    # Update the completions for the AutocompleteInput
    cast.completions = school_names

    # Update the options for the Select widget
    centre.options = school_names

#### Clear filters
This module provides functionality to reset all filter widgets in a data visualization interface.

The `clear_filters` function resets various input controls (sliders, dropdowns, text inputs) to their
default values. This allows users to quickly clear all applied filters and return to the initial state
of the data visualization tool.

In [18]:
def clear_filters(
    min_year: Slider,
    max_year: Slider,
    nom_ensenyament: Select,
    nivell: Select,
    comarca: Select,
    municipi: Select,
    districte: Select,
    cast: AutocompleteInput
):
    """
    Resets all filters to their default values.

    This function takes various Bokeh widget objects as input and resets each to a predetermined
    default value. It's typically used to provide a "clear all filters" functionality in a
    data visualization interface.

    Args:
        min_year (Slider): The slider for the minimum year.
        max_year (Slider): The slider for the maximum year.
        nom_ensenyament (Select): The dropdown for selecting the type of education.
        nivell (Select): The dropdown for selecting the educational level.
        comarca (Select): The dropdown for selecting the comarca (county).
        municipi (Select): The dropdown for selecting the municipi (municipality).
        districte (Select): The dropdown for selecting the districte (district).
        cast (AutocompleteInput): The autocomplete input for school name search.

    Note:
        This function assumes specific default values for some widgets. Adjust these values
        if different defaults are needed for your application.
    """
    # Reset year range sliders
    min_year.value = START_YEAR
    max_year.value = END_YEAR

    # Reset education type and level dropdowns to first option
    nom_ensenyament.value = nom_ensenyament.options[0]
    nivell.value = nivell.options[0]

    # Clear geographic selections
    comarca.value = None
    municipi.value = None
    districte.value = None

    # Clear school name search
    cast.value = ""

### Update scatter plot

The functions, `update_scatter_plot`, `update_bar_plot` and `update_pie_plot`, 
refreshes the plot based on user-selected axes and filtered data. This allows 
for exploration of various metrics related to school assignments, offerings, 
and demographics across different regions and educational levels.

In [19]:
def update_scatter_plot(source, df, x_axis, y_axis, axis_map, plot):
    """
    Updates the plot data and properties based on the filtered data.

    Args:
        source (ColumnDataSource): The data source for the plot.
        df (pandas.DataFrame): The filtered dataframe containing the data to be plotted.
        x_axis (Select): The Bokeh Select widget for choosing the x-axis.
        y_axis (Select): The Bokeh Select widget for choosing the y-axis.
        axis_map (dict): A dictionary mapping axis labels to column names.
        plot (Figure): The Bokeh plot object to be updated.

    Returns:
        None
    """
    x_name = axis_map[x_axis.value]
    y_name = axis_map[y_axis.value]
    plot.xaxis.axis_label = x_axis.value
    plot.yaxis.axis_label = y_axis.value
    plot.title.text = f"{len(df)} items selected"
    source.data = dict(
        x=df[x_name],
        y=df[y_name],
        color=df["color"],
        any=df["Any"],
        places_inicials=df["oferta_inicial_places"],
        places_ofertades=df["places_ofertades_a_la"],
        assignacions=df["assignacions"],
        assignacions_1a=df["assignacions_1a_peticio"],
        assignacions_altres=df["assignacions_altres_peticions"],
        denominacio=df["denominaci_completa"],
        comarca=df["nom_comarca"],
        municipi=df["nom_municipi"],
        districte=df["nom_dm"],
        ensenyament=df["nom_ensenyament"],
        nivell=df["nivell"],
        perc_assignacions_1a=df["Perc_assignacions_1a"],
        perc_assignacions_altres=df["Perc_assignacions_Altres"]
    )

### Update bar plot

In [20]:
def update_bar_plot(bar_source, selected_df, denominacio):
    """
    Updates the bar plot data based on the selected 'denominacio' (school).

    This function filters the data for a specific school, calculates assignment statistics
    by year, and prepares the data for a side-by-side bar plot visualization.

    Args:
        bar_source (ColumnDataSource): The data source for the bar plot.
        selected_df (pandas.DataFrame): The dataframe containing the selected data.
        denominacio (str): The name of the selected school.

    Returns:
        None
    """
    # Filter the dataframe for the selected school
    df_filtered = selected_df[selected_df['denominaci_completa'] == denominacio]
    
    # Group the data by year and calculate the sum of assignments
    grouped = df_filtered.groupby('Any').agg({
        'assignacions': 'sum',
        'assignacions_1a_peticio': 'sum',
        'assignacions_altres_peticions': 'sum'
    }).reset_index()

    # Calculate offset x-values for side-by-side bars
    year_offset_1 = grouped['Any'] - 0.21
    year_offset_2 = grouped['Any'] + 0.21

    # Update the data for the bar plot
    bar_source.data = dict(
        year=grouped['Any'],
        assignacions=grouped['assignacions'],
        assignacions_1a=grouped['assignacions_1a_peticio'],
        assignacions_altres=grouped['assignacions_altres_peticions'],
        year_offset_1=year_offset_1,
        year_offset_2=year_offset_2
    )

### Update pie plot

In [21]:
def update_pie_chart(pie_source, selected_df, districte_val, municipi_val, comarca_val, nom_ensenyament_val, nivell_val):
    """
    Update the pie chart data based on the selected filters.

    This function filters the data based on various geographic and educational criteria,
    then calculates the distribution of public and private educational offerings.

    Args:
        pie_source (ColumnDataSource): The data source for the pie chart.
        selected_df (pandas.DataFrame): The dataframe containing the selected data.
        districte_val (str): The selected district value.
        municipi_val (str): The selected municipality value.
        comarca_val (str): The selected comarca (county) value.
        nom_ensenyament_val (str): The selected type of education.
        nivell_val (str): The selected educational level.

    Returns:
        None
    """
    # Apply filters based on geographic criteria
    if districte_val:
        filtered_df = selected_df[selected_df['nom_dm'] == districte_val]    
    elif municipi_val:
        filtered_df = selected_df[selected_df['nom_municipi'] == municipi_val]
    elif comarca_val:
        filtered_df = selected_df[selected_df['nom_comarca'] == comarca_val]
    else:
        filtered_df = selected_df

    # Apply filters based on educational criteria
    if nom_ensenyament_val:
        filtered_df = filtered_df[filtered_df['nom_ensenyament'] == nom_ensenyament_val]
    if nivell_val:
        filtered_df = filtered_df[filtered_df['nivell'] == nivell_val]

    # Calculate the sum of initial places for public and private institutions
    public_count = filtered_df[filtered_df['nom_naturalesa'] == 'Públic']['oferta_inicial_places'].sum()
    private_count = filtered_df[filtered_df['nom_naturalesa'] == 'Privat']['oferta_inicial_places'].sum()

    # Log the counts for debugging purposes
    logger.info(f"Public {public_count}")
    logger.info(f"Private {private_count}")

    # Calculate total and angles for the pie chart
    total = public_count + private_count
    public_angle = 2 * math.pi * (public_count / total)
    private_angle = 2 * math.pi * (private_count / total)
    
    # Update the ColumnDataSource with new data for the pie chart
    pie_source.data = dict(
        type=['Public', 'Private'],
        value=[public_count, private_count],
        angle=[public_angle, private_angle],
        color=['#db2b39', '#1d2f6f']
    )

# Generation of Bokeh application

This cell contains the main function `bkapp` for building and rendering a Bokeh application.
The application visualizes educational data in Catalonia, including interactive scatter plots,
bar charts, and pie charts. It allows users to filter and explore data based on various
criteria such as region, educational level, and school type.

[To Index](#Visualization-of-open-data-on-education-inscriptions-Catalonia)


In [22]:
def bkapp(doc):
    """
    Main function to build and render the Bokeh application.

    This function sets up the entire Bokeh application, including data loading,
    widget creation, plot generation, and event handling.

    Args:
        doc (Document): The Bokeh document to which the application will be added.

    Returns:
        None
    """
    # Load and process the data
    places = data_processing()
    axis_map = get_axis_map()

    # Create input controls
    min_year, max_year, nom_ensenyament, nivell, comarca, municipi, districte, centre, cast, x_axis, y_axis = create_controls(places)

    # Initialize data sources
    scatter_source = ColumnDataSource(data=dict(x=[], y=[], color=[], title=[], year=[], revenue=[], alpha=[]))
    bar_source = ColumnDataSource(data=dict(year=[], assignacions=[], assignacions_1a=[], assignacions_altres=[], year_offset_1=[], year_offset_2=[]))
    pie_source = ColumnDataSource(data=dict(type=[], value=[], angle=[], color=[]))
    tooltips = get_tooltips()

    # Create the scatter plot
    scatter = create_scatter_plot(scatter_source, tooltips)

    # Create the bar plot
    bar = create_bar_plot(bar_source)

    # Create the pie chart
    pie = create_pie_chart(pie_source)

    def update():
        """Update all plots based on the selected filters."""
        df = filter_places(
            places, comarca.value, municipi.value, districte.value,
            nom_ensenyament.value, nivell.value, cast.value.strip(),
            min_year.value, max_year.value
        )
        update_municipi_and_districte_options(places, comarca.value, municipi, districte)
        update_scatter_plot(scatter_source, df, x_axis, y_axis, axis_map, scatter)
        if cast.value.strip():
            update_bar_plot(bar_source, df, cast.value.strip())
        if centre:
            update_bar_plot(bar_source, df, centre.value)
        logger.info(f"Districte {districte.value}")
        logger.info(f"Municipi {municipi.value}")
        logger.info(f"Comarca {comarca.value}")
        update_pie_chart(pie_source, df, 
                         districte.value, municipi.value, comarca.value, 
                         nom_ensenyament.value, nivell.value)

    def update_school_filter(attr, old, new):
        """Update school names based on the selected comarca, municipi, and districte."""
        update_school_names(places, comarca.value, municipi.value, districte.value, cast, centre)

    # Attach event listeners to input controls
    controls = [comarca, 
                municipi, 
                districte, 
                nom_ensenyament, 
                nivell, 
                min_year, 
                max_year, 
                centre, 
                # cast, 
                # x_axis, 
                # y_axis
               ]
    for control in controls:
        control.on_change('value', lambda attr, old, new: update())

    # Attach listeners to Comarca, Municipi, and Districte to update the school names (cast)
    comarca.on_change('value', update_school_filter)
    municipi.on_change('value', update_school_filter)
    districte.on_change('value', update_school_filter)
    
    # Create Clear Filters Button
    clear_button = Button(label="Clear All Filters", button_type="success")
    
    def clear():
        """Clear all filters and update the plot."""
        clear_filters(min_year, max_year, nom_ensenyament, nivell, comarca, municipi, districte, cast)
        update()  # Update plot to reflect cleared filters

    clear_button.on_click(clear)

    # Arrange layout and add to the document
    inputs = column(*controls, width=300, height=400, sizing_mode="fixed")
    layout = column(row(clear_button), 
                    row(inputs, scatter, sizing_mode="stretch_width"), 
                    row(pie, bar, sizing_mode="stretch_width"),
                    # row(bar,  sizing_mode="stretch_width"), 
                    sizing_mode="stretch_width", height=1000)

    #layout = column(
    #row(clear_button),
    #row(inputs, p, sizing_mode="inherit"),  # First row
    #row(pie),  # Second row with bar and pie plots
    #row(bar),  # Second row with bar and pie plots
    #sizing_mode="stretch_width",
    #height=800)

    # Initial data load
    update()

    # Add the layout to the document
    doc.add_root(layout)
    doc.title = "Escoles"

In [23]:
show(bkapp, notebook_url="localhost:8888") # displaying the layout

# ToDo's
- plot que ensenyi alguna cosa entre publica i privada 
- quan fas click a un punt que filtri per l'escola en questió

[To Index](#Visualization-of-open-data-on-education-inscriptions-Catalonia)
