# NECTA PSLE Dashboard

## 04-dashboard
### Tasks
1. Read in and prepare data
2. Setup back-end: input controls, output figures and tables, callbacks
3. Setup front-end: layout components
4. Main dashboard code: instantiate, define layout, run app

#### Inputs:
- 03-feature-extraction.csv (17900, 44)
- gadm41_TZA_1.json (Tanzania region polygons) from [GADM](https://gadm.org/download_country.html)
- 04-region_links.csv (26, 3)

#### Outputs:
- Public web app at: https://lonnychen.pythonanywhere.com
    - Short link at: https://bit.ly/psle2022mvp

In [None]:
#Data handling
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', 50)
import json

#Plotly Dash, added to Anaconda
from dash import Dash, dcc, html, callback, Input, Output, dash_table
import plotly.express as px
from dash.dash_table.Format import Format, Scheme, Trim
import dash_bootstrap_components as dbc
from dash.exceptions import PreventUpdate

#Custom modules
from dashboard_config import initial_zoom, school_map_zoom, region_map_zoom,  tamisemi_url
from dashboard_config import color_options, color_options_short, category_orders, floats_to_round, hover_data, custom_data
from dashboard_utils import filter_df_or_all, convert_council_name, round_floats
from config import labels_5tile
from data_cleaning_special import assign_grade

### 1. Read in and prepare data

**Steps:**
1. Schools data
    - Read in CSV from data preparation steps
    - Filter Government schools
2. Regions data
    - Read in region polygons GeoJSON downloaded from [GADM - Tanzania ADM1](https://gadm.org/download_country.html)
    - Read in manual regions URL links

**Learnings:** (🧑🏻‍💻📚😎⚠️)
- 😎 (Geo)JSON files are just big Python dictionaries and can be accessed/modified that way!

In [None]:
#1.1 Schools data

#Read in deployable school data
df = pd.read_csv('../data/deployable/03-feature-extraction.csv', index_col='school_id')
df.shape #(17900, 44)

#Separate Gov
dfg = df[df['SCHOOL OWNERSHIP'] == 'Government']
dfg.shape #(16361, 44)
dfg = dfg.reset_index() #need school_id as a column

In [None]:
#1.2 Regions data

#Read in region GeoJSON
with open('../data/raw/geojson/gadm41_TZA_1.json', 'r') as f:
    tza_adm1_geojson = json.load(f)
    
#Region name fix to match NECTA/TAMISEMI
tza_adm1_geojson['features'][1]['properties']['NAME_1'] = 'Dar es Salaam' #was 'DaresSalaam'

#Read in manual regions URL links
df_regionlinks = pd.read_csv('../data/manual/04-region_links.csv')

In [None]:
df_regionlinks.shape

### 2. Setup back-end

**Steps:**

NOTE: schools tab at [0], regions tab at [1]

1. Create control components: data filters, color coding
2. Create dynamic map renderers and tables
3. **Define callback functions**
    1. Data filter interactions
    2. MAPS created dynamically based on user controls
    3. TABLES are updated based on maps' `clickData`

**Learnings:** (🧑🏻‍💻📚😎⚠️)
- 🧑🏻‍💻 Plotly `choropleth_mapbox` really goes hand-in-hand with Pandas `groupby('region_name')` to get the actual statistics of combined DATA for the choropleth regions (else takes last school in each region)
- 🧑🏻‍💻 Python list comprehension helpful to get a list of values from a list of dictionary keys for DataTable properties

In [None]:
######################
#1. Control components
######################

#Filter controls
region_dropdowns = list()
for i in [0,1]:
    region_dropdowns.append(
        dcc.Dropdown(
            options=np.append('ALL REGIONS', np.sort(dfg.region_name.unique())),
            value=['ALL REGIONS'],
            multi=True,
            placeholder='Type or select a region'
        )
    )                       
council_dropdown = dcc.Dropdown(
    value=['ALL COUNCILS'],
    multi=True,
    placeholder='Type or select a council'
)
school_dropdown = dcc.Dropdown(
    value=['ALL SCHOOLS'],
    multi=True,
    placeholder='Type or select a school'
)

#Filter dynamic statuses
region_labels = list()
for i in [0,1]:
    region_labels.append(dbc.Label())
council_label = dbc.Label()
school_label = dbc.Label()

#Color control
color_dropdowns = list()
for i in [0,1]:
    color_dropdowns.append(
        dcc.Dropdown(
            options=color_options,
            value='average_300',
            placeholder='Type or select a color variable',
            style={'width': '65%'}
        )
    )

#DataTable dynamic outputs
DT_title_labels = list()
DT_info_labels = list()
DT_url_links = list()
DT_map_links = list()
for i in [0,1]:
    DT_title_labels.append(dbc.Label(style={'font-weight': 'bold'}))
    DT_info_labels.append(dbc.Label(style={'font-size': '12px'}))
    DT_url_links.append(html.A('PSLE Results', target="_blank"))
    DT_map_links.append(html.A('Geospatial Features', target='_blank'))

In [None]:
####################################
#2. Dynamic map renderers and tables
####################################

#PER-TAB LISTS
map_graphs = list()
click_data_table_sets = list()

for i in [0,1]:
    #dcc.Graph
    map_graphs.append(dcc.Graph(figure={}, responsive=True))

    #dash_table.DataTable
    #Series of DataTables: PSLE results, TAMISEMI data, geospatial features
    click_data_tables = list()
    for i in range(0, 3):
        click_data_tables.append(dash_table.DataTable(
                columns = [{'id': 'Property', 'name': 'Property', 'type': 'any'},
                           {'id': 'Value', 'name': 'Value', 'type': 'numeric', 'format': Format(precision=2, group=',', scheme=Scheme.fixed, trim=Trim.yes)}],
                css=[{'selector': 'tr:first-child',
                      'rule':'''display: None;'''}],
                style_cell={
                    'whiteSpace': 'normal',
                    'height': 'auto',
                    'font_size': '12px'},
                style_cell_conditional=[
                    {'if': {'column_id': 'Property'},
                     'width': '75%', 'textAlign': 'left'},
                    {'if': {'column_id': 'Value'},
                     'width': '25%'},
                ]
            )
        )
    click_data_table_sets.append(click_data_tables)

In [None]:
##################################################
#3a. Callback functions - Data filter interactions
##################################################

#Region selection filters council options
@callback(
    Output(council_dropdown, 'options'),
    Input(region_dropdowns[0], 'value')
)
def set_council_options(sel_regions):
    #Filter regions (or all) to get council list
    if 'ALL REGIONS' in sel_regions:
        sel_regions = np.sort(dfg.region_name.unique())
    councils = np.sort(dfg[dfg['region_name'].isin(sel_regions)].council_name.unique())
    return np.append('ALL COUNCILS', councils)

#Region and council selections filter school options
@callback(
    Output(school_dropdown, 'options'),
    Input(region_dropdowns[0], 'value'),
    Input(council_dropdown, 'value')
)
def set_school_options(sel_regions, sel_councils):
    #Filter DataFrame for selected regions (or all) 
    dfgr = filter_df_or_all(dfg, 'region_name', sel_regions, 'ALL REGIONS')
    
    #Filter councils (or all) to get school list
    if 'ALL COUNCILS' in sel_councils:
        sel_councils = np.sort(dfgr.council_name.unique())
    schools = np.sort(dfgr[dfgr['council_name'].isin(sel_councils)].school_name.unique())
    return np.append('ALL SCHOOLS', schools)

In [None]:
######################################
#3b. Callback functions - Dynamic MAPS
######################################

#Schools MAP and filter status numbers updates - multiple outputs!
@callback(
    Output(map_graphs[0], 'figure'),
    Output(region_labels[0], 'children'),
    Output(council_label, 'children'),
    Output(school_label, 'children'),
    Input(region_dropdowns[0], 'value'),
    Input(council_dropdown, 'value'),
    Input(school_dropdown, 'value'),
    Input(color_dropdowns[0], 'value')
)
def update_school_graph_selected(sel_regions, sel_councils, sel_schools, color_input):
    
    #Avoid blank map when values cleared
    if not (sel_regions and sel_councils and sel_schools):
        raise PreventUpdate
    
    #Successive filtering of DataFrames for selected regions, councils, schools
    dfgr = filter_df_or_all(dfg, 'region_name', sel_regions, 'ALL REGIONS')
    dfgrc = filter_df_or_all(dfgr, 'council_name', sel_councils, 'ALL COUNCILS')
    dfgrcs = filter_df_or_all(dfgrc, 'school_name', sel_schools, 'ALL SCHOOLS')
    
    #Dynamically calculate filtered PSLE average quintile
    df_fig = dfgrcs.copy() #else SettingWithCopyWarning
    if len(df_fig) >= 5:
        df_fig['average_5tile_filtered'] = pd.qcut(df_fig['average_300'], 5, labels=labels_5tile)
    else:
        df_fig['average_5tile_filtered'] = np.nan
    
    #Convert floats HERE vs. hovertemplate (dynamic color column?)
    df_fig['pct_passed'] = df_fig['pct_passed'] * 100
    df_fig = round_floats(df_fig, floats_to_round, 2)
    
    #Calculate number of each filter from actual filtered DataFrames
    num_regions = dfgr['region_name'].nunique()
    num_councils = dfgrc['council_name'].nunique()
    num_schools = len(df_fig) #must use length because of non-unique school names
    
    #Calculate new centre based on filtered DataFrame
    lat_lon_centre = {'lat': df_fig['LATITUDE fix'].mean(), 'lon': df_fig['LONGITUDE fix'].mean()}
    
    #Plot MAP
    fig = px.scatter_mapbox(
                        df_fig,
                        lat='LATITUDE fix',
                        lon='LONGITUDE fix',
                        color=color_input,
                        color_continuous_scale=px.colors.sequential.Viridis_r, #_r = reverse
                        color_discrete_sequence=px.colors.sequential.Jet, #Ordinal
                        labels=color_options_short, #Shortened from color dropdowns
                        category_orders=category_orders,
                        size='num_sitters',
                        #text='school_name', #Marker text issue with OSM base map
                        hover_name='school_name',
                        hover_data=hover_data[0],
                        custom_data=custom_data[0], #for DataTable
                        zoom=initial_zoom,
                        center=lat_lon_centre,
                        mapbox_style='open-street-map',
                        #width = 800,
                        #height = 1000,
    )
    fig.update_layout(
        margin={'r': 0, 't': 10, 'l': 0, 'b': 0}, #top margin
        legend={'x': 0, 'title': None}, #remove title to stabilize positioning on map
        coloraxis={'colorbar': {'x': 0, 'title': None}}, #remove title
        uirevision=True
    )    
    #fig.update_traces() #mode="markers+text", hovertemplate=...
    #fig.show()

    return fig, f'Regions ({num_regions})', f'Councils ({num_councils})', f'Schools ({num_schools})'

In [None]:
#Regions MAP 
@callback(
    Output(map_graphs[1], 'figure'),
    Output(region_labels[1], 'children'),
    Input(region_dropdowns[1], 'value'),
    Input(color_dropdowns[1], 'value')
)
def update_region_graph(sel_regions, color_input):

    #Avoid blank map when value cleared
    if not sel_regions:
        raise PreventUpdate
    
    #TEMP, later: user control of mean or median
    stat_input = 'mean'
        
    #Region-grouped data - pre-filtering
    dfg_regions = dfg.groupby('region_name').agg(
        #Info
        num_councils = pd.NamedAgg(column='council_name', aggfunc='nunique'),
        num_schools = pd.NamedAgg(column='school_name', aggfunc='count'),
        TOTAL_STUDENTS = pd.NamedAgg(column='TOTAL STUDENTS', aggfunc='sum'),
        #Results (y)
        num_sitters = pd.NamedAgg(column='num_sitters', aggfunc='sum'),
        num_passed = pd.NamedAgg(column='num_passed', aggfunc='sum'),
        average_300 = pd.NamedAgg(column='average_300', aggfunc=stat_input),
        #Resources (Xi)
        PTR = pd.NamedAgg(column='PTR', aggfunc=stat_input),
        PBR_std7 = pd.NamedAgg(column='PBR_std7', aggfunc=stat_input),
        BPR_std7 = pd.NamedAgg(column='BPR_std7', aggfunc=stat_input),
        CG_per_student = pd.NamedAgg(column='CG_per_student', aggfunc=stat_input),
        #Demographics/Geography (Xd)
        approx_ages_mean = pd.NamedAgg(column='approx_ages_mean', aggfunc=stat_input),
        pop_3km = pd.NamedAgg(column='pop_3km', aggfunc=stat_input),
        d_closest = pd.NamedAgg(column='d_closest', aggfunc=stat_input),
        d_council_hq = pd.NamedAgg(column='d_council_hq', aggfunc=stat_input),
        lat = pd.NamedAgg(column='LATITUDE fix', aggfunc=stat_input),
        lon = pd.NamedAgg(column='LONGITUDE fix', aggfunc=stat_input)
    ).reset_index()
    dfg_regions['CG_per_student'] = dfg_regions['CG_per_student'].round()
    dfg_regions['pop_3km'] = dfg_regions['pop_3km'].round()
    dfg_regions['grade'] = dfg_regions['average_300'].apply(assign_grade)
    dfg_regions['pct_passed'] = dfg_regions['num_passed'] / dfg_regions['num_sitters'] * 100
    dfg_regions['region_name_full'] = dfg_regions['region_name'] + ' Region'
    #BUG workaround!: ...astype('object')
    dfg_regions['average_5tile'] = pd.qcut(dfg_regions['average_300'], 5, labels=labels_5tile).astype('object')

    #Filtering of regions DataFrame for selected regions
    dfgr_regions = filter_df_or_all(dfg_regions, 'region_name', sel_regions, 'ALL REGIONS').copy() #else SettingWithCopyWarning
    if len(dfgr_regions) >= 5:
        dfgr_regions['average_5tile_filtered'] = pd.qcut(dfgr_regions['average_300'], 5, labels=labels_5tile)
    else:
        dfgr_regions['average_5tile_filtered'] = np.nan
    lat_lon_centre = {'lat': dfgr_regions['lat'].mean(), 'lon': dfgr_regions['lon'].mean()}
    num_regions = dfgr_regions['region_name'].nunique()
    
    #Merge region URL links
    df_fig = dfgr_regions.merge(df_regionlinks, how='left', on='region_name')
    
    #Convert floats HERE vs. hovertemplate (dynamic color column?)
    df_fig = round_floats(df_fig, floats_to_round, 2)
    
    #Plot MAP
    fig = px.choropleth_mapbox(
        df_fig,
        locations='region_name',
        geojson=tza_adm1_geojson,
        featureidkey='properties.NAME_1',
        color=color_input,
        color_continuous_scale=px.colors.sequential.Viridis_r, #_r = reverse
        color_discrete_sequence=px.colors.sequential.Jet, #Ordinal
        labels=color_options_short, #Shortened from color dropdowns
        category_orders=category_orders,
        opacity=0.3,
        hover_name='region_name_full',
        hover_data=hover_data[1],
        custom_data=custom_data[1],
        zoom=initial_zoom,
        center=lat_lon_centre,
        mapbox_style='open-street-map'
    )

    fig.update_layout(
        margin={'r': 0, 't': 10, 'l': 0, 'b': 0}, #top margin
        legend={'x': 0, 'title': None}, #remove title to stabilize positioning on map
        coloraxis={'colorbar': {'x': 0, 'title': None}}, #remove title
        uirevision=True
    )
    
    return fig, f'Regions ({num_regions})'

In [None]:
########################################
#3b. Callback functions - Dynamic TABLES
########################################

#School coordinates @CLICK > school data table and accompanying text and links get clicked-on "custom_data"
@callback(
    Output(DT_title_labels[0], 'children'),
    Output(DT_info_labels[0], 'children'),
    Output(DT_url_links[0], 'href'),
    Output(click_data_table_sets[0][0], 'data'),
    Output(click_data_table_sets[0][1], 'data'),
    Output(DT_map_links[0], 'href'),
    Output(click_data_table_sets[0][2], 'data'),
    Input(map_graphs[0], 'clickData')
)
def update_school_table(clickData):
    if clickData:
        click_customdata = clickData['points'][0]['customdata']
        #print(click_customdata))
        school_title_string = f'{click_customdata[0]} - {click_customdata[1]}'
        council_string = convert_council_name(click_customdata[3])
        school_info_string = [html.Div(f'{click_customdata[2]} Ward, {council_string}, {click_customdata[4]} Region'),
                              html.Div(f'Type: {click_customdata[5]}, Total students: {click_customdata[6]}')]
        school_data_psle = pd.DataFrame({'Property': [color_options_short[x] for x in custom_data[0][8:14]], 'Value': click_customdata[8:14]}).to_dict('records')
        school_data_tamisemi = pd.DataFrame({'Property': [color_options_short[x] for x in custom_data[0][14:19]], 'Value': click_customdata[14:19]}).to_dict('records')
        school_map_url = f'https://www.openstreetmap.org/#map={school_map_zoom}/{click_customdata[19]}/{click_customdata[20]}'
        school_data_geo = pd.DataFrame({'Property': [color_options_short[x] for x in custom_data[0][21:24]], 'Value': click_customdata[21:24]}).to_dict('records')
        return school_title_string, school_info_string, click_customdata[7], school_data_psle, school_data_tamisemi, school_map_url, school_data_geo    
    else:
        #initial "blank" hover, else can not find ['points']
        return [None]*7

#Region polygon @CLICK > region data table and accompanying text and links get clicked-on "custom_data"
@callback(
    Output(DT_title_labels[1], 'children'),
    Output(DT_info_labels[1], 'children'),
    Output(DT_url_links[1], 'href'),
    Output(click_data_table_sets[1][0], 'data'),
    Output(click_data_table_sets[1][1], 'data'),
    Output(DT_map_links[1], 'href'),
    Output(click_data_table_sets[1][2], 'data'),
    Input(map_graphs[1], 'clickData')
)
def update_region_table(clickData):
    if clickData:
        click_customdata = clickData['points'][0]['customdata']
        #print(click_customdata))
        region_title_string = f'{click_customdata[0]}\n'
        region_info_string = [html.Div(f'Councils: {click_customdata[1]}, Schools: {click_customdata[2]}'),
                              html.Div(f'Total students: {click_customdata[3]:,}')]
        region_data_psle = pd.DataFrame({'Property': [color_options_short[x] for x in custom_data[1][5:11]], 'Value': click_customdata[5:11]}).to_dict('records')
        region_data_tamisemi = pd.DataFrame({'Property': [color_options_short[x] for x in custom_data[1][11:16]], 'Value': click_customdata[11:16]}).to_dict('records')
        region_data_geo = pd.DataFrame({'Property': [color_options_short[x] for x in custom_data[1][17:20]], 'Value': click_customdata[17:20]}).to_dict('records')
        return region_title_string, region_info_string, click_customdata[4], region_data_psle, region_data_tamisemi, click_customdata[16], region_data_geo    
    else:
        #initial "blank" hover, else can not find ['points']
        return [None]*7

### 3. Setup front-end

**Steps:**

1. Setup various `dbc` components for layouts:
    - Card
    - Tab(s), Container > Row > Col
    - High-level layout structure

In [None]:
#Define CARDS
school_data_card = dbc.Card(
    dbc.CardBody([
        DT_title_labels[0],
        html.Br(),
        DT_info_labels[0],
        html.Br(),
        DT_url_links[0],
        click_data_table_sets[0][0],
        html.A('Resources Data', href=tamisemi_url, target="_blank"),
        click_data_table_sets[0][1],
        DT_map_links[0],
        click_data_table_sets[0][2]
    ])
)

region_data_card = dbc.Card(
    dbc.CardBody([
        DT_title_labels[1],
        html.Br(),
        DT_info_labels[1],
        html.Br(),
        DT_url_links[1],
        click_data_table_sets[1][0],
        html.A('Resources Data', href=tamisemi_url, target="_blank"),
        click_data_table_sets[1][1],
        DT_map_links[1],
        click_data_table_sets[1][2]
    ])
)

#Define TABS
tab0_content = dbc.Container([
    dbc.Row([
        dbc.Col([
            region_labels[0],
            region_dropdowns[0]],
            width=4),
        dbc.Col([
            council_label,
            council_dropdown],
            width=4),
        dbc.Col([
            school_label,
            school_dropdown],
            width=4),
    ], style={'margin-top': '25px', 'margin-bottom': '25px'}),
    dbc.Row([
        dbc.Col([
            dbc.Label('Color schools by'),
            color_dropdowns[0],
            map_graphs[0]],
            width=8, style={'font-size': '14px'}),
        dbc.Col(school_data_card, width=4)
    ])],
    fluid=True
)

tab1_content = dbc.Container([
    dbc.Row([
        dbc.Col([
            region_labels[1],
            region_dropdowns[1]],
            width=4)
    ], style={'margin-top': '25px', 'margin-bottom': '25px'}),
    dbc.Row([
        dbc.Col([
            dbc.Label('Color regions by'),
            color_dropdowns[1],
            map_graphs[1]],
            width=8, style={'font-size': '14px'}),
        dbc.Col(region_data_card, width=4)
    ])],
    fluid=True
)

#Define high-level STRUCTURE
title = html.H2('NECTA PSLE Dashboard 2022')
tagline = html.H6("Tanzania's Primary School Leaving Examination (PSLE) results linked with resources data and geospatial features")
layout_elements = [
    html.Div([title, tagline]),
    dbc.Tabs([
        dbc.Tab(tab0_content, label='Schools', tab_id="schools_tab"),
        dbc.Tab(tab1_content, label='Regions', tab_id='regions_tab')],
        id='tabs',
        active_tab='schools_tab',
    ),   
]

### 4. Main dashboard code

**Steps:**

1. Instantiate app
2. Define app layout from elements (above)
3. Run app
    - Runs locally in THIS Jupyter Notebook, and terminal with "python app.py"
    - TEMP deployment: [lonnychen.pythonanywhere.com](https://lonnychen.pythonanywhere.com)

In [None]:
#1. Initialize the app (Dash constructor)
#app = Dash(__name__)
app = Dash(external_stylesheets=[dbc.themes.BOOTSTRAP])

#2. App layout
app.layout = html.Div(layout_elements)
    
#3. Run the app
if __name__ == '__main__':
    #app.run_server(debug=True)
    app.run_server(debug=False)