# NECTA PSLE Dashboard

## 04-dashboard
### Tasks
1. Read in and prepare data
2. Setup back-end: input controls, output figures and tables, callbacks
3. Setup front-end: layout components
4. Main dashboard code: instantiate, define layout, run app

#### Inputs:
- 03-feature-extraction.csv (17900, 44)

#### Outputs:
- Public web app at: https://lonnychen.pythonanywhere.com
    - Short link at: https://bit.ly/psle2022mvp

In [None]:
#Data handling
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', 50)
import json

#Plotly Dash, added to Anaconda
from dash import Dash, dcc, html, callback, Input, Output, dash_table
import plotly.express as px
from dash.dash_table.Format import Format, Scheme, Trim
import dash_bootstrap_components as dbc
from dash.exceptions import PreventUpdate

#Custom modules
from dashboard_config import initial_zoom, school_map_zoom, color_values, color_labels, category_orders, labels, hover_data, custom_data, tamisemi_url
from dashboard_utils import filter_df_or_all, convert_council_name
from config import labels_5tile

### 1. Read in and prepare data

**Steps:**
1. Read in schools CSV from data preparation steps
2. Read in region polygons GeoJSON downloaded from [GADM - Tanzania ADM1](https://gadm.org/download_country.html)

**Learnings:** (🧑🏻‍💻📚😎⚠️)
- 😎 (Geo)JSON files are just big Python dictionaries and can be accessed/modified that way!

In [None]:
#Read in deployable school data
df = pd.read_csv('../data/deployable/03-feature-extraction.csv', index_col='school_id')
df.shape #(17900, 44)

#Separate Gov
dfg = df[df['SCHOOL OWNERSHIP'] == 'Government']
dfg.shape #(16361, 44)
dfg = dfg.reset_index() #need school_id as a column

#Read in region GeoJSON
with open('../data/raw/geojson/gadm41_TZA_1.json', 'r') as f:
    tza_adm1_geojson = json.load(f)
    
#Region name fix to match NECTA/TAMISEMI
tza_adm1_geojson['features'][1]['properties']['NAME_1'] = 'Dar es Salaam' #was 'DaresSalaam'

### 2. Setup back-end

**Steps:**

NOTE: schools tab at [0], regions tab at [1]

1. Create control components: data filters, color coding, mean/median (regions only)
2. Create dynamic map renderers, and tables
3. **Define callback functions**
    - Control component interactions
    - Maps created dynamically based on user controls
    - Tables are updated based on maps' `clickData`

**Learnings:** (🧑🏻‍💻📚😎⚠️)
- 🧑🏻‍💻 Plotly `choropleth_mapbox` really goes hand-in-hand with Pandas `groupby('region_name')` to get the actual statistics of combined DATA for the choropleth regions (else takes last school in each region)

In [None]:
#1. Control components - OLD
#PER-TAB LISTS
color_radio = list()
context_checklist = list()
region_checklist = list()
stat_radio = list()

#SCHOOLS TAB [0]
#TEMP: empty string just to be consistent with indexing
color_radio.append('')
context_checklist.append('')
region_checklist.append('')
stat_radio.append('') 

#REGIONS TAB [1]
color_radio.append(dcc.RadioItems(options=['average_300', 'PTR', 'pop_3km'], value='average_300', inline=True))
context_checklist.append(dcc.Checklist(options=dfg.context.unique(), value=dfg.context.unique(), inline=True))
region_checklist.append(dcc.Checklist(options=np.sort(dfg.region_name.unique()), value=dfg.region_name.unique(), inline=True))
stat_radio.append(dcc.RadioItems(options=['mean', 'median'], value='mean', inline=True))

In [None]:
#1. Control components - NEW
#Filter controls
region_dropdown = dcc.Dropdown(
    options=np.append('ALL REGIONS', np.sort(dfg.region_name.unique())),
    value=['ALL REGIONS'],
    multi=True,
    placeholder='Type or select a region'
)
council_dropdown = dcc.Dropdown(
    value=['ALL COUNCILS'],
    multi=True,
    placeholder='Type or select a council'
)
school_dropdown = dcc.Dropdown(
    value=['ALL SCHOOLS'],
    multi=True,
    placeholder='Type or select a school'
)

#Filter dynamic statuses
region_label = dbc.Label()
council_label = dbc.Label()
school_label = dbc.Label()

#Color control
color_dropdown = dcc.Dropdown(
    #options=color_labels,
    options=color_values,
    value='average_300',
    placeholder='Type or select a color variable'
)

#DataTable dynamic outputs
school_title_label = dbc.Label(style={"font-weight": "bold"})
school_info_label = dbc.Label(style={"font-size": "12px"})
school_url_link = html.A('PSLE Results', target="_blank")
school_map_link = html.A('Geospatial Features', target="_blank")

In [None]:
#2. Maps and tables - OLD
#PER-TAB LISTS
map_graph = list()
click_data_table = list()

for i in [0,1]:
    #Graph
    map_graph.append(dcc.Graph(figure={}))

    #DataTable
    click_data_table.append(dash_table.DataTable(
        columns = [{'id': 'Field', 'name': 'Field', 'type': 'any'},
                   {'id': 'Value', 'name': 'Value', 'type': 'numeric', 'format': Format(precision=2, group=',', scheme=Scheme.fixed, trim=Trim.yes)}],
        style_header = {'display': 'none'},
        #style_table={'height': '480px', 'overflowY': 'auto'},
        style_table={'overflowY': 'auto'},
        style_cell={
            'height': 'auto',
            # all three widths are needed
            'minWidth': '150px', 'width': '150px', 'maxWidth': '150px',
            'whiteSpace': 'normal',
            'overflow': 'scroll',
            #'overflow-wrap': 'anywhere'
            'font_size': '12px'
        })
    )

In [None]:
#2. Maps and tables - NEW
#Series of school DataTables: PSLE results, TAMISEMI data, geospatial features
school_data_tables = list()
for i in range(0, 3):
    school_data_tables.append(dash_table.DataTable(
            columns = [{'id': 'Property', 'name': 'Property', 'type': 'any'},
                       {'id': 'Value', 'name': 'Value', 'type': 'numeric', 'format': Format(precision=2, group=',', scheme=Scheme.fixed, trim=Trim.yes)}],
            #style_header = {'display': 'none'},
            css=[{'selector': 'tr:first-child',
                  'rule':'''display: None;'''}],
            #style_table={'width': '300px', 'overflowY': 'auto'},
            style_cell={
                'height': 'auto',
                # all three widths are needed
                'minWidth': '120px', 'width': '120px', 'maxWidth': '120px',
                'whiteSpace': 'normal',
                #'overflow': 'scroll',
                'overflow': 'hidden',
                #'overflow-wrap': 'anywhere'
                'font_size': '12px'}
        )
    )

In [None]:
#3. Callback functions - NEW

#Control interaction callbacks
@callback(
    Output(council_dropdown, 'options'),
    Input(region_dropdown, 'value')
)
def set_council_options(sel_regions):
    #Filter regions (or all) to get council list
    if 'ALL REGIONS' in sel_regions:
        sel_regions = np.sort(dfg.region_name.unique())
    councils = np.sort(dfg[dfg['region_name'].isin(sel_regions)].council_name.unique())
    return np.append('ALL COUNCILS', councils)
        
@callback(
    Output(school_dropdown, 'options'),
    Input(region_dropdown, 'value'),
    Input(council_dropdown, 'value')
)
def set_school_options(sel_regions, sel_councils):
    #Filter DataFrame for selected regions (or all) 
    dfgr = filter_df_or_all(dfg, 'region_name', sel_regions, 'ALL REGIONS')
    
    #Filter councils (or all) to get school list
    if 'ALL COUNCILS' in sel_councils:
        sel_councils = np.sort(dfgr.council_name.unique())
    schools = np.sort(dfgr[dfgr['council_name'].isin(sel_councils)].school_name.unique())
    return np.append('ALL SCHOOLS', schools)
    
#Map and filter numbers updates - multiple outputs!
@callback(
    Output(map_graph[0], 'figure'),
    Output(region_label, 'children'),
    Output(council_label, 'children'),
    Output(school_label, 'children'),
    Input(region_dropdown, 'value'),
    Input(council_dropdown, 'value'),
    Input(school_dropdown, 'value'),
    Input(color_dropdown, 'value')
)
def update_school_graph_selected(sel_regions, sel_councils, sel_schools, color_input):
    
    #Avoid blank map when no value selected in any input
    if not (sel_regions and sel_councils and sel_schools):
        raise PreventUpdate
    
    #Successive filtering of DataFrames for selected regions, councils, schools
    dfgr = filter_df_or_all(dfg, 'region_name', sel_regions, 'ALL REGIONS')
    dfgrc = filter_df_or_all(dfgr, 'council_name', sel_councils, 'ALL COUNCILS')
    dfgrcs = filter_df_or_all(dfgrc, 'school_name', sel_schools, 'ALL SCHOOLS')
    
    #Dynamically calculate filtered PSLE average quintile
    df_fig = dfgrcs.copy() #else SettingWithCopyWarning
    if len(df_fig) >= 5:
        df_fig['average_5tile_filtered'] = pd.qcut(df_fig['average_300'], 5, labels=labels_5tile)
    else:
        df_fig['average_5tile_filtered'] = np.nan
    
    #Calculate number of each filter from actual filtered DataFrames
    num_regions = dfgr['region_name'].nunique()
    num_councils = dfgrc['council_name'].nunique()
    num_schools = len(df_fig) #must use length because of non-unique school names
    
    #Calculate new centre based on filtered DataFrame
    lat_lon_centre = {'lat': df_fig['LATITUDE fix'].mean(), 'lon': df_fig['LONGITUDE fix'].mean()}
    
    #Plot MAP
    fig = px.scatter_mapbox(
                        df_fig,
                        lat='LATITUDE fix',
                        lon='LONGITUDE fix',
                        color=color_input,
                        color_discrete_sequence=px.colors.sequential.Jet, #Ordinal
                        #labels=color_labels, #Full label names too long
                        category_orders=category_orders,
                        size='num_sitters',
                        #text='school_name', #Marker text issue with OSM base map
                        hover_name='school_name',
                        hover_data=hover_data[0],
                        custom_data=custom_data[0], #for DataTable
                        zoom=initial_zoom,
                        center=lat_lon_centre,
                        mapbox_style='open-street-map',
                        #width = 800,
                        #height = 500,
    )
    fig.update_layout(
        uirevision = True,
        #margin={"r":0,"t":0,"l":0,"b":0}
        margin={"b":0}
    )
    fig.update_traces()#mode="markers+text"
    #fig.show()

    return fig, f'Regions ({num_regions})', f'Councils ({num_councils})', f'Schools ({num_schools})'

# Callbacks Schools 2: School coordinates @CLICK > school data table
@callback(
    Output(school_title_label, 'children'),
    Output(school_info_label, 'children'),
    Output(school_url_link, 'href'),
    Output(school_data_tables[0], 'data'),
    Output(school_data_tables[1], 'data'),
    Output(school_map_link, 'href'),
    Output(school_data_tables[2], 'data'),
    Input(map_graph[0], 'clickData')
)
def update_school_table(clickData):
    if clickData:
        click_customdata = clickData['points'][0]['customdata']
        #print(click_customdata))
        school_title_string = f'{click_customdata[0]} - {click_customdata[1]}'
        council_string = convert_council_name(click_customdata[3])
        school_info_string = [html.Div(f'{click_customdata[2]} Ward, {council_string}, {click_customdata[4]} Region'),
                              html.Div(f'Type: {click_customdata[5]}, Total students: {click_customdata[6]}')]
        school_data_psle = pd.DataFrame({'Property': custom_data[0][8:12], 'Value': click_customdata[8:12]}).to_dict('records')
        school_data_tamisemi = pd.DataFrame({'Property': custom_data[0][12:17], 'Value': click_customdata[12:17]}).to_dict('records')
        school_map_url = f'https://www.openstreetmap.org/#map={school_map_zoom}/{click_customdata[17]}/{click_customdata[18]}'
        school_data_geo = pd.DataFrame({'Property': custom_data[0][19:22], 'Value': click_customdata[19:22]}).to_dict('records')
        return school_title_string, school_info_string, click_customdata[7], school_data_psle, school_data_tamisemi, school_map_url, school_data_geo    
    else:
        #initial "blank" hover, else can not find ['points']
        return [None]*7

In [None]:
#3. Callback functions - OLD
# Callbacks Regions 1: @Color/filter >region polygons
@callback(
    Output(map_graph[1], 'figure'),
    Input(color_radio[1], 'value'),
    Input(stat_radio[1], 'value'),
    Input(context_checklist[1], 'value'),
    Input(region_checklist[1], 'value')
)
def update_region_graph(color_input, stat_input, context_input, region_input):

    #Prepare data
    dfg2 = dfg[(dfg['context'].isin(context_input)) & (dfg['region_name'].isin(region_input))]
    df_fig = dfg2.groupby('region_name').agg(
        #Basic
        schools_n = pd.NamedAgg(column='school_name', aggfunc='count'),
        councils_n = pd.NamedAgg(column='council_name', aggfunc='nunique'),
        students_sum = pd.NamedAgg(column='TOTAL STUDENTS', aggfunc='sum'),
        #Results (y)
        sitters_sum = pd.NamedAgg(column='num_sitters', aggfunc='sum'),
        average_300 = pd.NamedAgg(column='average_300', aggfunc=stat_input),
        pct_passed = pd.NamedAgg(column='pct_passed', aggfunc=stat_input),
        #Resources (Xi)
        PTR = pd.NamedAgg(column='PTR', aggfunc=stat_input),
        PBR_std7 = pd.NamedAgg(column='PBR_std7', aggfunc=stat_input),
        CG_per_student = pd.NamedAgg(column='CG_per_student', aggfunc=stat_input),
        #Demographics/Geography (Xd)
        approx_ages_mean = pd.NamedAgg(column='approx_ages_mean', aggfunc=stat_input),
        pop_3km = pd.NamedAgg(column='pop_3km', aggfunc=stat_input),
        d_closest = pd.NamedAgg(column='d_closest', aggfunc=stat_input),
        d_council_hq = pd.NamedAgg(column='d_council_hq', aggfunc=stat_input)
    ).reset_index()
    
    #Plot MAP
    fig = px.choropleth_mapbox(
        df_fig,
        locations='region_name',
        geojson=tza_adm1_geojson,
        featureidkey='properties.NAME_1',
        color=color_input,
        #color_discrete_sequence=px.colors.sequential.Jet, #Ordinal
        opacity=0.5,
        hover_name='region_name',
        hover_data=hover_data[1],
        custom_data=custom_data[1],
        labels=labels
    )

    #TEMP
    lat_lon_centre = {'lat': dfg['LATITUDE fix'].mean(), 'lon': dfg['LONGITUDE fix'].mean()}
    
    fig.update_layout(
        mapbox = {'style': 'open-street-map', 'center': lat_lon_centre, 'zoom': initial_zoom},
        title = 'Primary School Leaving Examination (PSLE) 2022 Results - Regions',
        #width = 1000, height = 600,
        uirevision = True
        #margin={"r":0,"t":0,"l":0,"b":0}
    )

    fig.update_traces()#mode="markers+text"
    #fig.show()
    return fig

# Callbacks Regions 2: Region polygons @CLICK >region data table
@callback(
    Output(click_data_table[1], 'data'),
    Input(map_graph[1], 'clickData')
)
def update_region_table(clickData):
    if clickData:
        click_customdata = clickData['points'][0]['customdata']
        school_data_DT = pd.DataFrame({'Field': custom_data[1][0:14], 'Value': click_customdata[0:14]}).to_dict('records')
        return school_data_DT
    else:
        #initial "blank" hover, else can not find ['points']
        return None

### 3. Setup front-end

**Steps:**

1. Setup various `dbc` components for layouts:
    - Card
    - Tab(s), Container > Row > Col
    - High-level structure

In [None]:
#Define CARDS
school_map_card = dbc.Card(
    dbc.CardBody([
        map_graph[0]
    ])
)
school_data_card = dbc.Card(
    dbc.CardBody([
        school_title_label,
        school_info_label,
        school_url_link,
        school_data_tables[0],
        html.A('TAMISEMI Data', href=tamisemi_url, target="_blank"),
        school_data_tables[1],
        school_map_link,
        school_data_tables[2],
    ])
)
region_map_card = dbc.Card(
    dbc.CardBody([
        map_graph[1]
    ])
)

#Define TABS
tab0_content = dbc.Container([
    dbc.Row([
        dbc.Col([
            region_label,
            region_dropdown],
            width=4),
        dbc.Col([
            council_label,
            council_dropdown],
            width=4),
        dbc.Col([
            school_label,
            school_dropdown],
            width=4)
    ]),
    dbc.Row([
        dbc.Col([
            dbc.Label('Color variable'),
            color_dropdown],
            width=4)
    ]),
    dbc.Row([
        dbc.Col(school_map_card, width=8),
        dbc.Col(school_data_card, width=4)
    ])],
    fluid=True
)

tab1_content = dbc.Container([
    dbc.Row([
        dbc.Col([
            dbc.Label('Choose color data'),
            color_radio[1]],
            width='auto'),
        dbc.Col([
            dbc.Label('Choose statistic'),
            stat_radio[1]],
            width='auto'),
        dbc.Col([
            dbc.Label('Filter contexts'),
            context_checklist[1]],
            width='auto')
    ]),
    dbc.Row([
        dbc.Col([
            dbc.Label('Filter regions'),
            region_checklist[1]],
            width=True)
    ]),
    dbc.Row([
        dbc.Col(region_map_card, width=7),
        dbc.Col(click_data_table[1], width=5)
    ])],
    fluid=True
)

#Define high-level STRUCTURE
title = html.H2('Tanzania NECTA PSLE Dashboard 2022')
hr = html.Hr()
layout_elements = [
    html.Div([title, hr]),
    dbc.Tabs([
        dbc.Tab(tab0_content, label='Schools', tab_id="schools_tab"),
        dbc.Tab(tab1_content, label='Regions', tab_id='regions_tab')],
        id='tabs',
        active_tab='schools_tab',
    ),   
]

### 4. Main dashboard code

**Steps:**

1. Instantiate app
2. Define app layout from elements (above)
3. Run app
    - Runs locally in THIS Jupyter Notebook, and terminal with "python app.py"
    - TEMP deployment: [lonnychen.pythonanywhere.com](https://lonnychen.pythonanywhere.com)

In [None]:
#1. Initialize the app (Dash constructor)
#app = Dash(__name__)
app = Dash(external_stylesheets=[dbc.themes.BOOTSTRAP])

#2. App layout
app.layout = html.Div(layout_elements)
    
#3. Run the app
if __name__ == '__main__':
    app.run_server(debug=True)