# osm-hydro: plot routines #


This notebook provides a number of examples how to plot the outputs of the data quality assessments on OpenStreetMap in osm-hydro. We also provide some functionality to plot spatial data (saved in GeoJSON files) interactively. This helps to inspect the data in detail in a graphical map environment. 

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import os
import pandas as pd
import numpy as np

### Plotting funtions ###
Below, we define all the functions to plot data. Please run these first! The plot routines all use example data from the "sample_data" folder in the github repository.

In [None]:
def plot_dm_dq(ax, df, title, set_xlabels=True, colors=['g', 'y', 'orange', 'r']):
    """
    Plots a staked bar plot from a pandas dataframe.
    Inputs:
        ax: matplotlib axis to plot on
        df: pandas dataframe (indexes are used as legend)
        title: axis title
        set_xlabels=True: whether to plot x-axis labels
        colors=['g', 'y', 'orange', 'r']: list of colors
    
    Returns:
        artists: list of handles to bar objects, which can be used for a legend
        
    """
    width = 0.35
    names = df.keys()
    index = np.arange(len(names))
    bottom = np.zeros(len(index))
    artists = []
    for n, col in zip(df.iterrows(), colors):
        count = np.array(n[1:][0], dtype='int')  # get the numbers of features in an array
        artist = ax.bar(index, count, width, bottom=bottom, color=col, linewidth=0)
        artists.append(artist[0])
        bottom += np.atleast_1d(count)

    ax.set_xticks(np.atleast_1d(index) + width/2.)
    if set_xlabels:
        ax.set_xticklabels(tuple(names), rotation='vertical')
    else:
        ax.set_xticklabels('')
    ax.set(ylabel='', title=title)
    return artists

def dq_data_model_bar_plot(xls_fn):
    """
    Reads in an excel file with data model quality results
    A graph for each sheet is prepared. Each sheet contains results for a geographical area
    """
    def fn_template(fn, suffix='_{:s}', extension='.png'):
        """
        Strips a file name from its extension and provides a template for making a set of files
        """
        return str(os.path.splitext(fn) + suffix + extension).format
        
    xls = pd.ExcelFile(xls_fn)
    nplots = len(xls.sheet_names)
    nrows = int(np.round(np.sqrt(nplots)))
    ncols = int(np.ceil(float(nplots)/nrows))
    fig = plt.figure(figsize=(16,8))
    
    plt.subplots_adjust(bottom=0.2, hspace=0.25)
    for n, name in enumerate(xls.sheet_names):  # [0:1]
        df = xls.parse(name).set_index('validation')
        ax = plt.subplot(nrows, ncols, n + 1)
        if n >= nplots-ncols:
            set_xlabels=True
        else:
            set_xlabels=False
        artists = plot_dm_dq(ax, df, name, set_xlabels)
    fig.legend(artists, df.index, loc='upper center', bbox_to_anchor=(0.4, 0.05),
              fancybox=True, shadow=True, ncol=4)
            
    return fig    
        

## plot the data model check ##
Below we demonstrate how to plot the results of the check on the data model quality. The results are reported in excel files that contains one sheet per geographic area checked. In our example, we have checked for all wards of Ramani Huria in Dar Es Salaam. Hence we make a subplot for the tag completeness of ewach individual ward. We use the functions defined above to prepare these plots from the excel file. Basically, we only need to provide an excel file, and we receive a figure with all the results back.

In [None]:
report_folder = os.path.abspath(os.path.join('..', 'sample_data', 'report_files'))
xls_fn = os.path.join(report_folder, 'data_model_channels_report.xlsx')

# function below does the actual plotting
fig = dq_data_model_bar_plot(xls_fn)

# save figures as nice looking PDF and PNG files                                
fig.savefig(os.path.join(os.path.split(xls_fn)[0], 'data_model_channels.pdf'))
fig.savefig(os.path.join(os.path.split(xls_fn)[0], 'data_model_channels.png'), bbox_inches='tight', dpi=300)


## plot crossings ##
We have a check for completeness of waterway and highway crossings. This check provides per geographical area how many features contains crossing information (versus how many don't) and counts for the amount of crossings of a given type. Both the validity of crossing information and the types are plotted below. We do this by reading the excel file and make a stacked bar plot for correctly mapped versus missing crossing information, and a stacked bar plot for the different types of crossings.

In [None]:
# read the report file
xls_fn = os.path.join(report_folder, 'crossings_report.xlsx')
df = pd.read_excel(xls_fn).set_index('validation')
df

In [None]:
# now we make a plot.
fig = plt.figure(figsize=(10,4))
fig.subplots_adjust(bottom=0.4)
ax = fig.add_subplot(121)
artists_corr = plot_dm_dq(ax, df[0:2], 'Correct', colors=['g', 'r'])
ax = fig.add_subplot(122)
artists_type = plot_dm_dq(ax, df[2:], 'Type', colors=['b', 'orange'])
artists = artists_corr + artists_type
fig.legend(artists, df.index, loc='upper center', bbox_to_anchor=(0.42, 0.1),
           fancybox=True, ncol=4, fontsize=8.)

fig.savefig(os.path.join(os.path.split(xls_fn)[0], 'crossings.pdf'))
fig.savefig(os.path.join(os.path.split(xls_fn)[0], 'crossings.png'), bbox_inches='tight', dpi=300)
