# CSC337: Coursework 1

## Task 1: Design 2
### Box plots to show the age of global power plants by primary fuel type
<b>Visual Design Type:</b> Box plot

<b>Name of Tool:</b> Altair

<b>Country:</b> Global aggregate

<b>Year:</b> N/A

<b>Visual Mappings:</b>
+ Box plots show the distribution of power plant age for each fuel type;
    + the whiskers show <i>relative</i> minimum/maximum ages of the plants

    + the filled box illustrates the IQR of the age
    
    + the black line in the box indicates the median age
    
+ Colour is used to discern renewable from non-renewable energy sources

Additionally, the following points were deliberated:
+ Whiskers do not show true min/max values for the plants, as extremely old plants were skewing the distributions. Instead, the whiskers show plants within an age of 0.5x the IQR for the fuel type

In [5]:
import pandas as pd
import altair as alt
import datetime as dt

Debug options

In [6]:
# font to use for chart labels
__CHART_FONT__ = 'Circular'

# DEBUG: disable maximum row prevention (cripples chart performance)
alt.data_transformers.disable_max_rows()

# DEBUG: set max rows/columns in pandas table previewer
# pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', 1000)

DataTransformerRegistry.enable('default')

Load the GPPD data set and sequentially derive the required information

In [7]:
# load GPPD data set into pandas array
d_plants = pd.read_csv('../data/global_power_plant_database.csv')

# keep only necessary columns
d_boxplot = d_plants[['commissioning_year', 'primary_fuel']]

# remove null data from combined column
d_boxplot.dropna(
    subset=['commissioning_year'],
    inplace=True
)

# remove plants of type storage because they don't generate power
d_boxplot = d_boxplot[d_boxplot['primary_fuel'] != 'Storage']

# calculate age of plants
d_boxplot['age'] = dt.datetime.now().year - d_boxplot['commissioning_year']

# colour the box plots based on whether fuel type is renewable
d_boxplot['Renewable'] = d_boxplot['primary_fuel'].isin(['Biomass', 'Geothermal', 'Hydro', 'Solar', 'Wind'])

# replace renewability True/False with Yes/No for better presentation
d_boxplot.replace(
    {True: 'Yes', False: 'No'},
    inplace=True
)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.


Draw the visualisation

In [8]:
alt.Chart(
    data=d_boxplot,
    height=750,
    width=1000,
    padding=20,
    title=alt.TitleParams(
        text='Age of Power Plants by Fuel Type',
        fontSize=22,
        font=__CHART_FONT__
    )
).mark_boxplot(
    size=50,
    extent=0.5,
    outliers=False,
    median=alt.MarkConfig(
        stroke='black'
    )
).encode(
    x=alt.X(
        'primary_fuel:N',
        axis=alt.Axis(
            title="Fuel type",
            titleFontSize=18,
            titleFont=__CHART_FONT__,
            labelFontSize=14,
            labelFont=__CHART_FONT__,
            labelAngle=45
        ),
        # this sort stopped working for some reason smh my head
        sort=alt.SortField(
            field='Renewable'
        )
    ),
    y=alt.Y(
        'age:Q',
        axis=alt.Axis(
            title='Age (years)',
            titleFontSize=18,
            titleFont=__CHART_FONT__,
            labelFontSize=14,
            labelFont=__CHART_FONT__,
            tickCount=20
        )
    ),
    color=alt.Color(
        # colour boxes red/blue based on renewability
        'Renewable:N',
        scale=alt.Scale(
            range=['#FF5722', '#00BCD4']
        ),
        legend=alt.Legend(
            titleFontSize=18,
            titleFont=__CHART_FONT__,
            labelFontSize=16,
            labelFont=__CHART_FONT__
        )
    )
)