# Data Clock with Plotly and Pandas in Python

I found the [data clock](https://pro.arcgis.com/en/pro-app/latest/help/analysis/geoprocessing/charts/data-clock.htm) in ArcGIS Pro quite nice to visualize seasonal patterns in time series. The rings of the chart show the larger, cyclic time unit (e.g. year), while each ring is divided into  smaller units shown as wedges. Of course, I wanted to be able to do the same using open source tools, and it was possible in Python using [Pandas](https://pandas.pydata.org/) and [Plotly](https://plotly.com/python/).

> In QGIS, the chart can be generated right from the processing toolbox with my [QGIS Plugin Data Clock](https://github.com/florianneukirchen/qgis_data_clock) 

## Figure Factory

In [1]:
# Required imports

import calendar
import pandas as pd
import numpy as np
import plotly.graph_objects as go

This [Comment](https://github.com/plotly/plotly.py/issues/2024#issuecomment-569959100) on the Plotly Bug Tracker got me on the right track. We can use `go.Barpolar` to draw the chart, however some data processing is required to translate the date to theta (angle) and r (radius). Using a pandas dataframe makes the processing part much easier.

I define a figure factory function that is easy to be used, with parameter `mode` for the different combinations for rings and wedges.

In [2]:
def dataclock(df, date_column, mode="YM", agg='count', agg_column=None, colorscale=None, title=None, colorbar=False):
    """
    Figure factory for data clock 

    For visualisation of seasonal/cyclic time series data.
    The rings of the chart show the larger, cyclic time unit 
    (e.g. year), while each ring is divided into
    smaller units shown as wedges.

    The data is binned into these wedges and the color is determined
    by the count of data rows or by an aggregation function 
    (e.g. 'sum', 'mean', 'median') on a specified column.

    The following combinations of rings and wedges are implemented:
    Year-Month, Year-Week, Year-Day, Week-Day, Day-Hour.

    :param df: DataFrame
    :param date_column: Name of a datetime column
    :param mode: Codes mapping time units to rings and wedges: 
        "YM" (Year-Month), "YW" (Year-Week), "YD" (Year-Day), "WD" (Week-Day), "DH" (Day-Hour)
    :param agg: Optional aggregate function to be used in combination with parameter agg_column
    :param agg_column: Name of a numerical column for aggregation with parameter agg
    :param colorscale: Optional, name of a plotly color scale, e.g. 'Viridis', 'Magma', 'YlGn'
    :param title: String, optionally set a title.
    :param colorbar: Bool, optionally show a color bar

    :return: Plotly figure object
    """

    df = df.copy()

    if not agg_column:
        agg_column =  date_column

    try:
        df['year'] = df[date_column].dt.year
    except AttributeError:
        raise ValueError("Date column must be of type datetime")

    hovertemplate = agg.title() + ': %{marker.color}<extra></extra>'
    customdata = None
    categoryarray = None

    if mode == "YM":
        df['ring'] = df['year']
        df['wedge'] = df[date_column].dt.strftime('%B') # Name of month
        hovertemplate = '%{r}<br>%{theta}<br>' + hovertemplate
        categoryarray = list(calendar.month_name)[1:] # For proper sorting of categories
        wedgerange = categoryarray # All categories in the wedges, for filling data gaps
    elif mode == "YW":
        df['ring'] = df['year']
        df['wedge'] = df[date_column].dt.isocalendar().week
        # Add the last few days to week 52 
        df.loc[df['wedge'] == 53, 'wedge'] = 52
        wedgerange = range(1, 53)
        categoryarray = [str(x) for x in range(1, 53)]
        hovertemplate = '%{r}<br>Week %{theta}<br>' + hovertemplate
    elif mode == "YD":
        df['ring'] = df['year']
        df['wedge'] = df[date_column].dt.dayofyear
        # Leap year: add last two days
        # TODO: Is there a better solution? 
        df.loc[df['wedge'] == 366, 'wedge'] = 365
        categoryarray = [str(x) for x in range(1, 366)]
        wedgerange = range(1, 366)
        hovertemplate = '%{r}<br>Day %{theta}<br>' + hovertemplate
    elif mode == "WD":
        # Use a number made from year and week as identifier for the ring 
        df['ring'] = (df[date_column].dt.isocalendar().week + df[date_column].dt.year * 100).astype(str)
        df['wedge'] = df[date_column].dt.strftime('%A')
        df['week'] = df[date_column].dt.isocalendar().week
        hovertemplate = '%{customdata|%Y-%m-%d}<br>%{theta}, Week %{customdata|%W}<br>' + hovertemplate
        categoryarray = list(calendar.day_name)
        wedgerange = categoryarray
    elif mode == "DH": 
        df['ring'] = df[date_column].dt.dayofyear + df[date_column].dt.year * 1000
        df['wedge'] = df[date_column].dt.hour
        hovertemplate = '%{customdata|%Y-%m-%d}<br>%{theta}:00 h<br>' + hovertemplate
        wedgerange = range(24)
    else:
        raise ValueError("Invalid mode")
    
    df = df.groupby(['ring', 'wedge']).agg({agg_column: agg}).reset_index()

    # Make sure all combinations are present (even if the value will be nan)
    # otherwise wedges will shown in the wrong ring
    full_index =  pd.MultiIndex.from_product([df['ring'].unique(), wedgerange], names=['ring', 'wedge'])  
    df = df.set_index(['ring', 'wedge']).reindex(full_index).reset_index()

    # Set customdata: reconstructing the date from ring and wedge makes shure
    # that hover info is also shown when data is nan
    if mode == "DH":
        customdata = pd.to_datetime(df['ring'], format='%Y%j')
    elif mode == "WD":
        customdata = pd.to_datetime(df['ring'].astype(str) + '-' + df['wedge'], format='%Y%W-%A') 

    # Passing theta and r as string makes them categorical;  
    # numerical values would be interpreted as angles and radii
    fig = go.Figure(go.Barpolar(
        r=df['ring'].astype(str),
        theta=df['wedge'].astype(str),
        marker_color=df[agg_column],
        marker_colorscale=colorscale,
        hovertemplate=hovertemplate,
        customdata=customdata,
        ))

    fig.update_layout(
        title=title,
        polar_bargap=0,
        polar_radialaxis_visible=False,
        polar_radialaxis_type='linear',
        polar_angularaxis_direction='clockwise',
        )
    
    if colorbar:
        fig.update_traces(
            marker_colorbar_thickness=20,
        )
    if categoryarray:
        fig.update_layout(
            polar_angularaxis_categoryarray=categoryarray,
            polar_angularaxis_categoryorder='array',
            )

    # No ticks for YD mode, as it is too crowded
    if mode == 'YD':
        fig.update_layout(
            polar_angularaxis_visible=False,
        )

    return fig

## Example Data

As example data, I will use: 
- A subset of the air quality data that is used in the [Pandas Time Series Tutorial](https://pandas.pydata.org/pandas-docs/version/1.0/getting_started/intro_tutorials/09_timeseries.html). The original data is from [openaq](https://openaq.org/).
- Selected columns of the data of piracy incidents that is used in the tutorial [Animating Time Series Data (QGIS3)](https://www.qgistutorials.com/en/docs/3/animating_time_series.html), originally provided by the National Geospatial-Intelligence Agency’s [Maritime Safety Information portal](https://msi.nga.mil/NGAPortal/MSI.portal).

In [7]:
air = pd.read_csv('data/air_quality_no2_paris.csv')
air['date.utc'] = pd.to_datetime(air['date.utc'])
air.head()

Unnamed: 0,location,city,country,date.utc,parameter,value,unit
0,FR04014,Paris,FR,2019-06-21 00:00:00+00:00,no2,20.0,µg/m³
1,FR04014,Paris,FR,2019-06-20 23:00:00+00:00,no2,21.8,µg/m³
2,FR04014,Paris,FR,2019-06-20 22:00:00+00:00,no2,26.5,µg/m³
3,FR04014,Paris,FR,2019-06-20 21:00:00+00:00,no2,24.9,µg/m³
4,FR04014,Paris,FR,2019-06-20 20:00:00+00:00,no2,21.4,µg/m³


In [18]:
pirates = pd.read_csv('data/pirates.csv')
pirates['dateofocc'] = pd.to_datetime(pirates['dateofocc'])
pirates.head()

Unnamed: 0,dateofocc,victim_l_D,victim_l,lon,lat
0,2013-03-05,Tanker,9,3.383333,6.45
1,2015-07-31,Fishing Vessel,4,112.0,16.5
2,2015-07-24,Tanker,9,70.216667,23.016667
3,2015-07-23,Tanker,9,70.216667,23.016667
4,2015-08-08,Tanker,9,101.983333,2.05


## Plot Year - Month

In [19]:
fig = dataclock(pirates, 'dateofocc', title="Pirate Attacks per Month, 2001-2017")

fig.update_layout(
    autosize=False,
    width=500,
    height=500,
)

fig.show()

## Plot Year - Week

In [20]:
fig = dataclock(pirates, 'dateofocc', 'YW', colorscale='Magma', title="Pirate Attacks per Week, 2001-2017")

fig.update_layout(
    autosize=False,
    width=500,
    height=500,
)

fig.show()

## Plot Week - Day, aggregating with sum

In [27]:
subset = pirates.set_index('dateofocc').sort_index()['2010-07-01':'2010-12-31'].reset_index()

fig = dataclock(subset, 'dateofocc', 'WD', agg="sum", agg_column="victim_l", colorscale='Viridis', title='Number of Victims of Pirate Attacks, second half of 2010')

fig.update_layout(
    autosize=False,
    width=500,
    height=500,
)

fig.show()

## Plot Day - Hour with Median

In [203]:
# use a subset of a few days
fig = dataclock(air.iloc[51:240], 'date.utc', 'DH', 'median', 'value', colorscale='Plotly3', title='Hourly NO<sub>2</sub> concentration in Paris')

fig.update_layout(
    autosize=False,
    width=500,
    height=500,
)

fig.show()

## Plot Week - Day with Median

In [9]:
fig = dataclock(air, 'date.utc', 'WD', 'median', 'value', colorscale='Magma', title='NO<sub>2</sub> concentration in Paris')

fig.update_layout(
    autosize=False,
    width=500,
    height=500,
)

fig.show()