# Introduction

This is an attempt to recreate Florence Nightingale's famous "coxcomb" plot of British deaths during the Crimean War.

> DIAGRAMS are of great utility for illustrating certain questions of vital statistics by conveying ideas on the subject through the eye, which cannot be so readily grasped when contained in figures. This aid has therefore been called in to give greater clearness to the numerical results in the body Report and in the Appendix.<br>&mdash; From the Introduction

Quote I found somewhere:

> Full and minute statistical details are to the lawgiver, as the chart, the compass, and the lead to the navigator. <br>&mdash; Lord Brougham

> All the pleasures prove<br>That facts and figures can supply<br>Unto the Statist's ravished eye.<br>&mdash; *Punch*

# The Viz

<img src="the-viz.png" width="800">

Nightingale, Florence. 1858. *Notes on Matters Affecting the Health, Efficiency, and Hospital Administration of the British Army: Founded Chiefly on the Experience of the Late War*. London: printed by Harrison. http://archive.org/details/b20387118. Page 310&ndash;11.

<hr>

<div style="font-family:serif; font-size:16pt; width:550px; font-style: italic;padding:2rem; background:#f4e6cf;">
<p style="padding-left: 22px; text-indent: -22px;">The Areas of the blue, red, & black wedges are each measured from the centre as the common vertex.</p>
<p style="padding-left: 22px; text-indent: -22px;">The blue wedges measured from the centre of the circle represent area for area the deaths from Preventable or Mitigable Zymotic diseases, the red wedges measured from the centre the deaths from wounds, & the black wedges measured from the centre the deaths from all other causes.</p>
<p style="padding-left: 22px; text-indent: -22px;">The black line across the red triangle in Nov. 1854 marks the boundary of the deaths from all other causes during the month.</p>
<p style="padding-left: 22px; text-indent: -22px;"> In October 1854, & April 1855, the black area coincides with the red, in January & February 1856, the blue coincides with the black.</p>
<p style="padding-left: 22px; text-indent: -22px;"> The entire areas may be compared by following the blue, the red, & the black lines enclosing them.</p>
</div>

# The Data

## Source 1

<img src="the-data.png" width="800">

*Mortality of the British Army: At Home and Abroad, and during the Russian War, as Compared with the Mortality of the Civil Population in England; Illustrated by Tables and Diagrams*. 1858. London. http://hdl.handle.net/2027/njp.32101075698199. 

## Source 2

<img src="the-data2.png" width="800">

> Nightingale, Florence. 1858. *Notes on Matters Affecting the Health, Efficiency, and Hospital Administration of the British Army: Founded Chiefly on the Experience of the Late War*. London: printed by Harrison. http://archive.org/details/b20387118. Page 315.


# Set Up 

In [1]:
import pandas as pd
import plotly_express as px
import plotly.graph_objects as go
import io
import re
import math

## Enter the data

In [2]:
data_str = """
YEAR, MONTH, STRENGTH, DISEASE, INJURY, OTHER, DISEASE_RATE, INJURY_RATE, OTHER_RATE
1854, 4, 8571, 1, 0, 5, 1.4, 0.0, 7.0
1854, 5, 23333, 12, 0,9, 6.2, 0.0, 4.6
1854, 6, 28333, 11, 0, 6, 4.7, 0.0, 2.5
1854, 7, 28722, 359, 0, 23, 150.0, 0.0, 9.6
1854, 8, 30246, 828, 1, 30, 328.5, 0.4, 11.9
1854, 9, 30290, 788, 81, 70, 312.2, 32.1, 27.7 
1854, 10, 30643, 503, 132, 128, 197.0, 51.7, 50.1 
1854, 11, 29736, 844, 287, 106, 340.6, 115.8, 42.8
1854, 12, 32779, 1725, 114, 131, 631.5, 41.7, 48.0
1855, 1, 32393, 2761, 83, 324, 1022.8, 30.7, 120.0
1855, 2, 30919, 2120, 42, 361, 822.8, 16.3, 140.1
1855, 3, 30107, 1205, 32, 172, 480.3, 12.8, 68.6
"""
#1855, 4, 32252, 477, 48, 57
#1855, 5, 35473, 508, 49, 37
#1855, 6, 38863, 802, 209, 31
#1855, 7, 42647, 382, 134, 33
#1855, 8, 44614, 483, 164, 25
#1855, 9, 47751, 189, 276, 20
#1855, 10, 46852, 128, 53, 18
#1855, 11, 37853, 178, 33, 32
#1855, 12, 43217, 91, 18, 28
#1856, 1, 44212, 42, 2, 48
#1856, 2, 43485, 24, 0, 19
#1856, 3, 46140, 15, 0, 35

## Import the data

In [3]:
df = pd.read_csv(io. StringIO(data_str), skipinitialspace=True).set_index(['YEAR','MONTH'])

In [4]:
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,STRENGTH,DISEASE,INJURY,OTHER,DISEASE_RATE,INJURY_RATE,OTHER_RATE
YEAR,MONTH,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1854,4,8571,1,0,5,1.4,0.0,7.0
1854,5,23333,12,0,9,6.2,0.0,4.6
1854,6,28333,11,0,6,4.7,0.0,2.5
1854,7,28722,359,0,23,150.0,0.0,9.6
1854,8,30246,828,1,30,328.5,0.4,11.9


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 12 entries, (1854, 4) to (1855, 3)
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   STRENGTH      12 non-null     int64  
 1   DISEASE       12 non-null     int64  
 2   INJURY        12 non-null     int64  
 3   OTHER         12 non-null     int64  
 4   DISEASE_RATE  12 non-null     float64
 5   INJURY_RATE   12 non-null     float64
 6   OTHER_RATE    12 non-null     float64
dtypes: float64(3), int64(4)
memory usage: 915.0 bytes


## Create date labels

In [6]:
DATES = ["{1}/{0}".format(str(x[1]).zfill(2), x[0]) for x in df.index.to_list()]

In [7]:
DATES

['1854/04',
 '1854/05',
 '1854/06',
 '1854/07',
 '1854/08',
 '1854/09',
 '1854/10',
 '1854/11',
 '1854/12',
 '1855/01',
 '1855/02',
 '1855/03']

# Compute Areas

## Define the formula 
<div style="float:left;margin-top:1rem;"><img src="circle.png" width="250"></div> 

$r = \sqrt{\dfrac{A}{(\theta/2)}}$

$B = \theta/2$

$r = \sqrt{\dfrac{A}{B}}$

## Define function

In [130]:
def get_radius(A, slices=12, factor=1):
    B = (360 / slices) / 2
    r = (math.sqrt(A/B)) * factor
    return r

## Create analytic columns

In [131]:
COLS = ['INJURY','OTHER','DISEASE']
R_COLS = [col+'_RATE' for col in COLS]
I_COLS = [col[0] for col in COLS]
Ir_COLS = [col[0]+'r' for col in COLS]

In [128]:
for i, col in enumerate(COLS):
    df[I_COLS[i]] = df[col].apply(get_radius)
    df[Ir_COLS[i]] = df[R_COLS[i]].apply(get_radius)

In [132]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,STRENGTH,DISEASE,INJURY,OTHER,DISEASE_RATE,INJURY_RATE,OTHER_RATE,I,Ir,O,Or,D,Dr
YEAR,MONTH,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1854,4,8571,1,0,5,1.4,0.0,7.0,0.0,0.0,0.57735,0.68313,0.258199,0.305505
1854,5,23333,12,0,9,6.2,0.0,4.6,0.0,0.0,0.774597,0.553775,0.894427,0.64291
1854,6,28333,11,0,6,4.7,0.0,2.5,0.0,0.0,0.632456,0.408248,0.856349,0.559762
1854,7,28722,359,0,23,150.0,0.0,9.6,0.0,0.0,1.238278,0.8,4.892171,3.162278
1854,8,30246,828,1,30,328.5,0.4,11.9,0.258199,0.163299,1.414214,0.890693,7.42967,4.679744
1854,9,30290,788,81,70,312.2,32.1,27.7,2.32379,1.462874,2.160247,1.358921,7.247988,4.562163
1854,10,30643,503,132,128,197.0,51.7,50.1,2.966479,1.85652,2.921187,1.827567,5.790797,3.623994
1854,11,29736,844,287,106,340.6,115.8,42.8,4.374167,2.778489,2.65832,1.689181,7.501111,4.765151
1854,12,32779,1725,114,131,631.5,41.7,48.0,2.75681,1.667333,2.955221,1.788854,10.723805,6.488451
1855,1,32393,2761,83,324,1022.8,30.7,120.0,2.352304,1.430618,4.64758,2.828427,13.567117,8.257522


# Visualize

## Define viz functions

In [12]:
def heatmap(mycols):
    global df
    matrix = df.reset_index()[mycols]
    matrix.index = DATES
    fig = go.Figure(data=go.Heatmap(
            z=matrix,
            x=matrix.columns,
            y=matrix.index,
            colorscale='Reds',
            reversescale=False))
    fig.update_layout(
        title='GitHub commits per day',
        xaxis_nticks=36)
    fig.show()

In [14]:
def bar(mycols):
    global df
    mydf = df[mycols].reset_index().iloc[:,2:]
    mydf.index = DATES
    fig = px.bar(mydf)
    fig.show()

In [164]:
def get_thin(mydf, as_sum=False):
    mydf.index = DATES
    mydf = mydf.T.unstack().to_frame().reset_index()\
        .rename(columns={'level_0':'DATE', 'level_1':'CAUSE', 0:'COUNT'})
    mydf = mydf.set_index('DATE')
    if as_sum:
        mydf = mydf.groupby(['CAUSE']).sum()
    return mydf

In [16]:
def pie(mycols):
    global df
    mydf = df[mycols]
    thin = get_thin(mydf, 1).reset_index()
    fig = px.pie(thin, names='CAUSE', values='COUNT')
    fig.show()

In [159]:
def polar(mycols):
    global df
    mydf = df[mycols]
    thin = get_thin(mydf).reset_index()
    fig = px.bar_polar(thin,
             r='COUNT',
             labels='CAUSE', 
             color='CAUSE', 
             theta='DATE',
             barmode='overlay',
             hover_data=['DATE'],
             color_discrete_sequence=['red','gray','lightblue'],
             start_angle=162.5,
#              title="Causes of Mortality in the Army in the East"
            )
    fig.show()

In [158]:
def polar2(mycols):
    global df
    colors = ['red','gray','lightblue']
    mydf = df[mycols]
    mydf = mydf.sort_index(ascending=False).reset_index()
    fig = go.Figure()
    for i, col in enumerate(mycols):
        fig.add_trace(
            go.Barpolar(
                r=mydf[col].to_list(),
                name=col,
                marker_color=colors[i]))
    fig.update_traces(opacity=.5)
    fig.update_traces(base='stacked')
    fig.update_traces(theta=list(reversed(DATES)))
    fig.update_layout(
#         title='Causes of Mortality in the Army in the East',
        font_size=16,
        legend_font_size=16,
        polar_angularaxis_rotation=193
    )
    fig.show()

## Exhibits

### Pie

In [165]:
pie(R_COLS)

### Bar

In [135]:
bar(R_COLS)

### Heatmap

In [136]:
heatmap(R_COLS)

### Polar

In [145]:
polar(R_COLS)

In [151]:
polar(I_COLS)

In [146]:
polar(Ir_COLS)

### Polar 2

In [160]:
polar2(R_COLS)

In [161]:
polar2(I_COLS)

### The Correct Version

In [162]:
polar2(Ir_COLS)

<img src="the-viz.png" width="800">

# Questions

* Why did she choose April 1854 as her starting point (instead of Jan 1855)? Did she select a date that would dramatically show a contrast? Her start state has a disease value of 1 -- this is telling. This also explains why she chose a circular diagram.
* Why not simply show a pie chart of aggregate statistics?
* Why not use a bar chart?
* Why are there geographic names on the diagram? (Crimea and Bulgaria are at 0&deg; and 90&deg; respectively.)
* Did she correct the diagram?

In [None]:
x = 4264

In [None]:
import math

In [None]:
math.sqrt(x)