# Executive Summary
<p>
It's hard to characterize the populations of clinicians most likely to contract COVID-19.  It is possible to get a feel for the breakdown by specialty/occupation by using online lists set up as memorials to healthcare workers who have died of COVID-19.  The data is likely incomplete and biased, but may suggest a direction for better studies.
</p>
<p>
Offhand, I would have expected Emergency Medicine personnel to dominate the list.  In data for several countries, nurses, healthcare assistants and support workers always comprise a large share of the fallen.  Interestingly, though, General Practitioners are the specialty among physicians, not Emergency Medicine.  Admittedly this may be partly to do with how I have coded the occupations, but I don't think this entirely accounts for this effect.  It may be worth looking into ways to protect General Practitioners better.

# Introduction
<p>
To do a proper job on this task, one would need highly detailed data on each clinician infected with COVID-19. But for privacy reasons, that data is highly unlikely to be made available to amateurs like us.  Also, it's not clear that even the experts, like the US CDC, has good data on this (see below).
</p>
<p>
A fallback might be to do a careful search of local newspapers and published obituaries.  This is a big job.  It could be crowdsourced if the idea is found interesting enough.
</p>
<p>
As a fallback-to-the-fallback, one could use several online memorial lists of health workers who have succumbed to COVID-19.  The data collection processes are obscure and probably biased, but the result could at least be a pointer to things to look at with better data, and give some idea whether it's worth proceeding with better data collection.
</p>
<p>
The published lists I've found so far require a lot of hand-work to make them machine readable, but I've done this.  There are also copyright questions (one web site has already refused to allow me to make their list available on Kaggle).
</p>
<p>
The published memorial lists do give occupational data/medical specialty for the deceased, which is what I plan to concentrate on.

# Published Studies on Clinicians Dying of COVID-19
<p>
A recent CDC study:
</p>
<p>
Characteristics of Health Care Personnel with COVID-19 — United States, February 12–April 9, 2020
</p>
<p>
https://www.cdc.gov/mmwr/volumes/69/wr/mm6915e6.htm
</p>
<p>
The authors note that since healthcare-worker status was only avaiable for 16% of reported cases nationwide, their data is likely only only a subset.
</p>
<p>
There is a Chinese study which does break down deaths by medical specialty:
</p>
<p>
Death from Covid-19 of 23 Health Care Workers in China
</p>
<p>
https://www.nejm.org/doi/full/10.1056/NEJMc2005696?rss=searchAndBrowse


# The Memorial Lists
<p>
All of the following lists appear to be updated periodically.  All of them include at least some retired health care workers (HCP), some of whom may have no longer been working as HCP. They all also solicit readers to submit names of other deceased healthcare workers, which should aid completeness, but also potentially introduces biases.
</p>
<p>
The Guardian newspaper in the UK has been tracking healthcare worker deaths there.  Their list includes not just physicians, but many support personnel as well:
</p>
<p>
Doctors, nurses, porters, volunteers: the UK health workers who have died from Covid-19
</p>
<p>
https://www.theguardian.com/world/2020/apr/16/doctors-nurses-porters-volunteers-the-uk-health-workers-who-have-died-from-covid-19
</p>
<p>
The Italian organization Federazione Nazionale degli Ordini dei Medici Chirurghi e degli Odontoiatri (FNOMCeO) (National Federation of Orders of Surgeons and Dentists) has a list which includes only physicians and some dentists, not nurses or support personnel.  They do not list ages at time of death:
</p>
<p>
Elenco dei Medici caduti nel corso dell’epidemia di Covid-19 (List of doctors who fell during the Covid-19 epidemic):
</p>
<p>
 https://portale.fnomceo.it/elenco-dei-medici-caduti-nel-corso-dellepidemia-di-covid-19/
</p>
<p>
The commercial website Medscape has a list covering multiple countries.  The Italy portions overlap largely with the FNOMCeO list.  They give ages and locations, but not dates of death. Unfortunately, when I asked for permission to disseminate the information from their list on Kaggle, they refused.  I will include some charts based on this data, but I can't give out the data itself.
</p>
<p>
In Memoriam: Healthcare Workers Who Have Died of COVID-19
</p>
<p>
https://www.medscape.com/viewarticle/927976


# Caveats
<p>
For each dataset I have attempted to create a regularized version of the occupation/specialty information, so that I can calculate the fraction of the total physicians in a given country in each occupation/specialty.  I'm not an expert on healthcare systems around the work, and I don't speak Italian, so much of this is surmise/guesswork.

# UK Data from the Guardian list
<p>
The following plot shows the raw counts of workers by specialty/occupation.
</p>
<p>
Note that the largest bars are Nurses and Health Care Assistants.  The third largets bar is General Practitioners, which is the largest bar of any medical professional.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#
#  Read list derived from the The Guardian's memorial list of healthcare workers
#  who died of COVID-19
#
guardianList = pd.read_csv('../input/list-of-uk-health-workers-dead-from-covid19/guardian_list.csv', encoding='cp1252')

#
#  Plot raw number of deceased workers
#
fig,axes = plt.subplots(1,1,figsize=(20,10))
plot1 = guardianList['MED_SPEC'].value_counts().plot(kind='bar')
plt.title('Guardian UK raw # Health Workers who died of COVID-19 by specialty', fontsize=24)
plt.ylabel('number of workers', fontsize=20)
plt.xticks(fontsize=20)
plt.tight_layout()


<p>
and the following shows the fraction of total workers in the UK who have died.  I don't have data for all the occupations, so they won't all show up in this plot.
</p>
<p>
Several specialties have high bars compared to the raw totals plot, but are small in absolute numbers.  I'll show a table of fractions with the absolute numbers, below.

In [None]:
#
# Read the medical and nursing specialties lists from the EU
# and combine them into one dataframe
#
medSpecsByCountry = pd.read_csv('../input/eu-physicians-by-medical-specialty/hlth_rs_spec_1_Data_fixed.csv')
medSpecsByCountry  = medSpecsByCountry.rename(columns={'total_number_of_docs' : 'total_number_of_workers'}) # make specialty column names match

nurseSpecsByCountry = pd.read_csv('../input/eu-nursing-and-caring-professionals/hlth_rs_prsns_1_Data_fixed.csv')
nurseSpecsByCountry = nurseSpecsByCountry.query('WSTATUS == "Practising" and UNIT=="Number"') # practising nurses only
nurseSpecsByCountry = nurseSpecsByCountry.drop(['WSTATUS'], axis=1) # column doesn't exist in doctors specialty list
nurseSpecsByCountry  = nurseSpecsByCountry.rename(columns={'total_number_of_nurses' : 'total_number_of_workers'}) # make specialty column names match
nurseSpecsByCountry  = nurseSpecsByCountry.rename(columns={'ISCO08' : 'MED_SPEC'}) # make specialty column names match
medSpecsByCountry = medSpecsByCountry.append(nurseSpecsByCountry)
#
#  Extract totals by occupation for just the UK, for just 2016.
#
specsUK = medSpecsByCountry.query("GEO=='United Kingdom' and TIME==2016") # data is spotty after 2016
#
#  Calculate totals in the Guardian list by specialty/occupation
#
specCounts = guardianList.groupby('MED_SPEC').count().name
specCountsDF = pd.DataFrame(specCounts)
specCountsDF = specCountsDF.rename(columns={'name':'deceased_workers'})
specCountsDF.index.names=['MED_SPEC']
#
# Calculate the fraction of workers in each specialty/occupation who have died
#
fracDF = specsUK.merge(specCountsDF, on='MED_SPEC')
fracDF = fracDF.assign(fraction_deceased = fracDF.deceased_workers/fracDF.total_number_of_workers)

#
# Count the workers in the guardian-derived list by specialty
#
specCounts = guardianList.groupby('MED_SPEC').count().name
specCountsDF = pd.DataFrame(specCounts)
specCountsDF = specCountsDF.rename(columns={'name':'deceased_workers'})

specCountsDF.index.names=['MED_SPEC']
fracDF = specsUK.merge(specCountsDF, on='MED_SPEC')
fracDF = fracDF.assign(fraction_deceased = fracDF.deceased_workers/fracDF.total_number_of_workers)
fracDF.index.names=['MED_SPEC']

plt.figure()
fig,axes = plt.subplots(1,1,figsize=(20,10))
plot2 = plt.bar(fracDF.MED_SPEC, fracDF.fraction_deceased)
plt.xticks(rotation='vertical', fontsize=20)
plt.title('Fraction of UK Health Workers in Guardian list who died of COVID-19, by specialty', fontsize=24)
plt.ylabel('Fraction of workers', fontsize=20)
plt.tight_layout()

<p>
Here's a table of fractions with the absolute numbers.  This only shows specialties where there's total-population data.  Note that some specialties with high fractions have low numbers of deaths, while Nurses and General Practioners have comparatively small bars due to the high total population in those specialties.

In [None]:
UK_summary_table = fracDF[['MED_SPEC','deceased_workers','total_number_of_workers','fraction_deceased']]
UK_summary_table

# Italian Data from the FNOMCeO List
<p>
The following plot shows the raw counts of workers by specialty/occupation.

In [None]:
fnomceoList = pd.read_csv('../input/list-of-doctors-in-italy-dead-from-covid19/fnomceo_memorial_list.csv', encoding='cp1252')

#
#  Plot raw number of deceased workers
#
fig,axes = plt.subplots(1,1,figsize=(20,10))
plot1 = fnomceoList['MED_SPEC'].value_counts().plot(kind='bar')
plt.title('FNOMCeO #physicians who died of COVID-19 by EU specialties', fontsize=24)
plt.ylabel('number of physicians', fontsize=20)
plt.xticks(fontsize=20)
plt.tight_layout()


In [None]:
specsByCountry = pd.read_csv('../input/eu-physicians-by-medical-specialty/hlth_rs_spec_1_Data_fixed.csv')
specsItaly = specsByCountry.query("GEO=='Italy' and TIME==2016") # data is spotty after 2016
specCounts = fnomceoList.groupby('MED_SPEC').count().name
specCountsDF = pd.DataFrame(specCounts)
specCountsDF = specCountsDF.rename(columns={'name':'deceased_docs'})

specCountsDF.index.names=['MED_SPEC']
fracDF = specsItaly.merge(specCountsDF, on='MED_SPEC')
fracDF = fracDF.assign(fraction_deceased = fracDF.deceased_docs/fracDF.total_number_of_docs)

plt.figure()
fig,axes = plt.subplots(1,1,figsize=(20,10))
plot2 = plt.bar(fracDF.MED_SPEC, fracDF.fraction_deceased)
plt.xticks(rotation='vertical')
plt.title('Fraction of Italian physicians in FNOMCeO list who died of COVID-19 by EU specialties', fontsize=24)
plt.ylabel('Fraction of physicians', fontsize=20)
plt.xticks(fontsize=20)
plt.tight_layout()
#plt.xticks(fracDF.fraction_deceased, fracDF.MED_SPEC, rotation='vertical')

In [None]:
italy_summary_table = fracDF[['MED_SPEC','deceased_docs','total_number_of_docs']]
italy_summary_table

# USA Data from the Medscape list
<p>
Since Medscape has refused to allow me to disseminate any part of their article, I can't show the raw data, but here are some plots made from it.  "Kaiser specialties" refers to a list of specialties used in tables of total numbers of HCP by specialty from the Kaiser Family Foundation.

In [None]:
from IPython.display import display, Image
display(Image(filename='../input/usrawnumbyspecpng/USrawNumBySpec.png'))

In [None]:
from IPython.display import display, Image
display(Image(filename='../input/usfracbyspecpng/USFracBySpec.png'))

# Iran Data from the Medscape list
<p>
The country with the second-largest number of entries in the Medscape list is Iran.
<p>
Since I don't have nationwide totals of the number of medical personnel by specialty for Iran, I can only plot the raw counts, not the fraction.**

In [None]:
from IPython.display import display, Image
display(Image(filename='../input/iranrawnumbyspecpng/IranRawNumBySpec.png'))