# AHRI Health and Demographic Surveillance Data
MIT student visit 18 January 2024

## Get the Data
Download the data from the SAPRIN Data Repository: https://saprindata.samrc.ac.za/index.php/catalog/85
It is public access, but you need to register to download the data

1. ### Set up the Julia packages we will be using:
    - CSV - To read the downloaded data
    - DataFrames - to process the data once read from the CSV file
    - CarioMakie and AlgebraOfGraphics to plot our results 

In [None]:
using CSV
using DataFrames
using CairoMakie
using AlgebraOfGraphics

2. ### Read the downloaded CSV file

In [None]:
df = CSV.read(/Users/kobush/Library/CloudStorage/OneDrive-AHRI/ACDIS/Projects/SAPRIN/Data Extraction/SAPRIN_Data/SAPRIN_YrAge_Episodes.csv, DataFrame)

In [None]:
# List the column names
print(names(df))

### Columns
NodeId  
IndividualId  
**Sex**  
DoB  
DoD  
...  
**CalendarYear**  
**Age**  
StartDate  
EndDate  
Episodes  
Episode  
LocationId  
HouseholdId  
**Resident**  
Enumeration, Born, InMigration, LocationEntry, ExtResStart, Participation, ...,   
Died, OutMigration, LocationExit, ExtResEnd, LostToFollowUp, Refusal, ...,   
**Days**  
**Current**  

In [None]:
# List the first 5 rows
first(df, 5)

In [None]:
# List the last 5 rows
last(df, 5)

In [None]:
# Number of records for each NodeId
rd = combine(groupby(df, :NodeId), nrow => :freq)

### SAPRIN Nodes
1 = Agincourt<br>
2 = DIMAMO<br>
3 = AHRI<br>

#### We will continue working only with AHRI data for residents up to 2022

In [None]:
ahri = subset(df, :NodeId => ByRow(x -> x == 3), :CalendarYear => ByRow(x -> x <= 2022), :Resident=> ByRow(x -> x == 1))

In [None]:
# Number of births per year
births = combine(groupby(ahri, :CalendarYear), :Born => sum => :Births)

In [None]:
# Number of deaths per year
deaths = combine(groupby(ahri, :CalendarYear), :Died => sum => :Deaths)

In [None]:
# Let's plot the number of births and deaths per year
fig = Figure(size = (800, 600))
ax = Axis(fig[1, 1], xlabel = "Year", ylabel = "Number of births/deaths", xticks = 2000:2:2022)
lines!(ax, births.CalendarYear, births.Births, color = :blue, linewidth = 2, linestyle = :dash, label = "Births")
lines!(ax, deaths.CalendarYear, deaths.Deaths, color = :red, linewidth = 2, linestyle = :dash, label = "Deaths")
Legend(fig[1, 2], ax, label = ["Births" "Deaths"], loc = :best)
fig

These graphs look a bit strange - why the sudden increase in births in 2017?

Could it be that the denominator have changes over the years?

In [None]:
# Let's check when new persons were enumerated
enums = combine(groupby(ahri, :CalendarYear), :Enumeration => sum => :Freq)

So we rather need to look at **rates**.<br>
$$Rate = \frac{numerator}{denominator}$$
*numerator* = Number of births/deaths<br>
*denominator* = Person years of exposure



In [None]:
# Calculate birth rate per year
birthrate = combine(groupby(ahri, :CalendarYear), :Born => sum => :Births, :Days => sum => :Days)
transform!(birthrate, :Days => ByRow(x -> x / 365) => :PersonYears)
transform!(birthrate, [:Births,:PersonYears] => ByRow((x,y) -> (x / y)*1000) => :BirthRate)

In [None]:
# Calculate death rate per year
deathrate = combine(groupby(ahri, :CalendarYear), :Died => sum => :Deaths, :Days => sum => :Days)
transform!(deathrate, :Days => ByRow(x -> x / 365) => :PersonYears)
transform!(deathrate, [:Deaths,:PersonYears] => ByRow((x,y) -> (x / y)*1000) => :DeathRate)

In [None]:
# Let's plot the number of births and deaths per year
fig = Figure(size = (800, 600))
ax = Axis(fig[1, 1], xlabel = "Year", ylabel = "Number of births/deaths", xticks = 2000:2:2022)
lines!(ax, birthrate.CalendarYear, birthrate.BirthRate, color = :blue, linewidth = 2, linestyle = :dash, label = "Crude Birth Rate")
lines!(ax, deathrate.CalendarYear, deathrate.DeathRate, color = :red, linewidth = 2, linestyle = :dash, label = "Crude Death Rate")
Legend(fig[1, 2], ax, label = ["Crude Birth Rate" "Crude Death Rate"], loc = :best)
fig

### The Code
You can find that here: https://github.com/kobusherbst/AHRI_MIT.jl.git