# Calibration

This file will describe sources for the data in this repository, as well as any extra processing information.  

Two zipped datasets are available online here [TODO], `calibration_raw_data` and `RFFSPs_large_datafiles`.  The latter is automatically downloaded with this package upon first use using the `DataDeps` julia package, so you should accept the request to download these data and don't need to do anything further to obtain what is needed to run the component.

To understand any post-processing done from raw files to those in `RFFSPs_large_datafiles`, you may download `calibration_raw_data` to the current `calibration` folder, and run the scripts below, which will do any post-processing we did as a group and put files into their respective locations.

## Emissions

The processes for the emissions data can be found in the [RFF Socioeconomic Projections repository](https://github.com/rffscghg/rff-socioeconomic-projections) and downloaded from [TODO] into `data/calibration`.  These inputs files include;

- CH4_Emissions_Trajectories.csv
- N2O_Emissions_Trajectories.csv
- CO2_Emissions_Trajectories.csv

We postprocess these files as follows to (1) long format (2) the same time dimension available for socioeconomics and save them to `RFFSPs_large_datafiles/emissions` using the script below.

In [11]:
using DataFrames, CSVFiles, Query

outdir = joinpath(@__DIR__, "..", "data", "RFFSPs_large_datafiles", "emissions")
isdir(outdir) || mkpath(outdir)

files = ["CH4_Emissions_Trajectories.csv", "N2O_Emissions_Trajectories.csv", "CO2_Emissions_Trajectories.csv"]
for file in files
    df = load(joinpath(@__DIR__, "calibration_raw_data", file)) |> 
        DataFrame |>
        i -> stack(i, Not(:sample)) |>
        DataFrame |>
        i -> rename!(i, [:sample, :year, :value]) |>
        DataFrame
    
    df.year = parse.(Int64, df.year)

    df |> @filter(_.year in collect(2020:2300)) |> 
        DataFrame |>
        save(joinpath(outdir, file))
end

## Socioeconomic

The processes for the socioeconomics data can be found in the [RFF Socioeconomic Projections repository](https://github.com/rffscghg/rff-socioeconomic-projections) and downloaded from [TODO] and saved in `RFFSPs_large_datafiles/rffsps`.

## Baseline Mortality

In order to run some damage functions, users need baseline mortality data harmonized to a given population trajectory.  Baseline mortality data found in `data/mortality` was derived from the `death_rates.csv` provided to the RFF team on October 7th from Hana Sevcikova and available (privately) here: https://drive.google.com/open?id=1TCYgzRJyt-8wadRcCfxDdpqF84t6oZfj&authuser=hanas%40uw.edu&usp=drive_fs to be downloaded to `calibration/data`. Column `DeathRate` is the annual deaths per 1000 people. `PopAvg` is the denominator (average between two time periods) and `PopStart` is population at the start of the time interval. Values are average deaths per 1000 people. 

There is one mortality death rate trajectory for each population trajectory, and each RFF SP is matched to one of these 1000 trajectories. The key file for this matching is in `data/keys/sampled_pop_trajectory_numbers.csv`.

Post processing of `death_rates.csv` into `death_rates/death_rates_TrajectoryX.csv` is below.

The file _sampled_pop_trajectory_numbers.csv_ maps each of the 10,000 RFF SP scenarios to the baseline mortality scenario (out of 1000) matched to its population draw.

In [9]:
using DataFrames, Query, CSVFiles, Arrow, CategoricalArrays

outdir = joinpath(@__DIR__, "..", "data", "RFFSPs_large_datafiles", "death_rates")
isdir(outdir) || mkpath(outdir)

# load the ISO3 codes we want to use
countries = load(joinpath(@__DIR__, "..", "data", "keys", "MimiRFFSPs_ISO3.csv")) |> DataFrame 

# process death_rates.csv into 1000 trajectories
df = load(joinpath(@__DIR__, "calibration_raw_data", "death_rates.csv")) |> 
    DataFrame |> 
    @filter(_.LocID in countries.NumericCode) |>
    @select(:LocID, :Year, :Trajectory, :DeathRate) |> 
    DataFrame

# get ISO3 codes
idxs = indexin(df.LocID, countries.NumericCode)
insertcols!(df, 1, :ISO3 => countries.ISO3[idxs])
select!(df, Not(:LocID))

for t in unique(df.Trajectory)
    
    filtered_df = df |> 
        @filter(_.Trajectory == t) |> 
        DataFrame |> 
        i -> select!(i, Not(:Trajectory)) |> 
        DataFrame

    # interpolate the years
    trajectory_df = DataFrame(
        :ISO3 => reduce(vcat, [fill(i, 5) for i in filtered_df.ISO3]),
        :Year => reduce(vcat, [collect(2021:2300) for i in 1:length(unique(filtered_df.ISO3))]),
        :DeathRate => reduce(vcat, [fill(i, 5) for i in filtered_df.DeathRate])
    )

    start_rows = deepcopy(trajectory_df |> @filter(_.Year == 2021) |> DataFrame) # get 2021 rows
    start_rows.Year .= 2020 # use 2020
    append!(trajectory_df, start_rows)
    sort!(trajectory_df, [:ISO3, :Year])

    # some compression and types
    trajectory_df.Year = convert.(Int16, trajectory_df.Year)
    trajectory_df.DeathRate = convert.(Float64, trajectory_df.DeathRate)
    trajectory_df.ISO3 = categorical(trajectory_df.ISO3, compress = true)

    trajectory_df |> Arrow.write(joinpath(outdir, "death_rates_Trajectory$t.feather"))
end