# Update functions in `vivarium_output_loader` for loading raw output and count space data

I separated loading data and merging data from different locations into two steps, to make it easier to work with the data in either format:
*  separate collections of count space dataframes for each country, or
*  a single collection of dataframes for all countries, with a `'location'` column added to each dataframe.

## Original Goal: Reproduce bugs Ali found in July and encapsulate them in `.py` files

I did not get very far with this original goal. I was trying to calculate mean differences in birthweigh between covered and uncovered groups so I could draw a histogram to compare to the effect size distribution, and I ran into a weird pandas error in my `.averted` function that I'm still trying to figure out... Basically the dataframe of indices is sometimes missing the broadcast column in the index, and I don't know why. I moved the problem code from here and put it in a new notebook to keep things cleaner.

In [1]:
import pandas as pd, numpy as np
import vivarium_output_loader as vol
import lsff_output_processing as lop

!whoami
!date

ndbs
Wed Sep 30 16:58:00 PDT 2020


In [2]:
%load_ext autoreload
%autoreload 2

## 1. Define output directories and load count space data

Make `vivarium_output_loader` more modular by redefining `load_count_data_by_location` and `load_output_by_location`, and adding new `merge` functions to recreate the functionality of the old versions.

Woo hoo! I finally figured out how to merge the countspace dataframes using `pd.concat()` with a list comprehension and a dictionary comprehension.

In [3]:
base_directory = '/share/costeffectiveness/results/vivarium_conic_lsff'
locations_rundates = {
    'India': '2020_06_26_20_35_00',
    'Nigeria': '2020_06_26_20_28_27',
    'Ethiopia': '2020_06_28_12_40_56',
}
locations_paths = vol.locaction_paths_from_rundates(base_directory, locations_rundates)

data = vol.load_count_data_by_location(locations_paths)
data.keys()

dict_keys(['India', 'Nigeria', 'Ethiopia'])

In [4]:
# View the table names for one of the countries (they should be the same for all)
data['India'].keys()

dict_keys(['gestational_age', 'transition_count', 'hemoglobin_level', 'deaths', 'state_person_time', 'anemia_state_person_time', 'births_with_ntd', 'population', 'person_time', 'ylls', 'ylds', 'births', 'birth_weight'])

In [5]:
# Check the Ethiopia state_person_time table to see if years are included,
# since Ali said they were missing on 7/7/2020.
# Yes, they are - it looks like the count_data folder was updated 0n 7/8/2020,
# so maybe Kjell reran the count_space transformation to fix this.
data['Ethiopia']['state_person_time'].head()

Unnamed: 0,year,age_group,cause,folic_acid_fortification_group,vitamin_a_fortification_group,measure,input_draw,scenario,value
0,2020,1_to_4,diarrheal_diseases,covered,covered,person_time,21,baseline,0.0
1,2020,1_to_4,diarrheal_diseases,covered,covered,person_time,21,folic_acid_fortification_scale_up,0.0
2,2020,1_to_4,diarrheal_diseases,covered,covered,person_time,21,iron_folic_acid_fortification_scale_up,0.0
3,2020,1_to_4,diarrheal_diseases,covered,covered,person_time,21,vitamin_a_fortification_scale_up,0.0
4,2020,1_to_4,diarrheal_diseases,covered,covered,person_time,29,baseline,0.0


In [6]:
data['Ethiopia']['state_person_time'].year.unique()

array(['2020', '2021', '2022', '2023'], dtype=object)

In [7]:
# Merge the data for all locations
all_data = vol.merge_location_count_data(data)
all_data.keys()

dict_keys(['gestational_age', 'transition_count', 'hemoglobin_level', 'deaths', 'state_person_time', 'anemia_state_person_time', 'births_with_ntd', 'population', 'person_time', 'ylls', 'ylds', 'births', 'birth_weight'])

In [8]:
# The following should look like the table above, but with a 'location' column added
all_data['state_person_time'].query('location=="Ethiopia"').head()

Unnamed: 0,location,year,age_group,cause,folic_acid_fortification_group,vitamin_a_fortification_group,measure,input_draw,scenario,value
0,Ethiopia,2020,1_to_4,diarrheal_diseases,covered,covered,person_time,21,baseline,0.0
1,Ethiopia,2020,1_to_4,diarrheal_diseases,covered,covered,person_time,21,folic_acid_fortification_scale_up,0.0
2,Ethiopia,2020,1_to_4,diarrheal_diseases,covered,covered,person_time,21,iron_folic_acid_fortification_scale_up,0.0
3,Ethiopia,2020,1_to_4,diarrheal_diseases,covered,covered,person_time,21,vitamin_a_fortification_scale_up,0.0
4,Ethiopia,2020,1_to_4,diarrheal_diseases,covered,covered,person_time,29,baseline,0.0


In [9]:
# Test my new functions for loading and merging the raw output tables
all_output = vol.load_and_merge_location_outputs(locations_paths)
all_output.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,total_population_untracked,total_population_tracked,total_population,diarrheal_diseases_prevalent_cases_at_sim_end,susceptible_to_diarrheal_diseases_event_count,diarrheal_diseases_event_count,measles_prevalent_cases_at_sim_end,susceptible_to_measles_event_count,measles_event_count,recovered_from_measles_event_count,...,anemia_severe_person_time_in_2023_among_male_in_age_group_early_neonatal,anemia_severe_person_time_in_2023_among_female_in_age_group_early_neonatal,anemia_severe_person_time_in_2023_among_male_in_age_group_late_neonatal,anemia_severe_person_time_in_2023_among_female_in_age_group_late_neonatal,anemia_severe_person_time_in_2023_among_male_in_age_group_post_neonatal,anemia_severe_person_time_in_2023_among_female_in_age_group_post_neonatal,anemia_severe_person_time_in_2023_among_male_in_age_group_1_to_4,anemia_severe_person_time_in_2023_among_female_in_age_group_1_to_4,fortification_intervention.scenario,location
input_draw_number,random_seed,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
674.0,55.0,8125.0,9492.0,17617.0,120.0,20880.0,20997.0,2158.0,0.0,2166.0,2145.0,...,4.605065,3.753593,1.856263,1.738535,24.44627,18.524298,41.823409,35.583847,iron_folic_acid_fortification_scale_up,India
674.0,35.0,8078.0,9433.0,17511.0,119.0,18736.0,18848.0,1845.0,0.0,1847.0,1840.0,...,4.388775,4.057495,1.848049,1.741273,26.685832,17.223819,54.666667,45.7577,vitamin_a_fortification_scale_up,India
55.0,66.0,8114.0,9366.0,17480.0,122.0,18627.0,18740.0,1901.0,0.0,1906.0,1905.0,...,4.799452,3.786448,2.75154,2.034223,25.650924,10.888433,43.112936,42.633812,iron_folic_acid_fortification_scale_up,India
55.0,145.0,8097.0,9422.0,17519.0,133.0,18952.0,19075.0,1959.0,0.0,1978.0,1958.0,...,4.750171,3.011636,2.934976,1.924709,20.788501,16.424367,60.15332,38.28063,folic_acid_fortification_scale_up,India
524.0,389.0,8108.0,9364.0,17472.0,86.0,16231.0,16326.0,555.0,0.0,559.0,551.0,...,4.098563,4.106776,2.729637,2.447639,24.574949,22.387406,51.22245,46.75154,folic_acid_fortification_scale_up,India


## 2. Check iron effect on birthweight

In [10]:
all_data['births'].head()

Unnamed: 0,location,year,sex,folic_acid_fortification_group,measure,input_draw,scenario,value
0,India,2020,female,covered,live_births,21,baseline,1857.0
1,India,2020,female,covered,live_births,21,baseline,12.0
2,India,2020,female,covered,live_births,21,baseline,530.0
3,India,2020,female,covered,live_births,21,folic_acid_fortification_scale_up,1857.0
4,India,2020,female,covered,live_births,21,folic_acid_fortification_scale_up,12.0


In [11]:
all_data['births'].folic_acid_fortification_group.unique()

array(['covered', 'uncovered', 'unknown'], dtype=object)

In [12]:
all_data['birth_weight'].head()

Unnamed: 0,location,year,sex,measure,input_draw,scenario,value,iron_fortification_group
0,India,2020,female,birth_weight_mean,21,baseline,2912.186341,uncovered
1,India,2020,female,birth_weight_mean,21,baseline,2897.54748,covered
2,India,2020,female,birth_weight_mean,21,folic_acid_fortification_scale_up,2912.186341,uncovered
3,India,2020,female,birth_weight_mean,21,folic_acid_fortification_scale_up,2897.54748,covered
4,India,2020,female,birth_weight_mean,21,iron_folic_acid_fortification_scale_up,2912.186341,uncovered


In [13]:
all_data['birth_weight'].iron_fortification_group.unique()

array(['uncovered', 'covered'], dtype=object)

In [14]:
all_data['birth_weight'].scenario.unique()

array(['baseline', 'folic_acid_fortification_scale_up',
       'iron_folic_acid_fortification_scale_up',
       'vitamin_a_fortification_scale_up'], dtype=object)

In [15]:
all_data['birth_weight'].measure.unique()

array(['birth_weight_mean', 'birth_weight_sd'], dtype=object)

In [22]:
year = '2022'
scenario = 'iron_folic_acid_fortification_scale_up'
location = 'Nigeria'
draw = 21
query = ('measure=="birth_weight_mean" and scenario==@scenario and year==@year'
         ' and input_draw==@draw and location==@location'
        )
all_data['birth_weight'].query(query)

Unnamed: 0,location,year,sex,measure,input_draw,scenario,value,iron_fortification_group
1604,Nigeria,2022,female,birth_weight_mean,21,iron_folic_acid_fortification_scale_up,3193.635062,uncovered
1605,Nigeria,2022,female,birth_weight_mean,21,iron_folic_acid_fortification_scale_up,3201.502874,covered
2004,Nigeria,2022,male,birth_weight_mean,21,iron_folic_acid_fortification_scale_up,3235.496806,uncovered
2005,Nigeria,2022,male,birth_weight_mean,21,iron_folic_acid_fortification_scale_up,3241.154803,covered
