In [1]:
using CSV, DataFrames, Serialization

### Extract Training Data from CSV
This script extracts relevant inputs from the experimental data reported in Kholodenko et al. For example, it extracts what timepoints to save at for the timecourse simulation, then serializes the files. As another example, it extracts the ligand dose inputs (in nM).

Outputs 8 dictionaries, one per experimental measurement, with the following entries:

000_processed_grb_egfr_20.dict: <br>

Dict{String, Any} with 4 entries: <br>
  "save_at"                => [0, 15, 30, 45, 60, 120] <br>
  "reponse"                => [0.0, 18.06, 15.79, 8.66, 6.44, 4.7] <br>
  "ligand_simulation (nM)" => 20.0 <br>
  "average_error"          => 5 <br>

In [4]:
data_files = readdir("data")
sort_files = data_files .!= ".DS_Store"
data_files = sort(data_files[sort_files]) #sort to ensure consistent order
sort_files = data_files .!= "kholodenko1.xml"
data_files = sort(data_files[sort_files]) #sort to ensure consistent order
data = [DataFrame(CSV.File("data/$(data_files[i])")) for i in 1:length(data_files)]
ligand_stimulation = [20,20,0.2,2,20,2,20,20] #consistent with sorted order of files
my_keys = ["save_at", "response","ligand_simulation (nM)", "average_error"]
average_error = [2.5, 5, 1, 1, 5, 1, 1, 1] ##took average error per species
my_values = [[data[i][!,"x"], data[i][!," y"],ligand_stimulation[i], average_error[i]] for i in 1:length(data_files)]
processed_dictionary = [Dict(my_keys .=> my_values[i]) for i in 1:length(data_files)]
output_names = [replace(data_files[i], ".csv" => "") for i in 1:length(data_files)]
output_names = ["000_processed_" * output_names[i] * ".dict" for i in 1:length(data_files)]
[serialize("outputs/$(output_names[i])", processed_dictionary[i]) for i in 1:length(data_files)];
