-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model Output File Management Class For gempyor
#257
Comments
Hi Timothy, could you read thru #253 on using xarray as the primary simulation output and see its it's complementary to this issue? |
Hi @twallema! Funny enough reading that issue and working on another issue combined spurred this thought. And when discussing this issue with @jcblemai I even mentioned the possibility of other IO formats (hence the folder driver idea or whatever we end up calling it). This setup would make it easier to just use CSV files for everything when working with sample/testing configs or potentially using netcdf for xarray objects in the future. The first pass focus will be on centralizing the directory structure logic though. |
I have run into what I think is a related problem: the outputs aren't just nested with their corresponding configuration files. So currently can: get config + know infrastructure => find output files (assuming a bunch of defaults). Alternatively, get output file + know infrastructure => find config file. For my particular use, seems possible to infer configuration entries etc from the data, but all of this is a bit painful / fragile long term. Basically, want something like a single entry point object (for users / tools), which then knows how to inflate the concepts of interest independent of the underlying representation - the tools should be able to easily discover which representation is present (csv vs arrow vs database vs ...) and abstract that for the user. |
@pearsonca I do not understand your comment, could you perhaps provide a concrete example of what you're describing? I don't think configuration files fall under this issue, might be better as a separate issue. |
Sure: let's say I want to plot some outputs from a run. I'd like to be able to do a somewhat-useful version of that just given the enclosing folder for that run. Given the known folder structure, perfectly fine to descend and grab the relevant results file(s). But with the file(s) read in, still have to introspect out all the features (e.g. compartments, populations, etc). The alternative would be to grab those from the corresponding configuration file. So: either have to also provide its location OR attempt to find it based on the output folder location (+some other introspection). I think the same problem will arise for a hypothetical ModelOutput object - its probably going to want to know about the configuration associated with the output to properly structure itself. One easy way to solve this might be to write a snapshot of the config file to the output directory? |
Ah, I see. That seems slightly larger in scope then what is described in this issue currently and involves changing the output structure slightly to now add either just a copy of the config or a parsed version of it. I'll defer to @jcblemai but I think changing output structure is challenging for legacy reasons? I suppose adding a new directory should be as bad since it maintains backwards compatibility. |
File IO related to model output is a bit scattered at the moment and difficult to test. There are also underlying assumptions throughout the package on the output directory structure that are challenging to change since it cannot be done in one place, let alone unit test.
A helpful abstraction would be a
ModelOutput
class where each instance would correspond to a single model output folder. The class would have methods for reading/writing files of a particular type (i.e.hosp
,spar
, etc.), the ability to accept arbitraryFolderIODriver
that will handle the reading/writing of files to a folder (i.e.ParquetFolderIODriver
,CsvFolderIODriver
, etc.), and the ability to construct an instance from an existing folder for ease of use in post-processing/analysis.Also will need to document output structure as a part of this (relates to GH-229). There has also already been some prior related discussion in GH-198.
I'll leave it to @jcblemai to comment on priority and fill in other details I have missed here, but I think this covers the main points.
The text was updated successfully, but these errors were encountered: