Abstractify methods for finding data files saved on disk #32

spencerahill · 2015-11-06T16:08:45Z

(Below is copied from @spencerkclark 's comment on #31 )

While currently within Calc the two methods accomplish the same tasks, in an abstract sense the current ..._one_dir and ..._gfdl read-in methods are actually quite different:

..._one_dir requires the user to map every variable to a file name explicitly. If the variable does not appear in this map, aospy will not even attempt to look for it.
..._gfdl is an implicit system. The mapping is coded into the method which looks for the files within the post-processing file structure. The user is not required a priori to specify which variables are in which files, and thus aospy is allowed to look for variables that may not exist.

I would argue that the explicit read-in method is the most general way of doing things. With enough information, one could automatically generate an explicit file map from an implicit generator. In addition, there is nothing that says you couldn't relax the current single directory constraint and just map each variable (within a particular time frequency) to a full file path.

To continue to support implicit read-in methods (for very structured output data, like ..._gfdl) you could require that a user create some object that implements an interface (call the interface FileMapGenerator?) to include methods to generate a map to files for a particular variable when given the intvl_in, variable name, data_in_dur etc. as arguments. The source code for these objects could be stored in a user's aospy_user directory.

Within a Run object one could then have a single argument for the file read-in method. The user could pass either the explicit dictionary mapping or they could pass an object that implements the FileMapGenerator interface. Within Calc, when reading in the files, you could have some simple logic that would be along the lines of: "if an explicit map is provided use the map; if not, use information about which variable you are looking for, and the interval in etc. and pass those as arguments to the generator, which would return an explicit map for just that variable." Using an interface would ensure that the explicit file map generated would always have the same structure (so that it could be used seamlessly within Calc).

spencerahill · 2015-11-06T16:18:42Z

I really like this. I agree, the implicit mapping should be specified by the user, since it no doubt varies so much among people (even within the lab; cf. #28 ).

Going one step further, there could be a FileMap class, which for starters could just be a wrapper around the dict built-in (or maybe ordereddict). So, Run (or Model or Proj; should be able to specify at any level), just calls e.g. FileMap(read_in_method), and FileMap.__init__ supports read_in_method being either a dict or a FileMapGenerator. In the latter case, FileMapGenerator then just executes whatever method(s) it needs to build the FileMap. Or maybe it should be FileMapGenerator that accepts the dict, so that FileMap only needs to handle the FileMapGenerator interface and nothing else?

That way Calc doesn't need any logic at all -- it will always receive a FileMap object that explicitly lists where the files it needs are.

spencerkclark · 2015-11-06T16:58:40Z

That's even better -- something along these lines would remove a lot of distracting code (~100 lines) from calc.py.

spencerahill · 2016-10-12T22:47:18Z

From @spencerkclark in #90:

Tracing all the way back to a main-like script, what is the minimum set of parameters needed to identify a given file set for any DataLoader? How should we specify those parameters when submitting a computation? In some ways this traces back to your comment on _generate_file_set above.

This is a key outstanding design question regarding DataLoader, which was introduced in #90 to address this Issue.

spencerahill · 2017-01-18T21:15:47Z

Closed by #90

spencerahill added the enhancement label Nov 6, 2015

spencerahill mentioned this issue Nov 6, 2015

Support user-specified map/dict of directory structure. #28

Closed

spencerahill mentioned this issue Nov 6, 2015

Refactor: break Calc class up into smaller classes #33

Open

spencerkclark mentioned this issue Jun 9, 2016

Misc. updates I've made over the past two months #74

Merged

spencerahill added this to the v0.1 milestone Jul 2, 2016

spencerahill added IO Calc labels Jul 2, 2016

spencerkclark mentioned this issue Sep 27, 2016

WIP: Refactor variable-loading portion of calc.py #90

Merged

9 tasks

spencerahill closed this as completed Jan 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstractify methods for finding data files saved on disk #32

Abstractify methods for finding data files saved on disk #32

spencerahill commented Nov 6, 2015

spencerahill commented Nov 6, 2015

spencerkclark commented Nov 6, 2015

spencerahill commented Oct 12, 2016

spencerahill commented Jan 18, 2017

Abstractify methods for finding data files saved on disk #32

Abstractify methods for finding data files saved on disk #32

Comments

spencerahill commented Nov 6, 2015

spencerahill commented Nov 6, 2015

spencerkclark commented Nov 6, 2015

spencerahill commented Oct 12, 2016

spencerahill commented Jan 18, 2017