Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'one_dir_freq' file read-in option #31

Merged
merged 4 commits into from
Nov 12, 2015

Conversation

spencerkclark
Copy link
Collaborator

I recognize that we ultimately want to refactor this process out of calc.py, but I needed something along these lines for dealing with idealized model output. The option I've added is called 'one_dir_freq'. It is a very slight modification of the existing 'one_dir' option; however in this case I leverage the intvl_in attribute of Calc, much like the 'gfdl' option to enable the user to specify a series of files with different output frequencies (e.g. monthly, daily, 3hr etc.).

Here's how one uses it within a Run object:

control_T42 = Run(
    name='control_T42',
    description=(
        'Control case at T42 spectral resolution'
        ),
    data_in_direc='path/to/files',
    default_date_range=(start, end),
    data_in_dir_struc='one_dir_freq',
    data_in_files={'20-day': {v: '00000.1x20days.nc' for v in variables},
                   'daily': {v: '00000.1xday.nc' for v in variables},
                   '3-hourly': {v: '00000.8xday.nc' for v in variables}},
    idealized=True
)

# data_in_files may hold absolute or relative paths
paths = []
for nc in data_in_files:
full = '/'.join([data_in_direc, nc]).replace('//', '/')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best way to do this is os.path.join(data_in_direc, nc) (only recently learned that...you were probably just keeping consistent with the existing code. Do as I say, not as I do ;))

@spencerahill
Copy link
Owner

I have no problem with adding this functionality. But does it need to be its own method? Can the existing ..._one_dir method be modified to incorporate this functionality?

Better yet, can the isinstance... if/else blocks in the two functions be combined and made into their own function, that _one_dir calls? That would get rid of ~30 lines of repeated code.

@spencerkclark
Copy link
Collaborator Author

Sure thing -- I considered this, but wasn't sure if it would be worth it, since we are likely going to change this process down the road. Since it's quick and easy I'll look into doing this for now.


Perhaps I should move this discussion to an issue (and maybe you've thought about something along these lines already), but these are a bit of my thoughts going forward:

While currently within Calc the two methods accomplish the same tasks, in an abstract sense the current ..._one_dir and ..._gfdl read-in methods are actually quite different:

  • ..._one_dir requires the user to map every variable to a file name explicitly. If the variable does not appear in this map, aospy will not even attempt to look for it.
  • ..._gfdl is an implicit system. The mapping is coded into the method which looks for the files within the post-processing file structure. The user is not required a priori to specify which variables are in which files, and thus aospy is allowed to look for variables that may not exist.

I would argue that the explicit read-in method is the most general way of doing things. With enough information, one could automatically generate an explicit file map from an implicit generator. In addition, there is nothing that says you couldn't relax the current single directory constraint and just map each variable (within a particular time frequency) to a full file path.

To continue to support implicit read-in methods (for very structured output data, like ..._gfdl) you could require that a user create some object that implements an interface (call the interface FileMapGenerator?) to include methods to generate a map to files for a particular variable when given the intvl_in, variable name, data_in_dur etc. as arguments. The source code for these objects could be stored in a user's aospy_user directory.

Within a Run object one could then have a single argument for the file read-in method. The user could pass either the explicit dictionary mapping or they could pass an object that implements the FileMapGenerator interface. Within Calc, when reading in the files, you could have some simple logic that would be along the lines of: "if an explicit map is provided use the map; if not, use information about which variable you are looking for, and the interval in etc. and pass those as arguments to the generator, which would return an explicit map for just that variable." Using an interface would ensure that the explicit file map generated would always have the same structure (so that it could be used seamlessly within Calc).

@spencerkclark
Copy link
Collaborator Author

This should be ready to go now. You can now specify my dictionary example above under the one_dir read in mode method and things should work as expected.

spencerahill pushed a commit that referenced this pull request Nov 12, 2015
Add 'one_dir_freq' file read-in option
@spencerahill spencerahill merged commit a55ee82 into spencerahill:develop Nov 12, 2015
@spencerahill
Copy link
Owner

Thanks, Spencer! Looks great. And bonus points for # lines deleted > # lines added

@spencerkclark spencerkclark deleted the one_dir_freq branch November 13, 2015 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants