Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp main script functionality and associated examples #155

Merged
merged 27 commits into from
Mar 25, 2017

Conversation

spencerahill
Copy link
Owner

@spencerahill spencerahill commented Mar 16, 2017

Closes #151 and #152.

@spencerkclark I would really appreciate a review on this within the next few days if at all possible. (Sorry for the time crunch and that I accidentally did this on the same branch as my last PR #153, so the diffs from that are also showing up below) . Thanks in advance!

This is a major overhaul of the functionality relating to the main script, and also includes one minor change to the Calc API.

  • Calc.region had been a dict of {region.name: region} items. But there is no need for this; we were never using the key, and we can always just grab the name from the object itself. Doing so made the main script overhaul easier
  • Enable multiple projs to be iterated over in the same multi-calc-generation call.
  • Rename main.py to aospy_main.py and the main() function to submit_mult_calcs()
  • Make the names used by submit_mult_calcs and thus the main script more descriptive
  • Delete find_obj.py and most of logic from main.py; move what remained of main.py to new module automate.py.

I am not completely satisfied with my new approach in automate.py, which involves separating the specifications for Calc into "core" and "aux" categories. This whole business would be trivial (just call itertools.product on everything), were it not for our support of 'default' and 'all' for some of the specs.

Basically I think this is well suited for a more OOP approach: a CalcSpec abstract base class, with one implementation of it for each spec that gets passed to Calc. Each implementation would specify how it is to be handled, e.g. where to find its 'default' or 'all' values or what data type to be expected (e.g. dtype_out_time is a tuple that gets permuted over within a Calc, not across Calcs). This would make it easier to, for example, support 'default' and 'all' values for every spec.

BUT we are a bit time crunched in terms of the pending blog post, and I've already ended up down the rabbit hole as it is. So, barring fundamental objections to the current approach, I'm looking for critiques there, and we can come back to this bigger picture stuff later. Sound good?

This bigger picture stuff notwithstanding, still some outstanding items:

  • Tests
  • Better docstrings in automate.py
  • what's new

Once this is locked in I will address #150, which I've already started on.

Copy link
Collaborator

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calc.region had been a dict of {region.name: region} items. But there is no need for this; we were never using the key, and we can always just grab the name from the object itself. Doing so made the main script overhaul easier

Delete find_obj.py

+1 for both of these

we can come back to [...] bigger picture stuff later. Sound good?

Agreed; we could easily get caught in a lengthy discussion here, but for now let's hold off.

Still working through the PR; I need to head out now, but will return to this later today and tomorrow.

elif name == 'default':
return getattr(parent, defaults_attr_name).values()
else:
return getattr(parent, attr_name)[name]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily opposed, just want to confirm: does this mean that we need to switch from using lists to dictionaries when we specify the models attribute of a Proj object or the runs attribute of a Model object?

E.g. go from this:

example_model = Model(
    name='example_model',
    grid_file_paths=('/path/to/files'),
    runs=[runs.control, runs.conv_off],
    default_runs=[]
)

to this:

example_model = Model(
    name='example_model',
    grid_file_paths=('/path/to/files'),
    runs=dict(control=runs.control, conv_off=runs.conv_off),
    default_runs=dict()
)

I'm assuming this was a decision made to eliminate the need for the find_obj.py module? (I'm all for that decision, by the way)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually already taken care of via the utils.io.dict_name_keys function. So the list syntax still works.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that point, the main script and example object library from these commits should work; try running the main script (from inside the example directory so that you don't have to install the object library).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see this gets addressed automatically, should have noticed that sorry -- should we add some checking to utils.io.dict_name_keys to make sure that there are no duplicate names in the list of provided core objects (at least warn, maybe raise an exception)?

For example, right now if you had two Runs with the same name attribute in the list provided, the first one would get overwritten by the second without the user knowing.

On that point, the main script and example object library from these commits should work; try running the main script (from inside the example directory so that you don't have to install the object library).

Indeed, I can confirm this works!

@spencerahill
Copy link
Owner Author

Still working through the PR; I need to head out now, but will return to this later today and tomorrow.

Thanks a lot. Sorry again for the large scope and putting us into a bit of a time crunch.

Copy link
Collaborator

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencerahill I've finished making a first pass here -- overall I think it looks pretty good! Thanks for taking this on.

A few high-level questions came up (related to #3), but we might want to put those off until later.

elif name == 'default':
return getattr(parent, defaults_attr_name).values()
else:
return getattr(parent, attr_name)[name]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see this gets addressed automatically, should have noticed that sorry -- should we add some checking to utils.io.dict_name_keys to make sure that there are no duplicate names in the list of provided core objects (at least warn, maybe raise an exception)?

For example, right now if you had two Runs with the same name attribute in the list provided, the first one would get overwritten by the second without the user knowing.

On that point, the main script and example object library from these commits should work; try running the main script (from inside the example directory so that you don't have to install the object library).

Indeed, I can confirm this works!



projects = {example_proj.name: example_proj}
variables = {var.name: var for var in [precip_largescale, precip_convective,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat big picture, but I'll just note here that this requirement (creating a dictionary mapping Var name attributes to Var objects) breaks my current object library, because I have some Var objects that have the same name attribute, but are different Var objects (to accommodate cases where a variable is native in one model, but needs to be computed from other variables for output from another model, see #3 (comment)).

E.g. these two variables for OLR:

olr = Var(
    name='olr',
    alt_names=('rlut',),
    units=units.W_m2,
    domain='atmos',
    description='All-sky outgoing longwave radiation at TOA.',
    def_time=True,
    def_vert=False,
    def_lat=True,
    def_lon=True
)

olr_imr = Var(
    name='olr',
    domain='atmos',
    description=('Outgoing longwave radiation'),
    variables=(swdn_sfc, vert_int_tdtsw_rad_imr, netrad_toa_imr),
    def_time=True,
    def_vert=False,
    def_lat=True,
    def_lon=True,
    func=calcs.idealized_moist_rad.energy.olr,
    units=units.W_m2
)

This will take some effort/thinking to address and I don't think it's necessarily a blocker here. There's nothing that precludes me (or anyone else) from rolling their own solutions to a main script or object library (so until we have a better solution for #3 I may need to just continue using find_obj.py and my old main script to get things done).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a more general note, this practice of requiring the user to create module level dictionaries to set up their object libraries feels a little non-ideal; again though, without more thought I don't have a good alternative solution. For small object libraries it feels OK, but for larger ones (that get spread out across multiple submodules, like what we use in practice), these would need to be placed (somewhat hidden) in an aospy_user/__init__.py file.

For variables in particular, if one has a lot defined in a module (e.g. in aospy_user/variables/__init__.py) is there a straightforward pythonic way to create this dictionary without much typing?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this requirement (creating a dictionary mapping Var name attributes to Var objects) breaks my current object library, because I have some Var objects that have the same name attribute, but are different Var objects

this practice of requiring the user to create module level dictionaries to set up their object libraries feels a little non-ideal

These are both good points and I definitely want to avoid breaking your workflow. Will have a separate comment and/or Issue on these...I have some ideas but need to clarify them.

specs = self._specs_in.copy()
[specs.pop(core) for core in self._CORE_SPEC_NAMES]
# TODO: don't just pull from first project.
regions = _get_mult_objs_by_names(specs['regions'], self._projects[0],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this prevent a user from using a region that was not explicitly listed as a Region in a particular Proj object?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's a good point...I need to think about this

A related issue (although I don't see an immediate use-case) is if the same name was used by Region objects in different projects but with different region definitions. In that case this logic would incorrectly use the region definition for the first project in all cases.

Copy link
Owner Author

@spencerahill spencerahill Mar 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same name was used by Region objects in different projects but with different region definitions

This would remain a problem even if e.g. we formed the union of regions defined across all projects. In that case the problem reduces to the same issue you raised above re: conflicting names of Var objects.

def create_calcs(self):
return _create_calcs(self._combine_core_aux_specs())

def _print_specs(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll need to update this function to use the self._specs_in dictionary, rather than the (now removed) projects, models, runs etc. attributes in order for it to run (though I'm guessing you're aware of this, since you've commented it out below).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely

prompt_verify=False, verbose=True):
"""Generate and execute all specified computations."""
calc_suite = CalcSuite(calc_suite_specs, obj_lib)
# calc_suite._print_specs()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncomment this once _print_specs() is updated

return _permute_aux_specs(specs)

def _combine_core_aux_specs(self):
return _combine_core_aux_specs(self._permute_core_specs(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of confusing to have two functions of the same name that are defined differently. Is there a good reason to define the module-level version of _combine_core_aux_specs? Could you simply just implement what it does within this class-level function?

Copy link
Collaborator

@spencerkclark spencerkclark Mar 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe at the module level create a function called _merge_dicts and then here you can write a one-liner using itertools.product? Would that work?

E.g. at the module level:

def _merge_dicts(a, b):
    merged = a.copy()
    merged.update(b)
    return merged

and at the instance-level (here):

def _combine_core_aux_specs(self):
    return [_merge_dicts(a, b) for a, b in itertools.product(
            self._permute_core_specs(), self._permute_aux_specs())]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I got carried away here with the module-level functions. The original motivation was for ease of testing and potential re-use of functionality elsewhere, but this logic seems pretty specific to CalcSuite.

defaults_attr_name) for name in names]


def _permute_aux_specs(specs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could write this function pretty compactly as a list-comprehension:

def _permute_aux_specs(specs):
    return [dict(zip(specs.keys(), perm)) for perm in itertools.product(*specs.values())]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I can confirm this works!

Great!

For example, right now if you had two Runs with the same name attribute in the list provided, the first one would get overwritten by the second without the user knowing.

Good catch; will address.

I think you could write this function pretty compactly as a list-comprehension

Yes, will do and the other list comprehensions you suggest

return all_specs


def _create_calcs(specs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could also be written more compactly as a list-comprehension

return pool.map(lambda calc: calc.compute(), calcs)
out = []
for calc in calcs:
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be a way to encapsulate this try-except logic such that it could also be used above when we parallelize the calculations? That way it would ensure the same warning behavior through each pathway.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, fixing this makes that logic much cleaner

@spencerahill
Copy link
Owner Author

@spencerkclark thanks a lot for the review. Your concerns largely coincide with the parts I was most uncomfortable about, so at least we're on the same page about what needs improving.

In terms of long-term solutions, I opened #156 to discuss the aspects re: how to register objects with minimal input by the users.

The other big issue you note that I still need to think about is what the appropriate "parent" is of Vars and Regions. Will follow up with my thinking on that.

@spencerahill
Copy link
Owner Author

@spencerkclark I think I have managed to relax the prior onerous requirement of defining {name: obj} dicts for the projects and variables. CalcSuite now simply searches the object.

However, now I'm realizing this still doesn't work with either of our existing setups, because we are importing a variables module into the object library's top-level namespace. In what I've just pushed, the Var objects are searched for at that top level.

Perhaps a good solution is a try/except case, wherein if 'variables' is defined it is used, and if not then this new method of finding all the Vars is used. Does that make sense?

I added some initial tests; will add more. C.f. our convo offline, you're welcome to add more, but more important for me is your feedback on the overall approach.

@spencerahill
Copy link
Owner Author

spencerahill commented Mar 22, 2017

Perhaps a good solution is a try/except case, wherein if 'variables' is defined it is used, and if not then this new method of finding all the Vars is used

Edit: typo in the commit message; should be 'variables'

Just pushed this, let me know if it works for you.

@spencerkclark
Copy link
Collaborator

Perhaps a good solution is a try/except case, wherein if 'variables' is defined it is used, and if not then this new method of finding all the Vars is used. Does that make sense?

Thanks for the updates; yes, I think this is a very good compromise. I will have a closer look at your changes tomorrow afternoon.

@spencerahill
Copy link
Owner Author

Great, thanks! I'll try to do some more testing in the meantime

@spencerahill
Copy link
Owner Author

Just added some more tests. This is my first time using pytest. I immediately like it a lot, although I'm not certain I'm using it correctly. I feel like maybe I'm overdoing it with all of the @pytest.fixtures, and also I haven't figured out the syntax of the one parameterized case.

@spencerkclark since I know you've been using pytest recently, any comments on these aspects would be appreciated.

@spencerahill
Copy link
Owner Author

spencerahill commented Mar 22, 2017

I have encountered another problem. When I execute aospy_main the first time, it works fine. But the second time, I'm getting an error related to the tar output:

~/Dropbox/py/aospy/examples$ ./aospy_main.py
INFO:root:Initializing Calc instance: Calc object: precip_total, example_proj, example_model, example_run (Wed Mar 22 10:11:03 2017)
INFO:root:Initializing Calc instance: Calc object: precip_convective, example_proj, example_model, example_run (Wed Mar 22 10:11:03 2017)
INFO:root:Initializing Calc instance: Calc object: precip_largescale, example_proj, example_model, example_run (Wed Mar 22 10:11:03 2017)
INFO:root:Initializing Calc instance: Calc object: precip_conv_frac, example_proj, example_model, example_run (Wed Mar 22 10:11:03 2017)
Calc object: precip_total, example_proj, example_model, example_run
Calc object: precip_convective, example_proj, example_model, example_run
Calc object: precip_largescale, example_proj, example_model, example_run
Calc object: precip_conv_frac, example_proj, example_model, example_run
Perform these computations? [y/n] y
INFO:root:Getting input data: Var instance "precip_largescale" (Wed Mar 22 10:11:06 2017)
/Users/shill/Dropbox/miniconda3/lib/python3.5/site-packages/xarray/conventions.py:389: RuntimeWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy netCDF4.datetime objects instead, reason: dates out of range
  result = decode_cf_datetime(example_value, units, calendar)
/Users/shill/Dropbox/miniconda3/lib/python3.5/site-packages/xarray/conventions.py:408: RuntimeWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy netCDF4.datetime objects instead, reason: dates out of range
  calendar=self.calendar)
INFO:root:Getting input data: Var instance "precip_convective" (Wed Mar 22 10:11:06 2017)
/Users/shill/Dropbox/miniconda3/lib/python3.5/site-packages/xarray/conventions.py:389: RuntimeWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy netCDF4.datetime objects instead, reason: dates out of range
  result = decode_cf_datetime(example_value, units, calendar)
/Users/shill/Dropbox/miniconda3/lib/python3.5/site-packages/xarray/conventions.py:408: RuntimeWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy netCDF4.datetime objects instead, reason: dates out of range
  calendar=self.calendar)
INFO:root:Computing timeseries for 0004-01-01 00:00:00 -- 0006-12-31 00:00:00.
INFO:root:Applying desired time-reduction methods. (Wed Mar 22 10:11:06 2017)
INFO:root:Writing desired gridded outputs to disk.
tar: Option --delete is not supported
Usage:
  List:    tar -tf <archive-filename>
  Extract: tar -xf <archive-filename>
  Create:  tar -cf <archive-filename> [filenames...]
  Help:    tar --help
INFO:root:	example-output/example_proj/example_model/example_run/precip_total/precip_total.ann.av.from_monthly_ts.example_model.example_run.0004-0006.nc
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 0:
  #000: H5F.c line 522 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1024 in H5F_open(): unable to truncate a file which is already open
    major: File accessibilty
    minor: Unable to open file
Traceback (most recent call last):
  File "./aospy_main.py", line 110, in <module>
    prompt_verify=prompt_verify)
  File "/Users/shill/Dropbox/py/aospy/aospy/automate.py", line 247, in submit_mult_calcs
    return exec_calcs(calcs, parallelize=parallelize)
  File "/Users/shill/Dropbox/py/aospy/aospy/automate.py", line 230, in exec_calcs
    o = calc.compute()
  File "/Users/shill/Dropbox/py/aospy/aospy/calc.py", line 642, in compute
    save_files=save_files, save_tar_files=save_tar_files)
  File "/Users/shill/Dropbox/py/aospy/aospy/calc.py", line 707, in save
    self._save_files(data, dtype_out_time)
  File "/Users/shill/Dropbox/py/aospy/aospy/calc.py", line 663, in _save_files
    data_out.to_netcdf(path)
  File "/Users/shill/Dropbox/miniconda3/lib/python3.5/site-packages/xarray/core/dataset.py", line 953, in to_netcdf
    unlimited_dims=unlimited_dims)
  File "/Users/shill/Dropbox/miniconda3/lib/python3.5/site-packages/xarray/backends/api.py", line 563, in to_netcdf
    store = store_cls(path, mode, format, group, writer)
  File "/Users/shill/Dropbox/miniconda3/lib/python3.5/site-packages/xarray/backends/netCDF4_.py", line 205, in __init__
    self.ds = opener()
  File "/Users/shill/Dropbox/miniconda3/lib/python3.5/site-packages/xarray/backends/netCDF4_.py", line 181, in _open_netcdf4_group
    ds = nc4.Dataset(filename, mode=mode, **kwargs)
  File "netCDF4/_netCDF4.pyx", line 1848, in netCDF4._netCDF4.Dataset.__init__ (netCDF4/_netCDF4.c:13992)
OSError: Permission denied

If I delete the tar output and re-submit, it works fine. @spencerkclark does this happen for you? It's possibly a MacOS thing, given the OSError and that we've not had this problem in the past with the tar output (I'm running from my Macbook Pro).

@spencerkclark
Copy link
Collaborator

Just added some more tests. This is my first time using pytest. I immediately like it a lot, although I'm not certain I'm using it correctly. I feel like maybe I'm overdoing it with all of the @pytest.fixtures, and also I haven't figured out the syntax of the one parameterized case.

Thanks for adding some more tests! I'm still getting the hang of pytest as well, but I also think it's pretty awesome. I think your use of fixtures is actually what's causing your trouble with parametrize. As it turns out the results of fixture functions cannot be used within parametrize decorators; they can only be passed as arguments to test functions themselves. So basically you'll just need to explicitly write out what obj_lib, all_vars, and all_projects should be there:

@pytest.mark.parametrize(('type_', 'obj_lib', 'expected'),
                         [(Var, examples, {condensation_rain, convection_rain, precip, ps, sphum}),
                          (Proj, examples, {example_proj})])
def test_get_all_objs_of_type(type_, obj_lib, expected):
    actual = _get_all_objs_of_type(type_, obj_lib)
    assert expected == actual

and I think things should work.

I'm taking a look at the rest of your changes now, and will see if I can fill in some gaps in the test coverage after.

# Drop the "core" specifications, which are handled separately.
specs = self._specs_in.copy()
[specs.pop(core) for core in self._CORE_SPEC_NAMES]
specs['regions'] = self._get_regions()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will always use every region defined by the user in their object library.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also does this work if there is no region specified? (In other words if someone doesn't want to do a spatial reduction on result?)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, both very good points, and same with _get_variables below.

specs = self._specs_in.copy()
[specs.pop(core) for core in self._CORE_SPEC_NAMES]
specs['regions'] = self._get_regions()
specs['variables'] = self._get_variables()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Similar to above) I think this will always use every variable defined by the user in their object library. Is that correct?


def _get_variables(self):
if hasattr(self._obj_lib, 'variables'):
return self._obj_lib.variables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and tried this in my object library; this line does get called, but it causes a problem because this function needs to return an iterable to be used in line 53.

The obvious change (which I tried) would be to return the result of _get_all_objs_of_type(Var, self._obj_lib.variables), but this results in automate attempting to create calculations involving all the variables defined in my object library (ignoring the one I specified in the specifications), prompting my comments below. Is there a simple way to address this?

@spencerkclark
Copy link
Collaborator

spencerkclark commented Mar 22, 2017

I have encountered another problem. When I execute aospy_main the first time, it works fine. But the second time, I'm getting an error related to the tar output:

I've actually had something similar happen (prior to this PR), and not for tar output but for standard netCDF output. It only seems to occur for regional calculations. Here's a minimal working example:

In [1]: import xarray as xr

In [2]: ds = xr.Dataset()

In [3]: ds['a'] = 1

In [4]: ds['b'] = 2

In [5]: ds.to_netcdf('test.nc')

In [6]: ds.to_netcdf('test.nc')

For some reason I can call to_netcdf twice in the same session, but if I shut down ipython and restart the kernel, and reopen the file to try again, things fail:

In [1]: import xarray as xr

In [2]: ds = xr.open_dataset('test.nc')

In [3]: ds.to_netcdf('test.nc')
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 0:
  #000: H5F.c line 522 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1024 in H5F_open(): unable to truncate a file which is already open
    major: File accessibilty
    minor: Unable to open file
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-3-d7687fe1d488> in <module>()
----> 1 ds.to_netcdf('test.nc')

/home/skc/miniconda/envs/research/lib/python3.6/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims)
    951         return to_netcdf(self, path, mode, format=format, group=group,
    952                          engine=engine, encoding=encoding,
--> 953                          unlimited_dims=unlimited_dims)
    954
    955     def __unicode__(self):

/home/skc/miniconda/envs/research/lib/python3.6/site-packages/xarray/backends/api.py in to_netcdf(dataset, path, mode, format, group, engine, writer, encoding, unlimited_dims)
    561     sync = writer is None
    562
--> 563     store = store_cls(path, mode, format, group, writer)
    564
    565     if unlimited_dims is None:

/home/skc/miniconda/envs/research/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in __init__(self, filename, mode, format, group, writer, clobber, diskless, persist)
    203                                    diskless=diskless, persist=persist,
    204                                    format=format)
--> 205         self.ds = opener()
    206         self.format = format
    207         self.is_remote = is_remote_uri(filename)

/home/skc/miniconda/envs/research/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in _open_netcdf4_group(filename, mode, group, **kwargs)
    179     import netCDF4 as nc4
    180
--> 181     ds = nc4.Dataset(filename, mode=mode, **kwargs)
    182
    183     with close_on_error(ds):

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__ (netCDF4/_netCDF4.c:13992)()

OSError: Permission denied

(This is on a GFDL workstation by the way, so not Mac OS specific)

@spencerkclark
Copy link
Collaborator

I'm not sure if this issue in xarray is related: pydata/xarray#1215

def _get_regions(self):
return [_get_all_objs_of_type(Region, self._obj_lib)]

def _get_variables(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rewriting this function to look like this be acceptable? This would basically revert the behavior back to using the python variable name for each Var (not its name attribute).

def _get_variables(self):
    objs = getattr(self._obj_lib, 'variables', self._obj_lib).__dict__
    variables = {
        name: var for name, var in objs.items() if isinstance(var, Var)}
    names = self._specs_in['variables']
    return set([var for name, var in variables.items() if name in names])

This is pretty confusing though, since the name attribute is used for every other core object type (though that's also the way things are handled in the main script prior to this PR).

To be honest I think for my particular situation / use-case, writing a similar, simpler, kind of main script to handle permutation that asked the user to specify lists of actual aospy core objects (rather than strings) would be the way to go. It's up to you what you want to do here; I'm not suggesting we switch to that in the current PR. I'm more considering in the medium-term writing that type of main script on my own, and potentially thinking about adding it to the aospy core later.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we should stop dinking around with this string silliness and use the actual objects. I have started in on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencerahill cool! Please let me know if you need any help.

@spencerahill
Copy link
Owner Author

@spencerkclark big thanks for this batch of comments. Sorry wasn't able to get to them tonight, will tomorrow for sure. Once again you've caught some important things.

@spencerahill
Copy link
Owner Author

(Github not letting respond to this in-line for some reason)

results of fixture functions cannot be used within parametrize decorators; they can only be passed as arguments to test functions themselves. So basically you'll just need to explicitly write out what obj_lib, all_vars, and all_projects should be there

Darn. I kind of assumed that was the case but was hoping there was some trickery to avoid it. Oh well. Thanks for the commit fixing it.

Among other things, need to (re-) implement converting from 'default'
date_ranges value to each Run's actual default range
@spencerahill
Copy link
Owner Author

@spencerkclark I've taken a stab at switching from names to objects and have managed to get back to roughly the same spot I was before otherwise.

I also added a "library" kwarg to the calc_suite_specs (essentially moving it there, rather than it being a separate kwarg of submit_mult_calcs).

Note that the date_ranges issue I note in the commit message goes away if you just make it ['default']. But with that fixed I'm noticing that I'm generating too many Calcs; it's probably iterating over the individual regions. Should be easy enough to fix these.

FYI I'll be away for the next hour or so but this is my sole task otherwise tonight. Will try my best to get all outstanding problems addressed before tomorrow.

@spencerahill
Copy link
Owner Author

Feel free to make the changes I suggested in the meantime (just be sure to push updates frequently, so I know what you've done), otherwise I'll try to get to them when I get back

Yes, will do and will ping you

@spencerahill
Copy link
Owner Author

@spencerkclark I'm still working through these, I'll ping you when I'm done, don't worry about it in the meantime (was going to push that same fix shortly)

Spencer Hill added 2 commits March 24, 2017 11:32
- pretty print summary of requested calcs
- getattr rather than try/except
- user doesn't have to add extra list around 'default' for
  date_ranges or around output_time_regional_reductions or regions

NOTE: I'm getting weird failures on test_permute_aux_specs; sometimes
it fails, sometimes it doesn't, depending on the context (i.e. whether
I call pytest on the whole package or just that test module).  When it
fails, basically the order of the two elements has been swapped somehow.
@spencerahill
Copy link
Owner Author

@spencerkclark ok I think I've addressed everything at this point, less some further tests of automate to bring the coverage up.

However see my note in the dd25efb commit re: the weird failing test. On Travis it's only failing for 2.7, but it's not simply a Python version issue; it happens for me on 3.5 sometimes, but not always.

@spencerkclark
Copy link
Collaborator

spencerkclark commented Mar 24, 2017

I want to say in this test we are implicitly assuming that the order is consistent across the sets that go into producing the dictionaries (which is not guaranteed). Using sets is nice to automatically trim duplicates, but I wonder if we should just switch to lists for test simplicity, which always have the same order?

raise AttributeError(e)
return objs


def to_dup_list(x, n, single_to_list=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized we no longer use this (untested) function, so I think we can remove it

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, will delete

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I recently learned of the vulture package that finds unused code in a package, could be useful to use this periodically on aospy

@spencerahill
Copy link
Owner Author

I wonder if we should just switch to lists for test simplicity, which always have the same order?

Yes this is probably the right move. I was motivated by a vague desire to think more carefully about what data structures we are using in each place, but I guess in this case a list's order preservation is more useful than set's prevention of duplicates (there is no built-in ordered set)

Will do this shortly

Spencer Hill added 4 commits March 24, 2017 14:33
We actually do need sets/unordered containers after all, because the
_get_all_objs_of_type list comprehension pulls items from __dict__,
which depending on the Python version may or may not be ordered.  So
since we can't rely on an order there, we can't rely on an order in
anything downstream of it or tests thereof.
return True
assert len(actual) == len(expected)
for act in actual:
assert act in expected
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But of course! Much cleaner solution here. Sorry for leading you down the wrong path.

@spencerahill
Copy link
Owner Author

Wow this ended up being a headache!

@spencerkclark automate is only at 78% coverage, but if it's alright with you I'd like to let it slide. Partly because I see a clear path forward for an improved long-term solution (#158 , with more details to come). Partly because I'm burnt out on this module! Self-inflicted 😄

@spencerahill
Copy link
Owner Author

@spencerkclark good catch on the docstring. Just updated the what's new.

Since all CI passed on the previous non-doc commits, we're green in my view. Any final concerns?

@spencerkclark
Copy link
Collaborator

@spencerahill thanks again! Bear with me, let me see if I can write a simple test for submit_mult_calcs and exec_calcs (it would just be to submit a single Calc, under both parallel and serial methods).

@spencerkclark
Copy link
Collaborator

@spencerahill let me know if you're good with the test I added; if that checks out, and the AppVeyor build succeeds, I'm good to merge!

@spencerahill spencerahill merged commit c5864c2 into develop Mar 25, 2017
@spencerahill spencerahill deleted the docs-update-install-instructions branch March 25, 2017 01:07
@spencerahill
Copy link
Owner Author

@spencerkclark done! Once again, huge thanks for catching my myriad mistakes on this and all your other effort on it.

Ironic that we went for a "quick fix", since it ended up being such an ordeal! Just how it goes sometimes I guess 😄

@spencerahill spencerahill changed the title WIP: revamp main script functionality and associated examples Revamp main script functionality and associated examples Mar 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants