-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/nd arrays #67
Conversation
This rewrites SpaceTimeConvertor and its tests, though the tests should next be written to mock the RegionRegister/IntervalRegister which do the work of converting, and currently fail because they still expect SpaceTimeValues.
Will need to handle more than one dimension. May also revisit handling of coefficients (sparse matrix multiplication approach?) instead of current iteration through regions.
SpaceTimeConvertor assumes it gets passed a numpy.ndarray with shape num_regions x num_intervals. Room for more numpy-idiomatic implementation.
TODO - test assertions that all values must be known - or NaN - in a timestep x region x interval data array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @tomalrussell. I've made a bunch of comments there. A few niggles, some as placeholders for future work, some optimisations and some ideas for refactoring/harmonising the interval/area conversion. If you can fix the niggles, we can discuss the other bits next week.
smif/convert/__init__.py
Outdated
return converted_data | ||
num_regions = len(self.regions.get_regions_in_set(to_spatial)) | ||
num_intervals = data.shape[1] | ||
converted = np.empty((num_regions, num_intervals)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, this is a good one! Beware of np.empty
. I have been burnt many times before. np.empty
doesn't fill the array with zeros. Use np.zeros
for safety, as tracking down the bugs that this causes can be tricky!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds safer - see 7aec9ca shortly
smif/convert/__init__.py
Outdated
# transpose data and iterate through 2nd dimension | ||
for idx, region_slice in enumerate(data.transpose()): | ||
converted[:, idx] = self.regions.convert(region_slice, from_spatial, to_spatial) | ||
return converted | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One optimisation here using a numpy function:
def _convert_regions(self, data, from_spatial, to_spatial):
"""Slice, convert and compose regions
"""
converted = np.apply_along_axis(self.regions.convert, 0, data,
from_spatial, to_spatial)
return converted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! b20aeb2 deals with this and the below
smif/convert/__init__.py
Outdated
Returns | ||
------- | ||
bool | ||
def _convert_intervals(self, data, from_temporal, to_temporal): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the contents of this can be replaced with:
def _convert_intervals(self, data, from_temporal, to_temporal):
"""Slice, convert and compose intervals
"""
converted = np.apply_along_axis(self.intervals.convert, 1, data,
from_temporal, to_temporal)
return converted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
for from_region_name, from_value in zip(from_set_names, data): | ||
for to_region_name, coef in coefficents[from_region_name]: | ||
to_region_idx = to_set_names.index(to_region_name) | ||
converted[to_region_idx] += coef*from_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a sum product operation over coefficients (which includes the mapping of the regions implicit in from_values
and converted
), and values. e.g. if values is length 10 (from_set_name
has 10 regions), and to_set_name
is length 5 (5 regions), coefficients is a sparse 10*5 matrix and you want to find the sum product of length 5.
e.g.
converted = np.dot(from_value, coef)
@@ -63,12 +64,12 @@ class RegionRegister(object): | |||
between data values relating to compatible sets of regions. | |||
""" | |||
def __init__(self): | |||
self._register = {} | |||
self._register = OrderedDict() | |||
self._conversions = defaultdict(dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be more efficient to store a dict of ndarrays (generated by _conversion_coefficients
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, would be more space-efficient not to hold onto the regions - though this is a bigger change.
Currently we hold on to all the region sets while building the conversion register so that adding a region triggers the generation of the conversions to/from the new region and each previously-registered region.
Could change the API so we have to register each conversion with knowledge of the pair of region sets? Or could introduce a builder-type step that holds onto all regions while setting up, and outputs the more parsimonious representation that the converter needs.
] | ||
|
||
if any([len(results) < 2 for results in model_set_results]): | ||
if any([len(results) < 2 for results in self.iterated_results.values()]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestep
argument isn't used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorted in 0076375
smif/sos_model.py
Outdated
|
||
data[year][param] = SpaceTimeValue(region, interval, value, units) | ||
data[param] = np.empty((num_timesteps, num_intervals, num_regions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Safer to use np.zeros
(really), particularly when filling of arrays is deferred...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, along with below, in e68d60b
smif/sos_model.py
Outdated
if len(timestep_names) == 0: | ||
self.logger.error("No timesteps found when loading %s", param) | ||
|
||
data = np.empty(( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.zeros
tests/convert/test_area.py
Outdated
expected = {'a': 0.5, 'b': 0.5} | ||
assert converted == expected | ||
expected = np.ones(2) / 2 | ||
assert all(converted == expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy.testing
contains helper methods assert_equal
and all_close` which handle array comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's clearer - changed in b60e16c, and below
|
||
def test_convert_from_half(self, regions_rect, regions_half_squares): | ||
rreg = RegionRegister() | ||
rreg.register(regions_rect) | ||
rreg.register(regions_half_squares) | ||
|
||
data = {'a': 0.5, 'b': 0.5} | ||
data = np.ones(2) / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.array([0.5, 0.5])
would be clearer!
@staticmethod | ||
def _regionalise_data(data): | ||
""" | ||
def convert(self, data, from_spatial, to_spatial, from_temporal, to_temporal): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(read after comments below) Thinking through optimisations/harmonisations in this module, the conversions in area and intervals could both be brought to this module and performed as a 2-dim matrix operation. This requires coefficient matrices for region and intervals to be generated either together or independently (probably independently and then pre-combined at run-time, as not every combination of regional and temporal conversion will ever be required). Probably a candidate for memoization.
For the avoidance of random-memory initialisation bugs that may come from np.empty if the entire array is not written over.
Absolute and relative tolerances for testing similarity of model outputs, and maximum iterations to run before raising an error
Tests current failing due to problem with v5.2 of pyomo and <= v4.6 of GLPK |
Instead of materialising the list with a comprehension - see https://www.quantifiedcode.com/app/issue_class/53lnAzfW for suggestion and justification.
[Delivers #143885653]