Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/nd arrays #67

Merged
merged 26 commits into from
May 22, 2017
Merged

Conversation

tomalrussell
Copy link
Member

[Delivers #143885653]

This rewrites SpaceTimeConvertor and its tests, though the tests should
next be written to mock the RegionRegister/IntervalRegister which do
the work of converting, and currently fail because they still expect
SpaceTimeValues.
Will need to handle more than one dimension. May also revisit handling
of coefficients (sparse matrix multiplication approach?) instead of
current iteration through regions.
SpaceTimeConvertor assumes it gets passed a numpy.ndarray with shape
num_regions x num_intervals. Room for more numpy-idiomatic
implementation.
TODO - test assertions that all values must be known
- or NaN - in a timestep x region x interval data array
Copy link
Member

@willu47 willu47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @tomalrussell. I've made a bunch of comments there. A few niggles, some as placeholders for future work, some optimisations and some ideas for refactoring/harmonising the interval/area conversion. If you can fix the niggles, we can discuss the other bits next week.

return converted_data
num_regions = len(self.regions.get_regions_in_set(to_spatial))
num_intervals = data.shape[1]
converted = np.empty((num_regions, num_intervals))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, this is a good one! Beware of np.empty. I have been burnt many times before. np.empty doesn't fill the array with zeros. Use np.zeros for safety, as tracking down the bugs that this causes can be tricky!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds safer - see 7aec9ca shortly

# transpose data and iterate through 2nd dimension
for idx, region_slice in enumerate(data.transpose()):
converted[:, idx] = self.regions.convert(region_slice, from_spatial, to_spatial)
return converted

Copy link
Member

@willu47 willu47 May 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One optimisation here using a numpy function:

    def _convert_regions(self, data, from_spatial, to_spatial):
        """Slice, convert and compose regions
        """
        converted = np.apply_along_axis(self.regions.convert, 0, data,
                                        from_spatial, to_spatial)
        return converted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! b20aeb2 deals with this and the below

Returns
-------
bool
def _convert_intervals(self, data, from_temporal, to_temporal):
Copy link
Member

@willu47 willu47 May 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the contents of this can be replaced with:

    def _convert_intervals(self, data, from_temporal, to_temporal):
        """Slice, convert and compose intervals
        """
        converted = np.apply_along_axis(self.intervals.convert, 1, data,
                                        from_temporal, to_temporal)
        return converted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

for from_region_name, from_value in zip(from_set_names, data):
for to_region_name, coef in coefficents[from_region_name]:
to_region_idx = to_set_names.index(to_region_name)
converted[to_region_idx] += coef*from_value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a sum product operation over coefficients (which includes the mapping of the regions implicit in from_values and converted), and values. e.g. if values is length 10 (from_set_name has 10 regions), and to_set_name is length 5 (5 regions), coefficients is a sparse 10*5 matrix and you want to find the sum product of length 5.

e.g.

converted = np.dot(from_value, coef)

@@ -63,12 +64,12 @@ class RegionRegister(object):
between data values relating to compatible sets of regions.
"""
def __init__(self):
self._register = {}
self._register = OrderedDict()
self._conversions = defaultdict(dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be more efficient to store a dict of ndarrays (generated by _conversion_coefficients)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, would be more space-efficient not to hold onto the regions - though this is a bigger change.

Currently we hold on to all the region sets while building the conversion register so that adding a region triggers the generation of the conversions to/from the new region and each previously-registered region.

Could change the API so we have to register each conversion with knowledge of the pair of region sets? Or could introduce a builder-type step that holds onto all regions while setting up, and outputs the more parsimonious representation that the converter needs.

]

if any([len(results) < 2 for results in model_set_results]):
if any([len(results) < 2 for results in self.iterated_results.values()]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestep argument isn't used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorted in 0076375


data[year][param] = SpaceTimeValue(region, interval, value, units)
data[param] = np.empty((num_timesteps, num_intervals, num_regions))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safer to use np.zeros (really), particularly when filling of arrays is deferred...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, along with below, in e68d60b

if len(timestep_names) == 0:
self.logger.error("No timesteps found when loading %s", param)

data = np.empty((
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.zeros

expected = {'a': 0.5, 'b': 0.5}
assert converted == expected
expected = np.ones(2) / 2
assert all(converted == expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy.testing contains helper methods assert_equal and all_close` which handle array comparison

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's clearer - changed in b60e16c, and below


def test_convert_from_half(self, regions_rect, regions_half_squares):
rreg = RegionRegister()
rreg.register(regions_rect)
rreg.register(regions_half_squares)

data = {'a': 0.5, 'b': 0.5}
data = np.ones(2) / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.array([0.5, 0.5]) would be clearer!

@staticmethod
def _regionalise_data(data):
"""
def convert(self, data, from_spatial, to_spatial, from_temporal, to_temporal):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(read after comments below) Thinking through optimisations/harmonisations in this module, the conversions in area and intervals could both be brought to this module and performed as a 2-dim matrix operation. This requires coefficient matrices for region and intervals to be generated either together or independently (probably independently and then pre-combined at run-time, as not every combination of regional and temporal conversion will ever be required). Probably a candidate for memoization.

@willu47
Copy link
Member

willu47 commented May 22, 2017

Tests current failing due to problem with v5.2 of pyomo and <= v4.6 of GLPK

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 95.692% when pulling c343020 on tomalrussell:feature/nd-arrays into 1a88693 on nismod:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 96.062% when pulling d2d57be on tomalrussell:feature/nd-arrays into 1a88693 on nismod:master.

Instead of materialising the list with a comprehension - see
https://www.quantifiedcode.com/app/issue_class/53lnAzfW for
suggestion and justification.
@tomalrussell tomalrussell merged commit 7cfc7a8 into nismod:master May 22, 2017
@tomalrussell tomalrussell deleted the feature/nd-arrays branch May 23, 2017 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants