Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aggregated read functionality #87

Merged
merged 18 commits into from Feb 13, 2019
Merged

Conversation

znicholls
Copy link
Collaborator

@znicholls znicholls commented Feb 1, 2019

Pull request

Please confirm that this pull request has done the following:

  • Tests added
  • Discussed tests added with co-authors
  • Documentation added (where applicable)
  • Example added (either to an existing notebook or as a new notebook, where applicable) (N/A as low level)
  • Description in CHANGELOG.rst added
  • make black

Adding to CHANGELOG.rst

Please add a single line in the changelog notes similar to one of the following:

- (`#XX <http://link-to-pr.com>`_) Added feature which does something
- (`#XX <http://link-to-pr.com>`_) Fixed bug identified in (`#XX <http://link-to-issue.com>`_)

@swillner swillner added the wip Work in progress (for PRs) label Feb 1, 2019
@swillner swillner changed the title WIP: Add failing test Add failing test Feb 1, 2019
@znicholls znicholls changed the title Add failing test Simplify parameterset reading Feb 1, 2019
@znicholls znicholls changed the title Simplify parameterset reading Add aggregated read functionality Feb 3, 2019
@swillner swillner self-assigned this Feb 4, 2019
@znicholls znicholls force-pushed the simplify-parameterset-reading branch from 6912a63 to 57350fa Compare February 4, 2019 23:45
@znicholls znicholls force-pushed the add-aggregated-read-functionality branch from 3a5e69b to 809a20f Compare February 5, 2019 04:24
@znicholls
Copy link
Collaborator Author

@swillner close #92 and #86 before looking at this (builds on both those PRs)

@znicholls znicholls mentioned this pull request Feb 5, 2019
4 tasks
@znicholls znicholls force-pushed the simplify-parameterset-reading branch from 57350fa to 7ba5b20 Compare February 5, 2019 11:21
openscm/parameter_views.py Outdated Show resolved Hide resolved
@znicholls znicholls force-pushed the simplify-parameterset-reading branch from 7ba5b20 to 390f48c Compare February 8, 2019 23:01
@znicholls znicholls changed the base branch from simplify-parameterset-reading to master February 9, 2019 00:03
@znicholls znicholls force-pushed the add-aggregated-read-functionality branch from 809a20f to 3685de1 Compare February 9, 2019 00:09
@znicholls
Copy link
Collaborator Author

@swillner are you happy with the tests that have been added?

@swillner
Copy link
Member

swillner commented Feb 9, 2019

@swillner are you happy with the tests that have been added?

Very! ;)

openscm/parameter_views.py Outdated Show resolved Hide resolved
@znicholls znicholls force-pushed the add-aggregated-read-functionality branch from aaf9569 to 4c389cc Compare February 9, 2019 00:37
@znicholls
Copy link
Collaborator Author

@swillner this is good to go from my end

Copy link
Member

@swillner swillner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid there are some issues with this approach:

  • Why do we need to catch NameError?
  • The unit converter converts between this parameter's unit and the requested one, so the one of the child parameter might be different (same for the timeframe)
  • Once we attempt an aggregated view, shall we allow added children?
  • Minor: Return type is missing
  • Minor: We should probably make this a protected ("_"-prefixed) method
  • Major: Rather than summing and then caching the sum in self._parameter._data I would collect views to the children's values and sum them on get (comes with the penality of always doing the sum for each get, even if there have been no changes, but the conversion is done and changes in the child parameters reflect on a new get as is the idea of the parameters views)
  • The recursion can be much simplyfied similar to:
    def _sum_child_data(self) -> float:
        """
        Sum child data.

        Returns
        -------
        float
            Sum of all child data
        """
        if self._parameter._children:
            data = 0
            for _, cp in self._parameter._children.items():
                data += self._unit_converter.convert_from(cp._sum_child_data()) # TODO fix unit conversion
        else:
            return self._parameter._data

(similar for timeseries).

@znicholls
Copy link
Collaborator Author

* Why do we need to catch NameError?

We don't, it's just another way to do these loops where you initialise in the first iteration then add later. These are all equivalent I think

from timeit import default_timer as timer

def sum_iterable(nums):
    data = 0  # you have to know the type in advance
    for j, a in enumerate(nums):
        data += a

    return data

start = timer()
numbers = [1, 3, 5]
for i in range(10000):
    res = sum_iterable(numbers)

print("result = {}".format(res))
end = timer()
print("{:.2f}ms".format((end - start) * 1000))

~7ms

from timeit import default_timer as timer

def sum_iterable(nums):
    for j, a in enumerate(nums):
        data = a if j == 0 else data + a

    return data

start = timer()
numbers = [1, 3, 5]
for i in range(10000):
    res = sum_iterable(numbers)

print("result = {}".format(res))
end = timer()
print("{:.2f}ms".format((end - start) * 1000))

~8ms

from timeit import default_timer as timer

def sum_iterable(nums):
    for a in nums:
        try:
            data += a
        except NameError:
            data = a

    return data

start = timer()
numbers = [1, 3, 5]
for i in range(10000):
    res = sum_iterable(numbers)

print("result = {}".format(res))
end = timer()
print("{:.2f}ms".format((end - start) * 1000))

~10.23ms

Looking at above I'll move to initialising as numpy arrays can add to scalars without issue so initialising with zero shouldn't be a problem.

@znicholls
Copy link
Collaborator Author

znicholls commented Feb 9, 2019

* The unit converter converts between this parameter's unit and the _requested one_, so the one of the child parameter might be different (same for the timeframe)

good pick up, will change

* Once we attempt an aggregated view, shall we allow added children?

Thinking about it more, not sure. Yes means more flexibility but you can get confused as the same view will give different answers depending on when it's called.

* Minor: Return type is missing

Yep

* Minor: We should probably make this a protected ("_"-prefixed) method

Yep

* Major: Rather than summing and then caching the sum in `self._parameter._data` I would collect views to the children's values and sum them on `get` (comes with the penality of always doing the sum for each get, even if there have been no changes, but the conversion is done and changes in the child parameters reflect on a new `get` as is the idea of the parameters views)

nice

@znicholls
Copy link
Collaborator Author

Alright let's have another go. I think the only thing I didn't implement from your suggestions is this.

Once we attempt an aggregated view, shall we allow added children?

Thinking about it more, not sure. Yes means more flexibility but you can get confused as the same view will give different answers depending on when it's called. Maybe that's the behaviour we want, how much do we want to 'lock the low level'/do views normally imply the data doesn't change?

@znicholls znicholls force-pushed the add-aggregated-read-functionality branch from b5710c8 to 403f7f2 Compare February 9, 2019 09:20
@swillner
Copy link
Member

swillner commented Feb 9, 2019

still having a lot of comments. mind if i make a pr to this one later?

@znicholls
Copy link
Collaborator Author

still having a lot of comments. mind if i make a pr to this one later?

yep, no rush

@swillner
Copy link
Member

Thinking about it more, not sure. Yes means more flexibility but you can get confused as the same view will give different answers depending on when it's called. Maybe that's the behaviour we want, how much do we want to 'lock the low level'/do views normally imply the data doesn't change?

Well, the point of the views is that they yield the most up-to-date data for each read via get so the values should be able to change especially when new data is written to them (e.g. for a new run). But adding a new child is something different....

@swillner
Copy link
Member

Just saw, that I disallow adding child parameters once a parameter has been read from anyway, which makes sense for non-aggregating reads, so we just apply the same rationale here.

@znicholls znicholls removed the wip Work in progress (for PRs) label Feb 12, 2019
@znicholls
Copy link
Collaborator Author

@swillner should be good to go now

@znicholls znicholls added the enhancement New feature or request label Feb 13, 2019
@swillner
Copy link
Member

nice!

@swillner swillner merged commit 3783280 into master Feb 13, 2019
@swillner swillner deleted the add-aggregated-read-functionality branch February 13, 2019 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants