Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datacollection Refactor #108

Merged
merged 52 commits into from
Mar 1, 2019
Merged

Datacollection Refactor #108

merged 52 commits into from
Mar 1, 2019

Conversation

chriswmackey
Copy link
Member

@mostaphaRoudsari , This is the complete set of changes to the datacollection module as we have discussed. The original DataCollection class has now been broken up into 5 classes and one hidden base class that they all inherit from.

Since the original datacollection.py file has changed dramatically, I think it's better to look at the complete new file rather than the diff. You will also see some useful documentation at the top of the file describing the inheritance structure.

I am happy to report that the improvement in speed and memory usage across the library is substantial. Creating EPW objects now takes < 20% of the original time and many of the DataCollection methods also run in similarly small fractions of their original time. This can be chalked up to the fact that continuous hourly data collections now only create DateTime objects when they are needed and are not needed upon creation of the object. This means that, most of the time, we just rely on the start an end time of the analysis_period to figure out the datetime associated with values.

Note that I didn't make a specific collection for annual hourly data but rather made one for all forms of continuous hourly data. I did this because:

  • The extra code and run time needed to make the methods work for all continuous data collections vs just annual ones is negligible.
  • The improvements in speed and memory usage now apply to all energy simulation results, design day values, and all types of Weas, as well as EPW data.

Note that, as of the time that I am making this PR, I have not yet updated the tests or written new ones for the new DataCollections and I will push these tests shortly. I just wanted to get this up so that you have more time to review it.

@chriswmackey chriswmackey added enhancement New feature or request new development For issues that require new code labels Jan 24, 2019
@mostaphaRoudsari mostaphaRoudsari self-assigned this Jan 24, 2019
@coveralls
Copy link

Pull Request Test Coverage Report for Build 400

  • 381 of 815 (46.75%) changed or added relevant lines in 10 files are covered.
  • 23 unchanged lines in 6 files lost coverage.
  • Overall coverage decreased (-3.4%) to 77.515%

Changes Missing Coverage Covered Lines Changed/Added Lines %
ladybug/datatype/base.py 1 2 50.0%
ladybug/designday.py 13 15 86.67%
ladybug/header.py 9 14 64.29%
ladybug/stat.py 16 21 76.19%
ladybug/wea.py 53 63 84.13%
ladybug/analysisperiod.py 65 76 85.53%
ladybug/_datacollectionbase.py 91 191 47.64%
ladybug/datacollection.py 111 411 27.01%
Files with Coverage Reduction New Missed Lines %
ladybug/wea.py 1 89.8%
ladybug/designday.py 2 85.78%
ladybug/header.py 2 80.7%
ladybug/stat.py 3 83.78%
ladybug/analysisperiod.py 4 85.03%
ladybug/dt.py 11 67.86%
Totals Coverage Status
Change from base Build 380: -3.4%
Covered Lines: 3861
Relevant Lines: 4981

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jan 24, 2019

Pull Request Test Coverage Report for Build 488

  • 1529 of 1730 (88.38%) changed or added relevant lines in 14 files are covered.
  • 23 unchanged lines in 7 files lost coverage.
  • Overall coverage increased (+4.6%) to 85.454%

Changes Missing Coverage Covered Lines Changed/Added Lines %
ladybug/datatype/base.py 1 2 50.0%
ladybug/datatype/generic.py 24 25 96.0%
ladybug/location.py 7 8 87.5%
ladybug/header.py 13 15 86.67%
ladybug/_datacollectionbase.py 208 216 96.3%
ladybug/analysisperiod.py 66 76 86.84%
ladybug/wea.py 53 63 84.13%
ladybug/designday.py 113 129 87.6%
ladybug/epw.py 374 393 95.17%
ladybug/stat.py 109 129 84.5%
Files with Coverage Reduction New Missed Lines %
ladybug/wea.py 1 89.8%
ladybug/datacollection.py 1 83.06%
ladybug/designday.py 2 86.69%
ladybug/analysisperiod.py 2 89.3%
ladybug/header.py 2 85.96%
ladybug/dt.py 3 78.57%
ladybug/stat.py 12 84.8%
Totals Coverage Status
Change from base Build 380: 4.6%
Covered Lines: 4923
Relevant Lines: 5761

💛 - Coveralls

Chris Mackey added 11 commits January 24, 2019 18:26
... and some bug fixes that were made along the way.
Now that we have monthly collections, we have all of the objects that were needed to parse the header of the EPW into various other objects (including design days, analysis periods, and monthly collections).  Accordingly, this commit introduces parsing of the EPW header.  Editing any of these parsed objects will result in edits to the EPW header.
I think this should work for all types of EPWs now
Copy link
Member

@mostaphaRoudsari mostaphaRoudsari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get to go through all the changes but here are some comments to get the conversation started.

The structure of the new DataCollections look promising. Thanks.

ladybug/_datacollectionbase.py Outdated Show resolved Hide resolved
ladybug/_datacollectionbase.py Outdated Show resolved Hide resolved
ladybug/_datacollectionbase.py Show resolved Hide resolved
ladybug/_datacollectionbase.py Show resolved Hide resolved
ladybug/_datacollectionbase.py Outdated Show resolved Hide resolved
ladybug/datacollection.py Outdated Show resolved Hide resolved
ladybug/datacollection.py Outdated Show resolved Hide resolved
ladybug/datacollection.py Outdated Show resolved Hide resolved
ladybug/datacollection.py Outdated Show resolved Hide resolved
ladybug/datacollection.py Outdated Show resolved Hide resolved
Chris Mackey and others added 23 commits February 1, 2019 17:48
Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>
Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>
Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>
Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>
... this one allows for iterators and will just cast them to a list.
... now that I don't have to worry about datapoint objects coming through this function.
I added a number of methods that perform certain types of checks and generally make it possible for the discontinuous data collection to be used as a means to clean up messy data into cleaner, more usable formats.
I have come to realize that there is an important edge case that I originally overlooked, which I do not want to drive the development or increase the runtime of the current workflows but I think must nevertheless be addressed.  The edge case is if the user supplies datetimes that lie outside of the analysis_period in the header or supplies datetimes that are out of chronological order.  I say that this is an edge case because there is no need for these checks whenever the data is derived from a continuous data set, which is pretty much 99% of Ladybug workflows.  Furthermore, the vast majority of the methods on the DataCollection will still work without these checks and I only see it becoming an issue for certain types of visualizations or when one wants to build a continuous collection from a discontinuous one. Accordingly, to solve this issue, I have added an attribute to the DataCollections to track whether the header analysis_period aligns with the datetimes in the data, which is always True through all operations on a continuous collection.  I have also added methods that validate the datetimes against the analysis_period.
... and I fixed some bugs that I found in the methods that validate the analysis_periods.
@mostaphaRoudsari mostaphaRoudsari self-requested a review March 1, 2019 19:59
Copy link
Member

@mostaphaRoudsari mostaphaRoudsari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@chriswmackey chriswmackey merged commit 7e3bdc2 into development Mar 1, 2019
@chriswmackey chriswmackey deleted the datacollection-refactor branch March 2, 2019 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new development For issues that require new code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants