Datacollection Refactor #108

chriswmackey · 2019-01-24T15:32:54Z

@mostaphaRoudsari , This is the complete set of changes to the datacollection module as we have discussed. The original DataCollection class has now been broken up into 5 classes and one hidden base class that they all inherit from.

Since the original datacollection.py file has changed dramatically, I think it's better to look at the complete new file rather than the diff. You will also see some useful documentation at the top of the file describing the inheritance structure.

I am happy to report that the improvement in speed and memory usage across the library is substantial. Creating EPW objects now takes < 20% of the original time and many of the DataCollection methods also run in similarly small fractions of their original time. This can be chalked up to the fact that continuous hourly data collections now only create DateTime objects when they are needed and are not needed upon creation of the object. This means that, most of the time, we just rely on the start an end time of the analysis_period to figure out the datetime associated with values.

Note that I didn't make a specific collection for annual hourly data but rather made one for all forms of continuous hourly data. I did this because:

The extra code and run time needed to make the methods work for all continuous data collections vs just annual ones is negligible.
The improvements in speed and memory usage now apply to all energy simulation results, design day values, and all types of Weas, as well as EPW data.

Note that, as of the time that I am making this PR, I have not yet updated the tests or written new ones for the new DataCollections and I will push these tests shortly. I just wanted to get this up so that you have more time to review it.

it finally has everything we need and the next commit, I will replace the DataCollection on the other ladybug classes with this one.

.... note that this commit does not yet include updated tests

coveralls · 2019-01-24T22:54:52Z

Pull Request Test Coverage Report for Build 400

381 of 815 (46.75%) changed or added relevant lines in 10 files are covered.
23 unchanged lines in 6 files lost coverage.
Overall coverage decreased (-3.4%) to 77.515%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
ladybug/datatype/base.py	1	2	50.0%
ladybug/designday.py	13	15	86.67%
ladybug/header.py	9	14	64.29%
ladybug/stat.py	16	21	76.19%
ladybug/wea.py	53	63	84.13%
ladybug/analysisperiod.py	65	76	85.53%
ladybug/_datacollectionbase.py	91	191	47.64%
ladybug/datacollection.py	111	411	27.01%

Files with Coverage Reduction	New Missed Lines	%
ladybug/wea.py	1	89.8%
ladybug/designday.py	2	85.78%
ladybug/header.py	2	80.7%
ladybug/stat.py	3	83.78%
ladybug/analysisperiod.py	4	85.03%
ladybug/dt.py	11	67.86%

Totals
Change from base Build 380:	-3.4%
Covered Lines:	3861
Relevant Lines:	4981

💛 - Coveralls

coveralls · 2019-01-24T22:54:52Z

Pull Request Test Coverage Report for Build 488

1529 of 1730 (88.38%) changed or added relevant lines in 14 files are covered.
23 unchanged lines in 7 files lost coverage.
Overall coverage increased (+4.6%) to 85.454%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
ladybug/datatype/base.py	1	2	50.0%
ladybug/datatype/generic.py	24	25	96.0%
ladybug/location.py	7	8	87.5%
ladybug/header.py	13	15	86.67%
ladybug/_datacollectionbase.py	208	216	96.3%
ladybug/analysisperiod.py	66	76	86.84%
ladybug/wea.py	53	63	84.13%
ladybug/designday.py	113	129	87.6%
ladybug/epw.py	374	393	95.17%
ladybug/stat.py	109	129	84.5%

Files with Coverage Reduction	New Missed Lines	%
ladybug/wea.py	1	89.8%
ladybug/datacollection.py	1	83.06%
ladybug/designday.py	2	86.69%
ladybug/analysisperiod.py	2	89.3%
ladybug/header.py	2	85.96%
ladybug/dt.py	3	78.57%
ladybug/stat.py	12	84.8%

Totals
Change from base Build 380:	4.6%
Covered Lines:	4923
Relevant Lines:	5761

💛 - Coveralls

... and some bug fixes that were made along the way.

Now that we have monthly collections, we have all of the objects that were needed to parse the header of the EPW into various other objects (including design days, analysis periods, and monthly collections). Accordingly, this commit introduces parsing of the EPW header. Editing any of these parsed objects will result in edits to the EPW header.

I think this should work for all types of EPWs now

mostaphaRoudsari

I didn't get to go through all the changes but here are some comments to get the conversation started.

The structure of the new DataCollections look promising. Thanks.

ladybug/_datacollectionbase.py

ladybug/datacollection.py

Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>

…tools/ladybug into datacollection-refactor

Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>

... this one allows for iterators and will just cast them to a list.

... now that I don't have to worry about datapoint objects coming through this function.

I added a number of methods that perform certain types of checks and generally make it possible for the discontinuous data collection to be used as a means to clean up messy data into cleaner, more usable formats.

I have come to realize that there is an important edge case that I originally overlooked, which I do not want to drive the development or increase the runtime of the current workflows but I think must nevertheless be addressed. The edge case is if the user supplies datetimes that lie outside of the analysis_period in the header or supplies datetimes that are out of chronological order. I say that this is an edge case because there is no need for these checks whenever the data is derived from a continuous data set, which is pretty much 99% of Ladybug workflows. Furthermore, the vast majority of the methods on the DataCollection will still work without these checks and I only see it becoming an issue for certain types of visualizations or when one wants to build a continuous collection from a discontinuous one. Accordingly, to solve this issue, I have added an attribute to the DataCollections to track whether the header analysis_period aligns with the datetimes in the data, which is always True through all operations on a continuous collection. I have also added methods that validate the datetimes against the analysis_period.

... and I fixed some bugs that I found in the methods that validate the analysis_periods.

mostaphaRoudsari

Looks good!

Chris Mackey added 11 commits January 18, 2019 23:14

Initial Start On New DataCollection Class

c537bc2

More Progress On DataCollections

6584df2

Update _datacollection.py

01587b0

Added Monthly and Daily Data Collections

28b1c2d

Update stat_test.py

a275973

New DataCollections Almost Complete!

8f5bcd2

More Updates to DataCollection

af1e9a1

Fixed Bug in Header Duplicate

9ff081e

New DataCollection is Finally Ready to be Implemented

e420748

it finally has everything we need and the next commit, I will replace the DataCollection on the other ladybug classes with this one.

One Last Addition: Method to Create Aligned Collection

94ae855

perf(datacollection): refactor datacollection into mutiple classes

87db759

.... note that this commit does not yet include updated tests

chriswmackey added enhancement New feature or request new development For issues that require new code labels Jan 24, 2019

chriswmackey requested a review from mostaphaRoudsari January 24, 2019 15:32

mostaphaRoudsari self-assigned this Jan 24, 2019

Chris Mackey added 2 commits January 24, 2019 13:59

Started to Update Tests

ccbf2aa

Updated existing tests for new DataCollections

63cb244

Chris Mackey added 11 commits January 24, 2019 18:26

More DataCollection Tests

7d17572

More Tests for DataCollections

1327959

... and some bug fixes that were made along the way.

Added Tests of all init methods

bb54280

Rounding Out Tests for Base Collection

6c8f28f

Finished DataCollection Tests

07a4c16

Better-Organized the Operator-Replacing Statements

7313790

Updateing Tests To Handle the New state Property on Location

52d8360

Cleaned Up EPW Header Parsing

eaf2acd

I think this should work for all types of EPWs now

Finished Adding Setters for the EPW Header Data

f3be551

Added classmethod for EPW from Missing Values

e747284

Chris Mackey added 2 commits January 29, 2019 20:12

Added Json Methods for DesignDay

7af5975

Update location_test.py

910e350

mostaphaRoudsari reviewed Jan 31, 2019

View reviewed changes

Chris Mackey and others added 23 commits February 1, 2019 17:48

Updated some docstrings

434b39f

Update ladybug/_datacollectionbase.py

db05c3d

Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>

Update ladybug/_datacollectionbase.py

4ee8722

Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>

Update ladybug/_datacollectionbase.py

bc99d24

Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>

Edits Based on Comments by @mostaphaRoudsari

7ee3834

Merge branch 'datacollection-refactor' of https://github.com/ladybug-…

7589735

…tools/ladybug into datacollection-refactor

Update ladybug/datacollection.py

b54f714

Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>

Enabling Import of Different Weeks on STAT Object

c1602c1

Implement better iterable check

6235dd1

... this one allows for iterators and will just cast them to a list.

Embellishing Docstring of highest and lowest values

e94f8da

Added Json Methods to STAT Object

5d07418

Corrected sky temperature function

c34f638

Putting Back Numerical Check

4071bf6

... now that I don't have to worry about datapoint objects coming through this function.

Adding Functions for Cleaning Up Data

1f7fe1b

I added a number of methods that perform certain types of checks and generally make it possible for the discontinuous data collection to be used as a means to clean up messy data into cleaner, more usable formats.

Finished Adding Methods to Validate analysis_period

c5c9d04

Added checks for duplicate datetimes

e343525

Update datacollection_test.py

dad0dad

Added test for validate analysis period

1519855

Finished Tests for Validate analysis_period

d584cbd

Updated checks for reversed analysis periods

218ecd2

Fixed a Couple of Bugs

31a895f

Finished Writing DataCollection Tests

30760a6

... and I fixed some bugs that I found in the methods that validate the analysis_periods.

mostaphaRoudsari self-requested a review March 1, 2019 19:59

mostaphaRoudsari approved these changes Mar 1, 2019

View reviewed changes

chriswmackey merged commit 7e3bdc2 into development Mar 1, 2019

chriswmackey deleted the datacollection-refactor branch March 2, 2019 00:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datacollection Refactor #108

Datacollection Refactor #108

chriswmackey commented Jan 24, 2019

coveralls commented Jan 24, 2019

coveralls commented Jan 24, 2019 •

edited

Loading

mostaphaRoudsari left a comment

mostaphaRoudsari left a comment

Datacollection Refactor #108

Datacollection Refactor #108

Conversation

chriswmackey commented Jan 24, 2019

coveralls commented Jan 24, 2019

Pull Request Test Coverage Report for Build 400

💛 - Coveralls

coveralls commented Jan 24, 2019 • edited Loading

Pull Request Test Coverage Report for Build 488

💛 - Coveralls

mostaphaRoudsari left a comment

Choose a reason for hiding this comment

mostaphaRoudsari left a comment

Choose a reason for hiding this comment

coveralls commented Jan 24, 2019 •

edited

Loading