-
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datacollection Refactor #108
Conversation
it finally has everything we need and the next commit, I will replace the DataCollection on the other ladybug classes with this one.
.... note that this commit does not yet include updated tests
Pull Request Test Coverage Report for Build 400
💛 - Coveralls |
Pull Request Test Coverage Report for Build 488
💛 - Coveralls |
... and some bug fixes that were made along the way.
Now that we have monthly collections, we have all of the objects that were needed to parse the header of the EPW into various other objects (including design days, analysis periods, and monthly collections). Accordingly, this commit introduces parsing of the EPW header. Editing any of these parsed objects will result in edits to the EPW header.
I think this should work for all types of EPWs now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get to go through all the changes but here are some comments to get the conversation started.
The structure of the new DataCollections look promising. Thanks.
Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>
Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>
Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>
…tools/ladybug into datacollection-refactor
Co-Authored-By: chriswmackey <chris@mackeyarchitecture.com>
... this one allows for iterators and will just cast them to a list.
... now that I don't have to worry about datapoint objects coming through this function.
I added a number of methods that perform certain types of checks and generally make it possible for the discontinuous data collection to be used as a means to clean up messy data into cleaner, more usable formats.
I have come to realize that there is an important edge case that I originally overlooked, which I do not want to drive the development or increase the runtime of the current workflows but I think must nevertheless be addressed. The edge case is if the user supplies datetimes that lie outside of the analysis_period in the header or supplies datetimes that are out of chronological order. I say that this is an edge case because there is no need for these checks whenever the data is derived from a continuous data set, which is pretty much 99% of Ladybug workflows. Furthermore, the vast majority of the methods on the DataCollection will still work without these checks and I only see it becoming an issue for certain types of visualizations or when one wants to build a continuous collection from a discontinuous one. Accordingly, to solve this issue, I have added an attribute to the DataCollections to track whether the header analysis_period aligns with the datetimes in the data, which is always True through all operations on a continuous collection. I have also added methods that validate the datetimes against the analysis_period.
... and I fixed some bugs that I found in the methods that validate the analysis_periods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@mostaphaRoudsari , This is the complete set of changes to the datacollection module as we have discussed. The original DataCollection class has now been broken up into 5 classes and one hidden base class that they all inherit from.
Since the original datacollection.py file has changed dramatically, I think it's better to look at the complete new file rather than the diff. You will also see some useful documentation at the top of the file describing the inheritance structure.
I am happy to report that the improvement in speed and memory usage across the library is substantial. Creating EPW objects now takes < 20% of the original time and many of the DataCollection methods also run in similarly small fractions of their original time. This can be chalked up to the fact that continuous hourly data collections now only create DateTime objects when they are needed and are not needed upon creation of the object. This means that, most of the time, we just rely on the start an end time of the analysis_period to figure out the datetime associated with values.
Note that I didn't make a specific collection for annual hourly data but rather made one for all forms of continuous hourly data. I did this because:
Note that, as of the time that I am making this PR, I have not yet updated the tests or written new ones for the new DataCollections and I will push these tests shortly. I just wanted to get this up so that you have more time to review it.