-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[reliability] Consistent handling of direction_id when NaNs present #90
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Leaving one initial comment. Want to make sure this is "upstream" enough to handle all issues with the direction id column.
@@ -165,7 +175,7 @@ def generate_all_observed_edge_costs(trips_and_stop_times: pd.DataFrame | |||
dir_mask = (tst_sub.direction_id == direction) | |||
tst_sub_dir = tst_sub[dir_mask] | |||
else: | |||
tst_sub_dir = tst_sub | |||
tst_sub_dir = tst_sub.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the removal of the .copy()
? Without examining it deeply, shouldn't we defensively employ .copy()
to avoid upstream impacts? It's removal does not seem to be relevant to handling the direction_id
column being dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore this. I am so sorry.
peartree/parallel.py
Outdated
@@ -74,9 +74,19 @@ def generate_route_costs(self, route_id: str): | |||
'departure_time'] | |||
trips_and_stop_times = trips_and_stop_times.sort_values(sort_list) | |||
|
|||
# Check direction_id column value before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our discussion offline - the advantage to adding the handling here is that it is within a single route. This means that we do not end up tossing direction id if a specific route happens to have all direction id rows filled in. It is possible that a route operator has direction id for one route but not another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments added. thanks!
Codecov Report
@@ Coverage Diff @@
## master #90 +/- ##
==========================================
+ Coverage 91.64% 91.91% +0.26%
==========================================
Files 12 12
Lines 862 866 +4
==========================================
+ Hits 790 796 +6
+ Misses 72 70 -2
Continue to review full report at Codecov.
|
@yiyange made some slight changes to the comments (see above commit). |
Final request, @yiyange, please add a test in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you.
# trips_and_stop_times to generate wait and edge costs | ||
# Note: Advantage to adding handling at route level is that peartree | ||
# avoids tossing direction id if a specific route has all direction | ||
# id rows filled in (while another does not, which is possible). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
evidence for the aforementioned case. this is happening
logs where there are non values in direction_id column. it is clear that some routes have full coverage but some do not. therefore we can't move this check further upstream.
Reduced selected trips on route 295-155 from 12 to 9.
this is happening
Reduced selected trips on route 296-155 from 45 to 29.
this is happening
Reduced selected trips on route 297-157 from 8 to 0.
Reduced selected trips on route 397-157 from 7 to 0.
Reduced selected trips on route 398-157 from 11 to 6.
this is happening
Reduced selected trips on route 399-157 from 9 to 0.
Reduced selected trips on route ECR-155 from 59 to 41.
Fix issue #89