[reliability] Consistent handling of direction_id when NaNs present #90

yiyange · 2018-07-12T18:13:11Z

Fix issue #89

kuanb

Thanks for the PR. Leaving one initial comment. Want to make sure this is "upstream" enough to handle all issues with the direction id column.

kuanb · 2018-07-12T18:15:49Z

peartree/parallel.py

@@ -165,7 +175,7 @@ def generate_all_observed_edge_costs(trips_and_stop_times: pd.DataFrame
                dir_mask = (tst_sub.direction_id == direction)
                tst_sub_dir = tst_sub[dir_mask]
            else:
-                tst_sub_dir = tst_sub
+                tst_sub_dir = tst_sub.copy()


Why the removal of the .copy()? Without examining it deeply, shouldn't we defensively employ .copy() to avoid upstream impacts? It's removal does not seem to be relevant to handling the direction_id column being dropped.

Ignore this. I am so sorry.

kuanb · 2018-07-12T18:19:15Z

peartree/parallel.py

@@ -74,9 +74,19 @@ def generate_route_costs(self, route_id: str):
                     'departure_time']
        trips_and_stop_times = trips_and_stop_times.sort_values(sort_list)

+        # Check direction_id column value before 


Per our discussion offline - the advantage to adding the handling here is that it is within a single route. This means that we do not end up tossing direction id if a specific route happens to have all direction id rows filled in. It is possible that a route operator has direction id for one route but not another.

comments added. thanks!

codecov · 2018-07-12T18:25:19Z

Codecov Report

Merging #90 into master will increase coverage by 0.26%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #90      +/-   ##
==========================================
+ Coverage   91.64%   91.91%   +0.26%     
==========================================
  Files          12       12              
  Lines         862      866       +4     
==========================================
+ Hits          790      796       +6     
+ Misses         72       70       -2

Impacted Files	Coverage Δ
peartree/parallel.py	`98.56% <100%> (+1.52%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 488f060...e0b90c3. Read the comment docs.

kuanb · 2018-07-12T18:30:48Z

@yiyange made some slight changes to the comments (see above commit).

kuanb · 2018-07-12T18:49:47Z

Final request, @yiyange, please add a test in test_paths.py. Something along the lines of:

def test_loading_in_too_small_timeframes():
    path = fixture('samtrans-2017-11-28.zip')
    feed = get_representative_feed(path)

    # Overwrite the stop times so that direction id is null
    st = feed.stop_times
    st.direction_id = np.nan
    feed.stop_times = st

    # Make sure the behavior is as expected
    start = 7 * 60 * 60
    end = 10 * 60 * 60
    G = load_feed_as_graph(feed, start, end)

    # Add an assertion that stops times are not null

kuanb

This looks great, thank you.

yiyange · 2018-07-12T21:53:54Z

peartree/parallel.py

+        # trips_and_stop_times to generate wait and edge costs
+        # Note: Advantage to adding handling at route level is that peartree
+        #       avoids tossing direction id if a specific route has all direction
+        #       id rows filled in (while another does not, which is possible).


evidence for the aforementioned case. this is happening logs where there are non values in direction_id column. it is clear that some routes have full coverage but some do not. therefore we can't move this check further upstream.

Reduced selected trips on route 295-155 from 12 to 9. this is happening Reduced selected trips on route 296-155 from 45 to 29. this is happening Reduced selected trips on route 297-157 from 8 to 0. Reduced selected trips on route 397-157 from 7 to 0. Reduced selected trips on route 398-157 from 11 to 6. this is happening Reduced selected trips on route 399-157 from 9 to 0. Reduced selected trips on route ECR-155 from 59 to 41.

drop direction_id column if it contains nan

15ae636

kuanb changed the title ~~Drop direction_id column if it contains nan~~ [reliability] Consistent handling of direction_id when NaNs present Jul 12, 2018

kuanb reviewed Jul 12, 2018

View reviewed changes

add comments

2e992c4

Added more comment tweaks

a94cb4b

Yiyan Ge added 2 commits July 12, 2018 14:26

add test

2b10cfc

minor

e0b90c3

kuanb approved these changes Jul 12, 2018

View reviewed changes

yiyange commented Jul 12, 2018

View reviewed changes

kuanb merged commit eb4fdab into kuanb:master Jul 12, 2018

kuanb mentioned this pull request Jul 14, 2018

Why handle direction_id being NaN and no direction_id differently? #89

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[reliability] Consistent handling of direction_id when NaNs present #90

[reliability] Consistent handling of direction_id when NaNs present #90

yiyange commented Jul 12, 2018

kuanb left a comment

kuanb Jul 12, 2018

kuanb Jul 12, 2018

kuanb Jul 12, 2018

yiyange Jul 12, 2018

codecov bot commented Jul 12, 2018 •

edited

Loading

kuanb commented Jul 12, 2018

kuanb commented Jul 12, 2018 •

edited

Loading

kuanb left a comment

yiyange Jul 12, 2018

[reliability] Consistent handling of direction_id when NaNs present #90

[reliability] Consistent handling of direction_id when NaNs present #90

Conversation

yiyange commented Jul 12, 2018

kuanb left a comment

Choose a reason for hiding this comment

kuanb Jul 12, 2018

Choose a reason for hiding this comment

kuanb Jul 12, 2018

Choose a reason for hiding this comment

kuanb Jul 12, 2018

Choose a reason for hiding this comment

yiyange Jul 12, 2018

Choose a reason for hiding this comment

codecov bot commented Jul 12, 2018 • edited Loading

Codecov Report

kuanb commented Jul 12, 2018

kuanb commented Jul 12, 2018 • edited Loading

kuanb left a comment

Choose a reason for hiding this comment

yiyange Jul 12, 2018

Choose a reason for hiding this comment

codecov bot commented Jul 12, 2018 •

edited

Loading

kuanb commented Jul 12, 2018 •

edited

Loading