Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor OsmHelper #31

Closed
pantierra opened this issue Oct 2, 2016 · 6 comments
Closed

Refactor OsmHelper #31

pantierra opened this issue Oct 2, 2016 · 6 comments

Comments

@pantierra
Copy link
Contributor

pantierra commented Oct 2, 2016

The OsmHelper is the part of the script that retrieves data from OpenStreetMap and puts it into the own class/object data structure (to be) discussed in #30. It has included a simple caching functionality that saves data obtained from OpenStreetMap into files and grabs them, if present, to avoid unnecessary hits to the OSM's Overpass API.

I think the current structure of the OsmHelper can be improved and I'm doing a very first draft here, so we have a base for discussion:

OsmHelper

- config
- routes {route_id, RouteMaster}
- stops {osm_id, Stop}
-------
+ get_routes(): Dic of RouteMaster
+ get_route(String route_id): RouteMaster
+ get_stops(): Dic of Stop
+ get_stop(String osm_id): Stop

- query_routes(): Dic of RouteMaster
- query_route(ref): RouteMaster
- query_stops(): List of Stop
- query_stop(osm_id): Stop

+ cache_refresh(name): void
- cache_write(name, content): void
- cache_read(name): content

Some thoughts:

  • Using generic read and write caching functions (calling for routemasters, routevariants and stops)
  • Use one single cache_refresh function, that takes as an argument if all or a certain part of the cache shall be refreshed.
  • All RouteVariants are going to be added to a RouteMaster object, even if there is no master relation in OSM. A simple check creates a RouteMaster either based on the master relation or just on the fly, if non-existant.
  • The get-functions are wrappers that check for cached data, return this or query from OSM if data is not cached.
@pantierra
Copy link
Contributor Author

I visualized it in a diagram:

osm2gtfs

@grote
Copy link
Owner

grote commented Oct 2, 2016

Looks nice!

Where are the methods to refresh a route (and maybe even a stop)? Is this cache_refresh? If so, maybe instead of passing a name, have dedicated methods for each type.

I was passing the routes into get_stops() and sometimes chaining these calls, because there's no point getting a stop if it is not part of a route relation, right? Also, you when you refresh a route, this might add new stops that you didn't have before.

The same applies when you refresh a route master, you might get a new route variant and new stops.

@pantierra
Copy link
Contributor Author

pantierra commented Oct 2, 2016

Refresh routes, stops and single route would happen through cache_refesh with an argument stating what to refresh:

  • cache_refresh('routes')
  • cache_refresh('route', '255')
  • cache_refresh('stops')

The function cache_refresh contains a if/elseif statement select what functions to run for refresh. Personally I prefer to have one wrapper function than three single-line functions.


I think it's better to handle stops independently of routes. I can think about several reasons for this:

  • We can optimize Overpass queries to not query again and again for each route's stops.
  • We can do validation and e.g. print a warning, when stops are not part of any route, or likewise if a route has a stop that is not properly tagged.
  • transitfeed handles stops independently from routes and we can also run logic better in encapsulated StopsCreators and RoutesCreators.

For the case when a route refresh comes up with new stops that are not cached already, I introduced the query_stop(osm_id) function to lazy load these stops.


I generally would like to handle all OSM specific things inside OsmHelper. This means, applied to routes, that from outside (of OsmHelper) there only exist routes (like in GTFS). All RouteMaster and RouteVariant stuff is handled inside the helper. When you refresh a route OsmHelper checks the master relation and route variants for it to be refreshed.


@grote
Copy link
Owner

grote commented Oct 2, 2016

Alright, sounds good. The cache refresh method is a matter of taste. If we might want to refresh different things in the future, I would agree with the parameter approach, but there will never be more than stops and routes. The arbitrary parameters you need to to error handling and offload to independent methods anyway. So I'd say using dedicated refresh methods is cleaner.

@pantierra
Copy link
Contributor Author

pantierra commented Oct 2, 2016

Yes, indeed, it's a matter of taste. I don't really care. Let's look how it looks like when we implement it. The only addition to refresh I can think of right now, is what you said above: refresh a single stop.

@pantierra
Copy link
Contributor Author

Refactored with the latest commit. Documenting here the changes and closing the issue.

  • Renamed to OsmConnector
  • Moved all caching to a Cache class
  • Added comments to program code

Implemented the following structure:

OsmConnector

- config
- routes {route_id, RouteMaster}
- stops {osm_id, Stop}
-------
+ get_routes(): Dic of RouteMaster
+ get_stops(): Dic of Stop

- _build_route_master(): RouteMaster
- _build_route_variant(): Route
- _build_route_stop(): Stop


- _query_routes(): Dic of RouteMaster
- _query_stops(): List of Stop

@pantierra pantierra self-assigned this Nov 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants