Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data structure #30

Closed
xamanu opened this issue Oct 2, 2016 · 20 comments
Closed

Data structure #30

xamanu opened this issue Oct 2, 2016 · 20 comments
Assignees

Comments

@xamanu
Copy link
Contributor

xamanu commented Oct 2, 2016

osm2gtfs downloads data from OpenStreetMap (Public Transport Schema version 2) and puts it into transitfeed's structure to generate GTFS.

Currently there is a parallel data structure (mainly living in the classes RouteMaster, Route, Stop) with some regional touch. We already started some discussion about this in #26, but I think it's worth to discuss this separately.

Resuming the past comments on this:

We were considering to use directly transitfeed's (routes, stops and trips objects) instead of a parallel data structure. The downside of this would be that we would make ourselves totally dependent on transitfeed's classes which might change without further notice.

I further think it's not really possible because stops associated to routes (and variants) are connected in transitfeed's objects through the trips. Something osm2gtfs generates at the very end based on stops over routes. So this would just not work, as far as I understand.

Ergo, we need a parallel structure(!?). So, let's think about this and how this would look like.

@xamanu
Copy link
Contributor Author

xamanu commented Oct 2, 2016

Here a first proposal:

RouteMaster

+ route_id: String
+ route_short_name: String
+ route_long_name: String
+ route_type: String
+ route_desc: String
+ route_color: String
+ route_text_color: String
+ bikes_allowed: String
- routes: List of RouteVariant
-------
+ get_routes(): List of RouteVariant
+ add_route(RouteVariant): void

RouteVariant

+ from: String
+ to: String
+ route_url: String
+ bikes_allowed: String
+ shape: List of Dictionary
- stops: List of String (osm_id of Stop)
-------
+ get_stops(): List of RouteVariant
+ add_stop(Stop): void
+ add_shape(List of Dic): void

Stop

+ osm_id: String
+ stop_id: String
+ name: String
+ lat: String
+ lon: String

Some thoughts:

  • When querying from OSM we try to obtain as much data possible, but all which is not there (e.g. colors, names of stops, bikes_allowed information, etc.) would be just left blank. It's task of the individual (regional) creators to enhance this information intelligently.
  • All RouteVariants are going to be added to a RouteMaster object, even if there is no master relation in OSM. A simple check creates a RouteMaster either based on the master relation or just on the fly, if non-existant.
  • Stop objects are not direcly added to the routes, but connected through a list of id's based on osm_ids.

@xamanu
Copy link
Contributor Author

xamanu commented Oct 2, 2016

I visualized it in a diagram:

osm2gtfs

@grote
Copy link
Owner

grote commented Oct 2, 2016

Looks generally good I'd say!

All RouteVariants are going to be added to a RouteMaster object, even if there is no master relation in OSM.

If we do this, can we please print a warning to stderr about the missing master relation?

@xamanu
Copy link
Contributor Author

xamanu commented Oct 2, 2016

Yes, of course (as it is already)! I think this would help people to know about this missing piece and maybe go to OSM and put it there.

@xamanu
Copy link
Contributor Author

xamanu commented Oct 30, 2016

I would like to start a quick discussion/question about naming.

We have different concepts of routes in OpenStreetMap and GTFS. Sometimes they are not existent or ambiguous. I'm considering to introduce new names for our classes, to be widely understandable and to avoid confusion:

No Description OSM GTFS Transmodel Proposed
1 A general public transport service (e.g. No. 38) route_master route LINE Line
2 A theoretical tour a bus takes, but without schedule information, it represents one each for different direction, but also if one is shorter than the other route - ROUTE/JOURNEY PATTERN Itinerary
3 An actual tour a bus takes, on a certain time - Trip VEHICLE JOURNEY Trip

Problems:

  • route: is used for different concepts (probably because of British and American English)
  • route_master: is a very technical term. I thinks, it's not understandable when looking at it naively (isn't this the bus driver?) 😄

That's why I'm suggesting to use ServiceLine, RouteVariant and Trip. And I'd like to ask for your feedback. At some point we probably should ask native speakers (British and American)

@xamanu
Copy link
Contributor Author

xamanu commented Oct 31, 2016

I asked on the OSM mailing list and Jo (Polyglot) suggested:

  1. Line; 2. Itinerary; 3. Trip

That looks also very appealing to me. (I updated the table above)

@xamanu
Copy link
Contributor Author

xamanu commented Oct 31, 2016

American friends told me that Line and Itinerary sound awkward to them. So, after some more conversations and back and forth I'm opting now for RouteContainer and RouteVariation (Updated the table).

BTW, the different use of the term route between OSM and GTFS seems not to be caused by British vs. American language, rather than OSM historical development of the data structure. First there was just a route, then it was realized a container was needed for it and somebody called it route_master...

@xamanu
Copy link
Contributor Author

xamanu commented Oct 31, 2016

Interesting, the European Transmodel standard actually defines "LINE" as the top container with one or many JOURNEY PATERNS/ROUTEs...

We are turning around and around... Maybe you have some good idea.

@grote
Copy link
Owner

grote commented Nov 1, 2016

I personally prefer Line, Itinerary and Trip, but in the end it doesn't matter all that much and we should rather spend time on improving the code than discussing names forever ;)

@xamanu xamanu mentioned this issue Nov 6, 2016
@xamanu
Copy link
Contributor Author

xamanu commented Nov 6, 2016

Yes, I agree. Let's use Line and Itinerary.

Just thinking more about this and I don't want to loose the idea: We probably want to move add_shape() logic into the OsmHelper/OsmConnector function called _build_shape() in order to make all OpenStreetMap relevant preparation in the respective class and leave the data structure to be abstract data.

@xamanu
Copy link
Contributor Author

xamanu commented Nov 6, 2016

In PR #36 we saw that the tests for doubled ref is very simple and we should make sure that two route variants with the same ref but not assigned to proper route_masters are handled with understandable error messages. But also if a route_master has an already existing ref to advice the user which relations (with their ids) need to be fixed in OpenStreetMap. A good set of tests and good error messages will be a lot easier to achieve while working on this task there. So, I'm mentioning it to be part of these changes.

@grote
Copy link
Owner

grote commented Nov 6, 2016

When fixing the data structures, maybe this library can be useful: https://glyph.twistedmatrix.com/2016/08/attrs.html
I always wanted to try this out in a project.

@xamanu
Copy link
Contributor Author

xamanu commented Nov 7, 2016

Looks like a nice library. I also want to try it.

Note to my/ourselves: We had issues with the old overpy which was not showing members of route_master relations after querying. But actually if there is no member the route_master relation wouldn't be queried at all. So we can use this behavior to actually throw an error that there is a problem with the overpy library which needs to be updated.

@xamanu
Copy link
Contributor Author

xamanu commented Nov 12, 2016

I started implementing it (not working, yet - also depending on current pull requests). Using the suggested attr library it looks quite nice! Maybe you want to sneak preview the stops.py and routes.py.

@xamanu xamanu self-assigned this Nov 19, 2016
@grote grote removed the help wanted label Nov 19, 2016
@grote
Copy link
Owner

grote commented Nov 22, 2016

@xamanu Can I use the above diagram in my State of the Map presentation? I'll also give credit to you.

@xamanu
Copy link
Contributor Author

xamanu commented Nov 23, 2016

@grote Yes, of course! Consider all documentation I write for osm2gtfs to be under CC-BY license, if not stated differently.

I'm very happy to hear you are presenting osm2gtfs on SotM Latam. Please give my warm regards to the people who know me.

@Skippern
Copy link

Skippern commented Mar 5, 2017

Looking at the source code, it seems like you need a config JSON and three .py scripts for each operator/consorcio. Wouldn't it improve data integrity to make the .py archives generic, and rather add more options into the JSON? That way only one archive need to be edited for each operator.

@jamescr
Copy link
Collaborator

jamescr commented Mar 7, 2017

Looking at the source code, it seems like you need a config JSON and three .py scripts for each
operator/consorcio. Wouldn't it improve data integrity to make the .py archives generic, and rather add
more options into the JSON? That way only one archive need to be edited for each operator.

Yes. We need generic creators (see #33 and #26) and we are going (slowly) to that. (#59)

@xamanu
Copy link
Contributor Author

xamanu commented Dec 11, 2017

Working on the data structure, I noticed, a new data class StopArea has been included. As for the naming of the other data pieces and to move the data structure towards GTFS, I suggest to rename it to Station (the GTFS term here), see the proposed stops.py.

@xamanu
Copy link
Contributor Author

xamanu commented Jan 24, 2018

Resolved with accepted PR #99

@xamanu xamanu closed this as completed Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants