-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstract schedule data input #9
Comments
I think this should be the last step of making this script truly general, also because it is the most difficult one and probably the one where we are not yet sure what we really need and how these yet unknown use-cases can be covered best by an abstraction layer. Just feeding in static files might be to inflexible. I expect most people still needing to integrate some logic. There could maybe be something like a abstract But that's just a first idea. Let's see how exactly the various use-cases look like and what architecture would accommodate them best. |
In a first, more or less easy and working step, I was thinking about an abstract |
If you like, I don't mind if you start factoring out things into different classes. You should just be prepared that this might need to be refactored again. Maybe it would also be a good idea to start specifying the classed (maybe in UML?) here in this issue before the implementation is started, so we have a chance to discuss the new architecture first. |
Of course, steady refactoring is a good thing. I don't expect all things to be carved in stone. As a first step I would change as less as possible but to make it possible to use additional conversion, for @jamescr and myself. I hope this will grow in the future and improve quality over time. Yes, I agree it'd be helpful to draw classes first and discuss them here. Thanks for your willingness to help with that! I will draw some sketches we can discuss. |
Here a first sketch about the structure. Probably there is still a lot to talk about. But it's hopefully a start and base to discuss the structure. Probably it'd make sense to wrap all together in one class, which also does the OSM querying for us. And then uses transitfeed to build the GTFS all together. Is this what you mean with the |
I was thinking of So maybe we start simple and just move all Fenix logic into a class that extends a Now that I think more about it, your approach is probably better. So the osm2gtfs script should continue to write and validate the GTFS feed. It should also add the agency and the feed info from the config file like it is now. So that part doesn't change (for now). The only thing that is really specific for each area is adding the routes, trips and shapes to the transitfeed schedule object. So what we could do as a start is just have an abstract
If we wanted to introduce more methods, there could for example be a When the route information comes completely from OSM, there could also be a Contrary to the sketch above, the @jamescr I'm pinging you, so you can also add your thoughts to the discussion if you want to. |
This is how I would understand the
This is a really interesting point and I'd like to elaborate more on it. When I started to make my head around it I came to the assumption that actually routes, shapes and stops can be added to transitfeed without any local particularity nor time information. I thought it could be created directly from the data obtained from OSM. The only very flexible part is how to generate the trips (and their schedules/time information) by different kind of data sources and logic. This is why I introduced the extendable class I tried to not use names that are already used for something else in the script. E.g. route: a route in OSM is a RouteVariant (ida or vuelta), but a route in transitfeed/GTFS is a RouteMaster (ida and vuelta together). Just using route, I thought, could be misleading. A similar, but not as complicated problem I see now with schedule as in transitfeed it's the main class that holds everything together and myself (maybe it's just me) thinks more about the time information in relation to all this. So, I'd even suggest to call this class |
I agree with the idea of a osm2gtfs that handle arguments and config. A OSMHelper that, using OSM nomenclature and their own objects, give us route masters (and routes), stops and shapes. The ScheduleCreator (TripsCreator in the diagram) seems nice, but maybe we should use the GTFS nomenclature there and as much as possible the transitfeed objects. Although I need to check deeper transitfeed, this could help avoiding the naming issue @xamanu mention. Then the GTFSWritter (method inside the osm2gtfs?) puts all together (here we should be careful with nomenclature) and validate it. |
@jamescr If there is no more feedback, can I understand the proposed structure to be suitable for all in the team? Thanks for your time looking into this. |
I think the only difference at the moment is that your proposal will only let the different creators create trips, while I think it would be more flexible to pass in the schedule object and let the creator also handle adding stops and routes (with default implementation in the super creator). This is more flexible and allows the creator to change not only trip creation. For example, I could imagine this being useful when the creator also creates an overview over what routes available in the schedule are still missing from OSM. |
Thanks for your input @grote! Allowing to merge geo data from other sources with osm data would be a completely new extended functionality. And even more than the name osm2gtfs promises. As we have discussed in other occasions, I'm really in favor of pushing people to get the data properly in OpenStreetMap (and transport schema 2) and then extract it from there. For now, and in this issue I think we should go the way to abstract the schedule creation process. Merging of different geo sources would probably be a complete new and complex step, which need a lot more thinking. |
I think there's a misunderstanding. I don't want to introduce new complex steps. On the contrary, I want to keep things as simple and flexible as possible and don't want to merge different geo sources. To summarize my proposal: Do a |
There is something with the TripsCreator that I haven't clear, What returns the getTrips method?. The ScheduleCreator @grote proposes makes sense to me. (I reminds me a little the GTFSWriter we talked) |
Yes, I also have in mind to create the optional frequencies.txt (in the future) and having a If we'd eliminate the |
@jamescr The getTrips() would return a list of trips you can then add to the transitfeed Schedule object routes with Stop_ids: Interesting question. I'm haven't really thought about that. But I guess they are created/defined in the |
I was just talking directly with @jamescr over Mumble and we made changes to the uml drawing. Basically we figured out to move directly I like it, as it makes sense. But I feel also we are moving around without getting to the point 😄 Maybe @grote input or sleeping a night over it brings more clarity. |
Great you guys got together to improve this. Looks nice! I'm just worried a bit that this is already getting too complex for the first step. Maybe we save this for the second step and start a bit simpler with less classes and creators. Here's some pseudo code for how it could be simpler in the first step: schedule = transitfeed.Schedule()
creator = FenixCreator(schedule)
creator.add_routes(osm_routes) # has default implementation in ScheduleCreator
creator.add_stops(osm_stops) # has default implementation in ScheduleCreator
creator.add_trips() # is abstract in ScheduleCreator So this is really minimal, has just two more classes compared to now and these classes only have three public methods. Maybe it is better to start like that and then build things up from there as needed? |
Thanks for the pseudo code, @grote. This glues together what wasn't clear, yet. But I still don't like the custom code ( Further I think it makes it a lot easier to code (already now) if we separate stop and routes logic into two separate classes. For now, I agree, we don't need to have an abstract classes defining them as this is going to be one standard implementation. So, this would look something like this: And the pseudo code for the main (control) script logic would look like:
This seems still understandble to me in one step and would allow us to abstract it in future steps easily. |
Awesome @xamanu! I think this looks really good for the first step and we are very close to a consensus! The only thing that can maybe be improved is that the |
I don't think we'd need three config elements for each type of creator. Let me explain: We have two (flexible) sources of information:
Therefore we need only two. For the geo information (1) we already have the If, at some point in the future, we really wanted to extend the flexibility on the geo information part (... support other sources than OSM or other schemas than the standard) we could either:
But anyway I think this is not really something that needs to be decided now. |
I just came up with another idea. No clue if you guys like it: Naming convention. We could just do the following in
Same for This would allow us to have always a fallback to the standard implementation which always can be extended on demand, not by code nor more configuration, just by defining a class with a certain name which changes the implementation of functions that differ to the standard implementation: e.g. |
So did I understand correctly then that the I thought the But I see that your idea with the class finder might take care of the creator choosing problem. However, I am not sure we need that level of complexity for something one simple method (that you can override if you want) would also give us. Quoting your pseudo code: # Add elements
schedule = routes_creator.add_routes(schedule)
schedule = stops_creator.add_stops(schedule)
schedule = trips_creator.add_trips(schedule) Would we need to re-assign the schedule here each time or would the creator work on the same schedule object? |
I had a call today with @grote and we could talk more deeply about how to implement the changes. One mayor information I wasn't explaining properly in the proposal was to move all respective code from the current Selection of the respective creators is happening based on naming convention (just implemented in my repo) and only one new config element ( Regarding the last question by @grote it's not really about re-assigning schedule each time but rather to pass the schedule object into the functions by reference (probably my pseudo code is misleading as it's not python) |
Instead of moving data retrieval and caching logic from osmhelper into the creators I left them in the osmhelper module, but moved it inside of a class, so a OsmHelper object containing the retrieved data can be injected into the creators. This allows us to keep the Creator classes (and the related pull request) smaller and understandable. Separately we can talk about the data structure itself (#30) and a refactoring of the OsmHelper (#31). |
Wonderful! This went in and was merged to the main repository. Closing this issue here. |
Currently, the data about the schedule information is getting pulled in a very custom way into the script.
Others would like to use the script. For now I see different ways for feeding the script:
I suggest moving different ways on pulling schedule data into different files, which could be handled similar to small plugins. Basically all of them then have to output the same information to the main script: Stop times of the first and last stops (mostly terminals, I guess).
Probably we could specify in the config file (#3) the name of the schedule information plugin. Do you have any suggestions on how to implement this properly?
This is an important step in order to make this tool generically usable.
The text was updated successfully, but these errors were encountered: