Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gtfsr package #55

Closed
6 tasks done
dantonnoriega opened this issue Jun 24, 2016 · 30 comments
Closed
6 tasks done

gtfsr package #55

dantonnoriega opened this issue Jun 24, 2016 · 30 comments

Comments

@dantonnoriega
Copy link

dantonnoriega commented Jun 24, 2016

  1. What does this package do? (explain in 50 words or less)

    Currently, the package facilitates the import of GTFS data (from url or local path) to create gtfs objects, validates file structure of gtfs objects, and provides functions to easily plot stops, routes, or networks.

    Future versions will build validation and modeling capabilities, and options to layer other data.

  2. Paste the full DESCRIPTION file inside a code block below.

Package: gtfsr
Type: Package
Title: Working with GTFS (General Transit Feed Specification) feeds in R
Version: 0.1.0
Authors@R: as.person(c(
    "Elaine McVey <elaine.mcvey@transloc.com> [aut, cre]",
    "Danton Noriega-Goodwin <danton@transloc.com> [aut]"
  ))
Description: Provides API wrappers for popular public GTFS feed sharing sites, reads feed data into a gtfs data object, validates data quality, provides convenience functions for common tasks.
License: GPL
LazyData: TRUE
Imports:
    dplyr,
    readr,
    httr,
    magrittr,
    stringr,
    assertthat,
    leaflet,
    sp,
    rgeos,
    scales
Suggests:
    testthat,
    knitr,
    rmarkdown
RoxygenNote: 5.0.1
VignetteBuilder: knitr
  1. URL for the package (the development repository, not a stylized html page)

    https://github.com/ropenscilabs/gtfsr

  2. What data source(s) does it work with (if applicable)?

    It is designed to work with any text files following the GTFS format. It does not come with any accompanying data.

  3. Who is the target audience?

    Transportation Economists, Transportation Consultants and Researchers, Urban Planners, Open Data Enthusiasts

  4. Are there other R packages that accomplish the same thing? If so, what is different about yours?

    None

  5. Check the box next to each policy below, confirming that you agree. These are mandatory.

    • This package does not violate the Terms of Service of any service it interacts with.
    • The repository has continuous integration with Travis CI and/or another service
    • The package contains a vignette
    • The package contains a reasonably complete README with instructions for installing the development version.
    • The package contains unit tests
    • The package only exports functions to the NAMESPACE that are intended for end users
  6. Do you agree to follow the rOpenSci packaging guidelines? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain.

    • [NO] Are there any package dependencies not on CRAN?
    • [YES] Do you intend for this package to go on CRAN?
    • [NO] Does the package have a CRAN accepted license?
    • [YES] Did R CMD check (or devtools::check()) produce any errors or warnings? If so paste them below.
R CMD check results
0 errors | 0 warnings | 1 note 
checking R code for possible problems ... NOTE
map_gtfs_agency_routes: warning in get_agency_stops(gtfs_obj, agency_id
  = agency): partial argument match of 'agency_id' to 'agency_ids'
map_gtfs_route_shape: warning in get_agency_stops(gtfs_obj, agency_id =
  agencies): partial argument match of 'agency_id' to 'agency_ids'
filter_feedlist: no visible binding for global variable ‘.’
get_agency_stops: no visible binding for global variable ‘agency_id’
get_agency_stops: no visible binding for global variable ‘route_id’
get_agency_stops: no visible binding for global variable ‘trip_id’
get_agency_stops: no visible binding for global variable ‘stop_id’
... 62 lines ...
  ‘file_provided_status’
validate_vars_provided: no visible binding for global variable
  ‘field_provided_status’
validate_vars_provided: no visible binding for global variable
  ‘validation_status’
Undefined global functions or variables:
  . agency_id color field field_provided_status field_spec
  file_provided_status file_spec n opacity popups prob_subset route_id
  route_short_name service_id shape_id shape_pt_lat shape_pt_lon
  shape_pt_sequence spec stop_id stop_lat stop_lon stop_name trip_id
  validation_details validation_status
  1. Please add explanations below for any exceptions to the above: NA
  2. If this is a resubmission following rejection, please explain the change in circumstances. NA
@sckott
Copy link
Contributor

sckott commented Jun 25, 2016

Passes all editors' checks, is a good fit

Seeking reviewers...

@sckott
Copy link
Contributor

sckott commented Jun 25, 2016

Reviewers: @Robinlovelace
Due date: 2016-07-18

@sckott
Copy link
Contributor

sckott commented Aug 1, 2016

@Robinlovelace
Due date: 2016-07-18 - hey there, it's been 37 days, please get your review in soon, thanks 😺 (ropensci-bot)

@Robinlovelace
Copy link
Member

Robinlovelace commented Aug 3, 2016

Thanks for the reminder and opportunity - here is my review of the package:

Introduction

GTFS is becoming the worldwide standard for sharing information about public transport systems. This makes it hugely important for the transport industry, public sector organisations trying to direct such systems in socially beneficial directions, and for researchers who have a range of motivations including the need to transition away from fossil fuels to save the world.

In any case, GTFS data is largely out of reach for the humble citizen like you and me. Googling "software gtfs data" reveals a cottage industry of GTFS editors and viewers, but little in the way of open source software for accessing this vital new file format. A GIS StackExchange question further shows the lack of cohesion around a single open source product (the World Bank provides a more up-to-date list of tools that can interact with GTFS data).

R is a powerful language that provides an emerging set of transport tools via packages such as stplanr, so rOpenSci is the ideal place for the development of a new package to make GTFS available to the world, in the spirit of citizen science.

Basic functionality

The package is installed and loaded easily with:

devtools::install_github('ropenscilabs/gtfsr')
library(gtfsr)

The most common thing that someone will want to use the package for, I imagine, is read in a GTFS file. This can be done easily and, based on a few tests of feeds from transit.land, reliably:

# this one failed for some reason
u = "https://www.bjcta.org/wp-content/uploads/2016/05/BJCTA_GTFS_0516.zip"
# the example in the README works
u = "http://www.co.fairbanks.ak.us/transportation/MACSDocuments/GTFS.zip"
# as does this giant one from Rio de Janeiro
u = "http://dadosabertos2.rio.rj.gov.br/dadoaberto/google-transit/google_transit.zip"
# as does this one from the USA
u = "http://www.bart.gov/dev/schedules/google_transit.zip"
gtfs_obj = import_gtfs(u)

The next logical thing you would want to do is plot the result. (Suggestion: make simply loading and plotting GTFS data appear earlier and more prominently in the README.)

This can be done easily with the sensible but clunkily-named map_gtfs_agency_routes(). Suggestion: rename this verbosely name function or create a new generic plotting function called simply map_gtfs():

map_gtfs_agency_routes(gtfs_obj)
map_gtfs_agency_routes(gtfs_obj, include_stops = TRUE)

I think in most cases users will want to see the stops. Suggestion: make include_stops = TRUE the default option in this function and the yet-to-be created generic map_gtfs().

Accessing GTFS feeds

A useful feature of the package, that goes above and beyond other GTFS software, is its ability to search for public GTFS feeds, using the transitfeeds.com API key. The guidance on how to get a key in the README is good. However, advice for storing the key could be better: the set_api_key() key works fine for one session but it seems the key must be entered afresh every for every new R session. Suggestion: recommend setting the key in .Renviron, as documented in the httr vignette, e.g. with the line GTFS_API_KEY=XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX. Then it could be automatically retrieved with each new session by adding something like the following lines to get_api_key():

if(grepl("[[:alnum:]]{8}\\-[[:alnum:]]{4}\\-[[:alnum:]]{4}\\-[[:alnum:]]{4}\\-[[:alnum:]]{12}", Sys.getenv("GTFS_API_KEY")))
  gtfs_api_key$set(Sys.getenv("GTFS_API_KEY"))

We use a similar technique in stplanr with cyclestreet_pat, which we should probably generalise to other API keys...

In any case, the most important thing is that users can quickly view valid gtfs feeds:

feedlist_df = filter_feedlist(get_feedlist()) # errors:
# Error in function_list[[k]](value) : could not find function "mutate"

Note that the above code errors, probably because the function mutate is missing a dplyr:: prefix in the definition of get_feedlist. (Suggestion: Fix this issue).

After dplyr has been loaded, the function works:

library(dplyr)
feedlist_df = filter_feedlist(get_feedlist())

As per the documentation, you can then find feeds from a specific country, e.g. Spain:

u_spain = filter(feedlist_df, grepl("Spain", loc_t)) %>% 
  select(url_d)
# get all gtfs feeds from spain
gtfs_spain = import_gtfs(u_spain)

Of the 3 imported GTFS feeds, only 1 can be plotted:

map_gtfs_agency_routes(gtfs_spain[[1]]) # fails - not a gtfs object
map_gtfs_agency_routes(gtfs_spain[[2]]) # works
map_gtfs_agency_routes(gtfs_spain[[3]]) # fails - no route_id

I think that the fact that only 1 of these Spanish GTFS datasets works says more about the variable quality of GTFS data than the package. However, such testing on a wide range of international examples could help further refine the package and make it more accepting of diverse feed formats. Suggestion: test gtfsr on a wide range of datasets, perhaps using continuous integration, although not sure how this would work with its reliance on external datasets - any ideas?

Data consistency issues aside (which are clearly not the fault of gtfsr), the package clearly does a great job of making GTFS data more accessible to the masses.

Compatibility with spatial classes

To increase the utility of GTFS objects, I think it would be fantastic if the package included functions to export them as spatial objects. That would expose them to the power of R's impressive GIS capabilities. Suggestion: explore options for converting gtfs data classes into spatial objects such as SpatialLinesDataFrame.

Note: stplanr has a basic function for this: gtfs2sldf().

Documentation

The package's vignette is currently the same as the README. This is useful for now but in the long-run the vignettes should be self-standing. Suggestion: make the README shorter and self-standing, make the vignette longer with more links to existing software and documentation for understanding and working with GTFS data.

Minor issues

The first thing I noticed when cloning this repo was its massive size (44 MB). This is predominently due to previously deleted files in the .git repo. It won't cause issues when installing it but will be a deterrent to potential developers. Suggestion: removing the culprits, e.g. using the BFG repo cleaner.

@Robinlovelace
Copy link
Member

An additional comment, more one for the longer term, is how can this interact with routing? Are there plans in future versions of gtfsr to allow journey planning, e.g. via an interface to something like OpenTripPlanner? This could help the GTFS data be used for analysis of spatio-temporal accessibility.

@sckott
Copy link
Contributor

sckott commented Aug 3, 2016

Thanks for the review @Robinlovelace !

@sckott
Copy link
Contributor

sckott commented Aug 17, 2016

@ultinomics review is in, let us know if you need any help

@dantonnoriega
Copy link
Author

Will do! I plan to go through the edits during the first week of September.

On Aug 17, 2016 3:02 PM, "Scott Chamberlain" notifications@github.com
wrote:

@ultinomics https://github.com/ultinomics review is in, let us know if
you need any help


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#55 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFaR2d-SgPKf5R2alC5VJkhoPrh4KwSaks5qg1rBgaJpZM4I-MZq
.

@sckott
Copy link
Contributor

sckott commented Aug 17, 2016

okay

@dantonnoriega
Copy link
Author

Update: I will be working on this during the last week of September during TransLoc internal "dev days". Thanks everyone and sorry for dragging my feet!

@sckott
Copy link
Contributor

sckott commented Sep 21, 2016

thanks for the update

@dantonnoriega
Copy link
Author

I've gone through most of the review (thanks, @Robinlovelace!) but still need to update the README and vignette and learn how to save the API key to .Renviron. I've also gone through most all the issues that I considered bugs. You can see what I have to work through still on the issues page.

Otherwise, the most pressing suggestions have been implemented and are currently on a branch called map-update! Once I update the README and vignette, I will merge and push to master. I'll work on the API storing thing separately since it is not critical to getting the package to work.

Let me know what's next! I really pumped about the new, updated mapping feature but would be great to get folks testing it!

@Robinlovelace
Copy link
Member

Great work - not had a chance to test it but looking forward to that.

@sckott
Copy link
Contributor

sckott commented Oct 6, 2016

@ultinomics thanks for the changes. I'll try it out the new map stuff.

Let me know what's next!

In addition to readme and vignette changes ... The API key change thing is important to get done as it's a security issue. Users of this pkg will possibly unintentionally leave their API key in an R script and then put their code up on the web (e.g., on github) for all the world to see. If they store as an env var, that won't happen

@dantonnoriega
Copy link
Author

I'll get in it! Good point!

On Oct 6, 2016 3:32 PM, "Scott Chamberlain" notifications@github.com
wrote:

@ultinomics https://github.com/ultinomics thanks for the changes. I'll
try it out the new map stuff.

Let me know what's next!

In addition to readme and vignette changes ... The API key change thing is
important to get done as it's a security issue. Users of this pkg will
possibly unintentionally leave their API key in an R script and then put
their code up on the web (e.g., on github) for all the world to see. If
they store as an env var, that won't happen


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#55 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFaR2UH8-kk4El-g3a_wiBjYHyikQuO8ks5qxUzggaJpZM4I-MZq
.

@sckott
Copy link
Contributor

sckott commented Oct 8, 2016

@ultinomics i was going to test out but the map-update branch isn't there, was it merged to master?

@dantonnoriega
Copy link
Author

dantonnoriega commented Oct 8, 2016

@sckott it was! did a few edits the day after but forgot to update ya.

@sckott
Copy link
Contributor

sckott commented Oct 10, 2016

Noticed a few more things:

  • Please add a package level manual file so that when users do ?gtfsr and ?gtfsr-package they get a high level manual file that explains the package, etc.
  • some functions from imported pkgs are called without being namespaced or imported. e.g., mutate() is called in get_feedlist() but mutate is not imported or namespaced, looks like must dplyr fxns you namespaced, and may just have missed a few
  • on r check, there are a lot of warnings about no visible binding for global variable, which seem to relate to lazy eval usage with dplyr/magrittr. You can do the globalVariables trick like https://github.com/ropensci/rnoaa/blob/master/R/globals.R or perhaps use dplyr's underscore fxn versions and quote variable names and such. Thoughts on what we recommend as best practice here @noamross

@sckott
Copy link
Contributor

sckott commented Nov 26, 2016

@ultinomics any update on your work on this package?

@dantonnoriega
Copy link
Author

dantonnoriega commented Nov 27, 2016 via email

@dantonnoriega
Copy link
Author

dantonnoriega commented Jan 4, 2017

@sckott My bad on close/comment.

Alright! I added the API .Renviron functionality and the ?gtfsr manual file and the warnings thing.

Anything else I can do? I know I need a lot more examples still but I hope it's good enough.

@sckott sckott reopened this Jan 4, 2017
@sckott
Copy link
Contributor

sckott commented Jan 4, 2017

no worries

@sckott
Copy link
Contributor

sckott commented Jan 4, 2017

@Robinlovelace any other thoughts at this point?

I'm having a last look now

@Robinlovelace
Copy link
Member

No more thoughts from me Scott (although I must confess that this is partly due to lack of time) - it looks like a well-documented and useful package! Any ideas on integration with stplanr - lots of talk there about routing, sf and all sorts at the moment.

Great work @ultinomics.

@sckott
Copy link
Contributor

sckott commented Jan 5, 2017

@ultinomics No more things, although do make sure to add more examples - perhaps open an issue https://github.com/ropenscilabs/gtfsr/issues to remind yourself to do that,

approved

@dantonnoriega
Copy link
Author

Hey @sckott, I was wondering what the next steps are. The on boarding process says "Once your package is approved, we will provide further instructions on transferring your repository to the rOpenSci repository" but no sure if I ever received those instructions. If I did, my apologies!

@sckott
Copy link
Contributor

sckott commented Feb 15, 2017

Hi @ultinomics -

  • you're already in ropenscilabs, let's move it to ropensci org account (which I can do), and then make sure you update all the links in your docs/readme/etc. (when users go to ropenscilabs/gtfsr, they'll get redirected automatically to ropensci/gtfsr) - Okay for me to transfer to ropensci now?
  • submit to CRAN when you're ready
  • Would you be interested in doing a guest blog post on our blog? about the package/can mention review process if you want

@dantonnoriega
Copy link
Author

Looping in @eamcvey.

Transfer away. I can make the edits pretty quickly to redirect. Also, happy to do a blog post but likely will do it in March, if thats ok.

Thanks, @sckott!

@sckott
Copy link
Contributor

sckott commented Feb 15, 2017

sure, no rush, i think we have a backlog anyway for guest blog posts

@sckott
Copy link
Contributor

sckott commented Feb 15, 2017

Okay, you're transferred https://github.com/ropensci/gtfsr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants