Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data to add #2

Open
mpadge opened this issue Mar 15, 2017 · 11 comments
Open

data to add #2

mpadge opened this issue Mar 15, 2017 · 11 comments

Comments

@mpadge
Copy link
Member

mpadge commented Mar 15, 2017

Great article of state of American bike share systems here, with new systems including LA and Portland. Full list of systems (with direct links to data, and excluding London UK):

  1. NYC
  2. Washington DC
  3. Chicago
  4. Boston
  5. LA
  6. Philadelphia
  7. Minneapolis/St Paul
  8. San Francisco Bay Area (issue here)
  9. bixi montreal (issue here)
  10. mibici Guadalahara (issue here)

Systems not yet part of this package which are hoped to be added

  1. Ciudad de Mexico (issue here) - awaiting open data on station locations
  2. [Vancouver Mobi]https://www.mobibikes.ca/en/system-data) (issue here) - awaiting open data on station locations

Additional systems that do not (yet?) provide data:

  1. Miami
  2. Portland
  3. Baltimore

Systems which have died an ungraceful death yet which still provide (historical) data:

  1. Seattle
@mpadge
Copy link
Member Author

mpadge commented Mar 15, 2017

@richardellison just a heads up - the report linked above is generating a fair bit of attention. I reckon if we don't rush bikedata to ropensci/CRAN, somebody else is likely to do an equivalent pretty soon. (And i'd be intending ropensci because it really is a perfect fit there, methinks.) The timetable @Robinlovelace set for notional stplanr integration seems like a good motivation - that means end of March.

Now my question: You'll obviously already be a co-author of this package. How interested are you in further active involvement over the next few weeks?

@richardellison
Copy link
Contributor

I agree that the end of March is a good time frame to aim for. I'm happy to be involved but have some time constraints in the next few weeks, but happy to help where I can.

I haven't had a chance to look at the data yet but I imagine the most time consuming task to integrate the additional systems would be ensuring some standardisation of the data.

Any thoughts on if we should add functionality to allow multiple systems in a single spatialite database/tables or we should keep each system segregated?

@mpadge
Copy link
Member Author

mpadge commented Mar 15, 2017

yeah, important question. my thoughs are that at least an initial draft of the package ought to delete the database and all files on ending the R session by default (with possible override), so i was thinking all trips in one database, stored in tables named according to the cities. Erasing all data at the end is likely to be more favourable in ropensci terms, at least for initial review, and we can potentially modify down the track.

I imagine most potential users would be predominantly interested in a particular city, and so this would make package interfacing and interaction more uniform and independent of details of usage. I freely admit, however, almost entire naivety regarding database programming, and interpet your C code to indicate that you're likely way better placed to make a more informed call on this one. So i'll bounce this one back to you: Thoughts?

(Other than that, yeah, you're right, it's just a fiddly standardisation issue. No real brain work actually involved in most of it, but i'll be happy to do it anyway - I had done most of it in my former C++ code, so can just adapt that.)

@richardellison
Copy link
Contributor

Is storing a spatialite database any different than storing any other data files? There doesn't seem much point in deleting everything, when that is what takes the most time? The users also chooses where to store the data so it is a fairly simple matter of deleting the database if required.

My view is that it is generally better to store the data in a single table for all cities. That allows for efficient comparison of multiple cities if desired and should not (at least in any measurable way) slow down analysis for a single city as, I agree with you, most users are likely to be interested in. This would require an additional table that contains at least two, possibly three fields:

  1. An ID for the city (probably just a serial (auto incrementing) integer value
  2. The name of the city.
  3. An optional short form for each city (NYC, etc.)

The trips and stations tables would then need an additional field that contains the ID of the city for each record. This supposes that we can somehow conform all the cities into a single standard format (that may in itself require making additional changes). We may also want to add a simple function that prints out the IDs of each city in the database.

@mpadge
Copy link
Member Author

mpadge commented Mar 22, 2017

@richardellison quick question for ya: i imagine that the extra DB column of "city" should be automatically indexed independent of your create_index parameter? Queries are surely almost always going to be city-specific. Can you see any reason not to create this index at DB construction time?

@richardellison
Copy link
Contributor

I agree that you would almost always want it indexed. The only reason I can think of for not automatically adding the index, irrespective of the create_index parameter is consistency.

mpadge added a commit that referenced this issue May 8, 2017
mpadge added a commit that referenced this issue May 9, 2017
This was referenced May 31, 2017
@ghost
Copy link

ghost commented Mar 14, 2018

FYI - Toronto's bikeshare program has released part of its data. I don't understand why they haven't released anything beyond 2016 though: https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#343faeaa-c920-57d6-6a75-969181b6cbde

@mpadge
Copy link
Member Author

mpadge commented Mar 14, 2018

Oh that's great to know @philstraforelli, thanks! Plan is only to incorporate data that are absolutely going to be available on an ongoing basis, so that doesn't qualify at present, but we can definitely keep a watch on it.

@mpadge
Copy link
Member Author

mpadge commented Mar 15, 2018

See this great curated github list of bikeshare data thanks to Daniel Patterson's tweet - mine that for new sources! nah, forget that, it hasn't been updated for > 4 years

@pstraforelli
Copy link

Vancouver, Canada data is available here: https://www.mobibikes.ca/en/system-data

@mpadge
Copy link
Member Author

mpadge commented Feb 21, 2020

Great to hear, thanks @pstraforelli. I'll check it out and ping you via a separate issue once I confirm that the data are generally compatible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants