-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query construction in tripmat function #4
Comments
Q1: yeah, it's coz i got quite frustrated trying to achieve the Q2: Not exactly sure what you mean there - do you mean parameterise everything in the way the dates and times already are? If so, then yeah, I'd agree with that. |
I was thinking of just constructing the query with COUNT(*) and GROUP BY directly so you don't need to rely on dplyr to do the summarise and group by. In terms of parameterised queries, I was thinking more of using something like dbBind to bind values to a particular query rather than building the query directly. As I said, perhaps it isn't really needed in this case but protecting against SQL injection is fairly common practice and also has the added benefit of protecting against some likely errors. |
yeah, that's a good point. I hadn't even begun to consider the potential consequences of failing to adequately protect SQL queries - now I understand what you mean. And I guess using In keeping with my professed ignorance of many matters database related, would you like to have a first cut at a |
That is one advantage of a spatialite database, that when the spatialite extension is not available it reverts back to a standard sqlite database with the only issue being you can't run spatial queries on the database. There shouldn't be any additional performance problems from having a spatialite database although the database would be slightly larger from having the additional database tables. I'll make some changes to just use a standard COUNT(*) and GROUP BY. |
From the list in #2, I just checked both Boston and Chicago and both have the coordinates of the stations. For Boston there is a link to a separate file containing the stations, for Chicago there is a json file that you admittedly have to dig for. I haven't looked at the others yet but I would be surprised if they weren't available anywhere. |
all systems provide station coordinates, i just meant that not all include them explicitly with the actual ride data (foremost London, which also has to be incorporated, and requires an OAuth key). DC is another strange case, where some stations have been moved a little, so coordinates change while the stations stay notionally the same (plus a couple of problematic cases where coordinates remain almost identical while stations names and purposes actually change - they use experimental stations for their own testing purposes, the clever folk). All fiddly parts of pre-processing that i'll deal with soon. I suspect a productive division of labour at present would be for me to concentrate on getting the data in, while you concentrate more on getting the data out. I'll aim to have all data in (start_id, start_name, start_time, end_id, end_name, end_time, ...), where (...) is additional user data provided by systems like NYC, but not London - so pretty much identical to the current NYC tables. I should have time to get all other systems programmed in this week. |
I see what you mean now, yes changing coordinates is an issue. Sounds like a reasonable division of labour to me. I'm working on changes to tripmat() now. |
I have switched to use RSQLite/DBI in #7. |
Is there a reason that the filter_tripmat_by_datetime function returns all the trips that match the query rather than just using a combination of a COUNT(*) and GROUP BY?
I also wonder if there is value to switching to parameterised queries where possible (it isn't in all cases in this package)? SQL injection probably isn't an issue here, but it is standard practice and can make the code somewhat easier to read.
The text was updated successfully, but these errors were encountered: