Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ckan_fetch() to work with txt files #165

Merged
merged 7 commits into from
Dec 3, 2020
Merged

Update ckan_fetch() to work with txt files #165

merged 7 commits into from
Dec 3, 2020

Conversation

sharlagelfand
Copy link
Contributor

Updated ckan_fetch() to work with txt files (closes #160) - as we discussed in that issue, uses read.table() and errors if the txt file cannot be read via read.table(). Added an example into the docs too.

Just to look at when it works vs fails, using the examples from the linked issue:

library(ckanr)

head(ckan_fetch("https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/7b70189a-aede-42f1-b092-8708fa4f5fc3/resource/e4211f49-611f-438c-a444-aaa7f3f84117/download/pickup-schedule-2020.txt"))
#            V1           V2       V3      V4        V5        V6
# 1    Calendar WeekStarting GreenBin Garbage Recycling YardWaste
# 2 MondayNight     01/06/20        M       M         0         0
# 3 MondayNight     01/13/20        M       0         M         0
# 4 MondayNight     01/20/20        M       M         0         0
# 5 MondayNight     01/27/20        M       0         M         0
# 6 MondayNight     02/03/20        M       M         0         0
#              V7
# 1 ChristmasTree
# 2             M
# 3             0
# 4             M
# 5             0
# 6             0
library(ckanr)

ckan_fetch("https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/c238eb23-474a-445f-8037-50d440502219", format = "txt")
# Error: File cannot be read via `read.table()`. Please download and import into R manually. 

I'm using the default header = FALSE in read.table() - I figure it's less of a pain to make one of the rows into the column names than it is to put the column names back into a row - lmk if you have strong feelings on this

I also took the opportunity to fix one error (that I wrote, of course!) which was missing call. = FALSE

Also, thank you for the access to push to the repo! 🎉

@sckott
Copy link
Contributor

sckott commented Nov 24, 2020

welcome on push access - well deserved and thanks for making the package better

having a loook

@sckott sckott added this to the v0.6 milestone Nov 24, 2020
@sckott
Copy link
Contributor

sckott commented Nov 24, 2020

what does it look like when you try to use ckan_fetch on a txt file that is really a readme or similar?

@hannaboe
Copy link
Contributor

Thank you! I was thinking of adding an option for specifying the delimiter as well since we have a data portal with mainly ;-separated txt files. But maybe that's too specific?

@sharlagelfand
Copy link
Contributor Author

@sckott fails with that error message if it can't be read via read.table(). but one option could be to try read.table, and if it errors, fall back to readLines(). This is the current behaviour though on e.g. this resource

library(ckanr)

ckan_fetch("https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/1050c7c5-54a3-4be2-ab75-971d640973bd/resource/43fa5931-7cca-4f16-82e8-3152b2f5af07/download/traffic-counts-cycling-readme.txt")
# Error: File cannot be read via `read.table()`. Please download and import into R manually. 

@sharlagelfand
Copy link
Contributor Author

@hannaboe can you link an example txt file with ; separation, if you have a public example?

my initial thoughts are that it's too specific and a workaround could be to DIY the session fetch yourself, i.e. ckan_fetch with store = "disk" to a tempfile, then read that in via read.table() with delimiter specified. could be something we need to consider though (and if so we should update fetching CSV to allow different delimiters too) especially for instances/locales where comma delimited is less common - thoughts @sckott?

just re: implementation, would have to be a named arg since the ... argument is already in use for something else in this function (passing to crul::verb-GET).

@sckott
Copy link
Contributor

sckott commented Nov 30, 2020

that error behavior is good.

you're welcome to change the ellipsis to pass on args to read.table - users can still set curl options outside of the function call - just make sure to update the docs and make sure the ellipsis doesn't pass through the http request fxn anymore

@hannaboe
Copy link
Contributor

hannaboe commented Dec 1, 2020

@sckott
Copy link
Contributor

sckott commented Dec 1, 2020

@sharlagelfand looks like there's a conflict here now, can you fix that?

@sharlagelfand
Copy link
Contributor Author

@sckott fixed the conflict, thanks! getting to the other comments now

@sharlagelfand
Copy link
Contributor Author

thanks for linking that @hannaboe! I will use it for testing.

@sckott i'm realizing that i'm not well versed in using ... to pass on to named arguments to other functions - do you know of a function that does this that you can link for me to look at? or somewhere in advanced r etc

another option is outright having an explicit sep argument, since it's used in both read.csv and read.table. but i do like the idea of allowing ... to use in any of the functions used in read_session (though i can see the explanation of it getting verbose - but could be nice for people to have more control over the reading function)

ty!

@sckott
Copy link
Contributor

sckott commented Dec 3, 2020

hmm, you can't pass ellipsis to specific named arguments. i think the most common use case is when you are letting the user pass on one or more named arguments to an internal function:

bar <-  function(x, y, z) c(x, y, z)
foo <-  function(x, y, ...) {
   bar(...)
}
foo(x = 4, y = 6, z = 7)

I don't think it's a good idea to have matching args in ckan_fetch for each arg in a reader function as that can add a lot of parameters to ckan_fetch. Seems like an ideal use for the ellipsis to me at least

@sharlagelfand
Copy link
Contributor Author

ahhh thank you, didn't realize it was that easy!

yeah ellipsis sounds good to me. i've implemented it as well as some additional details in the docs as to what function the ... is passed to, - lmk what you think.

with this implemented, the file @hannaboe linked now reads in easily (i added this as an example in the docs too):

library(ckanr)
ckanr_setup("https://data.coat.no")
res <- resource_show(id = "384fe537-e0bd-4e57-8a0d-420b7a745196", as = "table")
x <- ckan_fetch(res$url, sep = ";")
head(x)
#          V1               V2            V3          V4      V5         V6
# 1 sn_region      sn_locality    sn_section     sn_site sn_plot     t_date
# 2  varanger vestre_jakobselv torvhaugdalen vj_to_sn_22       0 2011-03-20
# 3  varanger vestre_jakobselv torvhaugdalen vj_to_sn_22       5 2011-03-20
# 4  varanger vestre_jakobselv torvhaugdalen vj_to_sn_22      10 2011-03-20
# 5  varanger vestre_jakobselv torvhaugdalen vj_to_sn_22      -5 2011-03-20
# 6  varanger vestre_jakobselv torvhaugdalen vj_to_sn_22     -10 2011-03-20
#             V7      V8        V9
#   1 v_observer v_depth v_comment
#   2       <NA>     310      <NA>
#   3       <NA>     350      <NA>
#   4       <NA>     370      <NA>
#   5       <NA>     230      <NA>
#   6       <NA>     200      <NA>

@sckott
Copy link
Contributor

sckott commented Dec 3, 2020

Nice work @sharlagelfand , LGTM

@sckott sckott merged commit 638d07d into master Dec 3, 2020
@hannaboe
Copy link
Contributor

hannaboe commented Dec 4, 2020

Perfect, thank you!

@sckott sckott deleted the fetch-txt branch December 4, 2020 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Including option for txt-files in ckan_fetch()
3 participants