Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading a CSV from Kaggle using {httr2} #180

Closed
emmansh opened this issue Nov 8, 2022 · 3 comments
Closed

Downloading a CSV from Kaggle using {httr2} #180

emmansh opened this issue Nov 8, 2022 · 3 comments

Comments

@emmansh
Copy link

emmansh commented Nov 8, 2022

Context

I want to publish a blog post that analyzes a dataset from Kaggle. I wish to make the blog post as reproducible as possible, thereby exposing the process of loading the data directly from the Internet. Therefore, any person trying to reproduce my steps could do it with just running my code and it'll work for them. However, downloading Kaggle datasets requires authentication, which is somewhat a spoke in the wheel of my ideal "copy & run" level of reproducibility.

The problem: translating {httr} code to {httr2}

I'm trying to download a csv file directly from the Kaggle website. I've come across this piece of code on Kaggle's website, which provides a way to download data from Kaggle via R.

However, this piece relies on {httr} rather than {httr2}, and I wonder what would be the {httr2} equivalent.

Somewhat reproducible example

  1. I randomly picked this dataset from Kaggle: Witch Trials Dataset

  2. I got my Kaggle token from the website's menu: Account -> API -> Create New API Token

    • this downloads a .json file to my computer with credentials for authentication.
  3. R code with {httr} to get a response object:

    library(httr)
    
    data_url_on_kaggle <- "https://www.kaggle.com/api/v1/datasets/download/michaelbryantds/witch-trials/trials.csv"
    
    my_username <- "kaggle_username_from_json"
    my_password <- "kaggle_key_from_json"
    
    response_obj <- httr::GET(data_url_on_kaggle, httr::authenticate(my_username, my_password, type = "basic")) 
  4. From here, I have two options to get the csv as a data.frame/tibble

    • using httr::content()
      df_via_content <- content(response_obj, type = "text/csv")
    • using readr::read_csv()
      library(readr)
      df_via_read_csv <- read_csv(response_obj$url)

Bottom line question

In the most common scenario, I open up RStudio when I already know:

  • The url to the data (e.g., https://www.kaggle.com/api/v1/datasets/download/michaelbryantds/witch-trials/trials.csv)
  • My credentials to access Kaggle (username & password)

When those are given, I'd like to know what the most modern way is, using {httr2} tools, to read the data as data.frame/tibble directly from the url while authenticating. Although I went over the documentation, I must admit I got lost in all the req_* functions.

@christopherkenny
Copy link

I think you're looking for a workflow like the following.

Starting with your known, but substituting in {httr2}:

library(httr2)
data_url_on_kaggle <- "https://www.kaggle.com/api/v1/datasets/download/michaelbryantds/witch-trials/trials.csv"

my_username <- "kaggle_username_from_json"
my_password <- "kaggle_key_from_json"

Then you can create a request from the url, add the authentication info, and perform the request:

response_obj <-  data_url_on_kaggle |> 
  request() |> 
  req_auth_basic(username = my_username, password = my_password) |> 
  req_perform()

Once you have the request object, you can read it easily with read_csv as before by first extracting the response body as a string (since it's text) and passing that to read_csv.

library(readr)
df_via_read_csv <- response_obj  |> 
  resp_body_string() |> 
  read_csv()

@hadley
Copy link
Member

hadley commented May 9, 2023

Thanks @christopherkenny!

@hadley hadley closed this as completed May 9, 2023
@dmontecino
Copy link

Hey, what about
data <- rvest::session_jump_to(logged.in.connect, api_address)

data <- httr::content(data$response, as = "parsed") |>
geojsonio::as.json() |>
geojsonio::geojson_sf()

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants