read_json and write_json #161

hadley · 2016-12-19T18:48:31Z

Would you consider adding:

write_json <- function(x, path, ...) {
  json <- jsonlite::toJSON(x, ...)
  writeLines(json, path)
}

read_json <- function(path, ...) {
  fromJSON(file(path), simplifyDataFrame = FALSE, ...)
}

That would make it slightly more symmetrical with readr, readxl and haven.

(If you don't want to add this to jsonlite, I'll probably make a tiny wrapper package, probably readjson)

jeroen · 2016-12-19T18:51:00Z

Sure. So you want this function to use fromJSON(x, simplifyVector = TRUE, simplifyDataFrame = FALSE) or is that a typo?

hadley · 2016-12-19T18:52:31Z

Hmmmm, might be more robust to not simplify vectors either.

jeroen · 2016-12-19T19:30:52Z

It's your call. I think the default behavior to simplify data frames is great for working with tidy data pipelines:

library(magrittr)
curl::curl("https://api.github.com/repos/hadley/ggplot2/issues") %>%
  jsonlite::fromJSON(flatten = TRUE) %>%
  dplyr::mutate(date = as.Date(created_at)) %>%
  dplyr::filter(user.login == "hadley") %>%
  dplyr::select(title, state, date)

It will seamlessly roundtrip between tidy data and json:

lm(mpg ~ wt, mtcars) %>%
  broom::tidy() %>%
  jsonlite::toJSON()  %>%
  jsonlite::fromJSON()

This has always been the motivation behind these defaults, and it fits nicely into the tidyverse.

hadley · 2016-12-19T21:04:31Z

My main worry is that it's a bit too magical - I'd prefer it if it worked more like col_types in readr, so you had some way to make it explicit.

@jennybc do you have any comments?

jeroen · 2016-12-19T21:48:48Z

I don't understand... col_types are needed because csv is not typed (everything is a string) but fields in json are already typed (numeric, boolean, string, ...). Why would you need col_types?

It's not that magical... it's quite well defined. Everyone stringifies dataframe-like structures (eg mysql tables) as a list of records:

> toJSON(iris, pretty=TRUE)
[
  {
    "Sepal.Length": 5.1,
    "Sepal.Width": 3.5,
    "Petal.Length": 1.4,
    "Petal.Width": 0.2,
    "Species": "setosa"
  },
  {
    "Sepal.Length": 4.9,
    "Sepal.Width": 3,
    "Petal.Length": 1.4,
    "Petal.Width": 0.2,
    "Species": "setosa"
  }
 ...

Then fromJSON simply inverts this mapping.

hadley · 2016-12-19T22:04:37Z

I guess I'm ok with simplifyVector — it's simplifyDataFrame that's more dangerous because you'll convert to a data frame if the elements have the same length, which can easily happen by coincidence.

jennybc · 2016-12-20T06:40:42Z

Searching my code ... I always use simplifyDataFrame = FALSE and frequently simplifyVector = FALSE as well.

I feel like I got here by getting surprised a few times: auto-simplifying code would "work" on a few records or on one day, but produce something quite different on the whole dataset or another day. But I don't have an example right now. Is this believable @jeroenooms?

If simplification is part of read_json(), I do like the readr system:

You can specify more types than just numeric, string, boolean. Namely integer vs double, date, time, datetime.
It feels like a good idea to state the types you're expecting. They get checked/enforced AND you're documenting the data for your future self.

This would be nice:

curl::curl("https://api.github.com/repos/hadley/ggplot2/issues") %>%
  read_json(col_types = cols_only(
    title = col_character(),
    state = col_character(),
    updated_at = col_datetime(),
    user.login = col_character()
  )) %>% 
  dplyr::filter(user.login == "hadley") %>%
  dplyr::select(-user.login)

jeroen · 2016-12-20T11:23:24Z

It depends on the input data. If the json is tidy then simplifyDataFrame = TRUE always gives a tidy data frame for any [{..}, {..}, ...] structure within the json. However to read messy nested structures, simplification is not going to help (it should not harm either) and you still get lists.

Internally, jsonlite already uses something like col_types. The simplifyDataFrame function has an argument columns which specifies the fields that need to be extracted from each record. The default for this argument is simply all names that appear in any of the records:

# find columns if not specified
if (missing(columns)) {
  columns <- unique(unlist(lapply(recordlist, names), recursive = FALSE, use.names = FALSE))
}

# Convert row lists to column lists.
columnlist <- lapply(columns, function(x) lapply(recordlist, "[[", x))

Currently this is not exported, but we could add something to support col_types.

I recommend to either disable simplification all together (simplifyVector=FALSE) as is the default in e.g httr::content, or to stick with the defaults from jsonlite::fromJSON. The combi simplifyVector=TRUE with simplifyDataFrame=FALSE would introduce a third set of default json parsing behavior which will perhaps mostly create confusion.

hadley · 2016-12-20T14:51:57Z

Ok, in that case I would prefer no simplification for read_json()

jennybc · 2016-12-20T23:19:34Z

Just unearthed a real example of typical GitHub API JSON --> data frame task for me. Parking here in case readjson comes to pass and includes readr-ish function for this. Recurring themes: limiting to specific fields, indexing >1 level down in the hierarchy with a character vector, giving the associated variable a different name in the tibble, type specification, simplification.

issue_df <- issue_list %>%
{
  tibble(number = map_int(., "number"),
         id = map_int(., "id"),
         title = map_chr(., "title"),
         state = map_chr(., "state"),
         n_comments = map_int(., "comments"),
         opener = map_chr(., c("user", "login")),
         created_at = map_chr(., "created_at") %>% as.Date())
}

jeroen · 2016-12-21T12:36:40Z

OK so I guess jsonlite should only do the parsing, and than you can do the simplification, coercion, tidyfication and transformations in another package.

jeroen · 2016-12-21T14:17:47Z

@hadley would you like path in read_json to support only file paths, or also urls and literal json strings?

hadley · 2016-12-21T15:25:09Z

I think urls and literal json strings are fine. In the longer-term, I'd like to extract out a small helper package that defines a consistent interface across paths, connections, urls, and literal input (along with some way to manual override incorrect guesses)

jeroen · 2016-12-29T14:18:02Z

Added these wrappers for version 1.2: ef112b6. Please lmk if this is what you have in mind, or if it needs additional changes.

hadley · 2016-12-29T14:22:26Z

Looks good - thanks!

jeroen · 2016-12-31T09:50:18Z

On CRAN now.

jeroen closed this as completed in ef112b6 Dec 29, 2016

philip-khor mentioned this issue Oct 12, 2018

read_json should accept strings as input #259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_json and write_json #161

read_json and write_json #161

hadley commented Dec 19, 2016

jeroen commented Dec 19, 2016

hadley commented Dec 19, 2016

jeroen commented Dec 19, 2016 •

edited

hadley commented Dec 19, 2016

jeroen commented Dec 19, 2016 •

edited

hadley commented Dec 19, 2016

jennybc commented Dec 20, 2016

jeroen commented Dec 20, 2016

hadley commented Dec 20, 2016

jennybc commented Dec 20, 2016

jeroen commented Dec 21, 2016 •

edited

jeroen commented Dec 21, 2016

hadley commented Dec 21, 2016

jeroen commented Dec 29, 2016 •

edited

hadley commented Dec 29, 2016

jeroen commented Dec 31, 2016

read_json and write_json #161

read_json and write_json #161

Comments

hadley commented Dec 19, 2016

jeroen commented Dec 19, 2016

hadley commented Dec 19, 2016

jeroen commented Dec 19, 2016 • edited

hadley commented Dec 19, 2016

jeroen commented Dec 19, 2016 • edited

hadley commented Dec 19, 2016

jennybc commented Dec 20, 2016

jeroen commented Dec 20, 2016

hadley commented Dec 20, 2016

jennybc commented Dec 20, 2016

jeroen commented Dec 21, 2016 • edited

jeroen commented Dec 21, 2016

hadley commented Dec 21, 2016

jeroen commented Dec 29, 2016 • edited

hadley commented Dec 29, 2016

jeroen commented Dec 31, 2016

jeroen commented Dec 19, 2016 •

edited

jeroen commented Dec 19, 2016 •

edited

jeroen commented Dec 21, 2016 •

edited

jeroen commented Dec 29, 2016 •

edited