- 
                Notifications
    You must be signed in to change notification settings 
- Fork 26
feat: add functions to convert content lists to data frames #470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
57f5519
              a4db014
              0cf14ea
              a8f4294
              67b3298
              62ef6ba
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|  | @@ -1504,7 +1504,7 @@ get_content_packages <- function(content) { | |||||||||||||||||||||
| #' usually not what you want. | ||||||||||||||||||||||
| #' | ||||||||||||||||||||||
| #' @return | ||||||||||||||||||||||
| #' A list of [Content] objects. | ||||||||||||||||||||||
| #' A list of [Content] objects, of class "connect_content_list" | ||||||||||||||||||||||
| #' | ||||||||||||||||||||||
| #' @details | ||||||||||||||||||||||
| #' Please see https://docs.posit.co/connect/api/#get-/v1/search/content for more | ||||||||||||||||||||||
|  | @@ -1548,9 +1548,12 @@ search_content <- function( | |||||||||||||||||||||
| limit = limit | ||||||||||||||||||||||
| ) | ||||||||||||||||||||||
|  | ||||||||||||||||||||||
| purrr::map(res, function(x) { | ||||||||||||||||||||||
| content_list <- purrr::map(res, function(x) { | ||||||||||||||||||||||
| Content$new(client, x) | ||||||||||||||||||||||
| }) | ||||||||||||||||||||||
|  | ||||||||||||||||||||||
| class(content_list) <- c("connect_content_list", class(content_list)) | ||||||||||||||||||||||
| content_list | ||||||||||||||||||||||
| 
      Comment on lines
    
      +1555
     to 
      +1556
    
   There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to preface this class with  I hate  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The S3 classes we've been playing with so far (just integrations and @karawoo's work on users right now) all use a  My thinking is that S3 classes are more common than R6 classes and are a lot less strict in their form, so having more verbosely & explicitly-named classes here is better. There wasn't much discussion of this approach at the time, so I didn't think it was controversial! (integrations: #431, #437, #440, user: #462) If we conclude that it should just be named  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've seen S3 classes commonly be prefixed by the package name since they don't show up namespaced when you check them a la  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nods ok, I'm fine with leaving it in if it's systematic (and not unique to connectapi) | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|  | ||||||||||||||||||||||
| .search_content <- function( | ||||||||||||||||||||||
|  | @@ -1573,3 +1576,45 @@ search_content <- function( | |||||||||||||||||||||
|  | ||||||||||||||||||||||
| client$GET(path, query = query) | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|  | ||||||||||||||||||||||
| #' Convert content list to a data frame | ||||||||||||||||||||||
| #' | ||||||||||||||||||||||
| #' @description | ||||||||||||||||||||||
| #' Converts a list returned by [search_content()] into a data frame. | ||||||||||||||||||||||
| #' | ||||||||||||||||||||||
| #' @param x A `connect_content_list` object (from [search_content()]). | ||||||||||||||||||||||
| #' @param row.names Passed to [base::as.data.frame()]. | ||||||||||||||||||||||
| #' @param optional Passed to [base::as.data.frame()]. | ||||||||||||||||||||||
| #' @param ... Passed to [base::as.data.frame()]. | ||||||||||||||||||||||
| #' | ||||||||||||||||||||||
| #' @return A `data.frame` with one row per content item. | ||||||||||||||||||||||
| #' @export | ||||||||||||||||||||||
| as.data.frame.connect_content_list <- function( | ||||||||||||||||||||||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would this method differ for any other list-of-records from the API? This makes me think you want to have some inheritance, set the S3 class as something like  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is probably correct, but I haven't gone that route yet because I don't want to over-generalize before we have enough instances of the pattern to know what the caveats are. Stuff like… the difference between S3 and R6 classes (see #470 (comment)), whether we want to rely on typed parsing, etc. I don't have a good sense for when the right moment to generalize is, but my gut is that it's a little in the future. | ||||||||||||||||||||||
| x, | ||||||||||||||||||||||
| row.names = NULL, # nolint | ||||||||||||||||||||||
| optional = FALSE, | ||||||||||||||||||||||
| ... | ||||||||||||||||||||||
| ) { | ||||||||||||||||||||||
| content_tbl <- as_tibble(x) | ||||||||||||||||||||||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Logically this seems backwards since a tibble just has annotations on top of a data.frame, but I'm guessing it would be a bigger change to unwrap everything to sort that out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it does seem backwards. This is because  Yep: #426 (comment) :) | ||||||||||||||||||||||
| as.data.frame( | ||||||||||||||||||||||
| content_tbl, | ||||||||||||||||||||||
| row.names = row.names, | ||||||||||||||||||||||
| optional = optional, | ||||||||||||||||||||||
| ... | ||||||||||||||||||||||
| ) | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|  | ||||||||||||||||||||||
| #' Convert integration list to a tibble | ||||||||||||||||||||||
| #' | ||||||||||||||||||||||
| #' @description | ||||||||||||||||||||||
| #' Converts a list returned by [search_content()] to a tibble. | ||||||||||||||||||||||
| #' | ||||||||||||||||||||||
| #' @param x A `connect_content_list` object. | ||||||||||||||||||||||
| #' @param ... Unused. | ||||||||||||||||||||||
| #' | ||||||||||||||||||||||
| #' @return A tibble with one row per content item. | ||||||||||||||||||||||
| #' @export | ||||||||||||||||||||||
| as_tibble.connect_content_list <- function(x, ...) { | ||||||||||||||||||||||
| content_data <- purrr::map(x, "content") | ||||||||||||||||||||||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think I understand the data structure here. What does this do? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The  Line 23 in b55a81b 
 The  With newer S3 class objects e.g.  (For example, to get the guid of an S3  | ||||||||||||||||||||||
| parse_connectapi_typed(content_data, connectapi_ptypes$content) | ||||||||||||||||||||||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is now the time to 🚮 the typing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure. 😣 I know you want me to remove — I would keep it, personally! I'll try parsing it un-typed and see how that feels. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Parsing without providing a prototype (using  Date times parse as character, which would shift the burden onto our customers to figure out the correct way to parse. I'm not sure if the logic we have about parsing date times is necessary complexity or not, but if it is, then it seems like it adds value and utility to handle that for the user. Perhaps I'm thinking about this wrong though — perhaps we should just parse everything as a character vector — is that what you're thinking? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO we should not scope creep the type conversion business into this PR. It's worth a discussion but it can be separate. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The JSON responses from the server are generally already correctly decoded (floats, ints, logicals, characters, etc all just work). There are a handful of fields that don't automatically parse, but the vast majority do just fine. Thinking about how we actually want our core  Lines 15 to 24 in b55a81b 
 parserargument, I haven't dug into detail if it's intended use is something like this, but that might also be the right place for this.For the purposes of this PR: maybe it makes sense to keep the parsing here where it is and we'll rip it out when we rip all of it out soon (I hope?). Or maybe it's not that big of a deal that the two(and if I'm looking at this correctly it is only two?) columns that are strings-of-datetimes aren't correctly translated in this case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks so much for this! I had realized I needed to ask about… what would replace it or perform that sort of function. I definitely agree that the better place for it to happen is either during JSON parsing or during object creation. Doing it during conversion to a data frame is the wrong place for a server-object-centric package! Happy to rip it out from here soon — I would prefer to do so after doing a bit of digging to see how much is actually needed. Like, if the only thing that gets parsed wrong at JSON-parse time are datetimes, it's not really a big deal. (At a cursory look, in the data frame conversion, an untyped conversion has a fair number of differences, but mostly in numeric types.) I'll create an issue to track this though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this answers my question below about what is in the data structure. You might want to reconsider this eager conversion to R6 objects. It's not free, and this list of records might be huge. Suppose it is only 1ms to create an R6 object (I've seen it be a few ms), but if you've got 1000 content items, that's a second of latency you've added. And most cases you probably won't use the vast majority of those R6 objects (and certainly not when you call as.data.frame on it).
Instead, you could keep this data structure as a bare list of content records (+ the client object in an attribute), and define a
[[method that extracts and wraps in the R6 object.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimentally, this does add some latency — with about 7000 items I saw about a second — it seems manageable, if not ideal.
Would defining a
[[method just shift that latency to other calls?Returning a list of R6 objects allows all of the existing R6 content code to just work on that list's content — would defining
[[make that contingent on using that function to access the items? Like, wouldpurrr::map(content_list, lock_content)work?The change to returning R6 objects isn't a part of this PR, so I'd like to create an issue to track this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IME the lapply functions work by
[[ing elements so yes, it would work--though that's an empirical question here.It defers it, yes, but the point is that in many cases, you don't ever need to make the R6 object, so you never pay that penalty. To make a data.frame as you are doing here, for example, you are making 7000 R6 objects and then immediately iterating over them to extract the data back out of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#474