feat: add functions to convert content lists to data frames #470

toph-allen · 2025-10-16T18:51:37Z

Intent

Allow the list of content returned by search_content() to be converted to a data frame.

Fixes #451

Approach

Add a class name to the list returned by search_content()
Provide as_tibble() and as.data.frame() methods for that function, exactly following the pattern used in integrations.R.

Checklist

Does this change update NEWS.md (referencing the connected issue if necessary)? [not needed]
Does this change need documentation? Have you run devtools::document()?

jonkeane · 2025-10-16T19:07:53Z

R/content.R

+#' @export
+as_tibble.connect_content_list <- function(x, ...) {
+  content_data <- purrr::map(x, "content")
+  parse_connectapi_typed(content_data, connectapi_ptypes$content)


Is now the time to 🚮 the typing?

I'm not sure. 😣 I know you want me to remove — I would keep it, personally!

I'll try parsing it un-typed and see how that feels.

Parsing without providing a prototype (using parse_connectapi) still attempts to parse columns based on their contents, so we still get columns with e.g. logical, numeric, character. But those could technically be inconsistent if the lists are short.

Date times parse as character, which would shift the burden onto our customers to figure out the correct way to parse. I'm not sure if the logic we have about parsing date times is necessary complexity or not, but if it is, then it seems like it adds value and utility to handle that for the user.

Perhaps I'm thinking about this wrong though — perhaps we should just parse everything as a character vector — is that what you're thinking?

IMO we should not scope creep the type conversion business into this PR. It's worth a discussion but it can be separate.

The JSON responses from the server are generally already correctly decoded (floats, ints, logicals, characters, etc all just work). There are a handful of fields that don't automatically parse, but the vast majority do just fine.

Thinking about how we actually want our core search_content function to work: the object it returns is useful, so if there are fields where the types are off, we should fix them before we even get to converting them to a dataframe. Somewhere like

connectapi/R/content.R

Lines 15 to 24 in b55a81b

#' @description Initialize this content.

#' @param connect The `Connect` instance.

#' @param content The content data.

initialize = function(connect, content) {

validate_R6_class(connect, "Connect")

self$connect <- connect

# TODO: need to check that content has

# at least guid, url, title to be functional

self$content <- content

},

would be a good place to put that type munging (but again, only for the fields that aren't already converted for us by httr / jsonlite parsing). Alternatively, our HTTP interface also has a parser argument, I haven't dug into detail if it's intended use is something like this, but that might also be the right place for this.

For the purposes of this PR: maybe it makes sense to keep the parsing here where it is and we'll rip it out when we rip all of it out soon (I hope?). Or maybe it's not that big of a deal that the two(and if I'm looking at this correctly it is only two?) columns that are strings-of-datetimes aren't correctly translated in this case?

Thanks so much for this! I had realized I needed to ask about… what would replace it or perform that sort of function.

I definitely agree that the better place for it to happen is either during JSON parsing or during object creation. Doing it during conversion to a data frame is the wrong place for a server-object-centric package!

Happy to rip it out from here soon — I would prefer to do so after doing a bit of digging to see how much is actually needed. Like, if the only thing that gets parsed wrong at JSON-parse time are datetimes, it's not really a big deal. (At a cursory look, in the data frame conversion, an untyped conversion has a fair number of differences, but mostly in numeric types.)

I'll create an issue to track this though.

DESCRIPTION

jonkeane · 2025-10-16T19:11:43Z

R/content.R

+  class(content_list) <- c("connect_content_list", class(content_list))
+  content_list


Do we need to preface this class with connect_? That seems like unnecessary typing for us, and if we want to communicate this is from Connect, we can do that in the print method(s) on this object. Our R6 classes are not for example all prefaced with Connect_

I hate content, as always, but I don't think that's solvable here, but so much do I wish we had a better label for this.

The S3 classes we've been playing with so far (just integrations and @karawoo's work on users right now) all use a connect_ prefix.

My thinking is that S3 classes are more common than R6 classes and are a lot less strict in their form, so having more verbosely & explicitly-named classes here is better.

There wasn't much discussion of this approach at the time, so I didn't think it was controversial! (integrations: #431, #437, #440, user: #462)

If we conclude that it should just be named content_list, I feel like I should also change the name of the integrations classes.

I've seen S3 classes commonly be prefixed by the package name since they don't show up namespaced when you check them a la inherits(x, class). I think this is fine and the verbosity doesn't really matter here, across the API at worst you'll have three words right? connect_<entity>(_list)

nods ok, I'm fine with leaving it in if it's systematic (and not unique to connectapi)

…com/posit-dev/connectapi into toph/as.data.frame-for-content-list

nealrichardson

I think the methods are generally reasonable, but I have some concerns about the performance that I think means we should take a small step back in thinking about this container class.

nealrichardson · 2025-10-17T12:25:43Z

R/content.R

+  optional = FALSE,
+  ...
+) {
+  content_tbl <- as_tibble(x)


Logically this seems backwards since a tibble just has annotations on top of a data.frame, but I'm guessing it would be a bigger change to unwrap everything to sort that out.

Yeah, it does seem backwards. This is because parse_connectapi_typed produces tibbles by default. I believe there was discussion of this in an integrations PR, lemme find it.

Yep: #426 (comment) :)

nealrichardson · 2025-10-17T12:27:21Z

R/content.R

+#'
+#' @return A `data.frame` with one row per content item.
+#' @export
+as.data.frame.connect_content_list <- function(


How would this method differ for any other list-of-records from the API? This makes me think you want to have some inheritance, set the S3 class as something like c("connect_content_list", "connect_list"), and then you define the method on connect_list so any list-type container would be supported.

This is probably correct, but I haven't gone that route yet because I don't want to over-generalize before we have enough instances of the pattern to know what the caveats are. Stuff like… the difference between S3 and R6 classes (see #470 (comment)), whether we want to rely on typed parsing, etc.

I don't have a good sense for when the right moment to generalize is, but my gut is that it's a little in the future.

nealrichardson · 2025-10-17T12:27:41Z

R/content.R

+#' @return A tibble with one row per content item.
+#' @export
+as_tibble.connect_content_list <- function(x, ...) {
+  content_data <- purrr::map(x, "content")


I don't think I understand the data structure here. What does this do?

The Content R6 class stores the API response under a content property here:

connectapi/R/content.R

Line 23 in b55a81b

self$content <- content

The parse_connectapi* functions expect a list of objects (i.e. shaped like the original JSON server response), so this step is reconstructing that. Probably worth a comment, eh?

With newer S3 class objects e.g. connect_integration_list and @karawoo's work in progress on connect_user, the objects are lists with the properties at the top level, so you don't need this extra step.

(For example, to get the guid of an S3 connect_integration you do object$guid, whereas to get the GUID of an R6 Content object you have to do object$content$guid.)

nealrichardson · 2025-10-17T12:32:41Z

R/content.R

  )

-  purrr::map(res, function(x) {
+  content_list <- purrr::map(res, function(x) {


Ok this answers my question below about what is in the data structure. You might want to reconsider this eager conversion to R6 objects. It's not free, and this list of records might be huge. Suppose it is only 1ms to create an R6 object (I've seen it be a few ms), but if you've got 1000 content items, that's a second of latency you've added. And most cases you probably won't use the vast majority of those R6 objects (and certainly not when you call as.data.frame on it).

Instead, you could keep this data structure as a bare list of content records (+ the client object in an attribute), and define a [[ method that extracts and wraps in the R6 object.

Experimentally, this does add some latency — with about 7000 items I saw about a second — it seems manageable, if not ideal.

Would defining a [[ method just shift that latency to other calls?

Returning a list of R6 objects allows all of the existing R6 content code to just work on that list's content — would defining [[ make that contingent on using that function to access the items? Like, would purrr::map(content_list, lock_content) work?

The change to returning R6 objects isn't a part of this PR, so I'd like to create an issue to track this separately.

IME the lapply functions work by [[ing elements so yes, it would work--though that's an empirical question here.

Would defining a [[ method just shift that latency to other calls?

It defers it, yes, but the point is that in many cases, you don't ever need to make the R6 object, so you never pay that penalty. To make a data.frame as you are doing here, for example, you are making 7000 R6 objects and then immediately iterating over them to extract the data back out of it.

nealrichardson · 2025-10-17T12:33:55Z

R/content.R

+#' @export
+as_tibble.connect_content_list <- function(x, ...) {
+  content_data <- purrr::map(x, "content")
+  parse_connectapi_typed(content_data, connectapi_ptypes$content)


IMO we should not scope creep the type conversion business into this PR. It's worth a discussion but it can be separate.

karawoo

This looks good to me. I do agree that #474 is worth looking into further, but fine to happen separately from this PR.

toph-allen · 2025-10-28T21:08:02Z

Sounds good! Thanks @karawoo — we'll use #474 to consider performance around this.

toph-allen added 2 commits October 16, 2025 14:45

add functions to convert content lists to data frames

57f5519

test data frame conversion.

a4db014

toph-allen requested review from jonkeane, karawoo, marcosnav and nealrichardson October 16, 2025 18:51

Update documentation

0cf14ea

jonkeane reviewed Oct 16, 2025

View reviewed changes

DESCRIPTION Outdated Show resolved Hide resolved

jonkeane reviewed Oct 16, 2025

View reviewed changes

toph-allen and others added 3 commits October 16, 2025 18:28

revert roxygennote to 7.3.2

a8f4294

Merge branch 'toph/as.data.frame-for-content-list' of https://github.…

67b3298

…com/posit-dev/connectapi into toph/as.data.frame-for-content-list

Update documentation

62ef6ba

nealrichardson reviewed Oct 17, 2025

View reviewed changes

This was referenced Oct 17, 2025

Parsing should happen correctly before object creation, not during data frame conversion #473

Open

Consider other options for search_content() return type (vs converting all to R6 objects) #474

Open

toph-allen requested review from jonkeane and nealrichardson October 17, 2025 16:03

karawoo approved these changes Oct 28, 2025

View reviewed changes

toph-allen merged commit 787eeed into main Oct 28, 2025

toph-allen deleted the toph/as.data.frame-for-content-list branch October 28, 2025 21:08

toph-allen mentioned this pull request Oct 28, 2025

fix: update get_content ptype and docs #468

Open

2 tasks

	#' @description Initialize this content.
	#' @param connect The `Connect` instance.
	#' @param content The content data.
	initialize = function(connect, content) {
	validate_R6_class(connect, "Connect")
	self$connect <- connect
	# TODO: need to check that content has
	# at least guid, url, title to be functional
	self$content <- content
	},

		class(content_list) <- c("connect_content_list", class(content_list))
		content_list

feat: add functions to convert content lists to data frames #470

feat: add functions to convert content lists to data frames #470

Conversation

toph-allen commented Oct 16, 2025

Intent

Approach

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toph-allen Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nealrichardson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toph-allen Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toph-allen Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karawoo left a comment

Choose a reason for hiding this comment

Uh oh!

toph-allen commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

toph-allen Oct 16, 2025 •

edited

Loading

toph-allen Oct 17, 2025 •

edited

Loading

toph-allen Oct 17, 2025 •

edited

Loading