Skip to content

Remove or drastically overhaul type parsing even when producing data frames #483

@toph-allen

Description

@toph-allen

Refactor connectapi's type parsing logic, specifically for functions like get_content() and get_usage(), as it's very slow with large objects.

My hunch is that parse_connectapi iterates over the JSON as a list (i.e. row-wise), and tries to preserve nested JSON as list-columns. Experimentally, just converting everything with jsonlite::fromJSON(flatten = TRUE) speeds up e.g., get_content() from ~10.5s to ~3.5s) on Dogfood.

It would probably be much faster to just flatten and convert to a character data frame. I think it's important to retain date parsing, and perhaps specific columns should be parsed as integers and character vectors, but I'm not sure what the best way to manage that is (we'd also like to remove the need to maintain a list of ptypes).

Closely related to #473 — that talks about removing the parsing when creating a list of objects, but this talks about existing data.frame-returning functions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions