Expected usage for Resource objects that need custom methods? #23

dbkegley · 2024-02-15T22:56:01Z

I see in the readme that we can operate on Resources using the methods defined by the base Resource classes, like find(), find_one(), get(), delete() etc. What should the behavior be for resources where we want to add custom methods that don't fit into one of the standard CRUD resource operations.

For example:

from .resources import Resource

class OAuthIntegration(Resource):
    name: str

    def get_credentials(self): # throws: TypedDict classes can contain only type annotations
        pass

edit:

I could see this also being relevant for an Rmd that I wanted to re-render. e.g.

con.content.get("1234-5678-90ab-cdef").render()

The text was updated successfully, but these errors were encountered:

nealrichardson · 2024-02-16T13:21:05Z

Right, of course you can create any methods that make sense for each object type. I put in the readme "Entities have methods that are appropriate to them" but if it would help clarify, please make a PR to expand on that and add examples.

dbkegley · 2024-02-16T14:01:07Z

Oh sorry, that part of my question wasn’t clear.
implementing additional methods on a Resource errors with:

“TypedDict classes can contain only type annotations”

nealrichardson · 2024-02-16T14:30:59Z

Ah ok. Then it sounds like Resource shouldn't inherit from TypedDict. I think that's right anyway--we want them to be more than just a typed wrapper around the JSON payload from the API, many properties will map directly through but not all.

tdstein · 2024-02-20T17:52:39Z

I've been going back and forth between TypedDict and dataclass for the Resource base.

nealrichardson · 2024-03-09T20:17:53Z

Poking at this some more, dataclass seems like it could work, though I think we'd have to think through the backwards/forwards compatibility implications (e.g. if the entities grow extra attributes in the API responses, or if you're querying an older Connect that doesn't have all fields). Other options that occur to me that may be more flexible, for better and worse:

Inherit from dict and add @propertys for accessing the fields (with type annotations).
Don't inherit from dict, put the raw API response in self._data, and use @propertys to access them.

tdstein · 2024-03-12T14:13:44Z

if the entities grow extra attributes in the API responses, or if you're querying an older Connect that doesn't have all fields

This is the caveat that led me down the TypedDict path.

I'm satisfied as long as we can support conversion from dict (.e.g., response.json()) to Resource.

nealrichardson · 2024-03-12T15:20:57Z

I'm satisfied as long as we can support conversion from dict (.e.g., response.json()) to Resource.

I think that's the easy part. So let's summarize the alternatives:

❌ subclass TypedDict: can't add methods
❌ use dataclass: too rigid to accommodate API changes
❌ subclass dict: I seemed to remember this being a bad idea usually, and this pointed out that we'll have trouble when defining a new method for update() since that's a base dict method too.
🔲 subclass UserDict: mentioned in that blog post, should investigate, sounds most promising of remaining alternatives
🔲 subclass MutableMapping: probably overkill to reimplement all of those methods
🔲 regular object, put data in _data: I'd like to preserve pd.DataFrame([list of objects]) and I'm not sure that would work, could investigate or pursue alternatives.

tdstein · 2024-03-12T20:38:55Z

I think dataclass can work using the following pattern.

from dataclasses import dataclass
from typing import List, Optional, Union
from unittest.mock import MagicMock

import requests


@dataclass
class FoobarV2024_02_0:
    field1: str
    field2: Optional[str]


@dataclass
class FoobarV2023_01_1:
    field1: str


class FoobarConverter:
    def __init__(self, version="2024.02.0") -> None:
        self.version = version

    def convert(self, instance: dict) -> Union[FoobarV2024_02_0, FoobarV2023_01_1]:
        if self.version >= "2024.02.0":
            return FoobarV2024_02_0(**instance)

        if self.version >= "2023.01.1":
            return FoobarV2023_01_1(**instance)


class Foobars:
    def __init__(self, version: str) -> None:
        self.converter = FoobarConverter(version)

    def find(self) -> List[Union[FoobarV2024_02_0, FoobarV2023_01_1]]:
        response: requests.Response = MagicMock()
        response.json.return_value = [
            {"field1": "asdf", "field2": None},
            {"field1": "asdf", "field2": None},
        ]
        return [self.converter.convert(instance) for instance in response.json()]


foobars = Foobars("2024.03.1")
items = foobars.find()
for item in items:
    print(item)


foobars = Foobars("2023.12.0")
items = foobars.find()
for item in items:
    print(item)

Running this script results in the following. The Traceback here is what we want since the mocked API is returning field2, which doesn't exist until version 2024.02.0.

We would want to provide a better user error, but hopefully you get the idea...

Running version 2024.03.1...
FoobarV2024_02_0(field1='asdf', field2=None)
FoobarV2024_02_0(field1='asdf', field2=None)
Running version 2023.12.0...
Traceback (most recent call last):
  File "/Users/me/Code/posit-sdk-py/src/posit/connect/foobars.py", line 53, in <module>
    items = foobars.find()
            ^^^^^^^^^^^^^^
  File "/Users/me/Code/posit-sdk-py/src/posit/connect/foobars.py", line 41, in find
    return [self.converter.convert(instance) for instance in response.json()]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/Code/posit-sdk-py/src/posit/connect/foobars.py", line 28, in convert
    return FoobarV2023_01_1(**instance)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: FoobarV2023_01_1.__init__() got an unexpected keyword argument 'field2'

nealrichardson · 2024-03-15T15:24:30Z

Let me try to summarize where we are right now. We're exploring two approaches right now: dataclass (#91, #92, #95) and dict (#94). Requirements we want to satisfy:

Ability to add methods (rules out TypedDict)
Backwards compatibility (can work with old version of Connect that has some response fields missing, as long as you don't try to access them)
Forwards compatibility (doesn't break if a newer Connect adds API response fields, important so that content on Connect that uses the SDK doesn't break if you upgrade Connect)
Attribute and type hinting for getters and other methods, for IDE autocompletion
Can easily get a data frame from API response data
Performance: can't add too much overhead since the __init__ may be called for thousands of items, but we don't need to worry about millions of items.
...

It looks like dataclass can meet these requirements, from what I see. I did a quick benchmark of ContentItem using dataclass vs. with dict, and the difference was on the order of 10ms when pulling the 5k items from our internal Connect server.

There's one other thing I can think of that I think should be a requirement, but I'm not 100% sure how to articulate it. I think we don't want to allow updating attributes of objects. As in,

user.first_name = "New name"

I think we don't want to have the local object getting mutated without it being reflected on the server. But we don't want every setattr to trigger a PATCH request--that may be surprising, and would be expensive if you're updating multiple attributes. I think it is safer to have the single .update() method as the way to modify objects.

nealrichardson · 2024-03-15T15:50:44Z

One other thought: perhaps we also keep all of the timestamp fields in objects as strings, and let users deal with parsing them if/when they need. Aside from saving the parsing cost, we wash our hands of local time zones and all of that, let users decide how they want to interpret the timestamps based on what they want to do with them.

nealrichardson · 2024-03-15T19:02:22Z

I've also poked at a third path: inheriting from abc.Mapping, with data in _data. Base class looks like this:

class Resource(Mapping):
    def __init__(
        self, data: dict, session: Optional[Session] = None, url: Optional[str] = None
    ):
        super().__init__()
        super().__setattr__("_data", data)
        super().__setattr__("session", session)
        super().__setattr__("url", url)

    def __getitem__(self, key):
        return self._data[key]

    def keys(self):
        return self._data.keys()

    def __contains__(self, key):
        return key in self._data

    def __dict__(self):
        return self._data

    def __iter__(self) -> Iterator:
        return self._data.__iter__()

    def __len__(self):
        return len(self._data)

    def __setattr__(self, name: str, value: Any) -> None:
        raise AttributeError("Cannot set attributes: use update() instead.")

    def update(self, **kwargs):
        self._data.update(kwargs)
        self.session.patch(self.url, json=kwargs)

Advantage of this over inheriting from dict: session and url never show up in your "data", whereas they're there in the dict version (e.g. when you do pd.DataFrame(list of objects).

Disadvantage: pandas alphabetically sorts the column names unless the objects inherit from dict or dataclass. I don't think this quirk should necessarily dictate our decisions, and we can write a custom to_pandas function if we cared. (That said, we could also prune session and url in the custom to_pandas and just use dicts.)

tdstein · 2024-03-15T19:21:20Z

Added my thoughts here: #96

It seemed a bit long to append to this discussion as ana additional comment.

nealrichardson · 2024-03-18T15:42:47Z

Solved by #94 among others

tdstein added this to the 0.1.0 milestone Feb 20, 2024

tdstein removed this from the 0.1.0 milestone Feb 20, 2024

nealrichardson mentioned this issue Mar 9, 2024

Add .update() method to Content #89

Closed

tdstein mentioned this issue Mar 12, 2024

feat: adds base class definition for resources #83

Merged

tdstein self-assigned this Mar 12, 2024

nealrichardson closed this as completed Mar 18, 2024

nealrichardson added this to the 0.1.0 milestone Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expected usage for Resource objects that need custom methods? #23

Expected usage for Resource objects that need custom methods? #23

dbkegley commented Feb 15, 2024 •

edited

Loading

nealrichardson commented Feb 16, 2024

dbkegley commented Feb 16, 2024

nealrichardson commented Feb 16, 2024

tdstein commented Feb 20, 2024

nealrichardson commented Mar 9, 2024

tdstein commented Mar 12, 2024

nealrichardson commented Mar 12, 2024

tdstein commented Mar 12, 2024 •

edited

Loading

nealrichardson commented Mar 15, 2024

nealrichardson commented Mar 15, 2024

nealrichardson commented Mar 15, 2024

tdstein commented Mar 15, 2024

nealrichardson commented Mar 18, 2024

Expected usage for Resource objects that need custom methods? #23

Expected usage for Resource objects that need custom methods? #23

Comments

dbkegley commented Feb 15, 2024 • edited Loading

nealrichardson commented Feb 16, 2024

dbkegley commented Feb 16, 2024

nealrichardson commented Feb 16, 2024

tdstein commented Feb 20, 2024

nealrichardson commented Mar 9, 2024

tdstein commented Mar 12, 2024

nealrichardson commented Mar 12, 2024

tdstein commented Mar 12, 2024 • edited Loading

nealrichardson commented Mar 15, 2024

nealrichardson commented Mar 15, 2024

nealrichardson commented Mar 15, 2024

tdstein commented Mar 15, 2024

nealrichardson commented Mar 18, 2024

dbkegley commented Feb 15, 2024 •

edited

Loading

tdstein commented Mar 12, 2024 •

edited

Loading