Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected usage for Resource objects that need custom methods? #23

Closed
dbkegley opened this issue Feb 15, 2024 · 13 comments
Closed

Expected usage for Resource objects that need custom methods? #23

dbkegley opened this issue Feb 15, 2024 · 13 comments
Assignees
Milestone

Comments

@dbkegley
Copy link
Collaborator

dbkegley commented Feb 15, 2024

I see in the readme that we can operate on Resources using the methods defined by the base Resource classes, like find(), find_one(), get(), delete() etc. What should the behavior be for resources where we want to add custom methods that don't fit into one of the standard CRUD resource operations.

For example:

from .resources import Resource

class OAuthIntegration(Resource):
    name: str

    def get_credentials(self): # throws: TypedDict classes can contain only type annotations
        pass

edit:

I could see this also being relevant for an Rmd that I wanted to re-render. e.g.

con.content.get("1234-5678-90ab-cdef").render()
@nealrichardson
Copy link
Collaborator

Right, of course you can create any methods that make sense for each object type. I put in the readme "Entities have methods that are appropriate to them" but if it would help clarify, please make a PR to expand on that and add examples.

@dbkegley
Copy link
Collaborator Author

Oh sorry, that part of my question wasn’t clear.
implementing additional methods on a Resource errors with:

“TypedDict classes can contain only type annotations”

@nealrichardson
Copy link
Collaborator

Ah ok. Then it sounds like Resource shouldn't inherit from TypedDict. I think that's right anyway--we want them to be more than just a typed wrapper around the JSON payload from the API, many properties will map directly through but not all.

@tdstein tdstein added this to the 0.1.0 milestone Feb 20, 2024
@tdstein
Copy link
Collaborator

tdstein commented Feb 20, 2024

I've been going back and forth between TypedDict and dataclass for the Resource base.

@tdstein tdstein removed this from the 0.1.0 milestone Feb 20, 2024
@nealrichardson
Copy link
Collaborator

Poking at this some more, dataclass seems like it could work, though I think we'd have to think through the backwards/forwards compatibility implications (e.g. if the entities grow extra attributes in the API responses, or if you're querying an older Connect that doesn't have all fields). Other options that occur to me that may be more flexible, for better and worse:

  • Inherit from dict and add @propertys for accessing the fields (with type annotations).
  • Don't inherit from dict, put the raw API response in self._data, and use @propertys to access them.

@tdstein
Copy link
Collaborator

tdstein commented Mar 12, 2024

if the entities grow extra attributes in the API responses, or if you're querying an older Connect that doesn't have all fields

This is the caveat that led me down the TypedDict path.

I'm satisfied as long as we can support conversion from dict (.e.g., response.json()) to Resource.

@nealrichardson
Copy link
Collaborator

I'm satisfied as long as we can support conversion from dict (.e.g., response.json()) to Resource.

I think that's the easy part. So let's summarize the alternatives:

❌ subclass TypedDict: can't add methods
❌ use dataclass: too rigid to accommodate API changes
❌ subclass dict: I seemed to remember this being a bad idea usually, and this pointed out that we'll have trouble when defining a new method for update() since that's a base dict method too.
🔲 subclass UserDict: mentioned in that blog post, should investigate, sounds most promising of remaining alternatives
🔲 subclass MutableMapping: probably overkill to reimplement all of those methods
🔲 regular object, put data in _data: I'd like to preserve pd.DataFrame([list of objects]) and I'm not sure that would work, could investigate or pursue alternatives.

@tdstein
Copy link
Collaborator

tdstein commented Mar 12, 2024

I think dataclass can work using the following pattern.

from dataclasses import dataclass
from typing import List, Optional, Union
from unittest.mock import MagicMock

import requests


@dataclass
class FoobarV2024_02_0:
    field1: str
    field2: Optional[str]


@dataclass
class FoobarV2023_01_1:
    field1: str


class FoobarConverter:
    def __init__(self, version="2024.02.0") -> None:
        self.version = version

    def convert(self, instance: dict) -> Union[FoobarV2024_02_0, FoobarV2023_01_1]:
        if self.version >= "2024.02.0":
            return FoobarV2024_02_0(**instance)

        if self.version >= "2023.01.1":
            return FoobarV2023_01_1(**instance)


class Foobars:
    def __init__(self, version: str) -> None:
        self.converter = FoobarConverter(version)

    def find(self) -> List[Union[FoobarV2024_02_0, FoobarV2023_01_1]]:
        response: requests.Response = MagicMock()
        response.json.return_value = [
            {"field1": "asdf", "field2": None},
            {"field1": "asdf", "field2": None},
        ]
        return [self.converter.convert(instance) for instance in response.json()]


foobars = Foobars("2024.03.1")
items = foobars.find()
for item in items:
    print(item)


foobars = Foobars("2023.12.0")
items = foobars.find()
for item in items:
    print(item)

Running this script results in the following. The Traceback here is what we want since the mocked API is returning field2, which doesn't exist until version 2024.02.0.

We would want to provide a better user error, but hopefully you get the idea...

Running version 2024.03.1...
FoobarV2024_02_0(field1='asdf', field2=None)
FoobarV2024_02_0(field1='asdf', field2=None)
Running version 2023.12.0...
Traceback (most recent call last):
  File "/Users/me/Code/posit-sdk-py/src/posit/connect/foobars.py", line 53, in <module>
    items = foobars.find()
            ^^^^^^^^^^^^^^
  File "/Users/me/Code/posit-sdk-py/src/posit/connect/foobars.py", line 41, in find
    return [self.converter.convert(instance) for instance in response.json()]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/Code/posit-sdk-py/src/posit/connect/foobars.py", line 28, in convert
    return FoobarV2023_01_1(**instance)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: FoobarV2023_01_1.__init__() got an unexpected keyword argument 'field2'

@nealrichardson
Copy link
Collaborator

Let me try to summarize where we are right now. We're exploring two approaches right now: dataclass (#91, #92, #95) and dict (#94). Requirements we want to satisfy:

  • Ability to add methods (rules out TypedDict)
  • Backwards compatibility (can work with old version of Connect that has some response fields missing, as long as you don't try to access them)
  • Forwards compatibility (doesn't break if a newer Connect adds API response fields, important so that content on Connect that uses the SDK doesn't break if you upgrade Connect)
  • Attribute and type hinting for getters and other methods, for IDE autocompletion
  • Can easily get a data frame from API response data
  • Performance: can't add too much overhead since the __init__ may be called for thousands of items, but we don't need to worry about millions of items.
  • ...

It looks like dataclass can meet these requirements, from what I see. I did a quick benchmark of ContentItem using dataclass vs. with dict, and the difference was on the order of 10ms when pulling the 5k items from our internal Connect server.

There's one other thing I can think of that I think should be a requirement, but I'm not 100% sure how to articulate it. I think we don't want to allow updating attributes of objects. As in,

user.first_name = "New name"

I think we don't want to have the local object getting mutated without it being reflected on the server. But we don't want every setattr to trigger a PATCH request--that may be surprising, and would be expensive if you're updating multiple attributes. I think it is safer to have the single .update() method as the way to modify objects.

@nealrichardson
Copy link
Collaborator

One other thought: perhaps we also keep all of the timestamp fields in objects as strings, and let users deal with parsing them if/when they need. Aside from saving the parsing cost, we wash our hands of local time zones and all of that, let users decide how they want to interpret the timestamps based on what they want to do with them.

@nealrichardson
Copy link
Collaborator

I've also poked at a third path: inheriting from abc.Mapping, with data in _data. Base class looks like this:

class Resource(Mapping):
    def __init__(
        self, data: dict, session: Optional[Session] = None, url: Optional[str] = None
    ):
        super().__init__()
        super().__setattr__("_data", data)
        super().__setattr__("session", session)
        super().__setattr__("url", url)

    def __getitem__(self, key):
        return self._data[key]

    def keys(self):
        return self._data.keys()

    def __contains__(self, key):
        return key in self._data

    def __dict__(self):
        return self._data

    def __iter__(self) -> Iterator:
        return self._data.__iter__()

    def __len__(self):
        return len(self._data)

    def __setattr__(self, name: str, value: Any) -> None:
        raise AttributeError("Cannot set attributes: use update() instead.")

    def update(self, **kwargs):
        self._data.update(kwargs)
        self.session.patch(self.url, json=kwargs)

Advantage of this over inheriting from dict: session and url never show up in your "data", whereas they're there in the dict version (e.g. when you do pd.DataFrame(list of objects).

Disadvantage: pandas alphabetically sorts the column names unless the objects inherit from dict or dataclass. I don't think this quirk should necessarily dictate our decisions, and we can write a custom to_pandas function if we cared. (That said, we could also prune session and url in the custom to_pandas and just use dicts.)

@tdstein
Copy link
Collaborator

tdstein commented Mar 15, 2024

Added my thoughts here: #96

It seemed a bit long to append to this discussion as ana additional comment.

@nealrichardson
Copy link
Collaborator

Solved by #94 among others

@nealrichardson nealrichardson added this to the 0.1.0 milestone Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants