Skip to content
This repository has been archived by the owner on Nov 5, 2019. It is now read-only.

Add DictionaryStorage class for #319. #344

Closed
wants to merge 1 commit into from

Conversation

theacodes
Copy link
Contributor

DictionaryStorage - implements an optionally-locked storage over a dictionary-like object.

Additionally:

  • Storage now includes optional locking logic. Previously this was implemented in subclasses.
  • Remove flask_util.FlaskSessionStorage and replaced it with DictionaryStorage.

@theacodes
Copy link
Contributor Author

Assigned to @dhermes for initial review.

@theacodes
Copy link
Contributor Author

/cc @waprin

@dhermes
Copy link
Contributor

dhermes commented Nov 24, 2015

sphinx-apidoc generated changes that are not checked in to version control.

w00t, I made that! Run tox -e docs locally and add whatever newly generated docs appear (for your new modules)

What is #319 anyway?

@dhermes
Copy link
Contributor

dhermes commented Nov 24, 2015

@jonparrott Why always sending PRs with so much stuff? Can't we break them into tiny digestible pieces? Reviewers are people too 😁

@theacodes
Copy link
Contributor Author

Sorry! I tried to make this one pretty small. Is there a better way for me to break this up? I'm happy to do so.

The only thing I can think of is to split it into separate PRs for LockedStorage, DictionaryStorage, updates to File storage, updates to flask_util. But there's all quite intertwined.

@theacodes
Copy link
Contributor Author

(I feel like this one just looks big because of the tests. Really I'm just shuffling code around)

@dhermes
Copy link
Contributor

dhermes commented Nov 24, 2015

I was trying to be complain-y and funny at once. I can handle it.

# limitations under the License.

"""Dictionary storage for OAuth2 Credentials.
"""

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@theacodes
Copy link
Contributor Author

I was trying to be complain-y and funny at once

You're not a person, you're a lobster.

@theacodes
Copy link
Contributor Author

Docs fixed.

@dhermes
Copy link
Contributor

dhermes commented Nov 24, 2015

👍 on the lobster

If self._key is a callable, it will return the result of calling
self._key.
"""
if callable(self._key):

This comment was marked as spam.

This comment was marked as spam.

@dhermes
Copy link
Contributor

dhermes commented Nov 25, 2015

Finished review. This is what I would've liked to have seen

  • 1 PR just to factor LockedStorage out of oauth2client.file.Storage
  • 1 PR just to implement DictionaryStorage on top of LockedStorage
  • 1 PR just to replace FlaskSessionStorage with DictionaryStorage

Boiling the ocean in a commit makes it less likely to be reviewed quickly and increases the chances that issues/problems will slip through the review.

@theacodes
Copy link
Contributor Author

I can still do that if it's easier for you.

@dhermes
Copy link
Contributor

dhermes commented Nov 25, 2015

I can still do that if it's easier for you.

No worries, review wasn't too hard. Just file it away for future code changes.

@theacodes
Copy link
Contributor Author

No worries, review wasn't too hard. Just file it away for future code changes.

You got it. I thought of adding just locked storage and dictionary storage then doing another PR to refactor file storage, but it was such a small change.


Args:
dictionary: A dictionary or dictionary-like object.
key: A hashable or a function returning a hashable. The credentials

This comment was marked as spam.

This comment was marked as spam.

# See the License for the specific language governing permissions and
# limitations under the License.

"""Oauth2client.file tests

This comment was marked as spam.

This comment was marked as spam.

@dhermes
Copy link
Contributor

dhermes commented Nov 30, 2015

@jonparrott Not sure why our discussion got folded under by GitHub. As I recall, Storage was done very quickly with no real design discussion. (The current design of almost all of oauth2client is ad-hoc and can likely be re-thought.) We can release breaking changes with a 2.0 or a 1.6 (not sure what @nathanielmanistaatgoogle thinks about which is preferable)

@theacodes
Copy link
Contributor Author

Yeah let's keep the discussion above the fold, github collapses comments on outdated commits.

If @nathanielmanistaatgoogle is willing to allow drastic, breaking changes, IMO storage could be more easily expressed with something like this:

# Could likely just be dropped altogether.
class Storage(object):
    def get(self):
        abstract()

    def set(self, credentials):
        abstract()

    def delete(self):
        abstract()


class DictionaryStorage(Storage):
    # Implements get, set, and delete
    pass


class LockedStorage(Storage):
    def __init__(self, storage, lock=None):
        self._storage = storage

        if lock is None:
            lock = threading.Lock()

        self._lock = lock

    def get(self):
        with self._lock:
            self._storage.get()

    # And so on...


# Usage

unlocked_storage = DictionaryStorage({}, key='credentials')
locked_storage = LockedStorage(unlocked_storage)

def locked_delete(self):
"""Remove the credentials from the dictionary, if they exist."""
key = self._get_key()
try:

This comment was marked as spam.

This comment was marked as spam.

def locked_delete(self):
"""Remove the credentials from the dictionary, if they exist."""
key = self._get_key()
self._dictionary.pop(key, None)

This comment was marked as spam.

@nathanielmanistaatgoogle
Copy link
Contributor

Let's do the drastic breaking change now. I agree that the current Storage class isn't what we'd like to have.

Tell me more about the "contextual" key creation circumstance? I'm having a hard time wrapping my head around needing a Storage, and in particular a Storage that uses exactly one key for storing values in a dictionary, but not knowing what that key is at the time of creating the Storage.

@theacodes
Copy link
Contributor Author

Let's do the drastic breaking change now. I agree that the current Storage class isn't what we'd like to have.

Are you comfortable for what I've proposed in terms of re-implementing Storage? At least as a starting point?

Tell me more about the "contextual" key creation circumstance?

Take a look at Implementation / usage suggestions in #319, though relevant stuff included below.

Basically imagine that you have a central database (e.g., Redis) and you want to keep track of the credentials for all of the users in your application. You can't just use KeyValueStorage(redis, key='credentials') because users will clobber each other's credentials. Instead, you need to figure out the key from the context. For example:

def get_storage_for_user(request):
   key = request['user_id']
   return KeyValueStorage(redis_instance, key=key)

oauth2 = flask_util.UserOAuth2(app, storage=get_storage_for_user)

Again, totally fine not having that has part of the Storage-level interface and keeping it solely within the interface of the web app helpers.

@nathanielmanistaatgoogle
Copy link
Contributor

Yes, I'm comfortable with what you've proposed - especially the way that Storage is an entirely abstract, pure-virtual interface class rather than half-a-specification-and-half-an-implementation. I have strong feelings on this particular design point; have you yet taken a look at https://www.youtube.com/watch?v=3MNVP9-hglc?

I will take you up on the offer of keeping the indirection entirely in the web app helpers. It's too weird to have in the generally-unaffiliated DictionaryStorage class.

@theacodes
Copy link
Contributor Author

Yes, I'm comfortable with what you've proposed - especially the way that Storage is an entirely abstract, pure-virtual interface class rather than half-a-specification-and-half-an-implementation.

Great. :)

https://www.youtube.com/watch?v=3MNVP9-hglc

Not yet, but I'll watch it tonight.

I will take you up on the offer of keeping the indirection entirely in the web app helpers. It's too weird to have in the generally-unaffiliated DictionaryStorage class.

You got it.

Put this PR on ice. I'm going to rip Storage's heart out and fix everything that it breaks and submit a PR for that. Sorry, @dhermes, it'll be slightly big as it's boiling the ocean by definition.

@theacodes
Copy link
Contributor Author

Interesting point of contention here between where we want to take storage and where it is. It seems the reason this locking logic is in storage is not only due to thread safety, but also due to this:

   def _refresh(self, http_request):
        """Refreshes the access_token.

        This method first checks by reading the Storage object if available.
        If a refresh is still needed, it holds the Storage lock until the
        refresh is completed.

        Args:
            http_request: callable, a callable that matches the method
                          signature of httplib2.Http.request, used to make the
                          refresh request.

        Raises:
            HttpAccessTokenRefreshError: When the refresh fails.
        """
        if not self.store:
            self._do_refresh_request(http_request)
        else:
            self.store.acquire_lock()
            try:
                new_cred = self.store.locked_get()

                if (new_cred and not new_cred.invalid and
                        new_cred.access_token != self.access_token and
                        not new_cred.access_token_expired):
                    logger.info('Updated access_token read from Storage')
                    self._updateFromCredential(new_cred)
                else:
                    self._do_refresh_request(http_request)
            finally:
                self.store.release_lock()

Credentials holds the storage lock while refreshing credentials to prevent duplicate refresh requests. Any ideas on this?

My two so far:

  • We could give credentials its own lock.
  • We could add an empty lock method to Storage that must return a lock/contextmanager and have LockedStorage implement that.

@dhermes
Copy link
Contributor

dhermes commented Nov 30, 2015

FWIW That is for thread-safety, in order to make the refresh thread-safe.

@theacodes
Copy link
Contributor Author

Putting on the credential itself wouldn't work, as there can be multiple instances of a single credential.

Is storage the right place for this? It seems like a convenient place, but not necessarily the correct place.

@dhermes
Copy link
Contributor

dhermes commented Nov 30, 2015

Not sure it's the right place, but you should try to limit the scope of your changes for everyone's sake (manage complexity).

The abstract concept of a store has traditionally been used for a file-system and for GAE. In these cases, the lock is for the resource (e.g. writing to the datastore and writing to the filesystem). So acquiring the lock happens very likely on different instances of Storage, but we want to lock to protect the resource.

In Py27 on GAE, with threading turned on, certain resources are shared between requests. In particular, ndb using local caching. So if someone were to use ndb to store credentials, they'd need the instances to be threadsafe.

@theacodes
Copy link
Contributor Author

Right, so there's two separate issues here:

  • Storage-level locking. This is what LockedStorage will decorate on top of Storage. Used to provide thread (or process) safety for storages such as FileStorage and NdbStorage. This can be used by more advanced classes to provide stuff like cluster-level locking if needed. That is to say, locking is a hard problem and we shouldn't make any assumptions for the user but allow them to do it and provide the most simple primitive (thread) by default.
  • Refresh locking, which is used by Credentials to prevent the credentials from being refreshed concurrently. This is even harder in the web app context as the refresh could happen on any process (even those on other machines). I could imagine a situation where the storage is one system (e.g. Redis) but the refresh lock is something else (e.g. Zookeeper).

Should we split refresh locking into its own concern?

theacodes pushed a commit to theacodes/oauth2client that referenced this pull request Nov 30, 2015
This is a rough-cut to sanity check the concept. Docstrings and full tests still missing.

New:
* `client.Storage` is an abstract base class.
* `locked_storage.LockedStorage` provides locking based on context managers and defaults to `threading.Lock`.
* `locked_storage.threadsafe` is a class-level decorator to provide thread safety. This is applied to `FileStorage` and `KeyringStorage`.

Changed:
* All `Storage` subclasses have been updated to use the new base class.
* `file.Storage` renamed to `file_storage.FileStorage`.
* `keyring_storage.Storage` renamed to `keyring_storage.KeyringStorage`.
* `django_orm.Storage` renamed to `django_orm.DjangoOrmStorage`.

**Questionable changes**

As mentioned in [a comment on googleapis#344], `Credentials` has the unfortunate behavior of using the Storage's lock to prevent concurrently refreshing the credentials. IMO, Refresh locking should be handled as a separate concern. For now, `Storage` implements a no-op `lock()`.
theacodes pushed a commit to theacodes/oauth2client that referenced this pull request Nov 30, 2015
This is a rough-cut to sanity check the concept. Docstrings and full tests still missing.

New:
* `client.Storage` is an abstract base class.
* `locked_storage.LockedStorage` provides locking based on context managers and defaults to `threading.Lock`.

Changed:
* All `Storage` subclasses have been updated to use the new base class.
* `file.Storage` renamed to `file_storage.FileStorage`.
* `keyring_storage.Storage` renamed to `keyring_storage.KeyringStorage`.
* `django_orm.Storage` renamed to `django_orm.DjangoOrmStorage`.
* `KeyringStorage` and `FileStorage` are no longer threadsafe by default.

**Questionable changes**

As mentioned in [a comment on googleapis#344], `Credentials` has the unfortunate behavior of using the Storage's lock to prevent concurrently refreshing the credentials. IMO, Refresh locking should be handled as a separate concern. For now, `Storage` implements a no-op `lock()`.
@waprin
Copy link
Contributor

waprin commented Dec 1, 2015

Jon, I am not totally sure those are two separate issues. In my mind they are both about synchronizing concurrent access to the same object, in this case the credentials. In the first case you are using the lock just to read/write the value safely, while in the second case you are holding the lock while you refresh the credentials so you don't refresh the credentials more time than you need to. So in other words I think it's totally sane to use the same lock for both.

In your example with Redis and Zookeeper, if you're using Zookeeper to protect a Redis entry, then you should be using the Zookeeper lock any time you write the credentials, not just during credentials refresh. If your credentials store is in a distributed store then just adding thread/file locking isn't enough in any of the cases. I think part of the reason things are getting confusing is we keep saying "thread-safety" referring to the Storage lock, but isn't necessarily just a lock for threads.

To me, if you were going to introduce different locking, it would be to introduce separate read/write locks, that way readers can still read the old credentials during the refresh process. But I think that's overcomplicating things unless long refreshes holding onto locks for too long actually pops up as an issue for someone.

Hopefully that makes sense, apologies if I'm confused on this complicated topic.

@theacodes
Copy link
Contributor Author

Fair point, we can continue the discussion in the PR for the storage
rewrite (which currently doesn't separate these).

On Mon, Nov 30, 2015, 4:54 PM Bill Prin notifications@github.com wrote:

Jon, I am not totally sure those are two separate issues. In my mind they
are both about synchronizing concurrent access to the same object, in this
case the credentials. In the first case you are using the lock just to
read/write the value safely, while in the second case you are holding the
lock while you refresh the credentials so you don't refresh the credentials
more time than you need to. So in other words I think it's totally sane to
use the same lock for both.

In your example with Redis and Zookeeper, if you're using Zookeeper to
protect a Redis entry, then you should be using the Zookeeper lock any time
you write the credentials, not just during credentials refresh. If your
credentials store is in a distributed store then just adding thread/file
locking isn't enough in any of the cases. I think part of the reason things
are getting confusing is we keep saying "thread-safety" referring to the
Storage lock, but isn't necessarily just a lock for threads.

To me, if you were going to introduce different locking, it would be to
introduce separate read/write locks, that way readers can still read the
old credentials during the refresh process. But I think that's
overcomplicating things unless long refreshes holding onto locks for too
long actually pops up as an issue for someone.

Hopefully that makes sense, apologies if I'm confused on this complicated
topic.


Reply to this email directly or view it on GitHub
#344 (comment).

* LockedStorage - implements a generic storage base class that uses a `threading.Lock` like-object.
* DictionaryStorage - implements an optionally-locked storage over a dictionary-like object.

Additionally:
* Updated `file.Storage` to use `LockedStorage`.
* Remove `flask_util.FlaskSessionStorage` and replaced it with `DictionaryStorage`.
@theacodes theacodes changed the title Add two new storage classes for #319. Add DictionaryStorage class for #319. Dec 3, 2015
@theacodes
Copy link
Contributor Author

I'm going to start a fresh PR for this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants