Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set and respect storage quota per user collection. #746

Closed
tublitzed opened this issue Jul 27, 2020 · 12 comments · Fixed by #751
Closed

Set and respect storage quota per user collection. #746

tublitzed opened this issue Jul 27, 2020 · 12 comments · Fixed by #751
Assignees
Labels
8 Estimate - xl - Moderately complex, medium effort, some uncertainty.

Comments

@tublitzed
Copy link
Contributor

tublitzed commented Jul 27, 2020

We need to set a quota server side (2GB) and return a error to clients that will cause them to fail to sync (and log details) in this instance. This is to support spanner limitations.

Quota should be configurable server side to avoid need to do a train release to make changes there. Ideally it's some config var that will also prevent us from needing to do a server side release as well.

We also need to add tracking for this on the server side so we can see how many users this is affecting.

Related docs:

@tublitzed tublitzed added this to Backlog: Misc in Services Engineering via automation Jul 27, 2020
@tublitzed tublitzed moved this from Backlog: Misc to Prioritized in Services Engineering Jul 27, 2020
@jrconlin
Copy link
Member

jrconlin commented Jul 27, 2020

See internal doc

(Note: The internal doc is mostly a personal brain dump around tombstones and how they work, since I'm not part of the client dev team. In summary: Tombstones are short records to indicate that something was removed. They generally contain just the date and the collection type, and sometimes an ID. Each collection has it's own tombstone and how it deals with them. There's some discussion about how long a given tombstone should persist. They're used by older clients to note that something was deleted and that it should be deleted locally as well.)

@tublitzed tublitzed moved this from Prioritized to Scheduled in Services Engineering Jul 27, 2020
@tublitzed tublitzed added the 8 Estimate - xl - Moderately complex, medium effort, some uncertainty. label Jul 27, 2020
@tublitzed tublitzed changed the title Set and respect storage quota for spanner Set and respect storage quota per user collection. Jul 27, 2020
@rfk
Copy link
Contributor

rfk commented Jul 28, 2020

FWIW the server protocol docs already reserve 400 Bad Request with error code 14 for "user has exeeded their quota", and there is even a special failure status for this in the desktop client code. I don't think this special failure status provides an satisfying UX or anything like that, but if nothing else it should help the failure show up in an intelligible way in the sync logs.

@jrconlin
Copy link
Member

As noted in the bug: We will definitely use these codes, since we're not doing so currently. I'd like to push for better UX as well and suggest the client be mindful if storage reaches some threshold less than the absolute limit. I suppose the server should probably broadcast what that absolute limit may be since it may vary per user or service.

@tublitzed
Copy link
Contributor Author

tublitzed commented Jul 28, 2020

@jrconlin to confirm that the client does indeed behave as expected re: the existing "exceeded quota" error code, would it be possible to get us to a point where we could confirm that against staging next week via some forced response mechanism?

My thinking is that if for some reason the client (let's assume release versions of all) does not behave as expected, we want to know that ASAP given the desktop release train schedule. Even if we hack together something that forces this response on a normal sync for say, a specific account or something like that, I think that might be enough to confirm this. Probably also if we could set it to some low size too (vs 2GB) to make testing easier.

Then we do the actual implementation work in future weeks/etc. What do you think?

@jrconlin
Copy link
Member

I'll see what I can come up with. It might be possible to have the server return something based on a passed header.

jrconlin added a commit that referenced this issue Jul 29, 2020
`Client-Debug` is a JSON blob that contains what to return.
```json
{
   "status": HTTP_Status_code,
   "code": "Optional Sync Error Code as body",
   "message": "Optional message content",
   "headers": ["Header:Value",...],
}

TODO:
        * test
        * verify body result

Closes: #746
@tublitzed
Copy link
Contributor Author

Thanks JR. I'll track testing results here whenever you're ready.

jrconlin added a commit that referenced this issue Jul 30, 2020
`Client-Debug` is a JSON blob that contains what to return.
```json
{
   "status": HTTP_Status_code,
   "code": Optional Sync Error Code as body,
   "message": "Optional message content",
   "headers": {"Header": "Value",...},
}
```

Please note that `code` and `message` are exclusive. Code is a numeric,
message is a string. If code is present, message will not be dumped into
the body.

*Example*
```
> curl -v --header 'Client-Debug: {"status":418, "headers":{"foo":"bar"}, "code": 16}' http://localhost:8000/__version__
*   Trying 127.0.0.1:8000...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /__version__ HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.68.0
> Accept: */*
> Client-Debug: {"status":418, "headers":{"foo":"bar"}, "code": 16}
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 418 I'm a teapot
< content-length: 2
< content-type: application/json
< foo: bar
< debug: client
< date: Thu, 30 Jul 2020 18:18:46 GMT
<
* Connection #0 to host localhost left intact
16
```

Closes: #746
@jrconlin
Copy link
Member

Per Rachel's requirements:

No random versions. Latest of all. But when you get back, what about:

    Hardcode a few test FxA account ids into the sync response. (ie, my accounts, or QA/etc)
    When those tests ids come through, hardcode a forced response of what we want.
    Also send this error to sentry.
    Deploy to staging only (then we revert changes doesn’t need to go to prod)
    I test with those fxa accounts against staging.
    We confirm we see errors in Sentry.

@tublitzed
Copy link
Contributor Author

@jrconlin to be clear, that's one possible approach here, not requirements :)

What we need to do here is:

  1. Mimic exact client behaviour that we know will trigger this. Ie, try to sync > some threshold, see a returned quota error (or that it's triggered in some way). I'd suggest we go with less than 2GB of data to make testing easier, so if you want to hardcode it to something small, that'd be great.
  2. For desktop - looking at the sync logs should be enough to verify. Latest version of desktop is fine.
  3. For mobile (iOS, Fenix, Fennec) - since we don't have sync logs easily accessible there without running locally I would suggest we send a server side error to sentry so we can confirm this that way. If you have an easier way to confirm, that's fine too.

jrconlin added a commit that referenced this issue Jul 30, 2020
This PR defines a new config option `limits.debug_client`. this is a
comma separated list of UIDs which will receive an 'over_quota` error
whenever they attempt to put a BSO.

This is not meant for production.

e.g.
```
limits.debug_client=1,123
```

Closes #746
@jrconlin jrconlin mentioned this issue Jul 30, 2020
Services Engineering automation moved this from Scheduled to Done Jul 31, 2020
@tublitzed tublitzed moved this from Done to Scheduled in Services Engineering Jul 31, 2020
@tublitzed
Copy link
Contributor Author

Re-opening as what was implemented was the ability to validate client behaviour, not the actual feature. @jrconlin did you want to track that as a separate ticket, or are we good to just keep this one around for it?

@jrconlin
Copy link
Member

jrconlin commented Aug 1, 2020

Let's keep this ticket open for now.

@jrconlin jrconlin reopened this Aug 1, 2020
Services Engineering automation moved this from Scheduled to Prioritized Aug 1, 2020
@tublitzed tublitzed moved this from Prioritized to Scheduled in Services Engineering Aug 3, 2020
jrconlin added a commit that referenced this issue Aug 11, 2020
This PR defines a new config option `limits.debug_client`. this is a
comma separated list of UIDs which will receive an 'over_quota` error
whenever they attempt to put a BSO.

This is not meant for production.

e.g.
```
limits.debug_client=1,123
```

Closes #746
@jrconlin jrconlin moved this from Scheduled to In Progress in Services Engineering Aug 17, 2020
@jrconlin
Copy link
Member

Related issues:
#789
#790
#791

@jrconlin jrconlin moved this from In Progress to In Review in Services Engineering Sep 17, 2020
@jrconlin jrconlin moved this from In Review to Done in Services Engineering Sep 24, 2020
@jrconlin
Copy link
Member

closed as part of #806

@tublitzed tublitzed moved this from Done to Archived in Services Engineering Sep 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8 Estimate - xl - Moderately complex, medium effort, some uncertainty.
Projects
3 participants