We need to be able to restrict the whisper importer utility to a single org #1334

replay · 2019-05-28T22:59:58Z

Currently when we spin up the importer-writer utility, then it will just write all the chunk write requests it receives into the defined keyspace. But we need to be able to limit which orgs we want to write to, in order to not accidentally write into the wrong one.

fixes #1334

woodsaj · 2019-05-29T06:14:45Z

I think this problem needs a bit more thought.
I dont think that we should put restrictions on mt-whisper-importer-writer. I think the better way to handle this is in the same way we handle metric ingestion, ie we have tsdb-gw authenticate requests and do org-id enforcement.

replay · 2019-05-29T21:57:12Z

That's a good idea. If we support the same authentication mechanism that the normal metric ingestion uses maybe we won't need to deploy the importer-writer every time we need it for a customer on the shared instance. The shared instance could just always have an importer-writer pod running, and if a customer wants to import data they'd simply run the importer-reader with the same authentication key that they use for ingest.

replay · 2019-05-29T23:30:22Z

How about this: 0e0a5cd

Dieterbe · 2019-05-30T07:18:39Z

i don't think we should move auth into any of the mt services
we already have decent separation of concerns: tsdb-gw doing auth, and after auth, forwarding/proxying requests to MT. why can't we do the same for the importer? just put a tsdb-gw in front which takes care of the auth, that's what it's for. it should only require updating tsdbgw to support this http path

woodsaj · 2019-05-30T08:49:55Z

I agree with @Dieterbe. Authentication should continue to remain out-of-scope for metrictank.
importer-writer should handle auth the same way as metrictank, ie just expect the 'x-org-id' header.

In addition to just gating access, we also need to ensure that the orgId used in archives matches the orgId of the authenticated user. For metrics, we currently handle this in tsdb-gw. We could also handle it there for the importer, but that would mean that the gateway needs to decode the archives and validate that the correct orgId is being used (we could also potentially just modify the orgId in archives before forwarding to the importer, we do that for metrics). But this seems like unnecessary overhead for archives. Instead i think we could design it like so:

importer-writer should have a cli flag for orgId enforcement. If enabled then importer-writer will expect requests to have a 'x-org-id' header. The value of the header will be the orgId that should be used to validate archives (i am undecided as to whether we should reject or just rewrite).

For GrafanaCloud we then just need to add a proxy route to tsdb-gw for the importer, with the "x-org-id" header being added before proxying the request. This is exactly what we do for graphite/* requests.

Dieterbe · 2019-05-30T10:08:21Z

I see 3 main options:

enforce proper orgid in tsdbgw-auth (rewriting orgid based on auth like ingest)

con: extra decoding (and possibly encoding in case of auth mismatch - rare) overhead

enforce proper orgid in imporer-writer via a cli flag

con: we can't just have an importer-writer running to satisfy imports of different / arbitrary customers.

enforce proper orgid in imporer-writer by tsdbgw specifying it to importer-writer (e.g. via the http x-org-id header)

pro: doesn't have the cons of above approaches
con: this doesn't seem very clean: data on the wire may be inconsistent, and do we want this concern in the mt-importer-writer? it's also inconsistent with our ingestion path.

Note also that the ArchiveRequest not only contains an orgid in its MetricData property, but each contained ChunkWriteRequest has an orgid in the AMKey.
so regardless of which services takes care of auth/validation, either:

we should also validate that the orgid in the AMKey reflects the orgid of the MetricData (in addition to checking the orgid against the auth)
the ArchiveRequest should use a slightly different type of CWR, one that uses a new key type AKey (archive key, but not containing an orgid). it should be trivial to convert this CWR type into the one we already have. this enables us to validate much quicker (only in 1 place per ArchiveRequest instead of N), and in theory it means we only need to decode the beginning of the ArchiveRequest instead of having to decode the whole thing (not sure how practical this is though)

woodsaj · 2019-05-30T11:58:44Z

option 1 is not viable due to the overhead of decoding and re-encoding before forwarding from tsdb-gw to importer-writer.
option 2 is out as it prevents us from sharing the same importer-writers for multiple users/customers
option 3 is the clear winner for me. I dont share @Dieterbe's opinions on the cons.
- how is the solution not clean? Using an x-org-id header to a) confirm a request has been authenticated and b) identify who authenticated is exactly the approach used in metrictank. I feel that is a very clean solution
- 'data on the wire may be inconsistent' of course, that is always true regardless of how requests are authenticated. And is precisely why we need to validate payloads before blindly pushing them to the backend store.
- 'do we want this concern in the mt-importer-writer?' Definitely yes, it wouldn't make sense to validate payloads anywhere else.
- 'it's also inconsistent with our ingestion path.' In what way is it inconsistent? The only reason we do validation in tsdb-gw is because metrics are processed asynchronously by metrictank. If metrictank did the validation we would not be able to notify the users that validation was failing. Because processing archives is a synchronous operation you cant compare it with metric ingestion. You can compare it with graphite render queries though, which are also HTTP POSTs that simply get proxied to the backend with the x-org-id header.

the ArchiveRequest should use a slightly different type of CWR,

I do agree with this. The only change we need is to use Archive schema.Archive instead of Key schema.AMKey. When consuming the ArchiveRequest we would then

validate the MetricData (ie set orgId to x-org-id and re-run SetId() to reset the id.)
for each writeRequest in the archive we would then just create a new ChunkWriteRequest setting Key to schema.AMKey{MKey: mkey, Archive: writeRequest.Archive}

replay · 2019-05-30T14:21:35Z

I like 3) as well. If we do that, does the user even need to specify the org id when starting the reader? If the user will need to specify their authentication credentials anyway, then we might as well just set the org id in the MetricData to 0, then the writer can just take the id from the X-Org-Id header and add it to the MetricData before writing to the store. That way the user has one less option to specify when using the reader, so usability is better: https://github.com/grafana/metrictank/blob/master/cmd/mt-whisper-importer-reader/main.go#L51

Dieterbe · 2019-05-30T18:16:22Z

No point arguing about minor nuances. I'm fine with option 3.
@replay regarding your question, probably you're right. but i'ld like @woodsaj to confirm.

replay · 2019-05-31T02:59:29Z

This is now involving 3 PRs (and I'll need to create one for raintank/schema too):
#1335
https://github.com/raintank/tsdb-gw/pull/164
https://github.com/raintank/hosted-metrics-api/pull/241
I did a test deploy in my QA instance and so far it looks good. I can authenticate, and when authenticated I can reach the importer. I haven't done a test import yet, I'll do that tomorrow.

replay · 2019-06-03T14:26:20Z

I have done more testing of this on Friday (and added a bunch of fixes). If you get a chance, these two are still waiting for review:

#1335
https://github.com/raintank/hosted-metrics-api/pull/241

Dieterbe · 2019-06-07T09:06:22Z

With all this discussion having happened I think a synopsis is worthwhile. Seems it all boils down to:

tsdbgw provides auth and proxies to importer writer, setting the x-org-id and passing along the client's ArchiveRequests unaltered. <- merged
currently ArchiveRequest has orgid and AMKey in each CWR which may not match x-org-id. thus we will use Archive schema.Archive instead of Key schema.AMKey for the CWR. after validating the MetricData orgid, we can then generate the proper key for the CWR
importer-writer has no need for a cli flag for orgId enforcement, and can accept whatever was sent to it (once 1 and 2 are implemented)
importer-reader needs no orgid option, as that'll be set automatically via auth.

replay · 2019-06-07T13:57:29Z

that's correct as you described. the only PR left to merge is the MT one (#1335), the HM-API & tsdb are already merged

replay self-assigned this May 28, 2019

replay added a commit that referenced this issue May 28, 2019

limit importer-writer to specific org id

b9cda00

fixes #1334

replay mentioned this issue May 28, 2019

Make the importer utilities rely on TSDB-GW for authentication and org-association #1335

Merged

replay added a commit that referenced this issue May 28, 2019

limit importer-writer to specific org id

e20989d

fixes #1334

replay added a commit that referenced this issue May 28, 2019

limit importer-writer to specific org id

b9ecb6a

fixes #1334

Dieterbe closed this as completed in #1335 Jun 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We need to be able to restrict the whisper importer utility to a single org #1334

We need to be able to restrict the whisper importer utility to a single org #1334

replay commented May 28, 2019

woodsaj commented May 29, 2019

replay commented May 29, 2019

replay commented May 29, 2019

Dieterbe commented May 30, 2019 •

edited

Loading

woodsaj commented May 30, 2019

Dieterbe commented May 30, 2019 •

edited

Loading

woodsaj commented May 30, 2019 •

edited

Loading

replay commented May 30, 2019 •

edited

Loading

Dieterbe commented May 30, 2019

replay commented May 31, 2019 •

edited

Loading

replay commented Jun 3, 2019

Dieterbe commented Jun 7, 2019 •

edited

Loading

replay commented Jun 7, 2019

We need to be able to restrict the whisper importer utility to a single org #1334

We need to be able to restrict the whisper importer utility to a single org #1334

Comments

replay commented May 28, 2019

woodsaj commented May 29, 2019

replay commented May 29, 2019

replay commented May 29, 2019

Dieterbe commented May 30, 2019 • edited Loading

woodsaj commented May 30, 2019

Dieterbe commented May 30, 2019 • edited Loading

enforce proper orgid in tsdbgw-auth (rewriting orgid based on auth like ingest)

enforce proper orgid in imporer-writer via a cli flag

enforce proper orgid in imporer-writer by tsdbgw specifying it to importer-writer (e.g. via the http x-org-id header)

woodsaj commented May 30, 2019 • edited Loading

replay commented May 30, 2019 • edited Loading

Dieterbe commented May 30, 2019

replay commented May 31, 2019 • edited Loading

replay commented Jun 3, 2019

Dieterbe commented Jun 7, 2019 • edited Loading

replay commented Jun 7, 2019

Dieterbe commented May 30, 2019 •

edited

Loading

Dieterbe commented May 30, 2019 •

edited

Loading

woodsaj commented May 30, 2019 •

edited

Loading

replay commented May 30, 2019 •

edited

Loading

replay commented May 31, 2019 •

edited

Loading

Dieterbe commented Jun 7, 2019 •

edited

Loading