bug 1567320: add support for funnel_experiment and funnel_variation a… #95
Conversation
|
r+wc |
| if len(unEscapedCode) > 200 { | ||
| logEntry.WithField("code_len", len(code)).Error("code longer than 200 characters") | ||
| return "", errors.New("code longer than 200 characters") | ||
| if len(unEscapedCode) > 400 { |
hoosteeno
Dec 17, 2019
Contributor
We've talked about increasing this to e.g. 600 or 800 to make sure we can accommodate a long UA string if needed.
We've talked about increasing this to e.g. 600 or 800 to make sure we can accommodate a long UA string if needed.
Mardak
Dec 17, 2019
Making a length increase should be fine long term as long as we keep in mind there are existing Firefox releases that reject attribution over 200 characters: https://hg.mozilla.org/releases/mozilla-release/file/2c8be14a89c92e56dbfc10ad32284e628a517a93/browser/components/attribution/AttributionCode.jsm#l24
Making a length increase should be fine long term as long as we keep in mind there are existing Firefox releases that reject attribution over 200 characters: https://hg.mozilla.org/releases/mozilla-release/file/2c8be14a89c92e56dbfc10ad32284e628a517a93/browser/components/attribution/AttributionCode.jsm#l24
mixedpuppy
Dec 17, 2019
I thought the size limitation had something to do with the stub installer.
I thought the size limitation had something to do with the stub installer.
Mardak
Dec 19, 2019
I have a patch that increases the length firefox accepts: https://phabricator.services.mozilla.com/D57906
I have a patch that increases the length firefox accepts: https://phabricator.services.mozilla.com/D57906
| "funnel_experiment": true, | ||
| "funnel_variation": true, |
Mardak
Dec 17, 2019
@mixedpuppy just making sure, on windows, the expected keys have no funnel_? https://hg.mozilla.org/releases/mozilla-release/file/2c8be14a89c92e56dbfc10ad32284e628a517a93/browser/components/attribution/AttributionCode.jsm#l66
@mixedpuppy just making sure, on windows, the expected keys have no funnel_? https://hg.mozilla.org/releases/mozilla-release/file/2c8be14a89c92e56dbfc10ad32284e628a517a93/browser/components/attribution/AttributionCode.jsm#l66
hoosteeno
Dec 17, 2019
Contributor
We should aim for consistency here. funnel_ is not adding anything to these parameters. If Firefox expects experiment and variation, then we should plan on those being sent all the way from the start. Bedrock work hasn't begun, so we can get this in.
We should aim for consistency here. funnel_ is not adding anything to these parameters. If Firefox expects experiment and variation, then we should plan on those being sent all the way from the start. Bedrock work hasn't begun, so we can get this in.
mixedpuppy
Dec 17, 2019
IIRC funnel_ was requested. On windows portions are stripped somewhere along the way to reduce data size I think, but they are not for osx (which doesn't work anyway, I noticed Mardak comment on some bug about using the quarantine data, that's what we tried).
IIRC funnel_ was requested. On windows portions are stripped somewhere along the way to reduce data size I think, but they are not for osx (which doesn't work anyway, I noticed Mardak comment on some bug about using the quarantine data, that's what we tried).
Mardak
Dec 17, 2019
•
Looks like this was the last comment in the firefox bug re: prefix or not:
https://bugzilla.mozilla.org/show_bug.cgi?id=1515172#c37
This seems to say the url may contain "utm_" and "funnel_" that then end up getting stripped before stored as attribution.
Looking at the AttributionCode.jsm logic, it does seem to support "utm_" and "funnel_" and no-prefix for url-based attribution for mac while on windows, it only wants the stripped versions, e.g., "source" "medium" as before and the new "experiment"
Looks like this was the last comment in the firefox bug re: prefix or not:
https://bugzilla.mozilla.org/show_bug.cgi?id=1515172#c37
This seems to say the url may contain "utm_" and "funnel_" that then end up getting stripped before stored as attribution.
Looking at the AttributionCode.jsm logic, it does seem to support "utm_" and "funnel_" and no-prefix for url-based attribution for mac while on windows, it only wants the stripped versions, e.g., "source" "medium" as before and the new "experiment"
mixedpuppy
Dec 17, 2019
prefixed is supported of osx. non-prefixed is for windows. The stub installer is limited to 200 characters.
prefixed is supported of osx. non-prefixed is for windows. The stub installer is limited to 200 characters.
hoosteeno
Dec 17, 2019
Contributor
prefixed is supported of osx. non-prefixed is for windows. The stub installer is limited to 200 characters.
Let's walk backwards:
- does telemetry require any particular param format?
- does the stub binary require any particular param format?
- can we simplify to avoid different behavior per platform?
Since we're working on the stubattribution service here, presumably we can plan to support those requirements.
And since we're working on Bedrock in mozilla/bedrock#7474, we can plan to support them there, too.
Where, specifically, is the constraint? In https://bugzilla.mozilla.org/show_bug.cgi?id=1406005#c49, mhowell says the stub binary is limited to 1010. In this PR, we can see that it was limited to 200, and is now set to 400, and I'm requesting 600-800. Are there any other hard constraints to be aware of?
prefixed is supported of osx. non-prefixed is for windows. The stub installer is limited to 200 characters.
Let's walk backwards:
- does telemetry require any particular param format?
- does the stub binary require any particular param format?
- can we simplify to avoid different behavior per platform?
Since we're working on the stubattribution service here, presumably we can plan to support those requirements.
And since we're working on Bedrock in mozilla/bedrock#7474, we can plan to support them there, too.
Where, specifically, is the constraint? In https://bugzilla.mozilla.org/show_bug.cgi?id=1406005#c49, mhowell says the stub binary is limited to 1010. In this PR, we can see that it was limited to 200, and is now set to 400, and I'm requesting 600-800. Are there any other hard constraints to be aware of?
mixedpuppy
Dec 17, 2019
I've no opinions here, just attempting to relay what I can remember (hopefully relatively accurate) from when I looked at this last...
I didn't run into anything with specific limits/formats in the telemetry data.
To my knowledge the windows stub binary wouldn't require any specific format. The download service that builds the stub might, and IIRC that is a) where the prefix is stripped, b) has a hard coded list of what keys are allowed into the stub.
I'd verify in current code what the current limit is. Seems like 1000 as mentioned should be enough.
osx attributions (if a workaround is ever figured out) happen via the quarantine database, so the param keys are unchanged from whatever the referrer url was. That is why the service tries to support both with/without prefix.
I've no opinions here, just attempting to relay what I can remember (hopefully relatively accurate) from when I looked at this last...
I didn't run into anything with specific limits/formats in the telemetry data.
To my knowledge the windows stub binary wouldn't require any specific format. The download service that builds the stub might, and IIRC that is a) where the prefix is stripped, b) has a hard coded list of what keys are allowed into the stub.
I'd verify in current code what the current limit is. Seems like 1000 as mentioned should be enough.
osx attributions (if a workaround is ever figured out) happen via the quarantine database, so the param keys are unchanged from whatever the referrer url was. That is why the service tries to support both with/without prefix.
Mardak
Dec 17, 2019
It looks like the time this logic here (stubattribution validator.go) is used, there's a combined attribution_code that is parsed and handles keys like "source" and pretty sure there's no knowledge of "utm" anywhere in this repository.
From my searching around, stripping of "utm_" happens on bedrock backend as part of the hmac signature generation process which currently accepts {utm_source, utm_medium, …} json and returns {attribution_code, attribution_sig} json where the "attribution_code" has the concatenated prefix-stripped keys.
So yes it looks like validator.go should use "experiment" and "variation" without "funnel_" just like it doesn't currently expect "utm_*". And these prefix-less attribution will work with existing windows firefox 71.
(And attribution doesn't work quite right on osx anyway, and doesn't get called from bedrock anyway, so we don't really need to worry about compatibility yet. https://bugzilla.mozilla.org/show_bug.cgi?id=1525076 )
It looks like the time this logic here (stubattribution validator.go) is used, there's a combined attribution_code that is parsed and handles keys like "source" and pretty sure there's no knowledge of "utm" anywhere in this repository.
From my searching around, stripping of "utm_" happens on bedrock backend as part of the hmac signature generation process which currently accepts {utm_source, utm_medium, …} json and returns {attribution_code, attribution_sig} json where the "attribution_code" has the concatenated prefix-stripped keys.
So yes it looks like validator.go should use "experiment" and "variation" without "funnel_" just like it doesn't currently expect "utm_*". And these prefix-less attribution will work with existing windows firefox 71.
(And attribution doesn't work quite right on osx anyway, and doesn't get called from bedrock anyway, so we don't really need to worry about compatibility yet. https://bugzilla.mozilla.org/show_bug.cgi?id=1525076 )
|
I can't review for code quality but the implementation changes in the most recent 2 commits look good to me. r+ |
| @@ -148,6 +148,8 @@ func pingdomHandler(w http.ResponseWriter, req *http.Request) { | |||
| attrQuery.Set("medium", "pingdom") | |||
| attrQuery.Set("campaign", "pingdom") | |||
| attrQuery.Set("content", "pingdom") | |||
| attrQuery.Set("funnel_experiment", "pingdom") | |||
| attrQuery.Set("funnel_variation", "pingdom") | |||
Mardak
Jan 6, 2020
I'm not sure how this pingdom stuff works (or if it's used?), but I would guess the query params should be the funnel_-less versions.
I'm not sure how this pingdom stuff works (or if it's used?), but I would guess the query params should be the funnel_-less versions.
| "campaign": true, | ||
| "content": true, | ||
| "experiment": true, | ||
| "variation": true, |
Mardak
Jan 6, 2020
Just noting, and we probably don't need to change anything here -- this allows experiment or variation to be set without the other, which could totally be valid, e.g., some experiment with no variation compared against a baseline no-experiment.
Just noting, and we probably don't need to change anything here -- this allows experiment or variation to be set without the other, which could totally be valid, e.g., some experiment with no variation compared against a baseline no-experiment.
mixedpuppy
Jan 7, 2020
Oh, does this need "ua"? https://searchfox.org/mozilla-central/rev/a92ed79b0bc746159fc31af1586adbfa9e45e264/browser/components/attribution/AttributionCode.jsm#35
Probably could use a comment in the code for validAttributionKeys that it should match what is in firefox also.
Oh, does this need "ua"? https://searchfox.org/mozilla-central/rev/a92ed79b0bc746159fc31af1586adbfa9e45e264/browser/components/attribution/AttributionCode.jsm#35
Probably could use a comment in the code for validAttributionKeys that it should match what is in firefox also.
| "net/url" | ||
| "time" | ||
|
|
||
| "github.com/pkg/errors" | ||
| "github.com/sirupsen/logrus" | ||
| ) | ||
|
|
||
| const maxUnescapedCodeLen = 600 |
Mardak
Jan 6, 2020
Also just noting and probably don't need to change anything here either -- Firefox does a length check against the escaped code length while the check here is against the unescaped length.
Also just noting and probably don't need to change anything here either -- Firefox does a length check against the escaped code length while the check here is against the unescaped length.
mixedpuppy
Jan 7, 2020
At least, I think it should match what fx has, with a comment pointing to it.
At least, I think it should match what fx has, with a comment pointing to it.
|
Main potential concern is the |
|
This all looks ok functionally, just my question about the set of required attributes should be answered prior to landing. |
| "net/url" | ||
| "time" | ||
|
|
||
| "github.com/pkg/errors" | ||
| "github.com/sirupsen/logrus" | ||
| ) | ||
|
|
||
| const maxUnescapedCodeLen = 600 |
mixedpuppy
Jan 7, 2020
At least, I think it should match what fx has, with a comment pointing to it.
At least, I think it should match what fx has, with a comment pointing to it.
| "variation": true, | ||
| } | ||
|
|
||
| var requiredAttributionKeys = []string{ |
mixedpuppy
Jan 7, 2020
I'm not familiar with the code here, so just going on what I see here, name indicates to me that all these must be present.
Do we really require all these to be present? At this point I don't remember what AMO passes in (or if that functionality still exists on AMO, or if this change would affect that). Perhaps @jvillalobos can answer the AMO part.
I'm not familiar with the code here, so just going on what I see here, name indicates to me that all these must be present.
Do we really require all these to be present? At this point I don't remember what AMO passes in (or if that functionality still exists on AMO, or if this change would affect that). Perhaps @jvillalobos can answer the AMO part.
jvillalobos
Jan 7, 2020
•
For links going to AMO we require source and medium and, depending on the case, campaign or content.
For links going to AMO we require source and medium and, depending on the case, campaign or content.
mixedpuppy
Jan 7, 2020
Ok, if we do not always include campaign or content, then we probably shouldn't have a set of required attributes here.
Ok, if we do not always include campaign or content, then we probably shouldn't have a set of required attributes here.
hoosteeno
Jan 7, 2020
•
Contributor
I don't have all the details about the decision to make attributes required, but I do suspect it relates in part to concerns about passing web history into client telemetry (which is a data classification concern). That is primarily an issue with the "source" param, hinted at in an origin bug for this feature. Since we only allow attribution for a subset of possible sources, we have to require source.
It's also true that without a source and some other value, attribution isn't particularly useful. Source is required; medium and campaign are very valuable; content is not always helpful, and often omitted.
On www, we fill up some empty params with default values to ensure they can pass this validation, which does not provide any information value over simply making them optional.
I don't have all the details about the decision to make attributes required, but I do suspect it relates in part to concerns about passing web history into client telemetry (which is a data classification concern). That is primarily an issue with the "source" param, hinted at in an origin bug for this feature. Since we only allow attribution for a subset of possible sources, we have to require source.
It's also true that without a source and some other value, attribution isn't particularly useful. Source is required; medium and campaign are very valuable; content is not always helpful, and often omitted.
On www, we fill up some empty params with default values to ensure they can pass this validation, which does not provide any information value over simply making them optional.
mixedpuppy
Jan 7, 2020
On www, we fill up some empty params with default values to ensure they can pass this validation, which does not provide any information value over simply making them optional.
Ahh, that might be reasonable then, but a code comment here explaining that would be helpful.
On www, we fill up some empty params with default values to ensure they can pass this validation, which does not provide any information value over simply making them optional.
Ahh, that might be reasonable then, but a code comment here explaining that would be helpful.
|
Sorry for the late change, with Firefox 73 supporting Hopefully this should be relatively straightforward with the existing refactoring done in this PR. |
|
@Mardak I think I've addressed all your concerts, will you please you re-review? |
|
|
||
| if code == "" { | ||
| logEntry.Error("code is empty") | ||
| return "", errors.New("code is empty") |
Mardak
Jan 9, 2020
When does this trigger? Or is the validCodes test with "" wrong? Should empty code result in the 3 "not set" and "other" source?
When does this trigger? Or is the validCodes test with "" wrong? Should empty code result in the 3 "not set" and "other" source?
Mardak
Jan 9, 2020
Oh I see fixed in 5cfc62f
Oh I see fixed in 5cfc62f
oremj
Jan 9, 2020
Author
Collaborator
This only triggers if attributioncode="" or no attribution code parameter is in the query string. In that case, it just redirects to a standard non-attributed installer.
This only triggers if attributioncode="" or no attribution code parameter is in the query string. In that case, it just redirects to a standard non-attributed installer.
oremj
Jan 9, 2020
Author
Collaborator
I've also updated the tests.
I've also updated the tests.
…ttribution keys