-
Notifications
You must be signed in to change notification settings - Fork 18
fixes bug 858245 - Removal of PII from public web API #364
fixes bug 858245 - Removal of PII from public web API #364
Conversation
@rhelmer looking at you for the final verdict. @lauraxt @AdrianGaudebert @brandonsavage Would appreciate feedback from you folks too. |
@AdrianGaudebert note that I change the scrubber quite substantially. It would be inefficient to re-write the dicts with a new copy so I made it default that it changes it in-place instead. |
'os_name', | ||
'uuid', | ||
'hangid', | ||
'url', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a lot like it would be a URL field that should not be actually exposed. Do I get that wrong?
The code looks excellent, as usual. I think you forgot to add a whitelist to a few services though. From the code that means those services won't be accessible (they raise an APIWhitelistError) and I don't think that's what we want. Last but not least, I must say I really don't like this whitelist thing. I think it's going to be a pain to maintain. I would be in favor of having a whitelist only when we want to filter something out, and thus have most services just not have one. I do not suspect that we will often add a new PII field to an existing service, and that is our responsibility for making sure that field is secured everywhere. Yeah, that's exactly the opposite of what we previously agreed on. I wanted to write my feeling about that but I am not going to fight over it. If you guys think this way is fine, let's do it. |
If there are models that lack Can you see which models in particular they are? Regarding the vastly verbosity of writing down all these mundane fields, I'll invite Benjamin (who is much more experienced than us combined) to participate. |
Merged cddb33b |
The way this basically works is that each model now lists their keys. But, because the structure of the response (from the middleware) often very different, the way you list keys needs to be very flexible. The simplest example is:
A most complex on can be this:
How to scrub data is best explained with an example:
"Cleaning" is the only form of scrubbing (apart from removing fields entirely) that is implemented. The cleaning is done by doing replacements with an empty string
''
.Some models have NO whitelisting at all. That's because my gut tells me they'll never ever contain PII. These are:
There is only one model that is now entirely blacklisted:
There are a lot of models now that list ALL fields available. This means I might as well have switched off whitelisting (e.g. setting
API_WHITELIST = None
) but I felt this is easier to see and it makes it explicit what's going on. Basically, where this does NOT happen I write down what I deliberately exclude. E.g.:NOTE this PR manually puts the API back into action after it was disabled with this commit