Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Add a sanitize-db.py script. #7172
Conversation
jameinel
added some commits
Mar 29, 2017
mjs
approved these changes
Mar 29, 2017
Thanks for doing this. I know it's been asked for before.
Maybe call the file santize-db.py or something?
Lots of nits but good overall.
| @@ -0,0 +1,78 @@ | ||
| +#!/usr/bin/env python3 |
jameinel
Mar 30, 2017
Owner
its intended to be both python2 and 3 since it may get run on trusty, though I'll admit to not testing as much on Trusty.
| @@ -0,0 +1,78 @@ | ||
| +#!/usr/bin/env python3 | ||
| + | ||
| +# Copyright 2017 Canonical Ltd. |
| +# Copyright 2017 Canonical Ltd. | ||
| +# Generate a Javascript file that can be used to sanitize a Mongo database so | ||
| +# it can be shared. | ||
| +from __future__ import print_function |
| + | ||
| +# This lists the collections and fields in those collections that need to be sanitized | ||
| +to_sanitize = [ | ||
| + ('users', ['passwordhash']), |
| + for attribute in attributes: | ||
| + lines.append(' "{}": "REDACTED",'.format(attribute)) | ||
| + # replace the last ',' with a '}' | ||
| + lines[-1] = lines[-1][:-1] + '}' |
mjs
Mar 29, 2017
Contributor
Ick. What about something like:
inner = [' "%s": "REDACTED"' % attribute for attribute in attributes]
lines.append(",\n".join(inner) + "}")
lines.append('}, {"multi": 1}))')
...Then there's no need to fix up lines that were just generated and the string formatting technique is consistent with the surrounding code.
jameinel
Mar 30, 2017
Owner
yeah, I didn't mean to use .format(). I started trying to use it, but it turned out I had to put { and {} in lots of the rest of the code.
I do wish Javascript supported the trailing comma:
[
foo,
bar,
]
(Maybe javascript does but JSON doesn't? regardless changing syntax for the last entry sucks)
(done)
| + # Our TXN entries have a '$set' with a literal $ in them. | ||
| + # Apparently Mongo is perfectly fine for you to create documents | ||
| + # and query documents with $, but won't let you "update" documents | ||
| + # with a $ in them. so we just unset those fields instead of setting them to 'REDACTED'. |
| + lines.append('') | ||
| + lines.append('db.txns.deleteIndex({"o.c": 1})') | ||
| + lines.append('print(new Date().toLocaleString())') | ||
| + return lines |
mjs
Mar 29, 2017
Contributor
With the change I suggested above, there's actually no need to build a list. Just yield the lines instead.
| + | ||
| +def main(args): | ||
| + import argparse | ||
| + p = argparse.ArgumentParser(description="""Generate a JavaScript file for Mongo that can sanitize a Juju database. |
mjs
Mar 29, 2017
Contributor
This is would be more readable like this:
p = argparse.ArgumentParser(description="""\
Generate a JavaScript...| +This updates both the actual collections as well as the transactions that | ||
| +updated or created them. Fields that are sensitive are either converted to | ||
| +REDACTED or removed. (we are unable to REDACT some of the update transactions, | ||
| +because of limitations with $ characters.) |
mjs
Mar 29, 2017
Contributor
Perhaps make it clearer that this is destructive and therefore dangerous
| +REDACTED or removed. (we are unable to REDACT some of the update transactions, | ||
| +because of limitations with $ characters.) | ||
| +""", epilog=""" | ||
| +Example: ./sanitize.py > sanitize.js && mongo CONNECT_ARGS ./sanitize.js |
| +""", epilog=""" | ||
| +Example: ./sanitize.py > sanitize.js && mongo CONNECT_ARGS ./sanitize.js | ||
| +""") | ||
| + opts = p.parse_args(args) |
jameinel
Mar 30, 2017
Owner
it isn't, but if we ever want to add options (like what to set the values to rather than REDACTED), and this gives you "./sanitize-db.py --help".
I could just drop the 'opts' variable since it isn't used, but I'm not sure if that helps.
| + | ||
| + lines = generateScript() | ||
| + content = '\n'.join(lines) + '\n' | ||
| + print(content) |
mjs
Mar 29, 2017
Contributor
Why build the string in memory before printing it? Just do:
for line in generateScript():
print(line)
jameinel
Mar 30, 2017
Owner
At one point I was using \uFF04 to try to handle the $ issues and was looking to .encode('utf8') but yeah, we could switch this.
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
|
Build failed: Tests failed |
|
$$merge$$ the 'grant' test failed, which this definitely doesn't exercise. The failure is a bit obscure, it looks like it was looking for a "password:" prompt, but never got it. |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
|
Build failed: Tests failed |
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
|
Build failed: Tests failed |
|
$$merge$$ |
jameinel
changed the title from
Add a sanitize.py script.
to
Add a sanitize-db.py script.
Mar 30, 2017
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
|
Build failed: Tests failed |
|
$$merge$$ failed with grant again |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
jameinel commentedMar 29, 2017
•
Edited 1 time
-
jameinel
Mar 30, 2017
Description of change
This generates a Javascript file that allows us to easily sanitize Juju
databases. Both their collection and their txn records are sanitized.
QA steps
Inspect the database afterward and see that anything that could contain sensitive information like certificates or private keys has been REDACTED or removed completely. You can also
vimdiff before.txt after.txtto see what we changed.Documentation changes
It should probably be referenced in documentation as something that people can use on a mongodump to allow us to introspect their database safely.
Bug reference
Not directly. Was created to help debugging, though.