Add a sanitize-db.py script. #7172

Merged
merged 4 commits into from Mar 30, 2017

Conversation

Projects
None yet
4 participants
Owner

jameinel commented Mar 29, 2017

Description of change

This generates a Javascript file that allows us to easily sanitize Juju
databases. Both their collection and their txn records are sanitized.

QA steps

  $ juju bootstrap lxd
  $ juju ssh -m controller 0
  $ sudo su -
  $ service jujud-machine-0 stop
  $ ./sanitize-db.py > sanitize.js
  $ mongo CONNECT_ARGS --eval "db.txns.find().forEach(printjson)" > before.txt
  $ mongo CONNECT_ARGS ./sanitize.js
  $ mongo CONNECT_ARGS --eval "db.txns.find().forEach(printjson)" > after.txt

Inspect the database afterward and see that anything that could contain sensitive information like certificates or private keys has been REDACTED or removed completely. You can also vimdiff before.txt after.txt to see what we changed.

Documentation changes

It should probably be referenced in documentation as something that people can use on a mongodump to allow us to introspect their database safely.

Bug reference

Not directly. Was created to help debugging, though.

jameinel added some commits Mar 29, 2017

Add a sanitize.py script.
This generates a Javascript file that allows us to easily sanitize Juju
databases. Both their collection and their txn records are sanitized.

mjs approved these changes Mar 29, 2017

Thanks for doing this. I know it's been asked for before.

Maybe call the file santize-db.py or something?

Lots of nits but good overall.

scripts/sanitize.py
@@ -0,0 +1,78 @@
+#!/usr/bin/env python3
@mjs

mjs Mar 29, 2017

Contributor

Good :)

@jameinel

jameinel Mar 30, 2017

Owner

its intended to be both python2 and 3 since it may get run on trusty, though I'll admit to not testing as much on Trusty.

scripts/sanitize.py
@@ -0,0 +1,78 @@
+#!/usr/bin/env python3
+
+# Copyright 2017 Canonical Ltd.
@mjs

mjs Mar 29, 2017

Contributor

License header?

scripts/sanitize.py
+# Copyright 2017 Canonical Ltd.
+# Generate a Javascript file that can be used to sanitize a Mongo database so
+# it can be shared.
+from __future__ import print_function
@mjs

mjs Mar 29, 2017

Contributor

Given that you're already using Python 3, this is unnecessary

scripts/sanitize.py
+
+# This lists the collections and fields in those collections that need to be sanitized
+to_sanitize = [
+ ('users', ['passwordhash']),
@mjs

mjs Mar 29, 2017

Contributor

Shouldn't these be indented by 4?

scripts/sanitize.py
+ for attribute in attributes:
+ lines.append(' "{}": "REDACTED",'.format(attribute))
+ # replace the last ',' with a '}'
+ lines[-1] = lines[-1][:-1] + '}'
@mjs

mjs Mar 29, 2017

Contributor

Ick. What about something like:

inner = ['    "%s": "REDACTED"' % attribute for attribute in attributes]
lines.append(",\n".join(inner) + "}")
lines.append('}, {"multi": 1}))')
...

Then there's no need to fix up lines that were just generated and the string formatting technique is consistent with the surrounding code.

@jameinel

jameinel Mar 30, 2017

Owner

yeah, I didn't mean to use .format(). I started trying to use it, but it turned out I had to put { and {} in lots of the rest of the code.

I do wish Javascript supported the trailing comma:
[
foo,
bar,
]

(Maybe javascript does but JSON doesn't? regardless changing syntax for the last entry sucks)
(done)

scripts/sanitize.py
+ # Our TXN entries have a '$set' with a literal $ in them.
+ # Apparently Mongo is perfectly fine for you to create documents
+ # and query documents with $, but won't let you "update" documents
+ # with a $ in them. so we just unset those fields instead of setting them to 'REDACTED'.
@mjs

mjs Mar 29, 2017

Contributor

Yes, I've seen this before too. Frustratingly inconsistent.

scripts/sanitize.py
+ lines.append('')
+ lines.append('db.txns.deleteIndex({"o.c": 1})')
+ lines.append('print(new Date().toLocaleString())')
+ return lines
@mjs

mjs Mar 29, 2017

Contributor

With the change I suggested above, there's actually no need to build a list. Just yield the lines instead.

scripts/sanitize.py
+
+def main(args):
+ import argparse
+ p = argparse.ArgumentParser(description="""Generate a JavaScript file for Mongo that can sanitize a Juju database.
@mjs

mjs Mar 29, 2017

Contributor

This is would be more readable like this:

p = argparse.ArgumentParser(description="""\
Generate a JavaScript...
scripts/sanitize.py
+This updates both the actual collections as well as the transactions that
+updated or created them. Fields that are sensitive are either converted to
+REDACTED or removed. (we are unable to REDACT some of the update transactions,
+because of limitations with $ characters.)
@mjs

mjs Mar 29, 2017

Contributor

Perhaps make it clearer that this is destructive and therefore dangerous

scripts/sanitize.py
+REDACTED or removed. (we are unable to REDACT some of the update transactions,
+because of limitations with $ characters.)
+""", epilog="""
+Example: ./sanitize.py > sanitize.js && mongo CONNECT_ARGS ./sanitize.js
@mjs

mjs Mar 29, 2017

Contributor

This is more straightforward: ./santize.py | mongo CONNECT_ARGS

scripts/sanitize.py
+""", epilog="""
+Example: ./sanitize.py > sanitize.js && mongo CONNECT_ARGS ./sanitize.js
+""")
+ opts = p.parse_args(args)
@mjs

mjs Mar 29, 2017

Contributor

opts isn't used

@jameinel

jameinel Mar 30, 2017

Owner

it isn't, but if we ever want to add options (like what to set the values to rather than REDACTED), and this gives you "./sanitize-db.py --help".
I could just drop the 'opts' variable since it isn't used, but I'm not sure if that helps.

scripts/sanitize.py
+
+ lines = generateScript()
+ content = '\n'.join(lines) + '\n'
+ print(content)
@mjs

mjs Mar 29, 2017

Contributor

Why build the string in memory before printing it? Just do:

for line in generateScript():
    print(line)
@jameinel

jameinel Mar 30, 2017

Owner

At one point I was using \uFF04 to try to handle the $ issues and was looking to .encode('utf8') but yeah, we could switch this.

Review comment feedback.
Lots of little tweaks to the script as suggested by Menno.
Owner

jameinel commented Mar 30, 2017

$$merge$$

Contributor

jujubot commented Mar 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Mar 30, 2017

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10580

Owner

jameinel commented Mar 30, 2017

$$merge$$ the 'grant' test failed, which this definitely doesn't exercise. The failure is a bit obscure, it looks like it was looking for a "password:" prompt, but never got it.

Contributor

jujubot commented Mar 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Mar 30, 2017

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10582

Member

anastasiamac commented Mar 30, 2017

$$merge$$

Contributor

jujubot commented Mar 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Mar 30, 2017

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10584

Owner

jameinel commented Mar 30, 2017

$$merge$$

@jameinel jameinel changed the title from Add a sanitize.py script. to Add a sanitize-db.py script. Mar 30, 2017

Contributor

jujubot commented Mar 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Mar 30, 2017

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10588

Owner

jameinel commented Mar 30, 2017

$$merge$$ failed with grant again

Contributor

jujubot commented Mar 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit 347f1a9 into juju:develop Mar 30, 2017

@jameinel jameinel deleted the jameinel:2.1-mongo-sanitize-1676427 branch Apr 22, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment