anonymise data for easier sharing [$30 awarded] #265

Closed
rodlogic opened this Issue May 28, 2015 · 3 comments

Projects

None yet

2 participants

@rodlogic
rodlogic commented May 28, 2015 edited

This is mostly a thought that occurred with it's value quite debatable.

It seems to me that one of the reasons we always talk about it but never show how we organize our ledger files in it's entirety because the data contained in it's contents are obviously quite private. It would be nice if at any moment I could create a complete copy of my files with account numbers and descriptions changed, and amounts normalized (e.g. from 0-1000 instead of the real ones). I.e. a fully functional ledger database but without any real data behind it.

Maybe that could trigger the generation of some pre-defined databases for different kinds of use cases (business, consultant, trading, etc).


The $30 bounty on this issue has been claimed at Bountysource.

@simonmichael
Owner

Exactly right. Ledger has the --anon flag for this reason. We could implement a similar feature, and hopefully improve it, with potentially very good effects for docs, support, etc. Some details to consider:

  • what to anonymise - dates, descriptions, account names, amounts, comments, tags, file paths, journal stats...
  • can we generate human-friendly names rather than non-mnemonic codes
  • CLI
  • interactions with commands, like add
@simonmichael simonmichael added the WISH label May 28, 2015
@simonmichael simonmichael changed the title from Generate a private mode copy of hledger files to anonymised hledger output May 29, 2015
@simonmichael simonmichael changed the title from anonymised hledger output to anonymised output May 29, 2015
@simonmichael simonmichael changed the title from anonymised output to anonymise journal data May 29, 2015
@simonmichael simonmichael changed the title from anonymise journal data to anonymise data for easier sharing May 29, 2015
@simonmichael simonmichael added the bounty label May 29, 2015
@simonmichael simonmichael added this to the 1.0 milestone Sep 2, 2015
@simonmichael
Owner

Bounty increased.

@bascott bascott referenced this issue Oct 7, 2016
Merged

Anon feature #412

@simonmichael simonmichael closed this in #412 Oct 26, 2016
@simonmichael
Owner

Leaving this closed so bounty can be claimed. Cc'ing the below here for context and next steps:

For me the vision for this is making it easy to share non-trivial examples drawn from our real-world data and usage. Creating realistic synthetic examples is a lot of work. Generally we don't like to reveal our actual data for privacy and security reasons. If we could remove those concerns just by adding a flag, it would open up a lot more troubleshooting, support and sharing of tips and examples.

The threat model is that we share some "anonymized" data thinking it gives away no private information, when actually it's quite easy for a motivated person or bot to "decrypt" it and learn a lot about your finances and life.

After thinking about it slightly, it's a rather interesting challenge, since crypto is hard and also because of the goal of generating useful shareable examples. For that, we'd like to preserve constraints such as balanced transactions, balance assertions passing, balance zero points, prices etc. Also we'd like to generate easy human-readable names instead of random-looking data. I don't know how far we will go, and it may need a lot of experimentation. What we have so far is good, and when we are able to obfuscate amounts in some non-reversible way it should help a lot.

@simonmichael simonmichael modified the milestone: 1.0, post 1.0 Oct 31, 2016
@simonmichael simonmichael changed the title from anonymise data for easier sharing to anonymise data for easier sharing [$30 awarded] Nov 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment