Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Answers Anonymous in the Statistics Tab #6007

Open
brianrodri opened this issue Dec 19, 2018 · 10 comments
Open

Make Answers Anonymous in the Statistics Tab #6007

brianrodri opened this issue Dec 19, 2018 · 10 comments
Labels
bug Label to indicate an issue is a regression full-stack important

Comments

@brianrodri
Copy link
Contributor

brianrodri commented Dec 19, 2018

We'll accomplish this using k-anonymity. From Wikipedia:

A release of data is said to have the k-anonymity property if the information for each person contained in the release cannot be distinguished from at least k - 1 individuals whose information also appear in the release.

We'd like the statistics tab to have this property to protect learners from being identified. Anonymity can give our learners confidence in playing our explorations, and would improve the overall privacy of Oppia.

As far as implementation goes, we should only show a particular answer in the stats tab when k unique learners have input that same answer. Additionally, we should only show such answers if there are at least j of them in total, so something like:

def anonymize_answers(answers):
    anon_answers = [a for a in answers if a.unique_learners >= K]
    return anon_answers if len(anon_answers) >= J else []

The value of k and j should be kept in the admin config panel, and should thus be customizable (and testable) through there.

@brianrodri brianrodri added this to Data Management in Lesson analytics [INACTIVE] Dec 19, 2018
@brianrodri brianrodri changed the title Add k-anonymity to Statistics Add k-anonymity to Learner Answers in Statistics Dec 19, 2018
@brianrodri brianrodri pinned this issue Dec 19, 2018
@brianrodri brianrodri unpinned this issue Dec 19, 2018
@brianrodri brianrodri changed the title Add k-anonymity to Learner Answers in Statistics Improve anonymity of Learner Answers in Statistics Dec 21, 2018
@brianrodri brianrodri changed the title Improve anonymity of Learner Answers in Statistics Give learner answers anonymity in statistics Dec 21, 2018
@brianrodri brianrodri changed the title Give learner answers anonymity in statistics Give learners anonymity when displaying statistics Dec 21, 2018
@ctao5660
Copy link
Contributor

Hey @brianrodri , I might have some free time to take this up. Would this be a change to stats_services or a different file?

@brianrodri
Copy link
Contributor Author

@ctao5660 that'd be great, thanks!

Yes, there are backend apis there where we should be able to filter out answers from.

@ctao5660 ctao5660 self-assigned this Dec 27, 2018
@ctao5660
Copy link
Contributor

@brianrodri I'm creating a ConfigProperty in StatsServices for the k and j values so it can be changed through the admin config panel, what do you think are good k and j values to start with?

@brianrodri
Copy link
Contributor Author

brianrodri commented Dec 27, 2018

Let's start with k = 5 and j = 3. Thanks!

@ctao5660
Copy link
Contributor

ctao5660 commented Jan 2, 2019

@brianrodri there already seems to be a constant regulating the k value called feconf.STATE_ANSWER_STATS_MIN_FREQUENCY should the k value override this? Or should we replace the constant in feconf with the correct k value?

@brianrodri
Copy link
Contributor Author

brianrodri commented Jan 2, 2019

@ctao5660 Hmm, that value isn't sufficient for k, because it's an aggregate of all learner answers (a single learner could put in k answers). We need to base k on the number of unique learners who have input an answer (maybe through unique exploration attempts? I'm not sure if we have a way to identify unique learners yet)

@ctao5660
Copy link
Contributor

ctao5660 commented Jan 6, 2019

@brianrodri Hello, I am going to be away for the next week, I can deassign myself from the issue if this is something that needs to be urgently worked on.

@brianrodri
Copy link
Contributor Author

No problem, it is high priority so I'll deassign just in case someone else wants to make an attempt while you're away. Feel free to reassign yourself once you're available again!

@brianrodri brianrodri pinned this issue Jan 23, 2019
@brianrodri brianrodri changed the title Give learners anonymity when displaying statistics Anonymize Learner Answers in the Statistics Tab Jan 23, 2019
@varun-tandon varun-tandon unpinned this issue Mar 3, 2019
@brianrodri brianrodri changed the title Anonymize Learner Answers in the Statistics Tab Make Answers Anonymous in the Statistics Tab Jun 9, 2019
@anamsarfraz
Copy link

I am interested in working on this issue for GHC OSD

@iamprayush
Copy link
Contributor

Hi @brianrodri! Is this free to work on?

@kevintab95 kevintab95 added the bug Label to indicate an issue is a regression label Sep 1, 2022
@seanlip seanlip removed this from Data management in Lesson analytics [INACTIVE] Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Label to indicate an issue is a regression full-stack important
Projects
None yet
Development

No branches or pull requests

6 participants