Skip to content

Code repository for the IWSPA '24 paper: "1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy"

License

Notifications You must be signed in to change notification settings

sjmeis/Diffractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1-Diffractor

Code repository for 1-Diffractor, a highly efficient word-level Metric Differential Privacy mechanism.

Quick Start

1-Diffractor is made up of two parts: (1) lists, and (2) the Diffractor. In order to use the Diffractor mechanism, you must initiate a Lists object and pass this to a Diffractor object.

Creating Lists

In order to create a Lists object, use must initialize with the following parameters:

  • num_lists: number of lists per embedding model (default: 1)
  • model_names: which specific models to use (default: use all)
  • home: location of the embedding files (default: current directory)

To create the Lists, simply call L = Diffractor.Lists(**args). This process will take a short while, depending on the above parameters.

NOTE: for the default set of lists you must download the models from the following directory. Then you must specify the corresponding home argument.

Initalializing the Diffractor

With the Lists object ready, you can now set up a Diffractor. Some parameters here:

  • epsilon: the privacy parameter (default: 5)
  • method: the exact underlying method, either "TEM" or "geometric" (default: geometric)
  • rep_stop: whether to replace stop words or not (default: False == do not replace)

With these, simply call D = Diffractor.Diffractor(L=L, **args).

Text Privatization

1-Diffractor is optimized to run on multiple cores, and to process input texts in parallel. Therefore, the optimal usage is:

private_texts = D.rewrite(input_texts)

with input_texts as a list of inputs texts, i.e., a list of sentences / documents.

Note that the epsilon parameter is optional for rewrite. If no epsilon is specified, the default epsilon used in the instantiation of the Diffractor will be used. If you wish to provide the epsilon parameter to rewrite, this must be in the form of a list of lists of epsilon values, one for each input text to the function. Concretely, for each text in input_texts, there should be a corresponding list of epsilons matching the number of tokens (i.e., as determined by nltk.word_tokenize). Note that this feature is optional and was not used for the testing of 1-Diffractor!

About

Code repository for the IWSPA '24 paper: "1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages