Making Translators Privacy-aware on the User’s Side (TMLR 2024)

We propose PRISM to enable users of machine translation systems to preserve the privacy of data on their own initiative.

Paper: https://arxiv.org/abs/2312.04068

✨ Summary

▲ Overview of PRISM: PRISM converts the input sentence into a privacy-less sentence and sends it to the machine translation system. PRISM then converts the translated sentence back into the original sentence.

💿 Preparation

Install Poetry and run the following command:

$ poetry install
$ poetry run bash prepare.sh

Set an OpenAI API key in .env.

🧪 Evaluation

$ poetry run python eval.py --method prismstar --translator chatgpt
$ poetry run python eval.py --method prismr --translator chatgpt
$ poetry run python eval.py --method nodecode --translator chatgpt
$ poetry run python eval.py --method pup --translator chatgpt

Please refer to the help command for further options.

$ poetry run python eval.py -h
usage: eval.py [-h] [--lang LANG] [--basedir BASEDIR] [--rates RATES] [--method {pup,prismr,prismstar,nodecode}] [--translator {chatgpt,t5,t5-gpu}]

optional arguments:
  -h, --help            show this help message and exit
  --lang LANG
  --basedir BASEDIR
  --rates RATES
  --method {pup,prismr,prismstar,nodecode}
  --translator {chatgpt,t5,t5-gpu}

Results

▲ Results. PRISM* strikes an excellent balance between privacy and translation quality.

Please refer to the paper for more details.

⛏️ How to Build a Dictionary by Yourself

Run the following command to extract candidate words from the corpus. It uses load_mctest() for the corpus. You can replace it with your own corpus. In general, it is recommended to use the same or similar corpus as the one used in the evaluation.

$ poetry run python extract_all_words.py

Then, run the following command to build a dictionary. It build a dictiory based on wmt14 dataset (i.e., a public news corpus).

$ poetry run python build_dict.py 1 -1 --target French
$ poetry run merge_cand_words.py cand_words_French_1000

Bulding the entire dictionary may take a long time. You can build each part separately (in separate machines) and merge them.

$ poetry run python build_dict.py 1 100 --target French
$ poetry run python build_dict.py 100 200 --target French
$ poetry run python build_dict.py 200 300 --target French
...
$ poetry run merge_cand_words.py cand_words_French_1000

🖋️ Citation

@article{sato2024making,
  author    = {Ryoma Sato},
  title     = {Making Translators Privacy-aware on the User’s Side},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
imgs		imgs
.env		.env
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
build_dict.py		build_dict.py
chatgpt_translator.py		chatgpt_translator.py
chatgpt_util.py		chatgpt_util.py
download_nltk.py		download_nltk.py
eval.py		eval.py
extract_all_words.py		extract_all_words.py
mctestutil.py		mctestutil.py
merge_cand_words.py		merge_cand_words.py
myutil.py		myutil.py
poetry.lock		poetry.lock
prepare.sh		prepare.sh
prism.py		prism.py
pup.py		pup.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
t5translator.py		t5translator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Making Translators Privacy-aware on the User’s Side (TMLR 2024)

✨ Summary

💿 Preparation

🧪 Evaluation

Results

⛏️ How to Build a Dictionary by Yourself

🖋️ Citation

About

Releases

Packages

Languages

License

joisino/prism

Folders and files

Latest commit

History

Repository files navigation

Making Translators Privacy-aware on the User’s Side (TMLR 2024)

✨ Summary

💿 Preparation

🧪 Evaluation

Results

⛏️ How to Build a Dictionary by Yourself

🖋️ Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages