A watermark that preserve generated text quality and watermark effectiveness by watermarking small portions of tokens distributed across the generated text.
To create the environment for the experiment, run this command.
conda env create -f environment.yml
- Run the following command to perform the watermark generation
CUDA_VISIBLE_DEVICES=0 python pred.py \
--mode sparkp \
--gamma 0.05 \
--delta 10 \
--bl_type hard \
--dataset alpacafarm \
--model llama2-7b-chat-4k \
--pos_tag NN NPselect the model you want to evaluate via --model. And select the mode and hyper-parameters of the watermark via --mode, --bl_type, --gamma, --delta. The parameter mode means the kinds of watermarks we used in the experiments, including sparkp(SpARK-P) and sparkr(SpARK-R). The parameter bl_type means whether the type of the watermark is hard or soft. Also, you can select the dataset you want to evaluate via --dataset. Add --pos_tag to configure what POS tags can be used. Above command is an example of using SpARK-P for noun on llama2 to generate answers for alpacafarm dataset
- Then, run the detection code in
detect.pyto obtain z-scores:
CUDA_VISIBLE_DEVICES=0 python detect.py \
--input_dir ./pred/llama2-7b-chat-4k_sparkp_g0.05_d10.0_hard- After that, you can run the code in
eval.pyto obtain the evaluation results on all datasets inresult.json:
CUDA_VISIBLE_DEVICES=0 python eval.py \
--input_dir ./pred/llama2-7b-chat-4k_sparkp_g0.05_d10.0_hard- To get the detection results of the model with watermarks on standard answers, you can run
detect_human.py:
CUDA_VISIBLE_DEVICES=0 python detect_human.py \
--reference_dir llama2-7b-chat-4k_sparkp_g0.05_d10.0_hard \
--detect_dir human_generation \