Skip to content

Source code for the article, "Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models!"

License

Notifications You must be signed in to change notification settings

shashankkotyan/EvoSeed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

457f11b · Jun 20, 2024

History

21 Commits
Jun 20, 2024
Jun 20, 2024
Feb 7, 2024
Feb 7, 2024
Jun 20, 2024
Jun 20, 2024
May 31, 2024

Repository files navigation


EvoSeed EvoSeed


Publication

Source for the article: Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models!

(New!) Added Tutorial to generate adversarial images for ResNet-50 using Stable Diffusion.

Key Contributions:

  • We propose a black-box algorithmic framework based on an Evolutionary Strategy titled EvoSeed to generate natural adversarial samples in an unrestricted setting.
  • Our results show that adversarial samples created using EvoSeed are photo-realistic and do not change the human perception of the generated image; however, can be misclassified by various robust and non-robust classifiers.
Figure: Adversarial images created with EvoSeed are prime examples of how to deceive a range of classifiers tailored for various tasks. Note that, the generated natural adversarial images differ from non-adversarial ones, suggesting the adversarial images' unrestricted nature.

Tutorial:

Tutorial for creating adversarial images for ResNet-50 using Stable Diffusion can be found in the notebook

Citation:

If you find this project useful please cite:

@article{kotyan2024EvoSeed,
  title = {Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models!,
  author = {Kotyan, Shashank and Mao, Po-Yuan and Chen, Pin-Yu and Vargas, Danilo Vasconcellos},
  year = {2024},
  month = may,
  number = {arXiv:2402.04699},
  eprint = {2402.04699},
  publisher = {{arXiv}},
  doi = {10.48550/arXiv.2402.04699},
}

About

Source code for the article, "Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models!"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published