🔓 GPT-4o Jailbreak Vulnerability Demonstration This project is a demonstration for a hackathon challenge focused on identifying vulnerabilities and jailbreaks in OpenAI's GPT-4o system. I successfully performed a jailbreak attack that bypassed standard content moderation filters using creative prompt engineering.
📌 Objective To identify and document a method for bypassing GPT-4o's safeguards, allowing it to generate harmful or policy-violating content when prompted through indirect methods.
🧠 Methodology The attack was implemented using a prompt injection method
Steps: Role Injection:
Prompts are written in spaced-out characters, used personification, asked to follow a particular task to avoid detection.
Structured Prompts:
The format encourages GPT-4o to respond within “code” blocks, avoiding standard moderation.
Each answer is limited to short responses to mimic “coded” AI output.
📷 Evidence
E1: Adult Content
E2: Violence
E3: Scam
🚨 Ethical Disclaimer This project was conducted strictly for educational and security research purposes during a controlled hackathon environment. No real-world harm was intended or caused. The goal is to help improve AI safety systems.
🛠️ Tools Used
OpenAI GPT-4o (Operator access) Custom prompt design and injection techniques Screenshots & Excel for documentation
🙌 Credits Made with curiosity, ethical intent, and a goal to strengthen AI safety for everyone.



