A time stamp about the quality and performance of AI-driven code in game development. It was 2025 when we started experimenting and embracing multiple AI-driven tools in my formal work for multiple game-related projects.
On-device LLMs, privacy, speech to speech, restricted content... there were some of the requirements and assumptions that were on the table for an educational mobile app.
A few of the models we tested
- Llama 3.2
- gemma2.2
- whisper
- gpt-40
- Maxrosoft Azure AI Speech
- Unity Sentis
- Vost
2026 I started experimented with Claude 3.5 Sonnet (free version) with real challenges: building complete and classic game clones. Upfront I want to be clear, we have a complete functional game prototype, still far from a complete and enjoyable product.
The types of classic game clones ranging from Pong, Ms. PacMan, Mario Bros and increasing difficulty.
Check out Ms. Claudia (a MS Pac-Man clone) code here or play the game v1.0 release.
100% sure you have read what is the worst part of using AI-driven code: polishing.
Almost any social media has this problem mentioned by different professionals and veterans in the industry.
I spent considerable amount of hours trying to improve the in-game AI, no success. The model:
- Kept repeating same mistakes over and over
- Reverting existant code with new solutions
- Redoing previous work
- Breaking unrelated parts of the game
- Not understanding instructions
We want to continue with the more complex cloned games, including OS potability, a premium LLM susbciption (if that makes a difference at all).