This project explores an unusual idea: transforming code features into art. It’s an ongoing experiment in using neural networks to visualize the structure and “feel” of code as images — not for aesthetics alone, but to find patterns and relationships hidden inside code representations.
I’m building and refining this project gradually, learning as I go. It’s not perfect, and that’s exactly why it’s here — to grow.
The model takes a numerical feature representation of code and generates images that reflect underlying structure and semantics. The generator is trained to preserve variance, similarity, and diversity across features, and then evaluated using several clustering and correlation metrics.
The included evaluator (CodeImageEvaluator) measures:
- Clustering Quality: how well the generated images group by code type
- Similarity Preservation: how code similarity translates into image similarity
- Diversity: visual and statistical diversity of outputs
- t-SNE Visualization: 2D projection of generated images for visual inspection
You can try the model interactively on Hugging Face Spaces.
A simple demo lets you:
- Input your own code
- Watch it generate an image instantly
- Explore how feature changes affect the final output
Live Demo: https://huggingface.co/spaces/munjed/code2art
Silhouette Score: 0.2800
Calinski-Harabasz: 1526.63
Davies-Bouldin: 1.52
Pearson Correlation: 0.92
Spearman Correlation: 0.80
Within-image Variance: 0.40
Between-image Diversity: 361.50
The model currently shows strong similarity preservation, meaning it captures relationships between code samples well, but clustering and visual clarity still need work.
- Generated images lack strong structural coherence — they sometimes look abstract or noisy.
- Loss stability issues — balancing variance and reconstruction terms is tricky.
- Feature variance preservation can explode or vanish depending on learning rate and scaling.
- Higher resolutions (e.g. 256x256) increase complexity fast, sometimes degrading output quality.
- Code Extraction need more improvement to get the features dataset we need for different languages.
- Stabilize training using better loss normalization and dynamic weighting.
- Experiment with pretrained visual priors or diffusion-based conditioning.
- Possibly integrate contrastive learning or VAE-like embeddings.
MIT — feel free to use, modify, and build upon it.

