We initiated a machine learning experiment aimed at generating card images and text inspired by the iconic card game created by Kazuki Takahashi, characterized by themes of grace, power, and peace.
Our project builds upon the diverse visual landscape offered by Yu-Gi-Oh! cards, featuring vibrant color schemes and distinct archetypes. In contrast to the uniform designs of Pokémon cards, this diversity provides an excellent dataset for refining image generation techniques.
We aim to push the boundaries of Generative Adversarial Network (GAN) technology, exploring its potential to produce intricate and varied visual elements. Through this exploration, we seek to contribute to advancements in both the AI sector and the gaming industry. While GANs have shown effectiveness in generating detailed images, there remains room for improvement, particularly in handling complex visual data like Yu-Gi-Oh! cards.
Our objective is to enhance text-to-image and image captioning models to generate images closely resembling those in our training dataset, with a focus on realism and naturalness.
1. Refining Yu-Gi-Oh! Card Image Generation via GAN Model Optimization
How can we adjust GAN model parameters to enhance image generation quality for Yu-Gi-Oh! cards by strategically manipulating the dataset?
2. Influence of Archetypes on Generation Quality
To what extent does integrating archetype control mechanisms into the training dataset improve the fidelity of generated images and accompanying textual content?
3. Precision Enhancement through Targeted Data Categorization
Can precise categorization of card attributes, particularly by archetype, in both image and textual datasets empower existing GAN and Natural Language Processing (NLP) models to produce more accurate and comprehensive outputs?
We initially sourced a total of 10,763 English Yu-Gi-Oh! cards using their official API, ensuring fair use. This dataset included original card images along with associated details such as card ID, name, description, and related information. Additionally, cropped images from within cards were obtained to train our GAN and stable diffusion models. Focusing on archetype testing, we finalized our dataset to include 4,252 training images and 1,064 testing images. While refraining from publicly sharing this data due to potential ownership concerns, we have made these images accessible on the Hugging Face dataset to facilitate research in image generation challenges using GANs, particularly within the context of Yu-Gi-Oh! cards.
Leveraging insights gained from exploratory data analysis (EDA) on both image and text data, as well as our understanding of GAN and stable diffusion models, we developed a Streamlit app to evaluate the performance of our fine-tuned models.
$ conda init
$ pip install -r requirements.txt
$ streamlit run app.py