This is finetuned on LLaMA2 7B Chat model. Get details at this Hugging face. when input prompt is in non-english language many Text-to-Image generation models fail to generate relevent images. To solve this we finetuned Llama2 model in Indian languages (telugu and hindi). Our target was to achieve 4 things.To handle
- Multilingual prompts
- Mispelled prompts
- Enhancing insufficient prompt
- To summarize and keyword extraction for lengthy prompts
MLTE - Multilingual Text Enhancer, a text enhancement model developed primarily to enrich input text for text-to-image generation models. Existing text encoders in image generation models often have limited reach. The quality of image generation depends on the prompt. If the prompt includes misspelled words, the encoder will create an irrelevant image. Most encoders are primarily English-based. MLTE effectively addresses multilingual prompts, misspelled words, overly verbose prompts, and creatively enhances the prompt to get improved results. MLTE utilizes advanced natural language processing techniques to connect raw text input with the generation of highly accurate images. MLTE is based on LLaMA2 it has ability to handle numerous languages enables for simple incorporation of content from diverse linguistic origins. Additionally, its spell checking and correction functions ensure the quality and coherence of the prompt. Moreover, MLTE's scene and text augmentation features strengthen the visual richness and coherence of generated photos, enhancing their overall quality and realism. Its summarizing capability condenses large paragraphs into concise yet helpful summaries, assisting the image creation process by delivering more focused input. MLTE can be used with any text to image generating models.
- Hugging Face Repository:Hugging face
- Paper : under review
- Demo :
Do not give any descriptions like "Act as this .." Just give prompt.
As of 31/3/2024 it is giving better results for Hindi language and for Telugu it is under performing. We are trying to improve it.