### **Packages:**

In [1]:
!pip install transformers
!pip install gradio
!pip install torch
!pip install datasets
!pip install diffusers
!pip install diffusers transformers
!pip install uvicorn

Collecting gradio
  Downloading gradio-4.44.0-py3-none-any.whl.metadata (15 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0 (from gradio)
  Downloading fastapi-0.115.0-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.4.0-py3-none-any.whl.metadata (2.9 kB)
Collecting gradio-client==1.3.0 (from gradio)
  Downloading gradio_client-1.3.0-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx>=0.24.1 (from gradio)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting orjson~=3.0 (from gradio)
  Downloading orjson-3.10.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.9 (from g

In [2]:
from transformers import pipeline
from transformers.utils import logging
from datasets import load_dataset
import gradio as gr
import torch
import uvicorn
from diffusers import DiffusionPipeline

### Suppressing warning messages and check the availabilty of GPU

In [None]:
# logging.set_verbosity_error()
# device = 0 if torch.cuda.is_available() else -1

### **Arabic: Text-Generation:**
GPT2-Arabic-Poetry-2023 model to generate poetry in the Arabic language.

In [3]:
pipe_ar = pipeline('text-generation', framework='pt', model='akhooli/ap2023', tokenizer='akhooli/ap2023')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.07k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.50G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/303 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.50M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/4.52M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


### **English: Text-Generation:**
GPT2-Poet model to generate poetry in the English language.

In [4]:
pipe_en = pipeline("text-generation", model="ashiqabdulkhader/GPT2-Poet")

config.json:   0%|          | 0.00/723 [00:00<?, ?B/s]

tf_model.h5:   0%|          | 0.00/498M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at ashiqabdulkhader/GPT2-Poet.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


tokenizer_config.json:   0%|          | 0.00/748 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/999k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/438 [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


### **Arabic and English: Text-To-Speech:**
*   Massively Multilingual Speech: The mms-tts-ara model is used to convert Arabic poetry into speech.
*   SpeechT5 (TTS task): The SpeechT5 model is used to convert English poetry into speech.




In [5]:
# Initialize text-to-speech models for Arabic and English
# Arabic: text-to-speech
synthesiser_arabic = pipeline("text-to-speech", model="facebook/mms-tts-ara")

# English: text-to-speech
synthesiser_english = pipeline("text-to-speech", model="microsoft/speecht5_tts")
embeddings_dataset_english = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embedding_english = torch.tensor(embeddings_dataset_english[7306]["xvector"]).unsqueeze(0)

config.json:   0%|          | 0.00/1.64k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/145M [00:00<?, ?B/s]

Some weights of the model checkpoint at facebook/mms-tts-ara were not used when initializing VitsModel: ['flow.flows.0.wavenet.in_layers.0.weight_g', 'flow.flows.0.wavenet.in_layers.0.weight_v', 'flow.flows.0.wavenet.in_layers.1.weight_g', 'flow.flows.0.wavenet.in_layers.1.weight_v', 'flow.flows.0.wavenet.in_layers.2.weight_g', 'flow.flows.0.wavenet.in_layers.2.weight_v', 'flow.flows.0.wavenet.in_layers.3.weight_g', 'flow.flows.0.wavenet.in_layers.3.weight_v', 'flow.flows.0.wavenet.res_skip_layers.0.weight_g', 'flow.flows.0.wavenet.res_skip_layers.0.weight_v', 'flow.flows.0.wavenet.res_skip_layers.1.weight_g', 'flow.flows.0.wavenet.res_skip_layers.1.weight_v', 'flow.flows.0.wavenet.res_skip_layers.2.weight_g', 'flow.flows.0.wavenet.res_skip_layers.2.weight_v', 'flow.flows.0.wavenet.res_skip_layers.3.weight_g', 'flow.flows.0.wavenet.res_skip_layers.3.weight_v', 'flow.flows.1.wavenet.in_layers.0.weight_g', 'flow.flows.1.wavenet.in_layers.0.weight_v', 'flow.flows.1.wavenet.in_layers.1.wei

tokenizer_config.json:   0%|          | 0.00/288 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


config.json:   0%|          | 0.00/2.06k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/585M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/232 [00:00<?, ?B/s]

spm_char.model:   0%|          | 0.00/238k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/40.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/433 [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


config.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/50.7M [00:00<?, ?B/s]

0000.parquet:   0%|          | 0.00/21.3M [00:00<?, ?B/s]

Generating validation split:   0%|          | 0/7931 [00:00<?, ? examples/s]

### **English Text-To-Image:**
runwayml/stable-diffusion-v1-5 model is used to convert a poem into an image.

In [6]:
pipe_image = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

safety_checker/config.json:   0%|          | 0.00/4.72k [00:00<?, ?B/s]

tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/492M [00:00<?, ?B/s]

tokenizer/special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

scheduler/scheduler_config.json:   0%|          | 0.00/308 [00:00<?, ?B/s]

(…)ature_extractor/preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

text_encoder/config.json:   0%|          | 0.00/617 [00:00<?, ?B/s]

tokenizer/tokenizer_config.json:   0%|          | 0.00/806 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

vae/config.json:   0%|          | 0.00/547 [00:00<?, ?B/s]

unet/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

### **Translator from Arabic to English:**
Since the text-to-image model doesn't support Arabic, we need to translate the Arabic poem into English using the opus-mt-ar-en model in order to generate the image.

In [7]:
pipe_translator = pipeline("translation", model="Helsinki-NLP/opus-mt-ar-en")

config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/917k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.13M [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


### **Primary Function:**
This function will receive 2 inputs from the Gradio interface and execute the following tasks, returning 3 outputs:

1. The generated poem.
2. The audio.
3. The image.

In [8]:
# Generate poem based on language and convert it to audio and image
def generate_poem(selected_language, text):
    if selected_language == "English":
        poem = generate_poem_english(text) #retrun the generated poem from the generate_poem_english function
        sampling_rate, audio_data = text_to_speech_english(poem) #return the audio from the text_to_speech_english function
        image = generate_image_from_poem(text) #return the image from the generate_image_from_poem function
    elif selected_language == "Arabic":
        poem = generate_poem_arabic(text) #retrun the generated poem from the generate_poem_arabic function
        sampling_rate, audio_data = text_to_speech_arabic(poem) #return the audio from the text_to_speech_arabic function
        translated_text = translate_arabic_to_english(text) #return the translated poem from arabic to englsih, using translate_arabic_to_english function
        image = generate_image_from_poem(translated_text) #return the image from the generate_image_from_poem function

    return poem, (sampling_rate, audio_data), image

### **Poem Generation Function:**
This function is responsible for generating a poem (text) in either Arabic or English, based on the provided input.

In [9]:
# Poem generation for Arabic
def generate_poem_arabic(text):
    generated_text = pipe_ar(text, do_sample=True, max_length=96, top_k=50, top_p=1.0, temperature=1.0, num_return_sequences=1,
                              no_repeat_ngram_size = 3, return_full_text=True)[0]["generated_text"]
    clean_text = generated_text.replace("-", "") #To get rid of the dashs generated by the model.
    return clean_text

# Poem generation for English
def generate_poem_english(text):
    generated_text = pipe_en(text, do_sample=True, max_length=100, top_k=0, top_p=0.9, temperature=1.0, num_return_sequences=3)[0]['generated_text']
    clean_text = generated_text.replace("</s>", "") #To get rid of the </s> generated by the model.
    return clean_text

### **ِAduio Function:**
This function is responsible for generating audio in either Arabic or English, based on the poem.

In [10]:
# Text-to-speech conversion for Arabic
def text_to_speech_arabic(text):
    speech = synthesiser_arabic(text)
    audio_data = speech["audio"][0]  # Flatten to 1D
    sampling_rate = speech["sampling_rate"]
    return (sampling_rate, audio_data)

# Text-to-speech conversion for English
def text_to_speech_english(text):
    speech = synthesiser_english(text, forward_params={"speaker_embeddings": speaker_embedding_english})
    audio_data = speech["audio"]
    sampling_rate = speech["sampling_rate"]
    return (sampling_rate, audio_data)

### **Image Function:**
This function is responsible for generating an image based on the poem.


In [11]:
#Image Function
def generate_image_from_poem(poem_text):
    image = pipe_image(poem_text).images[0]
    return image

### **Translation Function:**
This function is responsible for translating the Arabic poem into English, to be used by the image function, which only accepts English inputs.

In [12]:
#Translation Function from Arabic to English
def translate_arabic_to_english(text):
    translated_text = pipe_translator(text)[0]['translation_text']
    return translated_text

### **CSS Styling:**

In [13]:
custom_css = """
body {
    background-color: #f4f4f9;
    color: #333;
}
.gradio-container {
    border-radius: 10px;
    box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
    background-color: #fff;
}
label {
    color: #4A90E2;
    font-weight: bold;
}

input[type="text"],
textarea {
    border: 1px solid #4A90E2;
}
textarea {
    height: 150px;
}

button {
    background-color: #4A90E2;
    color: #fff;
    border-radius: 5px;
    cursor: pointer;
}
button:hover {
    background-color: #357ABD;
}

.dropdown {
    border: 1px solid #4A90E2;
    border-radius: 4px;
}

"""

### **Examples for Gradio:**
Provide 4 predefined inputs to demonstrate how the interface works:


In [14]:
examples = [
    #First parameter is for the dropdown menu, and the second parameter is for the starter of the poem
    ["English", "The shining sun rises over the calm ocean"],
    ["Arabic", "الورود تتفتح في الربيع"],
    ["English", "The night sky is filled with stars and dreams"],
    ["Arabic", "اشعة الشمس المشرقة"]
]

### **Gradio Interface:**
Creating a Gradio interface to generate a poem, read the poem, and generate an image based on that poem.

In [15]:
my_model = gr.Interface(
    fn=generate_poem,  #The primary function that will recives the inputs (language and the starter of the poem)
    inputs=[
        gr.Dropdown(["English", "Arabic"], label="Select Language"), #Dropdown menu to select the language, either "English" or "Arabic" for the poem
        gr.Textbox(label="Enter a sentence")], #Textbox where the user will input a sentence or phrase to generate the poem (starter of the peom)

    outputs=[
        gr.Textbox(label="Generated Poem", lines=10), # Textbox to display the generated poem
        gr.Audio(label="Generated Audio", type="numpy"), #Audio output for the generated poem
        gr.Image(label="Generated Image")], #Display an image generated from the starter of the peom

    examples=examples,  #Predefined examples to guide the user how to use the interface
    css=custom_css  #Applying CSS Custeom
)
my_model.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://f185e20a632c76d51c.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


