<a href="https://colab.research.google.com/github/leaBroe/Deep_Learning_in_Python/blob/main/deep_learning_python_winter_school_24.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Project Overview:

**Abstract:**


### Multi-modal AI Application


### Fulfilling the Criteria


### Gradio App


### Model Card

For your sentiment analysis model, your model card should include:

- **Model Details**: This app incorporates three different models in order to perform sentiment analysis from text images. We first use the python-tesseract OCR tool from the `pytesseract` package to extract text from images. We then perform sentiment analysis and topic classification using `distilbert-base-uncased-finetuned-sst-2-english`, which is available on Hugging Face. Because this model only works with english text, we also use the `m2m100_418M` model from Meta (also available on HuggingFace) to translate the input text when it is not in English.
- **Data**: The distilbert SST2 model is based on the BERT transformer from Google, which was trained on large amounts of english text (from Wikipedia amongst others) in a self-supervised fashion. It was then fine-tuned on the Stanford Sentiment Treebank to enhance performance on sentiment analysis tasks.
- **Performance**: The fine-tuned distilbert model achieves a good score of 91.3 on the GLUE benchmark, but as we will discuss later, might struggle with some specific subtasks/topics. The overall pipeline's performance also depends on the language of the input text. The Many-to-Many translation model used here doesn't perform as well when translating Wolof than when translating German, for example.
- **Ethical Considerations**: Developers highlight the model producing biased predictions which affect underrepresented populations. As the SST 2 dataset was sourced from movie reviews on Rotten Tomatoes, many of the statements on which the fine-tuning is based on contain judgement on the way (American) movies in particular portray one topic or another. Our focus on analyzing sentiment around environmental issues makes it quite likely for geographical information to be of relevance, which may skew results significantly.

### Outlook

Given an extra month, you might consider:

- **Expanding Data Sources**: Including more diverse sources of images and text, such as news articles or blogs, to enrich the analysis.
- **Model Fine-Tuning**: Fine-tuning the sentiment analysis model on a dataset specifically related to environmental discourse to improve accuracy and relevance.
- **Feature Expansion**: Adding functionality to track sentiment trends over time, enabling longitudinal studies on public sentiment toward environmental issues.

In [1]:
!sudo apt install tesseract-ocr

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  tesseract-ocr-eng tesseract-ocr-osd
The following NEW packages will be installed:
  tesseract-ocr tesseract-ocr-eng tesseract-ocr-osd
0 upgraded, 3 newly installed, 0 to remove and 35 not upgraded.
Need to get 4,816 kB of archives.
After this operation, 15.6 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tesseract-ocr-eng all 1:4.00~git30-7274cfa-1.1 [1,591 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tesseract-ocr-osd all 1:4.00~git30-7274cfa-1.1 [2,990 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tesseract-ocr amd64 4.1.1-2.1build1 [236 kB]
Fetched 4,816 kB in 1s (9,244 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debc

In [2]:
!pip install pytesseract
#pytesseract.pytesseract.tesseract_cmd = r'/usr/local/bin/pytesseract'

Collecting pytesseract
  Downloading pytesseract-0.3.10-py3-none-any.whl (14 kB)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.10


In [3]:
# After installing Tesseract OCR, you need to inform pytesseract of the executable path.
# In Google Colab, it's usually not necessary to manually set the Tesseract path as the apt installation puts
# it in a standard location that pytesseract can automatically find. However, if you do run into issues where
# pytesseract can't find the Tesseract executable, you can set the path manually like this:

import pytesseract
pytesseract.pytesseract.tesseract_cmd = (r'/usr/bin/tesseract')


In [4]:
!pip install pytesseract transformers pillow



In [5]:
!pip install opencv-python-headless pillow



In [6]:
!pip install langdetect


Collecting langdetect
  Downloading langdetect-1.0.9.tar.gz (981 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: langdetect
  Building wheel for langdetect (setup.py) ... [?25l[?25hdone
  Created wheel for langdetect: filename=langdetect-1.0.9-py3-none-any.whl size=993225 sha256=0a6edf4e14b341ffc84539861772efe244436e04eb38e9844f12a0515e647e4f
  Stored in directory: /root/.cache/pip/wheels/95/03/7d/59ea870c70ce4e5a370638b5462a7711ab78fba2f655d05106
Successfully built langdetect
Installing collected packages: langdetect
Successfully installed langdetect-1.0.9


In [7]:
from google.colab import files
uploaded = files.upload()

# Assuming you now have the file, let's say "image.jpg", uploaded
image_path = next(iter(uploaded))  # This gets the name of the uploaded file


Saving deutsch.png to deutsch.png


In [8]:
import cv2
import numpy as np

def binarize_image(image_path):
    image = cv2.imread(image_path)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    _, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
    return binary



In [9]:
from PIL import Image
import pytesseract

# Assuming you have an image named 'example.jpg' in your Colab workspace
image = Image.open(image_path)
binarized_image = binarize_image(image_path)
extracted_text_binarized = pytesseract.image_to_string(binarized_image)
extracted_text = pytesseract.image_to_string(image)
print(extracted_text, extracted_text_binarized)


Dieser Text ist deutsch
 Dieser Text ist deutsch]



In [10]:
import pytesseract
from PIL import Image
from transformers import pipeline
from langdetect import detect

# Load the sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

# Function to extract text from image using pytesseract
def extract_text_from_image(image_path):
    image = Image.open(image_path)
    extracted_text = pytesseract.image_to_string(image)
    return extracted_text.strip()

# Function to detect the language of the text
def detect_language(text):
    try:
        return detect(text)
    except Exception as e:
        print(f"Error detecting language: {e}")
        return None

# Function to get sentiment from text
def get_sentiment(text):
    results = sentiment_pipeline(text)
    return results

# Example usage
#image_path = "path/to/your/image.jpg"  # Make sure to update this path

# Extract text from the image
extracted_text = extract_text_from_image(image_path)
print(f"Extracted Text: {extracted_text}")

# Detect the language of the extracted text
text_language = detect_language(extracted_text)

# Perform sentiment analysis if the text is in English
if text_language == 'en':
    sentiment_result = get_sentiment(extracted_text)
    print(f"Sentiment Analysis Result: {sentiment_result}")
else:
    print("Sentiment analysis not supported for this language (text has to be in English).")



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Extracted Text: Dieser Text ist deutsch
Sentiment analysis not supported for this language (text has to be in English).


In [11]:
import pytesseract
from PIL import Image
from transformers import pipeline
from langdetect import detect

# Load the sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

# Load the emotion detection pipeline
emotion_pipeline = pipeline("text-classification", model="bhadresh-savani/distilbert-base-uncased-emotion")

# Function to extract text from image using pytesseract
def extract_text_from_image(image_path):
    image = Image.open(image_path)
    extracted_text = pytesseract.image_to_string(image)
    return extracted_text.strip()


# Function to detect the language of the text
def detect_language(text):
    try:
        return detect(text)
    except Exception as e:
        print(f"Error detecting language: {e}")
        return None

# Function to get sentiment from text
def get_sentiment(text):
    results = sentiment_pipeline(text)
    return results

# Function to get emotion from text
def get_emotion(text):
    results = emotion_pipeline(text)
    return results


# Extract text from the image
extracted_text = extract_text_from_image(image_path)
print(f"Extracted Text: {extracted_text}")

# Detect the language of the extracted text
text_language = detect_language(extracted_text)

# Perform sentiment analysis if the text is in English
if text_language == 'en':
    sentiment_result = get_sentiment(extracted_text)
    print(f"Sentiment Analysis Result: {sentiment_result}")

    # Perform emotion detection on the text
    emotion_result = get_emotion(extracted_text)
    print(f"Emotion Analysis Result: {emotion_result}")
else:
    print("Sentiment and emotion analysis not supported for this language or text is not in English.")


config.json:   0%|          | 0.00/768 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


tokenizer_config.json:   0%|          | 0.00/291 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Extracted Text: Dieser Text ist deutsch
Sentiment and emotion analysis not supported for this language or text is not in English.


In [12]:
!pip install transformers --upgrade




In [13]:
from transformers import pipeline

# Initialize the translation pipeline
# This example uses a multi-language model. You can replace it with a specific model if preferred.

# old language model, not as good
translation_pipeline = pipeline("translation", model="facebook/m2m100_418M")

# potentially better model
#from transformers import NllbTokenizer
#translation_pipeline = pipeline("translation", model="facebook/nllb-200-distilled-600M")


config.json:   0%|          | 0.00/908 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.94G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/233 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/298 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/3.71M [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.14k [00:00<?, ?B/s]

In [14]:
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
#from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "facebook/m2m100_418M"

# Initialize tokenizer and model

#tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")

tokenizer = M2M100Tokenizer.from_pretrained(model_name)

model = M2M100ForConditionalGeneration.from_pretrained(model_name)
#model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")

# Bookmark

In [15]:
import pytesseract
from PIL import Image
from transformers import pipeline
from langdetect import detect

# Load the pipelines
sentiment_pipeline = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
emotion_pipeline = pipeline("text-classification", model="bhadresh-savani/distilbert-base-uncased-emotion")
translation_pipeline = pipeline("translation", model="facebook/m2m100_418M")
#translation_pipeline = pipeline("translation", model="Helsinki-NLP/opus-mt-multilingual-en")




# Function to extract text from image using pytesseract
def extract_text_from_image(image_path):
    image = Image.open(image_path)
    extracted_text = pytesseract.image_to_string(image)
    return extracted_text.strip()

def detect_language(extracted_text):
    detected_lang = detect(extracted_text)
    print(f"Detected language: {detected_lang}")  # For debugging
    return detected_lang

#def translate_text_to_english(extracted_text):
    # The facebook/m2m100_418M model automatically detects the source language
    #translated_text = translation_pipeline(extracted_text, forced_bos_token_id=translation_pipeline.tokenizer.get_lang_id("en"))[0]['translation_text']
    #return translated_text

def translate_text_to_english(text, detected_language):
    # The M2M100 model requires setting the forced_bos_token_id to specify the target language
    tokenizer.src_lang = detected_language  # Example: Assume the source language is French; this should be dynamically determined if possible
    encoded = tokenizer(text, return_tensors="pt")
    # Set the target language ID for English
    #generated_tokens = model.generate(**encoded, forced_bos_token_id=tokenizer.lang_code_to_id["deu_Latn"], max_length=30)
    #translated_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
    generated_tokens = model.generate(**encoded, forced_bos_token_id=tokenizer.get_lang_id("en"))
    translated_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
    return translated_text


# Function to get sentiment from text
def get_sentiment(extracted_text):
    results = sentiment_pipeline(translated_text)
    return results

# Function to get emotion from text
def get_emotion(extracted_text):
    results = emotion_pipeline(translated_text)
    return results


# Extract and translate text from the image
extracted_text = extract_text_from_image(image_path)
detected_language = detect_language(extracted_text)
translated_text = translate_text_to_english(extracted_text, detected_language)
print(f"Translated Text: {translated_text}")

# Perform sentiment and emotion analysis on the translated text
sentiment_result = get_sentiment(translated_text)
print(f"Sentiment Analysis Result: {sentiment_result}")

emotion_result = get_emotion(translated_text)
print(f"Emotion Analysis Result: {emotion_result}")


Detected language: de
Translated Text: This text is German.
Sentiment Analysis Result: [{'label': 'POSITIVE', 'score': 0.5755502581596375}]
Emotion Analysis Result: [{'label': 'joy', 'score': 0.4265340268611908}]


In [16]:
!pip install gradio

Collecting gradio
  Downloading gradio-4.20.1-py3-none-any.whl (17.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.0/17.0 MB[0m [31m46.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)
Collecting fastapi (from gradio)
  Downloading fastapi-0.110.0-py3-none-any.whl (92 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.1/92.1 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ffmpy (from gradio)
  Downloading ffmpy-0.3.2.tar.gz (5.5 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gradio-client==0.11.0 (from gradio)
  Downloading gradio_client-0.11.0-py3-none-any.whl (308 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m308.2/308.2 kB[0m [31m32.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx>=0.24.1 (from gradio)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━

In [17]:
from google.colab import files
uploaded = files.upload()

# Assuming you now have the file, let's say "image.jpg", uploaded
image_path = next(iter(uploaded))  # This gets the name of the uploaded file

Saving happy_note.png to happy_note.png


In [19]:

def process_image(image):
    extracted_text = extract_text_from_image(image)
    print(f"Extracted Text: {extracted_text}")  # Debug print

    detected_language = detect_language(extracted_text)

    translated_text = translate_text_to_english(extracted_text, detected_language)
    print(f"Translated Text: {translated_text}")  # Debug print

    sentiment_result = get_sentiment(translated_text)
    print(f"Sentiment: {sentiment_result}")  # Debug print

    emotion_result = get_emotion(translated_text)
    print(f"Emotion: {emotion_result}")  # Debug print

    return extracted_text, translated_text, sentiment_result, emotion_result

process_image(image_path)

Extracted Text: This is a happy note!
Detected language: en
Translated Text: This is a happy note!
Sentiment: [{'label': 'POSITIVE', 'score': 0.5755502581596375}]
Emotion: [{'label': 'joy', 'score': 0.4265340268611908}]


('This is a happy note!',
 'This is a happy note!',
 [{'label': 'POSITIVE', 'score': 0.5755502581596375}],
 [{'label': 'joy', 'score': 0.4265340268611908}])

In [20]:
import gradio as gr

# Your existing function for extracting text from an image file
def extract_text_from_image(image_path):
    from PIL import Image
    import pytesseract
    image = Image.open(image_path)
    extracted_text = pytesseract.image_to_string(image)
    return extracted_text.strip()

def translate_text_to_english(text):
    # The M2M100 model requires setting the forced_bos_token_id to specify the target language
    tokenizer.src_lang = "fr"  # Example: Assume the source language is French; this should be dynamically determined if possible
    encoded = tokenizer(text, return_tensors="pt")
    # Set the target language ID for English
    generated_tokens = model.generate(**encoded, forced_bos_token_id=tokenizer.get_lang_id("en"))
    translated_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
    return translated_text

# Function to get sentiment from text
def get_sentiment(text):
    results = sentiment_pipeline(text)
    return results

# Function to get emotion from text
def get_emotion(text):
    results = emotion_pipeline(text)
    return results

def process_image(image_path):
    extracted_text = extract_text_from_image(image_path)
    translated_text = translate_text_to_english(extracted_text)
    sentiment_result = get_sentiment(translated_text)
    emotion_result = get_emotion(translated_text)
    return extracted_text, translated_text, sentiment_result, emotion_result

iface = gr.Interface(fn=process_image,
                     inputs=gr.Image(label="Upload Image", type="filepath"),
                     outputs=[gr.Textbox(label="Extracted Text"),
                              gr.Textbox(label="Translated Text"),
                              gr.Textbox(label="Sentiment Analysis Result"),
                              gr.Textbox(label="Emotion Analysis Result")],
                     title="Image to Sentiment and Emotion Analysis",
                     description="Upload an image containing text, and the app will translate the text to English, then perform sentiment and emotion analysis.")

iface.launch(share=True)



Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://09981d63775d10a39b.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [21]:
!pip install spacy



In [22]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m29.1 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [23]:
import gradio as gr
from PIL import Image
import pytesseract
import spacy

# Load the spaCy English model
nlp = spacy.load("en_core_web_sm")

# Your existing function for extracting text from an image file
def extract_text_from_image(image_path):
    from PIL import Image
    import pytesseract
    image = Image.open(image_path)
    extracted_text = pytesseract.image_to_string(image)
    return extracted_text.strip()


def translate_text_to_english(text):
    # The M2M100 model requires setting the forced_bos_token_id to specify the target language
    tokenizer.src_lang = "fr"  # Example: Assume the source language is French; this should be dynamically determined if possible
    encoded = tokenizer(text, return_tensors="pt")
    # Set the target language ID for English
    generated_tokens = model.generate(**encoded, forced_bos_token_id=tokenizer.get_lang_id("en"))
    translated_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
    return translated_text

# Function to get sentiment from text
def get_sentiment(text):
    results = sentiment_pipeline(text)
    return results

# Function to get emotion from text
def get_emotion(text):
    results = emotion_pipeline(text)
    return results

# Function to extract named entities using spaCy
def extract_entities(text):
    doc = nlp(text)
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    return entities

# Main processing function to integrate OCR, translation, sentiment and emotion analysis, and NER
def process_image(image):
    extracted_text = extract_text_from_image(image)
    translated_text = translate_text_to_english(extracted_text)
    sentiment_result = get_sentiment(translated_text)
    emotion_result = get_emotion(translated_text)
    entities = extract_entities(translated_text)  # Use spaCy to extract entities
    entities_str = ', '.join([f"{text} ({label})" for text, label in entities])  # Format entities for display
    return extracted_text, translated_text, sentiment_result, emotion_result, entities_str

# Define Gradio interface
iface = gr.Interface(fn=process_image,
                     inputs=gr.Image(label="Upload Image", type="filepath"),
                     outputs=[gr.Textbox(label="Extracted Text"),
                              gr.Textbox(label="Translated Text"),
                              gr.Textbox(label="Sentiment Analysis Result"),
                              gr.Textbox(label="Emotion Analysis Result"),
                              gr.Textbox(label="Extracted Entities")],
                     title="Image to Sentiment and Emotion Analysis",
                     description="Upload an image containing text, and the app will translate the text to English, then perform sentiment and emotion analysis and extract named entities.")

# Launch the app
# debug=True
iface.launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://8861c0993fc4a268a9.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [24]:
import gradio as gr
from PIL import Image
import pytesseract
import spacy

# Load the spaCy English model
nlp = spacy.load("en_core_web_sm")

# Your existing function for extracting text from an image file
def extract_text_from_image(image_path):
    from PIL import Image
    import pytesseract
    image = Image.open(image_path)
    extracted_text = pytesseract.image_to_string(image)
    return extracted_text.strip()


from langdetect import detect
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

# Load your translation model and tokenizer
model_name = "facebook/m2m100_418M"
tokenizer = M2M100Tokenizer.from_pretrained(model_name)
model = M2M100ForConditionalGeneration.from_pretrained(model_name)

def translate_text_to_english(text):
    # Detect the language of the input text
    detected_lang = detect(text)
    print(f"Detected language: {detected_lang}")  # For debugging

    # Check if the detected language is English
    if detected_lang == 'en':
        return text  # Return the original text if it's already in English

    # Specify the source language for the tokenizer; m2m100 uses language codes
    tokenizer.src_lang = detected_lang

    # Encode the text for the model
    encoded = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)

    # Generate translation tokens and decode them to text
    # Note: forced_bos_token_id forces the model to translate to English
    generated_tokens = model.generate(**encoded, forced_bos_token_id=tokenizer.get_lang_id("en"))
    translated_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]

    return translated_text, detected_lang


# Function to get sentiment from text
def get_sentiment(text):
    results = sentiment_pipeline(text)
    return results

# Function to get emotion from text
def get_emotion(text):
    results = emotion_pipeline(text)
    return results

# Function to extract named entities using spaCy
def extract_entities(text):
    doc = nlp(text)
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    return entities

# Main processing function to integrate OCR, translation, sentiment and emotion analysis, and NER
def process_image(image):
    extracted_text = extract_text_from_image(image)
    translated_text, detected_lang = translate_text_to_english(extracted_text)
    sentiment_result = get_sentiment(translated_text)
    emotion_result = get_emotion(translated_text)
    entities = extract_entities(translated_text)  # Use spaCy to extract entities
    entities_str = ', '.join([f"{text} ({label})" for text, label in entities])  # Format entities for display
    return extracted_text, detected_lang, translated_text, sentiment_result, emotion_result, entities_str

# Define Gradio interface
iface = gr.Interface(fn=process_image,
                     inputs=gr.Image(label="Upload Image", type="filepath"),
                     outputs=[gr.Textbox(label="Extracted Text"),
                              gr.Textbox(label="Detected Language"),
                              gr.Textbox(label="Translated Text"),
                              gr.Textbox(label="Sentiment Analysis Result"),
                              gr.Textbox(label="Emotion Analysis Result"),
                              gr.Textbox(label="Extracted Entities")],
                     title="Image to Sentiment and Emotion Analysis",
                     description="Upload an image containing text, and the app will translate the text to English, then perform sentiment and emotion analysis and extract named entities.")

# Launch the app
# debug=True
iface.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://4d9421ce0d15c0482e.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


