<a href="https://colab.research.google.com/github/leaBroe/Deep_Learning_in_Python/blob/main/deep_learning_python_winter_school_24.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Project Overview: EcoTrendAnalyzer

**Abstract:**
EcoTrendAnalyzer is an AI-powered application designed to track and analyze environmental sentiment across social media platforms. By combining OCR technology with advanced sentiment analysis, the app evaluates public opinion on various environmental issues, providing valuable insights into public awareness and sentiment trends over time. This application matters because it offers real-time analysis of public sentiment towards environmental issues, helping policymakers, researchers, and activists gauge the effectiveness of environmental campaigns and public awareness.

### Multi-modal AI Application

Your application will use a **multi-modal AI approach**, combining vision (OCR) and text (sentiment analysis) models. This fulfills the requirement for covering a non-tabular modality, as you'll be extracting text from images (a non-tabular data source) and then performing text analysis.

### Fulfilling the Criteria

1. **Piping 2 or More Off-the-Shelf AIs Together**: Your project will use an OCR model to extract text from images and a pre-trained sentiment analysis model (like `distilbert-base-uncased-finetuned-sst-2-english`) to analyze the sentiment of the extracted text. This perfectly fits the criteria and showcases your ability to integrate multiple AI technologies to solve complex problems.

### Gradio App

You'll develop a **Gradio app** that allows users to upload images (such as screenshots of social media posts) and then displays the extracted text and its sentiment analysis. This interactive component not only makes your project user-friendly but also directly demonstrates the capabilities of your AI solution.

### Model Card

For your sentiment analysis model, your model card should include:

- **Model Details**: Mention the model used (`distilbert-base-uncased-finetuned-sst-2-english`), its source (Hugging Face), and its primary function (sentiment analysis).
- **Data**: Describe the type of data the model was trained on, emphasizing its use for sentiment analysis of English text.
- **Performance**: Highlight the model's accuracy, strengths, and limitations, particularly in the context of analyzing environmental sentiment.
- **Ethical Considerations**: Discuss any biases the model may have, its intended use cases, and any misuse potential.

### Outlook

Given an extra month, you might consider:

- **Expanding Data Sources**: Including more diverse sources of images and text, such as news articles or blogs, to enrich the analysis.
- **Model Fine-Tuning**: Fine-tuning the sentiment analysis model on a dataset specifically related to environmental discourse to improve accuracy and relevance.
- **Feature Expansion**: Adding functionality to track sentiment trends over time, enabling longitudinal studies on public sentiment toward environmental issues.

In [5]:
!sudo apt install tesseract-ocr

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  tesseract-ocr-eng tesseract-ocr-osd
The following NEW packages will be installed:
  tesseract-ocr tesseract-ocr-eng tesseract-ocr-osd
0 upgraded, 3 newly installed, 0 to remove and 35 not upgraded.
Need to get 4,816 kB of archives.
After this operation, 15.6 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tesseract-ocr-eng all 1:4.00~git30-7274cfa-1.1 [1,591 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tesseract-ocr-osd all 1:4.00~git30-7274cfa-1.1 [2,990 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tesseract-ocr amd64 4.1.1-2.1build1 [236 kB]
Fetched 4,816 kB in 0s (12.5 MB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debco

In [6]:
!pip install pytesseract
#pytesseract.pytesseract.tesseract_cmd = r'/usr/local/bin/pytesseract'

Collecting pytesseract
  Downloading pytesseract-0.3.10-py3-none-any.whl (14 kB)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.10


In [7]:
# After installing Tesseract OCR, you need to inform pytesseract of the executable path.
# In Google Colab, it's usually not necessary to manually set the Tesseract path as the apt installation puts
# it in a standard location that pytesseract can automatically find. However, if you do run into issues where
# pytesseract can't find the Tesseract executable, you can set the path manually like this:

import pytesseract
pytesseract.pytesseract.tesseract_cmd = (r'/usr/bin/tesseract')


In [8]:
!pip install pytesseract transformers pillow



In [9]:
!pip install langdetect




In [23]:
from google.colab import files
uploaded = files.upload()

# Assuming you now have the file, let's say "image.jpg", uploaded
image_path = next(iter(uploaded))  # This gets the name of the uploaded file


Saving deutsch.png to deutsch (2).png


In [21]:
from PIL import Image
import pytesseract

# Assuming you have an image named 'example.jpg' in your Colab workspace
#image_path = 'example.jpg'  # Make sure the image path is correct
image = Image.open(image_path)
extracted_text = pytesseract.image_to_string(image)
print(extracted_text)


Dieser Text ist deutsch



In [24]:
import pytesseract
from PIL import Image
from transformers import pipeline
from langdetect import detect

# Load the sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

# Function to extract text from image using pytesseract
def extract_text_from_image(image_path):
    image = Image.open(image_path)
    extracted_text = pytesseract.image_to_string(image)
    return extracted_text.strip()

# Function to detect the language of the text
def detect_language(text):
    try:
        return detect(text)
    except Exception as e:
        print(f"Error detecting language: {e}")
        return None

# Function to get sentiment from text
def get_sentiment(text):
    results = sentiment_pipeline(text)
    return results

# Example usage
#image_path = "path/to/your/image.jpg"  # Make sure to update this path

# Extract text from the image
extracted_text = extract_text_from_image(image_path)
print(f"Extracted Text: {extracted_text}")

# Detect the language of the extracted text
text_language = detect_language(extracted_text)

# Perform sentiment analysis if the text is in English
if text_language == 'en':
    sentiment_result = get_sentiment(extracted_text)
    print(f"Sentiment Analysis Result: {sentiment_result}")
else:
    print("Sentiment analysis not supported for this language or text is not in English.")



Extracted Text: Dieser Text ist deutsch
Sentiment analysis not supported for this language or text is not in English.
