

---

# **Project Presentation**: _[Smart Summarizer]_

---

## **Team Members**

- **Mähönen Janne**: Worked on text-to-speech (TTS), translation, and summary types integration.
- **Ocampo Heidi**: Worked on different file formats and and text-to-speech. Testing translations and general overview. Only TTS UI got in to the final project.
- **Sillanaukee Joonas**: Worked on file reading and summary generation.
- **Silvola Izabel**: Summary generation with chunking. Application testing and debugging. Clear button adding. MeloTTS using for text to speech (not in the final version)
- **Vihanto Jami**: Worked on url extraction and deployment.

---

## **Introduction**

- **Objective**: Provide users with faster analysis of large text sets by generating summaries for articles, documents, and educational materials.
- **Key Tools**:  
   - **BART model** for summarization.
   - **deep_translator** for translation between languages.
   - **gTTS** for text-to-speech (TTS).
   - **nltk (Natural Language Toolkit)** for extracting key terms and definitions.

---

## **Work process / Way of Work**
- **Objective**: Create suitable specification and way of work.
- **Meetings:**
   - Every 2-3 days, at first simultaniously with the other project work, but was quickly separated so we can focus on one theme
- **Development environment:**
   - Github, IDE by own preference
- **Phases**:
   - **Gathering requirements**
      - Project instructions had must-haves and nice-to-haves. From that we had ideation session about what kind of things would be nice to create and with what model that could be done.
      -  Selecting the idea from the idea pool was fairly easy. Overall in pool of ideas there were only 2 ideas that were interesting enough to think to raise to the discussion on what should be done.
   - **Creating development plan**
      - The preplan was created according to project instructions with the template provided.
      - During the actual development, the original plan was changing constantly.
   - **Creating Specification**
      - After permission to go ahead in planning, we started to create specification. The defined requirements were the basis of the specification. During development we also changed this fromt he original as we noticed the original model chosen was not as good as the end product model was.
   - **UI development and testing**
      - This was created in 2 phases. First by all of us individually and then presenting our solutions for others and then choosing the nicest one for the unified basis of the UI.
   - **Testing different models and develoment**
      - Development was divided to people on what to focus on and testing them. Also on this phase was decided to use create a common development repository using github. During this time people were developing the specification in mind and the idea was that we have the end goal in sight and everyone can test out different solutions. During this phase was noticed that our chosen model Llama was not as good as Bart in the actual summarization work and it was collectively decided to change it.
   - **Packaging and project delivery**
      - Final version package included creating documentation and also requirements files. After final version creation a video introduction was created by the group.

---


## **Design**

- **Text Input**: Users can input text directly, upload files (PDF, DOCX), or provide URLs for summarization.
- **Summary Types**: Multiple summarization formats are available:
   - Summary
   - Main Points
   - Concepts List (with definitions for key terms)

- **Language Detection**: The app uses `langdetect` to identify the language and if chosen it can provide text-to-speech in the correct tone.

---

## **Challenges**

### **Development Process**

- #### **Dependency Conflicts and Tool Integration**:
We faced library conflicts using Google Translate with Gradio's dependencies, and resolved them by switching to Deep Translator. Integrating multiple tools like Gradio, NLP models, and TTS systems while avoiding compatibility issues was crucial.


- #### **Concept List Extraction**:
Generating concept lists by extracting key nouns and adjectives was complicated. We relied on NLTK’s WordNet for definitions, which only works reliably with English input, requiring precise language detection to address its short comings.

- #### **Maintaining Workflow Between Summarization and Translation**:
Ensuring smooth transitions between summarization, translation, and TTS required careful data handling, especially when toggling between original and translated content to maintain output quality.

### **Group Work**

- #### **Remote communication and lack of face-to-face interaction:**
One of the challenges we faced was working remotely, which made communication through Teams more difficult at times. Without face-to-face interaction, it was harder to have quick, spontaneous conversations to solve problems or brainstorm ideas.
Scheduling meetings was also a challenge, as part of the group have other responsibilities to manage, making it tough to find time that worked for the whole team. Plus, without being able to pick up on non-verbal cues, it was sometimes tricky to fully understand tone or intent during discussions.

- #### **Varied programming skill levels and educational backgrounds:**
Another challenge we encountered was the difference in programming skills and educational backgrounds within the team. Some members had more experience with certain tools and technologies, while others were still learning or came from different academic programs with varying focuses. This created a bit of a learning curve for some team members and sometimes slowed down progress.

- #### **Integrating features developed separately:**
We developed each new feature individually and then tried to combine them later, which turned out to be a challenge. While programming features one at a time helped us focus on each aspect, integrating them into a single system was more complex than expected. There were unexpected compatibility issues between the different components, and combining everything required more troubleshooting and coordination than we initially anticipated.

- #### **Process to push directly to main without reviews and dev-branches:**
One of the most challenging thing was to create or make code improvements to something only to notice, that someone else had already done it. Usually this was noticed while doing the final base check up. And that often lead to either huge amount of errors and solving those or to notice this other person had done better job than you did.

- #### **Limited Time:**
We ran into the challenge of not having enough time to get all the features we wanted into the project. While we managed to develop and combine the key parts, we had to prioritize and leave out some features due to tight deadlines. Unfortunately, a few things we originally planned didn’t make it into the final version, and we had to scale back a bit to finish on time.

---

## **Code Logic**

Here’s a clean explanation of the **Code Logic** for summarization, translation, text-to-speech (TTS), and the concept list feature, with code snippets:

### 1. **Dynamic input**:
The *dynamic_input* function controls the visibility of input components in a Gradio interface based on the user's selection of the input source type. This allows the application to adapt its input fields dynamically, ensuring that users only see relevant input options.
**Snippet**:
```python
def dynamic_input(source):
    if source == "Text Input":
        return gr.update(visible=True), gr.update(visible=False), gr.update(visible=False)
    elif source == "Web Page URL":
        return gr.update(visible=False), gr.update(visible=False), gr.update(visible=True)
    elif source == "File Upload":
        return gr.update(visible=False), gr.update(visible=True), gr.update(visible=False)
    return gr.update(visible=False), gr.update(visible=False), gr.update(visible=False)
```



The *generate_summary* function checks the source of the input (text, URL, or file).
For file uploads, it determines the file type and calls read_pdf or read_docx accordingly.
If content is available, it summarizes it based on the specified format_type using various summarization functions.
Optionally translates the summary to a specified language.
Returns the translated summary along with the original content or an error message if something goes wrong.
**Snippet**:
```python
# Updated generate_summary function to handle file reading
def generate_summary(input_text, url, file, format_type, source, target_lang):
    if source == "Text Input":
        content = input_text
    elif source == "Web Page URL":
        content = url_extract.main()
    elif source == "File Upload" and file is not None:
        # Check file extension to determine the file type
        if file.name.endswith('.pdf'):
            content = read_pdf(file)
        elif file.name.endswith('.docx'):
            content = read_docx(file)
        else:
            return "Unsupported file type.", content
    else:
        content = ""

    if not content:
        return "No content to summarize.", content

```

### 2. **Summarization Logic**:
The app uses the **BART model** from Hugging Face to summarize input text. The model divides the text into chunks (1024 tokens ± 10 tokens overlap), because of the input sequence length limitation. Then generates summary for each chunk and concatenates them together to the final result.
The minimum and maximum lengths are adjusted for the best result. The numbers were obtained as a result of testing.

**Snippet**:
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load BART model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")

# Function to divide the input text into chunks (if needed) and generate the summary
def summarize_bart(input_text, max_length, min_length):
    chunk_size = 1024
    overlap = 10
    # Tokenize the input text
    tokens = tokenizer.encode(input_text, return_tensors='pt', truncation=False)
    
    # Split the tokens into chunks with overlap
    chunks = []
    stride = chunk_size - overlap
    for i in range(0, tokens.shape[1], stride):
        chunk = tokens[:, i:i + chunk_size]
        chunks.append(chunk)
    
    # Generate summaries for each chunk
    summaries = []
    for chunk in chunks:
        chunk = chunk.to(device)
        summary_ids = model.generate(chunk, max_length=max_length, min_length=min_length, do_sample=False)
        summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
        summaries.append(summary)
    
    # Concatenate the summaries
    full_summary = ' '.join(summaries)

    return full_summary
```

### 3. **Translation Logic**:
The app translates the summarized text using the **Deep Translator** library. If the user selects a language other than "Original," the app translates the summary into the chosen language.

**Snippet**:
```python
from deep_translator import GoogleTranslator

# Function to translate the summary
def translate_summary(text, target_lang):
    if target_lang != "Original":
        translator = GoogleTranslator(source="en", target=target_lang)
        translated_text = translator.translate(text)
        return translated_text
    return text

# Example: Translate summary to Spanish
translated_summary = translate_summary(short_summary, target_lang="es")
```

### 4. **Text-to-Speech (TTS) Logic**:
Using **gTTS (Google Text-to-Speech)**, the app converts the summary into audio. The **langdetect** library determines the language of the text, ensuring TTS is generated in the correct language.

**Snippet**:
```python
from langdetect import detect
from gtts import gTTS
from io import BytesIO

# Function to convert text to speech
def text_to_speech(input_text, summary_text, summary_generated):
    text_to_read = summary_text if summary_generated else input_text
    detected_lang = detect(text_to_read)  # Detect the language
    tts_lang = tts_language_map.get(detected_lang, 'en')  # Map language to TTS code

    # Generate speech using gTTS
    tts = gTTS(text=text_to_read, lang=tts_lang)
    audio_file = BytesIO()
    tts.write_to_fp(audio_file)
    audio_file.seek(0)
    return audio_file

# Example: Convert summary to speech
audio_output = text_to_speech(input_text, translated_summary, summary_generated=True)
```

### 5. **Concept List Logic**:
The **Concept List** extracts key **nouns** and **adjectives** from the text and generates a list of terms with their definitions (English only). It uses **WordNet** to retrieve the meanings of the extracted terms.

**Snippet**:
```python
from nltk.corpus import wordnet
from collections import Counter

# Extract key nouns and adjectives
def extract_key_terms(text, num_concepts=10):
    words = nltk.word_tokenize(text)
    pos_tags = nltk.pos_tag(words)
    relevant_words = [word for word, pos in pos_tags if pos in ["NN", "JJ"] and len(word) > 3]
    most_common_words = [word for word, _ in Counter(relevant_words).most_common(num_concepts)]
    return most_common_words

# Generate list of concepts with definitions (English only)
def extract_concepts_with_definitions(text, num_concepts=10):
    language = detect(text)
    if language != 'en':
        return "The 'Concepts List' feature only works with English input."

    key_terms = extract_key_terms(text, num_concepts)
    definitions = []
    for term in set(key_terms):
        synsets = wordnet.synsets(term)
        if synsets:
            definitions.append(f"• {term} = {synsets[0].definition()}")
    return "\n".join(definitions) if definitions else "No definitions found."
```


### 6. **File reading Logic**:
Uses the python-docx library to create a Document object from the uploaded file.
Iterates through each paragraph in the document, appending the text to a list.
Joins the list into a single string with newline characters and returns it.

```python
def read_docx(file):
    doc = docx.Document(file)  # Create a Document object from the uploaded file
    text = []
    for paragraph in doc.paragraphs:
        text.append(paragraph.text)
    return '\n'.join(text)
```

Utilizes the PyPDF2 library to open the PDF file. Initializes an empty string to accumulate text. Loops through each page, extracting text and adds it into the string. Returns the complete text extracted from the PDF.
```python
def read_pdf(file):
    pdf_reader = PyPDF2.PdfReader(file)
    text = ""
    for page_num in range(len(pdf_reader.pages)):
        page = pdf_reader.pages[page_num]
        text += page.extract_text()
    return text
```
### 7. **URL extract Logic**:
This module provides functionality to extract the main text content from an article located at a specified URL. It utilizes the newspaper library to handle the downloading and parsing of the article, along with additional utility functions to clean the extracted text.

The main function serves as the entry point for the text extraction process from a given URL from the Gradio interface. It orchestrates the extraction and text cleaning operations.

```python
def main(url):
    main_text = extract_main_text(url)
    main_text = remove_empty_lines(main_text) # removes empty spaces between paragraphs, comment line if not needed

    return main_text
```
The *extract_main_text* function is responsible for downloading and parsing the article content from the specified URL.
```python
# Function to extract main text from a URL
def extract_main_text(url):
    try:
        # Create an Article object and download the content
        article = Article(url)
        article.download()
        
        # Parse the downloaded content
        article.parse()
        
        # Perform natural language processing (optional, to extract keywords, summary, etc.)
        article.nlp()
        
        return article.text

    except RequestException as e:
        return f"An error occurred: {e}"
    except Exception as e:
        return f"Failed to extract article: {e}"

```
The *remove_empty_lines* function cleans the extracted text by removing excessive whitespace and empty lines, improving the readability of the output.
```python
    
def remove_empty_lines(main_text):
    cleaned_text = re.sub(r'\s+', ' ', main_text)
    
    return cleaned_text


if __name__ == "__main__":
    main()

```
### 8. **Main Points Extraction Logic**:
This module provides functionality to extract key points from a given text by summarizing it using the **BART model**. It formats the most important sentences as bullet points for clarity.

```python
# Function to summarize key points
def extract_main_points(text, num_points=5):
    summary = summarize_bart(text, max_length=150, min_length=80)
    sentences = summary.split('. ')
    points = [f"• {sentence.strip()}." for sentence in sentences[:num_points] if sentence.strip()]
    return "\n".join(points)
```

The `extract_main_points` function calls the `summarize_bart` function to generate a summary from the input text. It splits the summarized text into individual sentences and selects the top `num_points` sentences, formatting each as a bullet point.

---

### Quick Recap:
1. **Dynamic Input**: Changes the Gradio interface input based on user selection.
2. **Summarization**: Generates summaries of different lengths using the **BART model**.
3. **Translation**: Translates summaries into multiple languages using **Deep Translator**.
4. **Text-to-Speech**: Converts text or summaries into speech using **gTTS** and automatic language detection.
5. **Concept List**: Extracts key terms (nouns, adjectives) and provides definitions using **NLTK** for English text.
6. **File reading**: File Reading: Extracts text from PDF and DOCX files using PyPDF2 and python-docx.
7. **URL extract**: Extracts text from given URL using  **newspaper3k**.
8. **Main Points Extraction**: Extracts key points from a text and formats them into a list of bullet points using the **BART model**.
---

## **Possible Future Improvements**

A major possible improvement would be creating custom datasets and fine-tuning the **Llama** model to improve summarization accuracy, especially for more specific types of content. This would allow the model to better adapt to different domains and deliver more precise summaries. Another exciting enhancement would be adding support for summarization from **different languages**, making the tool more versatile and accessible to a broader audience. Additionally, we could integrate **MeloTTS** for text-to-speech to improve audio quality, add support for **.txt** file uploads, and offer more customization options. Improving the integration between different components and refining the user experience based on feedback are also key areas for future work.

---

## **In Conclusion**

- We built a tool that makes summarizing and translating text easier than ever. Whether users input text, upload files, or share a URL, the **Smart Summarizer** quickly generates summaries, translates them into different languages, and even reads them out loud. It's a convenient, all-in-one solution for anyone who needs quick access to key points in any content.

- We used some pretty cool tech along the way—like the **BART model** for summarizing, **deep_translator** for switching between languages, and **gTTS** for turning text into speech. We also pulled in tools like **NLTK** for picking out important terms and **PyPDF2** for reading different file types, making the tool flexible for lots of use cases.

- **What we learned**: Working on this project gave us hands-on experience with NLP models and taught us how to bring different APIs together into a single tool. It was also a good lesson in being adaptable, as our plans shifted during the development process.

- **Final thoughts**: Overall, the Smart Summarizer project gave us a great chance to learn and apply modern AI tools. It's something that can really help people save time and effort when it comes to digesting large amounts of information, making it a tool that's not just powerful but super practical too.
