In [None]:
import httpx
import base64
import os
import google.generativeai as genai

model = genai.GenerativeModel("gemini-2.0-flash-exp")
# Load all .pdf files in subfolder Paper
pdf_files = []
for root, dirs, files in os.walk("Paper1"):
    for file in files:
        if file.endswith(".pdf"):
            pdf_files.append(os.path.join(root, file))

for doc_path in pdf_files:
    # Read and encode the local file
    with open(doc_path, "rb") as doc_file:
        doc_data = base64.standard_b64encode(doc_file.read()).decode("utf-8")

    prompt = """Attached you will find the PDF of a scientific paper.
    I would like to compare the paper with my research project. My research project is described by the following abstract:

    Title:
    Bridging Models and Language: An Encoder-Decoder Approach for Automated Architectural Documentation with LLMs

    Abstract:
    This research explores how large language models (LLMs) can be integrated into an encoder-decoder framework to enable real-time architectural documentation and seamless conversion between formal models (e.g., UML) and natural language descriptions. Building on previous work in LLM-based requirements verification, we develop a system that automatically converts models into text and vice versa. By allowing LLMs to assess textual descriptions for compliance with requirements, this approach reduces the need for extensive model-side validation through rule-based queries. Additionally, the decoder helps transform text-based process descriptions into structured architectural models, supporting digitalization in organizations. The system’s effectiveness will be evaluated by comparing reconstructed models with their originals, assessing how well information is preserved and how accurately the transformations are performed.

    Please summarize the attached pdf file with regards to my paper. 
    subsequently please analyse the following aspects of the paper: 
    - the main topic of the paper
    - the purpose of the paper
    - the main findings of the paper
    - how nlp or llms are used in the paper to process formal models or natural language descriptions
    - if rigorous evaluation metrics are applied to quantify the performance of the presented approach, and if so which evaluation metrics are applied. 
    """
    response = model.generate_content([{'mime_type': 'application/pdf', 'data': doc_data}, prompt])
    print(response.text)
    # Save the response text in subfolder Paper as .txt
    output_path = os.path.join("Paper", f"{os.path.splitext(os.path.basename(doc_path))[0]}_summary.txt")
    with open(output_path, "w") as output_file:
        output_file.write(response.text)


Okay, I've read the PDF and will provide a summary and analysis in relation to your research project.

**Summary of the PDF in Relation to Your Project:**

The paper you provided addresses the problem of automating software documentation using Large Language Models (LLMs). Specifically, it focuses on extracting use cases from UML Use Case Diagrams and employing a Generative AI Model to generate descriptive text for each extracted use case. The goal is to reduce the time spent on documentation while maintaining clarity and consistency. The authors extract use case information from XML representations of UML diagrams and feed this data, along with carefully engineered prompts, into a Generative AI model (they mention the Gemini API). They present a case study using smart tourism applications.

**Analysis of the Paper in Relation to Your Project:**

*   **Main Topic:** Automating software documentation generation using LLMs, with a focus on use case descriptions derived from UML diagrams.

In [13]:
import os
import google.generativeai as genai
# Load all .txt files in subfolder Paper as group 1
txt_files_group1 = []
for root, dirs, files in os.walk("Paper"):
    for file in files:
        if file.endswith(".txt"):
            with open(os.path.join(root, file), "r") as f:
                txt_files_group1.append({file: f.read()})
        
# Load all .txt files in subfolder Paper1 as group 2
txt_files_group2 = []
for root, dirs, files in os.walk("Paper1"):
    for file in files:
        if file.endswith(".txt"):
            with open(os.path.join(root, file), "r") as f:
                txt_files_group2.append({file: f.read()})

group1_papers = "\n".join([f"'{file}': '{content}'" for paper in txt_files_group1 for file, content in paper.items()])
group2_papers = "\n".join([f"'{file}': '{content}'" for paper in txt_files_group2 for file, content in paper.items()])

model = genai.GenerativeModel("gemini-2.0-flash-exp")

prompt = f"""You are an experienced researcher. Let us write the related works section of my next paper together. 
in the following, I will show you 9 summaries of individual related papers. Each summary states the goal of the paper, some content, the main findings, and the similarities and differences compared to my paper. 

The content of my paper is described in the following abstract and title: 

 Title:
    Bridging Models and Language: An Encoder-Decoder Approach for Automated Architectural Documentation with LLMs

    Abstract:
    This research explores how large language models (LLMs) can be integrated into an encoder-decoder framework to enable real-time architectural documentation and seamless conversion between formal models (e.g., UML) and natural language descriptions. Building on previous work in LLM-based requirements verification, we develop a system that automatically converts models into text and vice versa. By allowing LLMs to assess textual descriptions for compliance with requirements, this approach reduces the need for extensive model-side validation through rule-based queries. Additionally, the decoder helps transform text-based process descriptions into structured architectural models, supporting digitalization in organizations. The system’s effectiveness will be evaluated by comparing reconstructed models with their originals, assessing how well information is preserved and how accurately the transformations are performed.

The related papers are separated into two groups. The first group is similar in that nlp technologies are applied to develop a two-way transformation between formal models and natural language descriptions. However, none of the papers leverages LLMs. 
The second group is similar in that LLMs are levaraged to perform transformations between formal models and natural language descriptions, but it is different in that there is no two-way conversion of models from formal to text and back to formal. 
Also, some of the referenced papers do not apply rigorous evaluation metrics to quantify the performance of the presented approach.

Please make a first suggestion for the related works section of my paper. For each of the related works papers, state very briefly the goal, the applied technology, the main findings and how it differs from my paper. 
The differences between the related papers and my papers should be described in the following way: For both of the two groups of related papers there should be an introductory description for the groupd that states the main differences. 
Afterwards all of the papers from that group should be described. The individual description of the papers should not repeat the differences in regards to my paper that is already described in the introductory description of the entire group. 
In referencing the papers, all authors should be mentioned. Make sure to keep the descriptions of each individual paper brief (2 - 3 sentences).

Finish the related works section with a conclusion that summarizes the main differences between the related papers and my paper. Also state the important challenges mentioned in the papers that can be applied to my paper.
The following are the papers of the first groupd mentioned: 
{group1_papers}

The papers of the second group are the following: 
{group2_papers}

Once again, make it sure to be very brief. The last response you generated was too long.
"""
print(prompt)
# response = model.generate_content(prompt)
# print(response.text)



You are an experienced researcher. Let us write the related works section of my next paper together. 
in the following, I will show you 9 summaries of individual related papers. Each summary states the goal of the paper, some content, the main findings, and the similarities and differences compared to my paper. 

The content of my paper is described in the following abstract and title: 

 Title:
    Bridging Models and Language: An Encoder-Decoder Approach for Automated Architectural Documentation with LLMs

    Abstract:
    This research explores how large language models (LLMs) can be integrated into an encoder-decoder framework to enable real-time architectural documentation and seamless conversion between formal models (e.g., UML) and natural language descriptions. Building on previous work in LLM-based requirements verification, we develop a system that automatically converts models into text and vice versa. By allowing LLMs to assess textual descriptions for compliance with requ