## Full Code
The fully-functional code is outlined below for ease of copying and pasting. This provides a pipeline for converting a research paper in pdf format into a visual abstract, utilizing NLM for text extraction and grouping



**Notes:**
- Show components not results (talk about pros and cons, and how we imporve them in future iterations), show them the cleanest version of the product
- Get feedback over presentation before next cohort
- Get back to joe allen - for complicated graphs, get NLP model to detect any numbers in the paper and extract data from that
- **Agentic workflows** - Gabriel, could we query it/use chain of thought to break it down, "here's my paper, how would we break it down into a visual abstract?
- "Don't start too big"
- Start with the smallest problems (ex: generate a graph for abstract?)

- Implementing web scraping - where the papers are sourced on Arxiv, etc.
- Could we pool other visuals that exist on the web/that exist on the abstract on the lab's webpage? --> accessing raw data
- Find some papers about why these models are so bad at text generation
- Pipeline for seeing if the papers fed into the pipeline are matching the output semantically

TO-DO:
- Add soofie script to cron job and see if that automates data collection


Improting libraries:

pdfminer.six: A PDF parsing library for extracting text and metadata from PDF files.

transformers: Hugging Face's library for working with pretrained NLP and vision-language models (like BERT, GPT, CLIP, etc.).

diffusers: A library for running and fine-tuning diffusion models (e.g., Stable Diffusion).

accelerate: A tool from Hugging Face that simplifies training models across CPUs, GPUs, and TPUs.

torch: The PyTorch deep learning framework.

torchvision: Utilities for computer vision tasks (datasets, model architectures, transforms).

ftfy: Fixes unicode text + sanity checking (e.g., replacing odd characters from PDFs or web scraping with proper ones).

In [None]:
!pip install -q pdfminer.six transformers diffusers accelerate torch torchvision ftfy

from google.colab import drive
drive.mount('/content/drive')


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m77.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m74.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m77.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m44.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m40.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
import os
import torch # for Pytorch
from PIL import Image, ImageDraw, ImageFont
from pdfminer.high_level import extract_text
from transformers import pipeline
from diffusers import StableDiffusionPipeline

device = "cuda" if torch.cuda.is_available() else "cpu"

# Replace these pdf paths with your own!
# Specific examples used in this case is the 96th paper.

pdf_path = "/content/drive/MyDrive/datasets/papers/96.pdf"
template_path = "/content/drive/MyDrive/datasets/abstract_template.png"
output_path = "/content/drive/MyDrive/datasets/generations/final_model/visual_abstracts/96_output.png"


Defining function to extract text from the selected pdf and put into an object:

In [None]:
def extract_pdf_text(path):
    try:
        return extract_text(path)
    except Exception as e:
        print("PDF extraction error:", e)
        return ""

paper_text = extract_pdf_text(pdf_path)


### Summarizer Implementation

First, we set up a summarization tool using Hugging Face’s transformers library, which extracts and summarizes key sections from the research paper (abstract, methodology, and results).

First, it loads a summarization pipeline with the distilbart-cnn-12-6 model. It automatically checks if a GPU is available and uses it if possible. Then, it defines a function called extract_section that looks through the input text for a specific keyword—like "abstract" or "method."

When it finds the keyword, it grabs a chunk of about 1,500 characters starting at that point and sends it to the **summarizer**, which returns a concise summary. Finally, the code calls this function three times to extract and summarize the "abstract," "methodology," and "results" sections of a paper stored in the variable paper_text.

In [None]:
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6", device=0 if torch.cuda.is_available() else -1)

def extract_section(text, keyword, maxlen=300):
    idx = text.lower().find(keyword.lower())
    if idx == -1:
        return ""
    snippet = text[idx:idx + 1500]
    return summarizer(snippet, max_length=maxlen, min_length=30, do_sample=False)[0]['summary_text']

abstract = extract_section(paper_text, "abstract")
methodology = extract_section(paper_text, "method", maxlen=200)
results = extract_section(paper_text, "results", maxlen=300)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Device set to use cuda:0


 ### Stable Diffusion Implementation

 Stable Diffusion model to generate images from text prompts that summarize a research paper. It first loads the runwayml/stable-diffusion-v1-5 model using Hugging Face’s diffusers library, setting the precision to float16 if a GPU is available (for faster performance) or float32 otherwise. The model pipeline is then moved to the appropriate computing device (CPU or GPU). After setup, the code generates three images by passing descriptive prompts to the model. Each prompt is based on a different part of a research paper: the abstract, the methodology, and the results. The .images[0] call retrieves the first generated image for each prompt, and stores them in the variables visual1, visual2, and visual3, respectively. These images can later be used to create a visual abstract of the paper.

In [None]:
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32)
pipe = pipe.to(device)

visual1 = pipe(f"Visual abstract for: {abstract}").images[0]
visual2 = pipe(f"Experiment setup: {methodology}").images[0]
visual3 = pipe(f"Study results: {results}").images[0]


model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/492M [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/4.72k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/617 [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/308 [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/806 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/547 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

In [None]:
template = Image.open(template_path).convert("RGBA")
template = template.resize((960, 540))

def paste_text_block(draw, text, box, font, line_spacing=22):
    x1, y1, x2, y2 = box
    words = text.split()
    lines = []
    current = ""
    for word in words:
        if draw.textlength(current + " " + word, font=font) < (x2 - x1):
            current += " " + word
        else:
            lines.append(current.strip())
            current = word
    lines.append(current.strip())

    y_text = y1
    for line in lines:
        if y_text > y2: break
        draw.text((x1, y_text), line, font=font, fill="black")
        y_text += line_spacing

font = ImageFont.load_default()
draw = ImageDraw.Draw(template)

# Define blocks (x1, y1, x2, y2) — these are the exact coordinates measured for the visual abstracts.
boxes = {
    "abstract": (35, 70, 310, 210),
    "method": (35, 230, 310, 490),
    "results": (650, 60, 935, 490),
    "vis1": (340, 70),
    "vis2": (340, 240),
    "vis3": (340, 330),
}

# Paste text
paste_text_block(draw, abstract, boxes["abstract"], font)
paste_text_block(draw, methodology, boxes["method"], font)
paste_text_block(draw, results, boxes["results"], font)

# Resize visuals and paste
template.paste(visual1.resize((290, 150)), boxes["vis1"])
template.paste(visual2.resize((290, 80)), boxes["vis2"])
template.paste(visual3.resize((290, 130)), boxes["vis3"])

template.save(output_path)
print("saved to:", output_path)


saved to: /content/drive/MyDrive/datasets/generations/final_model/visual_abstracts/96_output.png


# Experimental pipeline (not used in presentation)
You can also cut out Stable Diffusion altogether by generating visuals with MatPlotLib and GraphVis. This may be preferable currently, as the version of Stable Diffusion we are using in the prior code is not fine-tuned for visual abstract purposes.

In [None]:
# Mounting drive
import google.colab
from google.colab import drive
drive.mount('/content/drive')

# Image generation and display libraries - matplotlib + PILlow!
# You also need i/o library for image processing and output.
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw, ImageFont
import io

# Load template image and sample section texts.
template_path = "/content/drive/MyDrive/datasets/abstract_template.png"  # Update path if needed
template = Image.open(template_path).convert("RGBA")
template = template.resize((960, 540))

# SAMPLE DATA - this will be pulled from the paper using a summarization pipeline.
abstract = "This is a summary of the research abstract."
methodology = "We ran experiments using a convolutional neural network with dropout."
results = "Accuracy improved from 72% to 89%. Loss dropped by 30%. Inference time was reduced by 40%."

# Fonts
try:
    font = ImageFont.truetype("arial.ttf", size=16)
except:
    font = ImageFont.load_default()

draw = ImageDraw.Draw(template)

# BOUNDING BOXES
# This is crucial when using a template - these 'bounding boxes' tell the program which
boxes = {
    "abstract": (35, 70, 310, 210),
    "method": (35, 230, 310, 490),
    "results": (650, 60, 935, 490),
    "vis1": (340, 70),
    "vis2": (340, 240),
    "vis3": (340, 330),
}

# Drawing funcion for the graph
def paste_text_block(draw, text, box, font, line_spacing=22):
    x1, y1, x2, y2 = box
    words = text.split()
    lines, current = [], ""
    for word in words:
        if draw.textlength(current + " " + word, font=font) < (x2 - x1):
            current += " " + word
        else:
            lines.append(current.strip())
            current = word
    lines.append(current.strip())

    y_text = y1
    for line in lines:
        if y_text > y2: break
        draw.text((x1, y_text), line, font=font, fill="black")
        y_text += line_spacing

# Pasting text into blocks from the PDF
paste_text_block(draw, abstract, boxes["abstract"], font)
paste_text_block(draw, methodology, boxes["method"], font)
paste_text_block(draw, results, boxes["results"], font)

# Visual creation function for matplotlib
def create_plot(title, x, y, color='blue'):
    fig, ax = plt.subplots(figsize=(3.2, 1.8))  # ~290x150 pixels
    ax.plot(x, y, marker='o', color=color)
    ax.set_title(title, fontsize=10)
    ax.tick_params(labelsize=8)
    ax.grid(True)
    fig.tight_layout()

    # Save to io image buffer so we don't have to store every generation locally to use it
    buf = io.BytesIO()
    fig.savefig(buf, format='png', dpi=100, bbox_inches='tight')
    plt.close(fig)
    buf.seek(0)
    return Image.open(buf).convert("RGBA")

# EXAMPLE DATA PLOTS - data will be pulled from real PDFs in your code
visual1 = create_plot("Accuracy Over Epochs", list(range(1, 6)), [70, 75, 80, 85, 89])
visual2 = create_plot("Loss Reduction", list(range(1, 6)), [0.9, 0.6, 0.5, 0.4, 0.3], color='red')
visual3 = create_plot("Inference Time Drop", [1, 2, 3], [100, 70, 60], color='green')

# Functions to paste visuals
template.paste(visual1.resize((290, 150)), boxes["vis1"], visual1.resize((290, 150)))
template.paste(visual2.resize((290, 80)), boxes["vis2"], visual2.resize((290, 80)))
template.paste(visual3.resize((290, 130)), boxes["vis3"], visual3.resize((290, 130)))

# Save to your local area
output_path = "/content/drive/MyDrive/datasets/generated"
template.save(output_path)
print("✅ Visual abstract saved to Google Drive:", output_path)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


ValueError: unknown file extension: 