In [2]:
import os
import json

# Set paper ID
paper_id = "ICLR_2018_12"

# Define paths
paper_path = f"../data/dataset/ICLR_2018/ICLR_2018_paper/{paper_id}_paper.json"
review_path = f"../data/dataset/ICLR_2018/ICLR_2018_review/{paper_id}_review.json"

# Load files
with open(paper_path, "r", encoding="utf-8") as f:
    paper = json.load(f)

with open(review_path, "r", encoding="utf-8") as f:
    review = json.load(f)

# Extract content
title = paper.get("title", "")
abstract = paper.get("abstract", "")
reviews = [r.get("review", "") for r in review.get("reviews", [])[:3]]

# Prepare model inputs
baseline_input = f"Title: {title}\nAbstract: {abstract}"
rag_input = f"{baseline_input}\n\nRetrieved Reviews:\n" + "\n\n".join(reviews)

# Print preview
print("Paper ID:", paper.get("id"))
print("Baseline Input:\n", baseline_input[:300])
print("RAG Input Preview:\n", rag_input[:500])

Paper ID: ICLR_2018_12
Baseline Input:
 Title: Spectral Normalization for Generative Adversarial Networks
Abstract: 
RAG Input Preview:
 Title: Spectral Normalization for Generative Adversarial Networks
Abstract: 

Retrieved Reviews:
This paper borrows the classic idea of spectral regularization, recently applied to deep learning by Yoshida and Miyato (2017) and use it to normalize GAN objectives. The ensuing GAN, coined SN-GAN, essentially ensures the Lipschitz property of the discriminator. This Lipschitz property has already been proposed by recent methods and has showed some success. However,  the authors here argue that spec


In [3]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
model.eval()

inputs = tokenizer.encode(baseline_input, return_tensors='pt')
outputs = model.generate(inputs, max_length=100, do_sample=True)

generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated)


  from .autonotebook import tqdm as notebook_tqdm
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Title: Spectral Normalization for Generative Adversarial Networks
Abstract:  Some of the most popular kinds of adaptive networks are typically characterized by strong adaptive aspects, particularly about how to recognize one another and for each other, how to manipulate the state of the network through various strategies and processes (see my blog for more). In this article we describe an interesting approach that is highly selective, in that we are able to learn a model that can learn very fast, and where to look for


In [None]:
from transformers import RagTokenizer, RagRetriever, RagTokenForGeneration

# Load tokenizer, model, and retriever
tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-nq")
retriever = RagRetriever.from_pretrained("facebook/rag-token-nq", index_name="exact", use_dummy_dataset=True)
model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever)

# Your input text (title only)
input_text = title

# Tokenize
inputs = tokenizer(input_text, return_tensors="pt")

# Generate summary
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))


The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'DPRQuestionEncoderTokenizer'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'DPRQuestionEncoderTokenizerFast'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'BartTokenizer'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called fr

ImportError: 
RagRetriever requires the 🤗 Datasets library but it was not found in your environment. You can install it with:
```
pip install datasets
```
In a notebook or a colab, you can install it by executing a cell with
```
!pip install datasets
```
then restarting your kernel.

Note that if you have a local folder named `datasets` or a local python file named `datasets.py` in your current
working directory, python may try to import this instead of the 🤗 Datasets library. You should rename this folder or
that python file if that's the case. Please note that you may need to restart your runtime after installation.

RagRetriever requires the faiss library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/facebookresearch/faiss/blob/master/INSTALL.md and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.
