In [13]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from datasets import load_dataset
import pandas as pd
import torch
import textwrap
import nltk

test_df = pd.read_json('data/test.json', lines=True)
test_df.head()

Unnamed: 0,id,pr-title,pr-article,pr-summary,sc-abstract,sc-section_names,sc-sections,sc-article,sc-authors,sc-title
0,6,'Origami' Testing App Could Tackle Spread of M...,A new approach to tackling the spread of malar...,Researchers at the U.K.'s University of Glasgo...,"In infectious disease diagnosis, results need ...","[Abstract, , Diagnostic system, Integration an...",[There remains a substantial burden from infec...,"In infectious disease diagnosis, results need ...","[Xin Guo | Division of Biomedical Engineering,...",Smartphone-based DNA diagnostics for malaria d...
1,9,Researchers Say They've Found a Wildly Success...,In addition to helping police arrest the wrong...,Computer scientists at Israel's Tel Aviv Unive...,A master face is a face image that passes face...,"[Abstract, I. INTRODUCTION, II. RELATED WORK, ...","[In dictionary attacks, one attempts to pass a...",A master face is a face image that passes face...,[Ron Shmelkin | The Blavatnik School of Comput...,Generating Master Faces for Dictionary Attacks...
2,14,Running Quantum Software on a Classical Computer,In a paper published in Nature Quantum Informa...,Researchers at the Swiss Federal Institute of ...,A key open question in quantum computing is wh...,"[Abstract, INTRODUCTION, RESULTS, Classical va...",[The past decade has seen a fast development o...,A key open question in quantum computing is wh...,[Matija Medvidović | Center for Computational ...,Classical variational simulation of the Quantu...
3,28,Scientists Reducing Computational Power Requir...,"July 29, 2021 - An approach that reduces the c...",The KAUST Metagenomic Analysis Platform (KMAP)...,Exponential rise of metagenomics sequencing is...,"[Abstract, , Results and analyses, A global no...",[make functional validation of interesting gen...,Exponential rise of metagenomics sequencing is...,[Intikhab Alam | Computational Bioscience Rese...,"KAUST Metagenomic Analysis Platform (KMAP), en..."
4,32,Honeypot Security Technique Can Stop Attacks i...,"UNIVERSITY PARK, Pa. - As online fake news det...",A machine learning framework can proactively c...,The Universal Trigger (UniTrigger) is a recent...,"[Abstract, Introduction, Original:, The Univer...",[Adversarial examples in NLP refer to carefull...,The Universal Trigger (UniTrigger) is a recent...,"[Thai Le | Penn State University, Noseong Park...",A Sweet Rabbit Hole by DARCY: Using Honeypots ...


In [14]:
tokenizer = AutoTokenizer.from_pretrained('t5-small')
model = AutoModelForSeq2SeqLM.from_pretrained('T5-title-generator/checkpoint-500')
max_input_length = 512

In [19]:
def process_article(idx, test_df, tokenizer, model, max_input_length=512):
    article = test_df.iloc[idx]['pr-article']
    inputs = 'summarize: ' + article
    title = test_df.iloc[idx]['pr-title']

    tokenized_inputs = tokenizer(inputs,
                                 max_length=max_input_length,
                                 truncation=True,
                                 return_tensors="pt")
    output = model.generate(**tokenized_inputs,
                            num_beams=8,
                            do_sample=True,
                            min_length=5,
                            max_length=64)
    decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
    predicted_title = nltk.sent_tokenize(decoded_output.strip())[0]

    print('Article:\n')
    print(textwrap.fill(article, width=120, initial_indent='    ', subsequent_indent='    '))
    print('\nOriginal Title:\n')
    print(textwrap.fill(title, width=120, initial_indent='    ', subsequent_indent='    '))
    print("\nModel's Title:\n")
    print(textwrap.fill(predicted_title, width=120, initial_indent='    ', subsequent_indent='    '))

process_article(5, test_df, tokenizer, model, max_input_length)

Article:

    Northwestern Engineering researchers have developed a new framework using machine learning that improves the
    accuracy of interatomic potentials - the guiding rules describing how atoms interact - in new materials design. The
    findings could lead to more accurate predictions of how new materials transfer heat, deform, and fail at the atomic
    scale. Designing new nanomaterials is an important aspect of developing next-generation devices used in electronics,
    sensors, energy harvesting and storage, optical detectors, and structural materials. To design these materials,
    researchers create interatomic potentials through atomistic modeling, a computational approach that predicts how
    these materials behave by accounting for their properties at the smallest level. The process to establish materials'
    interatomic potential - called parameterization - has required significant chemical and physical intuition, leading
    to less accurate prediction of new mat

In [20]:
process_article(100, test_df, tokenizer, model, max_input_length)

Article:

    Researchers from the University of Toronto and LG AI Research have developed an "explainable" artificial
    intelligence (XAI) algorithm that can help identify and eliminate defects in display screens. The new algorithm,
    which outperformed comparable approaches on industry benchmarks, was developed through an ongoing AI research
    collaboration between LG and U of T that was expanded in 2019 with a focus on AI applications for businesses .
    Researchers say the XAI algorithm could potentially be applied in other fields that require a window into how
    machine learning makes its decisions, including the interpretation of data from medical scans. "Explainability and
    interpretability are about meeting the quality standards we set for ourselves as engineers and are demanded by the
    end user," says Kostas Plataniotis , a professor in the Edward S. Rogers Sr. department of electrical and computer
    engineering in the Faculty of Applied Science & Engineering.

In [21]:
process_article(200, test_df, tokenizer, model, max_input_length)

Article:

    The largest collection of public internet censorship data ever compiled shows that even citizens of the world's
    freest countries are not safe from internet censorship. A University of Michigan team used Censored Planet, an
    automated censorship tracking system launched in 2018 by assistant professor of electrical engineering and computer
    science Roya Ensafi, to collect more than 21 billion measurements over 20 months in 221 countries. They will present
    the findings Nov. 10 at the 2020 ACM Conference on Computer and Communications Security. "We hope that the continued
    publication of Censored Planet data will enable researchers to continuously monitor the deployment of network
    interference technologies, track policy changes in censoring nations, and better understand the targets of
    interference," Ensafi said. "While Censored Planet does not attribute censorship to a particular entity, we hope
    that the massive data we've collected can help poli

In [23]:
process_article(420, test_df, tokenizer, model, max_input_length)

Article:

    Two mysterious components of quantum technology came together in a lab at Rice University in Houston recently.
    Quantum entanglement - the key to quantum computing - and quantum criticality - an essential ingredient for high-
    temperature superconductors - have now been linked in a single experiment. The preliminary results suggest something
    approaching the same physics is behind these two essential but previously distinct quantum technologies. The
    temptation, then, is to imagine a future in which a sort of grand unified theory of entanglement and
    superconductivity might be developed, where breakthroughs in one field could be translated into the other. The
    research centers around a thin film of a metal (composed of the elements ytterbium , rhodium , and silicon)
    fabricated by researchers at the Vienna University of Technology . A team at Rice, then, analyzed its peculiar
    properties. They observed the film in a state that both exhibited so-cal