In [None]:
# default_exp data.summarization

In [None]:
#hide
%reload_ext autoreload
%autoreload 2
%matplotlib inline

# data.summarization

> This module contains the bits required to use the fastai DataBlock API and/or mid-level data processing pipelines to organize your data for summarization tasks using architectures like BART and T5.

In [None]:
#export
import ast
from functools import reduce

import torch
from transformers import *
from fastai.text.all import *

from blurr.utils import *
from blurr.data.core import *

In [None]:
#hide
import pdb

from nbdev.showdoc import *
from fastcore.test import *

from fastai import __version__ as fa_version
from torch import __version__ as pt_version
from transformers import __version__ as hft_version

print(f'Using pytorch {pt_version}')
print(f'Using fastai {fa_version}')
print(f'Using transformers {hft_version}')

Using pytorch 1.6.0
Using fastai 2.0.13
Using transformers 3.2.0


In [None]:
#cuda
torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}')

Using GPU #1: GeForce GTX 1080 Ti


## Summarization tokenization, batch transform, and DataBlock methods

Summarization tasks attempt to generate a human-understandable and sensible representation of a larger body of text (e.g., capture the meaning of a larger document in 1-3 sentences).

In [None]:
path = Path('./')
cnndm_df = pd.read_csv(path/'cnndm_sample.csv'); len(cnndm_df)

1000

In [None]:
cnndm_df.head(2)

Unnamed: 0,article,highlights,ds_type
0,"(CNN) -- Globalization washes like a flood over the world's cultures and economies. Floods can be destructive; however, they can also bring blessings, as the annual floods of the Nile did for ancient Egypt. The world's great universities can be crucial instruments in shaping, in a positive way, humankind's reaction to globalization and the development of humankind itself. Traditionally, universities have been defined and limited by location, creating an academic community and drawing students and scholars to that place. Eventually, some universities began to encourage students to study el...","John Sexton: Traditionally, universities have been defined and limited by location .\nGlobal campuses form a network of thought, innovation, he writes .\nFaculty can teach, Sexton says, students can team up in many cities at once .\nSexton: Research, scholarship can be shared and cultural ties made in ""century of knowledge""",train
1,"(CNN) -- Armenian President Robert Kocharian declared a state of emergency Saturday night after a day of clashes between police and protesters, a spokeswoman for the Armenian Foreign Ministry said. Opposition supporters wave an Armenian flag during a protest rally in Yerevan, Armenia, on Saturday. The protesters claim last month's presidential election was rigged. The state of emergency will ""hopefully bring some order"" to the capital, Yerevan, said Salpi Ghazarian, assistant to the Armenian foreign minister, who spoke to CNN early Sunday. The state of emergency could last until March 20, ...","NEW: Protest moves after crackdown at Freedom Square .\nOrder sought after protests over last month's election turn violent .\nDemonstrators say the election was fraudulent .\nState of emergency could last until March 20, official says .",train


In [None]:
pretrained_model_name = "facebook/bart-large-cnn"

hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(pretrained_model_name, 
                                                                               model_cls=BartForConditionalGeneration)

hf_arch, type(hf_tokenizer), type(hf_config), type(hf_model)

('bart',
 transformers.tokenization_bart.BartTokenizer,
 transformers.configuration_bart.BartConfig,
 transformers.modeling_bart.BartForConditionalGeneration)

In [None]:
#export
class HF_SummarizationInput(HF_BaseInput): pass

We create a subclass of `HF_BatchTransform` for summarization tasks to add `decoder_input_ids` and `labels` to our inputs during training, which will in turn allow the huggingface model to calculate the loss for us.  See [here](https://huggingface.co/transformers/model_doc/bart.html#transformers.BartModel.forward) for more information on these additional inputs are used in summarization and conversational training tasks.  

Note also that `labels` is simply target_ids shifted to the right by one since the task to is to predict the next token based on the current (and all previous) `decoder_input_ids`.

And lastly, we also update our targets to just be the `input_ids` of our target sequence so that fastai's `Learner.show_results` works (again, almost all the fastai bits require returning a single tensor to work).

In [None]:
#export
class HF_SummarizationBatchTransform(HF_BatchTransform):
    def __init__(self, hf_arch, hf_tokenizer, max_length=None, padding=True, truncation=True, 
                 is_split_into_words=False, n_tok_inps=2, hf_input_return_type=HF_SummarizationInput, 
                 tok_kwargs={}, **kwargs):
                 
        super().__init__(hf_arch, hf_tokenizer, max_length=max_length, padding=padding, truncation=truncation, 
                         is_split_into_words=is_split_into_words, n_tok_inps=n_tok_inps, 
                         hf_input_return_type=hf_input_return_type, 
                         tok_kwargs=tok_kwargs.copy(), **kwargs)
        
    def encodes(self, samples):  
        samples = super().encodes(samples)
        if (len(samples[0]) == 1): return samples
        
        updated_samples = []
        for s in samples:
            s[0]['decoder_input_ids'] = s[1]['input_ids'][:-1].clone()
            s[0]['labels'] = s[1]['input_ids'][1:].clone()
            s[0]['labels'][s[0]['labels'] == self.hf_tokenizer.pad_token_id] = -100
            
            targ_ids = s[1]['input_ids']
            
            updated_samples.append((s[0], targ_ids))
        
        return updated_samples

    def decodes(self, encoded_samples):
        input_ids = encoded_samples['input_ids'] if (isinstance(encoded_samples, dict)) else encoded_samples
        return self.hf_input_return_type(input_ids, hf_tokenizer=self.hf_tokenizer)

We had to override the `decodes` method above because, while both our inputs and targets are technically the same things, we update the later to consist of *only* the target input_ids so that methods like `Learner.show_results` work.

In [None]:
blocks = (HF_TextBlock(hf_batch_tfm=HF_SummarizationBatchTransform(hf_arch, hf_tokenizer)), noop)
dblock = DataBlock(blocks=blocks, get_x=ColReader('article'), get_y=ColReader('highlights'), splitter=RandomSplitter())

Two lines!  Notice we pass in `noop` for our targets (e.g. our summaries) because the batch transform will take care of both out inputs and targets.

In [None]:
# dblock.summary(cnndm_df)

In [None]:
dls = dblock.dataloaders(cnndm_df, bs=4)

In [None]:
b = dls.one_batch()

In [None]:
len(b), b[0]['input_ids'].shape, b[1].shape

(2, torch.Size([4, 1024]), torch.Size([4, 68]))

In [None]:
#export
@typedispatch
def show_batch(x:HF_SummarizationInput, y, samples, dataloaders, ctxs=None, max_n=6, **kwargs):  
    hf_tokenizer = dataloaders.before_batch[0].hf_tokenizer
    
    res = L([ (hf_tokenizer.decode(s[0], skip_special_tokens=True), hf_tokenizer.decode(s[1], skip_special_tokens=True))
             for s in samples ])      
    
    display_df(pd.DataFrame(res, columns=['text', 'target'])[:max_n])
    return ctxs

In [None]:
dls.show_batch(dataloaders=dls, max_n=2)

Unnamed: 0,text,target
0,"Dan Condon believes in recycling. Just not when it comes to his hotel towels. Condon composts when he's at home in Boulder, Colorado. He eats local, organic and fair-trade food and drives a Honda CR-Z hybrid sports car. You might call him green. Except he's not so green when he travels for his work at an education nonprofit and stays in a hotel, which happens about 10 weeks per year. There, he uses a new towel every day. And don't try to bribe him with a drink or dessert coupon to get him to reuse the same one. ""I could care less about rewards for environmentally conscious behavior unless it's miles,"" Condon wrote in an e-mail. If hotels can't convince a hybrid-driving recycling enthusiast like Condon to go green while traveling, how can they possibly convince everyone else? 9 glamorous movie-star hotels. That's the problem of hotels trying to ""green"" your hotel stay. After guests have paid a pretty penny for a night at the inn, even the most environmental guests may want to treat themselves to fresh towels every day and those little bottles of sweet-smelling shampoo. Despite the fact that most people describe themselves in surveys as environmentally conscious and as preferring green products, there's a big gap between consumer attitudes and consumer behaviors when it comes to going green, said Michael Giebelhausen, a marketing professor at the Cornell University School of Hotel Administration. ""It can be nice to have fresh towels, and not doing so is a sacrifice,"" said Giebelhausen, whose current research focuses on the impact of hotel sustainability programs on guest satisfaction. ""Participating requires some effort, and there's some cost to be incurred on the part of the consumer."" Guests who go green are happy. Nearly 90% of hotel guests are offered the chance to do something sustainable during their stays, and about two-thirds will participate, according to Giebelhausen's analysis of 2011 data from the J.D. Power and Associates North America Hotel Guest Satisfaction Study. Those guests who participate in a hotel's green programs report that they are more satisfied with their stays than guests who do not participate. Participating in a hotel's sustainability program provides ""a feeling that it was good to be green, it made them feel good about themselves, and that translated to the service provider,"" Giebelhausen said. 8 getaways we wish we could afford. ""These guests, who are ostensibly receiving a lower level of service, report being more satisfied overall with their stay."" There's just one catch: Guests who don't participate in voluntary sustainability programs reported the lowest levels of satisfaction with their hotel stays. ""One explanation for these findings is that when people don't live up to their ideals, and vice versa, this affects how satisfied they are with the entity that presented them this'moral dilemma,'"" Giebelhausen said. Sustainability is becoming the norm. It makes business sense for hotels to go green: Increasing sewage rates, stricter water use requirements and more recycling options are all convincing hotels to reduce their water and energy costs, said hotel industry veteran Pat Maher, an environmental consultant and ""green guru"" for the American Hotel & Lodging Association. More than 75% of U.S. hotels have linen and towel reuse programs, 59% have guest or internal recycling programs, and 46% have a water-saving program, according to a 2012 American Hotel & Lodging Association survey of its members. They also have ""back of the house"" programs that include low-flow shower heads, faucets and toilets; energy-efficient light bulbs, high-efficiency appliances and other efforts. Some are required by local governments; others just make business sense. That translates into real dollars: The U.S. Environmental Protection Agency has found that hotels and other lodging facilities use more than 510 trillion BTU of energy annually at a cost of more than $7.4 billion. That energy use generates 54 million metric tons of greenhouse gas emissions, equal to the emissions from more than 11 million passenger vehicles, according to the agency. Beyond Mickey Mouse: Best cruises for 2013. The EPA reports that the lodging industry could save $745 million annually by reducing energy use by 10%. That translates to 60 cents more revenue per room night at limited-service hotels and $2 at full-service hotels. Annoyed that the hotel's bottom line benefits from your sacrifice? Some hotels are trying to make water-saving behavior pay for their guests. Participating Sheraton Hotels & Resorts gives guests a $5 food and drink voucher or 500 Starwood points for every day they decline housekeeping's services (except departure day). Part of the Kimpton culture. Some hotels are making green cool. It seems to be an easier sell for hip, higher-end chains like Kimpton Hotel & Restaurant Group's properties, which cultivate an edgier base of customers. About 85% of hotel guests participate in the chain's towel and sheet reuse program, said Mike DeFrino","Hotel guests who ""go green"" are happier with their stay.\nIncreasing water and energy costs are pushing hotels to cut costs wherever they can.\nMany hotels find that guests don't mind using the same towels and sheets every night.\nTripAdvisor will be adding a green label for hotels listed on its site."
1,"Some U.S. officials this year are expected to get smartphones capable of handling classified government documents over cellular networks, according to people involved in the project. The phones will run a modified version of Google's Android software, which is being developed as part of an initiative that spans multiple federal agencies and government contractors, these people said. The smartphones are first being deployed to U.S. soldiers, people familiar with the project said. Later, federal agencies are expected to get phones for sending and receiving government cables while away from their offices, sources said. Eventually, local governments and corporations could give workers phones with similar software. The Army has been testing touchscreen devices at U.S. bases for nearly two years, said Michael McCarthy, a director for the Army's Brigade Modernization Command, in a phone interview. About 40 phones were sent to fighters overseas a year ago, and the Army plans to ship 50 more phones and 75 tablets to soldiers abroad in March, he said. ""We've had kind of an accelerated approval process,"" McCarthy said. ""This is a hugely significant event."" Currently, the United States doesn't allow government workers or soldiers to use smartphones for sending classified messages because the devices have not met security certifications. Officials have said they worry that hackers or rogue apps could tap into the commercial version of Android and spill state secrets to foreign governments or to the Web through a publisher such as WikiLeaks. As many as 5 million Android users may have had their phones compromised by a recent virus outbreak rooted in apps found on Google's market, said security software maker Symantec. But with a secure smartphone, a soldier could see fellow infantry on a digital map, or an official could send an important dispatch from Washington's Metro subway without fear of security breaches. Developers in the government program have completed a version that has been authorized for storing classified documents but not transmitting them over a cell network, said two people contributing to the initiative. Smartphones cleared for top-secret dispatches -- high-level classified information that would compromise national security if intercepted -- are expected to be ready in the next few months, they said. Rather than building special handsets hardwired with secure components, the government plans to install its software on commercially available phones, the people familiar with the project said. This approach is far less expensive and allows the government to stay up to date with the latest phones on the market, they said. Android vs. Apple. There are hundreds of different Android models available, and more than half of all smartphones sold globally in a recent quarter use Android, according to industry research firm Gartner. Verizon Wireless has sold more Android phones than any other U.S. cell carrier, thanks in part to its marketing emphasis interest on the Droid brand. About a year ago, Verizon also got the iPhone, ending AT&T's U.S. exclusivity with that device. ""There's a lot of interest in Android,"" Bryan Schromsky, a Verizon director for its wireless data services, said in a phone interview. ""We are seeing Android sales across all branches of government."" Still, Apple's iPhone and iPad are also highly desired among U.S. officials, and people involved in the U.S. smartphone program said their goal is to support any type of smartphone. As CNN has reported, the Chairman of the Joint Chiefs of Staff, Gen. Martin Dempsey, uses an iPad to read his classified intelligence by downloading cables and disconnecting from the network. However, the government chose to work on Android first because Google already allows people to tinker freely with its code, said those working on the project. Federal officials have met with Apple, but they were told they could not have access to the core of the company's mobile operating system, said Angelos Stavrou, an information-security director at George Mason University who is working on the government project as a contractor, in a phone interview. ""Android was more cooperative in supporting some of the capabilities that we wanted to support in the operating system, whereas Apple was more averse,"" Stavrou told CNN. ""They're shifting the strategy now."" An Apple spokeswoman declined to comment on the meeting or any changes to its strategy. Google publishes the source code for Android on its website for anyone to download and modify, and some partners are given access to the code before others. A Google spokesman declined to comment on the government project. When Google releases a new version of Android or when a new version of its phones comes out, a compatible software update to the government's secure Android can be ready within two weeks, Stavrou said. Emphasis on security. Government programmers are making security modifications to Android's kernel, which is the operating system's central component, the people involved said. The version will allow users to choose which data from Android and its applications can be sent over the Internet, they said. ""When you download an application on your phone, you don't really know what it does,"" Stavrou said. ""We test the application in labs before the user consumes that application."" After testing more than 200,","Government, military officials to get Android phones capable of sharing secret documents.\nThe phones will run a modified version of Google's Android software, sources say.\nContractor: Google ""more cooperative"" than Apple working with government on phones."


## Tests

The tests below to ensure the core DataBlock code above works for **all** pretrained summarization models available in huggingface.  These tests are excluded from the CI workflow because of how long they would take to run and the amount of data that would be required to download.

**Note**: Feel free to modify the code below to test whatever pretrained summarization models you are working with ... and if any of your pretrained summarization models fail, please submit a github issue *(or a PR if you'd like to fix it yourself)*

In [None]:
BLURR_MODEL_HELPER.get_models(task='ConditionalGeneration')

[transformers.modeling_bart.BartForConditionalGeneration,
 transformers.modeling_fsmt.FSMTForConditionalGeneration,
 transformers.modeling_mbart.MBartForConditionalGeneration,
 transformers.modeling_pegasus.PegasusForConditionalGeneration,
 transformers.modeling_t5.T5ForConditionalGeneration]

In [None]:
pretrained_model_names = [
    ('facebook/bart-base',BartForConditionalGeneration),
    ('t5-small', T5ForConditionalGeneration),
    ('google/pegasus-cnn_dailymail', PegasusForConditionalGeneration)
]

In [None]:
path = Path('./')
cnndm_df = pd.read_csv(path/'cnndm_sample.csv')

In [None]:
#slow
#hide_output
task = HF_TASKS_ALL.ConditionalGeneration
bsz = 2
seq_sz = 256
trg_seq_sz = 40

test_results = []
for model_name, model_cls in pretrained_model_names:
    error=None
    
    print(f'=== {model_name} ===\n')
    
    hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(model_name, 
                                                                                   task=task, 
                                                                                   model_cls=model_cls)
    print(f'architecture:\t{hf_arch}\ntokenizer:\t{type(hf_tokenizer).__name__}\n')
    
    hf_batch_tfm = HF_SummarizationBatchTransform(hf_arch, hf_tokenizer, 
                                                  padding='max_length', max_length=[seq_sz, trg_seq_sz])

    blocks = ( 
        HF_TextBlock(hf_arch, hf_tokenizer, hf_batch_tfm=hf_batch_tfm), 
        noop
    )

    def add_t5_prefix(inp): return f'summarize: {inp}' if (hf_arch == 't5') else inp

    dblock = DataBlock(blocks=blocks, 
                   get_x=Pipeline([ColReader('article'), add_t5_prefix]), 
                   get_y=ColReader('highlights'), 
                   splitter=RandomSplitter())

    dls = dblock.dataloaders(cnndm_df, bs=bsz) 
    b = dls.one_batch()
    
    try:
        print('*** TESTING DataLoaders ***\n')
        test_eq(len(b), 2)
        test_eq(len(b[0]['input_ids']), bsz)
        test_eq(b[0]['input_ids'].shape, torch.Size([bsz, seq_sz]))
        test_eq(len(b[1]), bsz)
        test_eq(b[1].shape, torch.Size([bsz, trg_seq_sz]))

        if (hasattr(hf_tokenizer, 'add_prefix_space')):
            test_eq(dls.before_batch[0].tok_kwargs['add_prefix_space'], True)
            
        test_results.append((hf_arch, type(hf_tokenizer).__name__, model_name, 'PASSED', ''))
        dls.show_batch(dataloaders=dls, max_n=2)
        
    except Exception as err:
        test_results.append((hf_arch, type(hf_tokenizer).__name__, model_name, 'FAILED', err))

=== facebook/bart-base ===

architecture:	bart
tokenizer:	BartTokenizer

*** TESTING DataLoaders ***



Unnamed: 0,text,target
0,"Dan Condon believes in recycling. Just not when it comes to his hotel towels. Condon composts when he's at home in Boulder, Colorado. He eats local, organic and fair-trade food and drives a Honda CR-Z hybrid sports car. You might call him green. Except he's not so green when he travels for his work at an education nonprofit and stays in a hotel, which happens about 10 weeks per year. There, he uses a new towel every day. And don't try to bribe him with a drink or dessert coupon to get him to reuse the same one. ""I could care less about rewards for environmentally conscious behavior unless it's miles,"" Condon wrote in an e-mail. If hotels can't convince a hybrid-driving recycling enthusiast like Condon to go green while traveling, how can they possibly convince everyone else? 9 glamorous movie-star hotels. That's the problem of hotels trying to ""green"" your hotel stay. After guests have paid a pretty penny for a night at the inn, even the most environmental guests may want to treat themselves to fresh towels every day and those little bottles of sweet-smelling shampoo. Despite the fact that most people describe themselves in surveys as environmentally conscious and as preferring green products,","Hotel guests who ""go green"" are happier with their stay.\nIncreasing water and energy costs are pushing hotels to cut costs wherever they can.\nMany hotels find that guests don't"
1,"Washington (CNN) -- Few answers have emerged to the myriad questions about the Boston Marathon bombing and its aftermath, but that didn't stop political leaders from clashing about what happened and why it did on Sunday talk shows. Republican members of Congress played up a possible connection to global terrorists and said the lone surviving suspect should be designated an enemy combatant to allow unfettered questioning and unlimited detention. Democratic legislators called for handling the 19-year-old suspect as a crime suspect rather than a war enemy, allowing the U.S. citizen the right to legal representation under federal law that could impose the death penalty. A closer look at their statements and arguments showed how politicians blend facts, conjecture and spin to push their side's agenda while countering arguments from across the aisle. The facts so far tell a still-convoluted story. Tamerlan and Dzhokhar Tsarnaev, brothers of northern Caucasus origin who had lived in the United States for years, allegedly set off two bombs near the finish line of Monday's Boston Marathon, killing three people and injuring more than 170. Graham: 'Ball was dropped' in probe of Tamerlan. After the FBI released video footage and photos of the pair on Thursday, they allegedly shot to death a university police officer and","Partisan posturing emerges over Boston bombings on Sunday talk shows.\nDespite little evidence, Republicans hint of possible international terror ties.\nDemocrats argue against designating the suspect an enemy combatant"


=== t5-small ===

architecture:	t5
tokenizer:	T5Tokenizer

*** TESTING DataLoaders ***



Unnamed: 0,text,target
0,"summarize: (CNN) -- Home to up to 10 percent of all known species, Mexico is recognized as one of the most biodiverse regions on the planet. The twin threats of climate change and human encroachment on natural environments are, however, threatening the existence of the country's rich wildlife. And there is a great deal to lose. In the United Nations Environment Program (UNEP) World Conservation Monitoring Centre's list of megadiverse countries Mexico ranks 11th. The list represents a group of 17 countries that harbor the majority of the Earth's species and are therefore considered extremely biodiverse. From its coral reefs in the Caribbean Sea to its tropical jungles in Chiapas and the Yucatan peninsula and its deserts and prairies in the north, Mexico boasts an incredibly rich variety of flora and fauna. Some 574 out of 717 reptile species found in Mexico -- the most in any country -- can only be encountered within its borders. It is home to 502 types of mammals, 290 species of birds, 1,150 varieties of birds and 26,000 classifications of plants. Pronatura, a non-profit organization that works to","Mexico hosts to up to 10 percent of all known species on Earth. It is home to 502 types of mammals, 290 bird species and 26,000 types of plants. Human development"
1,"summarize: Among the targets of U.S. strikes across Syria early Tuesday was the Khorasan Group -- a collection of senior al Qaeda members who have moved into Syria. President Obama called them ""seasoned al Qaeda operatives."" ""Once again, it must be clear to anyone who would plot against America and try to do Americans harm that we will not tolerate safe havens for terrorists who threaten our people,"" Obama said. The strikes targeted ""training camps, an explosives and munitions production facility, a communication building and command and control facilities,"" the military said in a statement. The group was actively plotting against a U.S. homeland target and Western targets, a senior U.S. official told CNN on Tuesday. The United States hoped to surprise the group by mixing strikes against it with strikes against ISIS targets. The official said the group posed an ""imminent"" threat. Another U.S. official later said the threat was not imminent in the sense that there were no known targets or attacks expected in the next few weeks. The plots were believed to be in an advanced stage, the second U.S. official said. There were indications that the","Official: Khorasan plot involving concealed bombs on airplanes ""was just one option"" The threat from the Khorasan Group was not imminent, a U.S"


=== google/pegasus-cnn_dailymail ===

architecture:	pegasus
tokenizer:	PegasusTokenizer

*** TESTING DataLoaders ***



Unnamed: 0,text,target
0,"(CNN) -- Home to up to 10 percent of all known species, Mexico is recognized as one of the most biodiverse regions on the planet. The twin threats of climate change and human encroachment on natural environments are, however, threatening the existence of the country's rich wildlife. And there is a great deal to lose. In the United Nations Environment Program (UNEP) World Conservation Monitoring Centre's list of megadiverse countries Mexico ranks 11th. The list represents a group of 17 countries that harbor the majority of the Earth's species and are therefore considered extremely biodiverse. From its coral reefs in the Caribbean Sea to its tropical jungles in Chiapas and the Yucatan peninsula and its deserts and prairies in the north, Mexico boasts an incredibly rich variety of flora and fauna. Some 574 out of 717 reptile species found in Mexico -- the most in any country -- can only be encountered within its borders. It is home to 502 types of mammals, 290 species of birds, 1,150 varieties of birds and 26,000 classifications of plants. Pronatura, a non-profit organization that works to promote conservation and sustainable development in Mexico, has selected six species which it says symbolize the problems faced by the destruction of nature. ""These are only some of the species which have","Mexico hosts to up to 10 percent of all known species on Earth. It is home to 502 types of mammals, 290 bird species and 26,000 types of plants. Human development and climate change"
1,"It's an international air disaster in a war zone -- a commercial flight with almost 300 people on board shot down in eastern Ukraine. As new details emerge, here is a look at basic questions about the tragedy:. Was the plane shot down? All evidence so far says yes. President Barack Obama declared Friday that a surface-to-air missile blasted the Malaysia Airlines Boeing 777 on Thursday over the Donetsk region of Ukraine near the Russian border. According to a senior American official, a U.S. radar system saw a surface-to-air missile system turn on and track an aircraft right before plane went down. A second system saw a heat signature, which would indicate a missile rising from the ground into the air at the time the airliner was hit, the official explained. Does anyone dispute that? Not at this point. While the Ukrainian government trades accusations of blame with pro-Russian rebels it is fighting in eastern Ukraine and Russia itself, no one has offered evidence of an alternative theory. The plane's debris field indicates a mid-air explosion. Who did it? A preliminary classified U.S. intelligence analysis concludes the rebels most likely fired the missile, according to an American defense official with direct access to the latest information. ""Evidence indicates that the plane","Donetsk rebel official: Plane shot down, but not by us. Malaysian official says the crash site's integrity has been compromised. President Obama says evidence points to a missile strike by pro"


In [None]:
#slow
#hide_input
test_results_df = pd.DataFrame(test_results, columns=['arch', 'tokenizer', 'model_name', 'result', 'error'])
display_df(test_results_df)

Unnamed: 0,arch,tokenizer,model_name,result,error
0,bart,BartTokenizer,facebook/bart-base,PASSED,
1,t5,T5Tokenizer,t5-small,PASSED,
2,pegasus,PegasusTokenizer,google/pegasus-cnn_dailymail,PASSED,


## Cleanup

In [None]:
#hide
from nbdev.export import notebook2script
notebook2script()

Converted 00_utils.ipynb.
Converted 01_data-core.ipynb.
Converted 01a_data-token-classification.ipynb.
Converted 01b_data-question-answering.ipynb.
Converted 01e_data-summarization.ipynb.
Converted 01z_data-language-modeling.ipynb.
Converted 02_modeling-core.ipynb.
Converted 02a_modeling-token-classification.ipynb.
Converted 02b_modeling-question-answering.ipynb.
Converted 02e_modeling-summarization.ipynb.
Converted 02z_modeling-language-modeling.ipynb.
Converted index.ipynb.
