# Intro

In this notebook we try to create extractive summaries using BERT. First we try 
some examples from the documentation to get a better understanding.


**Ressources:**
* https://pypi.org/project/bert-extractive-summarizer/ 
* https://github.com/huggingface/neuralcoref#install-neuralcoref-from-source

# Imports & Installs

In [2]:
#!pip install bert-extractive-summarizer

#!pip install spacy==2.1.3
#!pip install transformers==2.2.2
#!pip install neuralcoref

#!python -m spacy download en_core_web_md

In [3]:
#!python -m spacy download en_core_web_sm

In [None]:
import neuralcoref

100%|██████████| 40155833/40155833 [00:01<00:00, 24560228.60B/s]


In [None]:
from summarizer import Summarizer

model = Summarizer()


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=434.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1344997306.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




# Summarization

In [None]:
text = '''
For mountain lions living in Los Angeles—and yes, mountain lions do manage to survive in the second-largest city in the U.S.—the 101 freeway is a major barrier to their daily routines. The same is true for other wildlife. But plans to build a massive wildlife crossing over a 10-lane stretch of the freeway just north of the city are now in the final phase of design and engineering. The project will be the largest bridge of its kind in the world.

Reconnecting the open space on either side of the freeway is crucial for wildlife. “We know from science what’s going on there, and it’s a little deeper than just that the animals are getting hit by cars,” says Beth Pratt of the National Wildlife Federation, one of several partner organizations working on the project. “They are becoming genetically isolated, because animals cannot move into the small islands of habitat that are created by our freeways.” The situation is most acute for mountain lions, who risk extinction in the area within decades, but other wildlife, from lizards to birds, are also showing a decline in genetic diversity.

Fires fueled by climate change are making the challenges worse, as animals often can’t relocate when their habitat is destroyed, or they can’t directly flee the flames. A mountain lion named P-64, who died because of the Woolsey Fire, is one example. “That cat knew how to move in an urban environment,” Pratt says. “He had actually crossed the 101 using a culvert. But fire came and he could not get out of the burn zone.”
The project has been in planning for around eight years but is moving relatively quickly given the scale, complexity, and cost. “It’s over the busiest freeway, probably, in America, with multiple public agencies,” she says. “These things usually take decades. But I think everybody recognizes that mountain lions are running out of time.” The cost, at $87 million, will be largely funded by private money. If fundraising continues on schedule, the groundbreaking will happen in 2021.

Around the world, other wildlife crossings exist and have been proven to work, though the project will be the first in a dense urban area. “We have something no other crossing has, which is millions of people around it,” says Pratt. “The Kardashians are down the street. We’re building this in the most densely populated metropolitan area in the country, and these crossings, for the most part, have been built in very rural areas. So we have some things we have to mitigate for that they don’t, and two of those are sound and light.” Three hundred thousand to 400,000 cars pass through the area each day; the bridge, 165 feet wide, is designed to keep the crossing as quiet and dark as possible, with vegetation planted to extend to the wild spaces on either side of the freeway.

“We’re saving mountain lions, and we’re reconnecting an ecosystem for all wildlife,” she says. “But we’re also going to have some great model for what others can do in urban areas to get animals across the road.”'''

In [None]:
model(text)

'For mountain lions living in Los Angeles—and yes, mountain lions do manage to survive in the second-largest city in the U.S.—the 101 freeway is a major barrier to their daily routines. But plans to build a massive wildlife crossing over a 10-lane stretch of the freeway just north of the city are now in the final phase of design and engineering. The cost, at $87 million, will be largely funded by private money. Three hundred thousand to 400,000 cars pass through the area each day; the bridge, 165 feet wide, is designed to keep the crossing as quiet and dark as possible, with vegetation planted to extend to the wild spaces on either side of the freeway.'

In [None]:
result = model.run_embeddings(text, ratio=0.2)  # Specified with ratio. 
#result = model.run_embeddings(text, num_sentences=3)  # Will return (3, N) embedding numpy matrix.
#result = model.run_embeddings(text, num_sentences=3, aggregate='mean')  # Will return Mean aggregate over embeddings. 

In [None]:
result = model(text, ratio=0.1)  # Specified with ratio

In [None]:
result

'For mountain lions living in Los Angeles—and yes, mountain lions do manage to survive in the second-largest city in the U.S.—the 101 freeway is a major barrier to their daily routines. Three hundred thousand to 400,000 cars pass through the area each day; the bridge, 165 feet wide, is designed to keep the crossing as quiet and dark as possible, with vegetation planted to extend to the wild spaces on either side of the freeway.'

---


In [None]:
from summarizer import Summarizer
from summarizer.coreference_handler import CoreferenceHandler

handler = CoreferenceHandler(greedyness=0.4) # Float parameter that determines how greedy nueralcoref should be
# How coreference works:
# >>>handler.process('''My sister has a dog. She loves him.''', min_length=2)
# ['My sister has a dog.', 'My sister loves a dog.']

model = Summarizer(sentence_handler=handler)
model(text)

'For mountain lions living in Los Angeles—and yes, mountain lions do manage to survive in the second-largest city in the U.S.—the 101 freeway is a major barrier to their daily routines. Reconnecting the open space on either side of the freeway is crucial for wildlife. We know from science what’s going on there, and it’s a little deeper than just that the animals are getting hit by cars,” says Beth Pratt of the National Wildlife Federation, one of several partner organizations working on The project. The cost, at $87 million, will be largely funded by private money. Around the world, other wildlife crossings exist and have been proven to work, though the project will be the first in a dense urban area.'

In [None]:
from transformers import *

# Load model, model config and tokenizer via Transformers
custom_config = AutoConfig.from_pretrained('allenai/scibert_scivocab_uncased')
custom_config.output_hidden_states=True # we always have to set output_hidden_states=True in model config
custom_tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased')
custom_model = AutoModel.from_pretrained('allenai/scibert_scivocab_uncased', config=custom_config)

from summarizer import Summarizer

model = Summarizer(custom_model=custom_model, custom_tokenizer=custom_tokenizer)
model(text, ratio=0.1)


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=385.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=227845.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=442221694.0, style=ProgressStyle(descri…




'For mountain lions living in Los Angeles—and yes, mountain lions do manage to survive in the second-largest city in the U.S.—the 101 freeway is a major barrier to their daily routines. We know from science what’s going on there, and it’s a little deeper than just that the animals are getting hit by cars,” says Beth Pratt of the National Wildlife Federation, one of several partner organizations working on the project. “ We’re building this in the most densely populated metropolitan area in the country, and these crossings, for the most part, have been built in very rural areas.'

---


In [None]:
# Load your usual SpaCy model (one of SpaCy English models)
import spacy
nlp = spacy.load('en_core_web_sm')

# Add neural coref to SpaCy's pipe
import neuralcoref
neuralcoref.add_to_pipe(nlp)

# You're done. You can now use NeuralCoref as you usually manipulate a SpaCy document annotations.
#doc = nlp(u'My sister has a dog. She loves him.')
doc = nlp(text)

doc._.has_coref
doc._.coref_clusters

[mountain lions: [mountain lions, their],
 the second-largest city in the U.S.—the 101 freeway: [the second-largest city in the U.S.—the 101 freeway, the city],
 the U.S.—the 101 freeway: [the U.S.—the 101 freeway, the freeway, the freeway],
 The project: [The project, its, the project, The project, It, the project],
 science: [science, it],
 Beth Pratt of the National Wildlife Federation, one of several partner organizations working on the project: [Beth Pratt of the National Wildlife Federation, one of several partner organizations working on the project, They],
 animals: [animals, their, they],
 That cat: [That cat, He, he],
 Pratt: [Pratt, Pratt],
 America: [America, the country],
 other wildlife crossings: [other wildlife crossings, these crossings],
 no other crossing: [no other crossing, it, the crossing],
 We: [We, we, we],
 the most densely populated metropolitan area in the country: [the most densely populated metropolitan area in the country, the area],
 We: [We, we, we]]

In [None]:
doc


For mountain lions living in Los Angeles—and yes, mountain lions do manage to survive in the second-largest city in the U.S.—the 101 freeway is a major barrier to their daily routines. The same is true for other wildlife. But plans to build a massive wildlife crossing over a 10-lane stretch of the freeway just north of the city are now in the final phase of design and engineering. The project will be the largest bridge of its kind in the world.

Reconnecting the open space on either side of the freeway is crucial for wildlife. “We know from science what’s going on there, and it’s a little deeper than just that the animals are getting hit by cars,” says Beth Pratt of the National Wildlife Federation, one of several partner organizations working on the project. “They are becoming genetically isolated, because animals cannot move into the small islands of habitat that are created by our freeways.” The situation is most acute for mountain lions, who risk extinction in the area within deca