**Text Summarization is divided into two main categories**


1.   Extractive : Selects some most important sentences from the given content ensuring maximum retention of important data.
2.   Abstractive : Generates content on its own, summarize the content in its own words.



In [9]:
text = '''
       Scientists say they have discovered a new species of orangutans on Indonesia’s island of Sumatra.
The population differs in several ways from the two existing orangutan species found in Sumatra and the neighboring island of Borneo.
The orangutans were found inside North Sumatra’s Batang Toru forest, the science publication Current Biology reported.
Researchers named the new species the Tapanuli orangutan. They say the animals are considered a new species because of genetic, skeletal and tooth differences.
Michael Kruetzen is a geneticist with the University of Zurich who has studied the orangutans for several years. He said he was excited to be part of the unusual discovery of a new great ape in the present day. He noted that most great apes are currently considered endangered or severely endangered.
Gorillas, chimpanzees and bonobos also belong to the great ape species.
Orangutan – which means person of the forest in the Indonesian and Malay languages - is the world’s biggest tree-living mammal. The orange-haired animals can move easily among the trees because their arms are longer than their legs. They live more lonely lives than other great apes, spending a lot of time sleeping and eating fruit in the forest.
The new study said fewer than 800 of the newly-described orangutans exist. Their low numbers make the group the most endangered of all the great ape species.
They live within an area covering about 1,000 square kilometers. The population is considered highly vulnerable. That is because the environment which they depend on is greatly threatened by development.
Researchers say if steps are not taken quickly to reduce the current and future threats, the new species could become extinct “within our lifetime.”
Research into the new species began in 2013, when an orangutan protection group in Sumatra found an injured orangutan in an area far away from the other species. The adult male orangutan had been beaten by local villagers and died of his injuries. The complete skull was examined by researchers.
Among the physical differences of the new species are a notably smaller head and frizzier hair. The Tapanuli orangutans also have a different diet and are found only in higher forest areas.
There is no unified international system for recognizing new species. But to be considered, discovery claims at least require publication in a major scientific publication.
Russell Mittermeier is head of the primate specialist group at the International Union for the Conservation of Nature. He called the finding a “remarkable discovery.” He said it puts responsibility on the Indonesian government to help the species survive.
Matthew Nowak is one of the writers of the study. He told the Associated Press that there are three groups of the Tapanuli orangutans that are separated by non-protected land.He said forest land needs to connect the separated groups.
In addition, the writers of the study are recommending that plans for a hydropower center in the area be stopped by the government.
It also recommended that remaining forest in the Sumatran area where the orangutans live be protected.
I’m Bryan Lynn.

        '''

### *Extractive Text Summarization with BERT*

In [1]:
# Installing required libraries
#!pip install transformers==2.2.0
!pip install bert-extractive-summarizer


Collecting bert-extractive-summarizer
  Downloading bert_extractive_summarizer-0.10.1-py3-none-any.whl (25 kB)
Installing collected packages: bert-extractive-summarizer
Successfully installed bert-extractive-summarizer-0.10.1


In [2]:
from summarizer import Summarizer

In [None]:
# simple summarizer
model = Summarizer()

In [6]:
summarized = model(text)



In [8]:
summarized

'Scientists say they have discovered a new species of orangutans on Indonesia’s island of Sumatra. Researchers named the new species the Tapanuli orangutan. They say the animals are considered a new species because of genetic, skeletal and tooth differences. He noted that most great apes are currently considered endangered or severely endangered. Orangutan – which means person of the forest in the Indonesian and Malay languages - is the world’s biggest tree-living mammal. The population is considered highly vulnerable. Russell Mittermeier is head of the primate specialist group at the International Union for the Conservation of Nature.'

In [14]:
# Embedding Retrival
result = model.run_embeddings(text, num_sentences=3)  # Will return (3, N) embedding numpy matrix.



In [15]:
print(result.shape) # N refers to the model of BERT which we are using for Summarization.

(3, 1024)


1. We can pass the parameters like num_of_sentences or ratio of given sentence to total sentences.
2. We can set max_length and min_length, which will accept the sentences within the range of max_length to min_length.

### *Abstractive Summarization with GPT2*

In [16]:
!pip install transformers



In [18]:
from summarizer import TransformerSummarizer

In [None]:
GPT2_model = TransformerSummarizer(transformer_type="GPT2",transformer_model_key="gpt2-medium")

In [32]:
summarized = GPT2_model(text, min_length=100)



In [33]:
summarized

'The population differs in several ways from the two existing orangutan species found in Sumatra and the neighboring island of Borneo. The orangutans were found inside North Sumatra’s Batang Toru forest, the science publication Current Biology reported. Researchers say if steps are not taken quickly to reduce the current and future threats, the new species could become extinct “within our lifetime.”'

### *Abstractive Summarization using XLNet*

In [None]:
!pip install sentencepiece

In [3]:
from summarizer import TransformerSummarizer

In [4]:
model = TransformerSummarizer(transformer_type="XLNet",transformer_model_key="xlnet-base-cased")

spiece.model:   0%|          | 0.00/798k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.38M [00:00<?, ?B/s]

In [7]:
summarized = ''.join(model(text, min_length=60))



In [8]:
summarized

'Scientists say they have discovered a new species of orangutans on Indonesia’s island of Sumatra. They say the animals are considered a new species because of genetic, skeletal and tooth differences. Research into the new species began in 2013, when an orangutan protection group in Sumatra found an injured orangutan in an area far away from the other species. The Tapanuli orangutans also have a different diet and are found only in higher forest areas. In addition, the writers of the study are recommending that plans for a hydropower center in the area be stopped by the government. It also recommended that remaining forest in the Sumatran area where the orangutans live be protected.'