In [1]:
# Installing and importing required libraries
!pip install spacy




[notice] A new release of pip available: 22.2.1 -> 22.2.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
from heapq import nlargest

In [3]:
# This is the data I'm using in this task
Text_Data = """Netflix and Animal Logic are excited to announce today that Netflix plans to acquire the Australian animation studio.* This acquisition will support Netflix’s ambitious animated film slate, building on films like Academy Award-nominated Over the Moon, Academy Award-nominated Klaus and the recently released The Sea Beast.Animal Logic has been producing award-winning design, visual effects and animation for over 30 years. Headquartered in Sydney, Animal Logic set up a second studio in Vancouver, Canada in 2015 and has worked on Hollywood blockbusters including Happy Feet, Legend of the Guardians: The Owls of Ga’Hoole, The LEGO Movies and Peter Rabbit 1 & 2, alongside a catalogue of amazing visual effects work including The Matrix, Moulin Rouge!, 300, and The Great Gatsby.The announcement builds on an already strong partnership between the two companies, with a full slate of films across Animal Logic's Sydney and Vancouver studios including  The Magician’s Elephant, directed by Wendy Rogers, and the recently announced The Shrinking of the Treehorns, directed by Ron Howard, for Netflix.“Netflix has been investing in animation over the past few years and this furthers our commitment to building a world-class animation studio,” said Amy Reinhard, Netflix Vice President of Studio Operations. “Animal Logic is a leading animation studio with innovative technology that will strengthen our existing business and increase our long-term capacity in the animation space, so that we can better entertain our members around the world.”Working with Animal Logic will accelerate the buildout of Netflix’s end-to-end animation production capabilities. The Animal Logic and Netflix Animation teams together will create a global creative production team and an animation studio that will produce some of Netflix’s largest animated film titles. Netflix will continue to work with many other studios around the world for animated series and film needs. Led by CEO and co-founder Zareh Nalbandian, the Animal Logic teams and leadership will remain operating under the Animal Logic brand and will fulfil production of existing and ongoing commitments and continue to collaborate and work with longstanding studio partners. "After 30 years of producing great work with great people, this is the perfect next chapter for Animal Logic," said Nalbandian. "Our values and aspirations could not be more aligned with Netflix, in working with diverse content makers, producing innovative and engaging stories for audiences around the world. Our collective experience and talent will open new doors for all our teams and will empower a new level of creativity in animation."“The strength of our partnership across a number of projects is testament to our shared creative vision," said Animal Logic’s COO Sharon Taylor. "Solidifying our future together felt like a mutually beneficial, natural progression and I am so excited to continue to build on our success together."Animal Logic Entertainment, the producers behind Legend of the Guardians: The Owls of Ga’hoole, Peter Rabbit 1 & 2 and The Shrinking of the Treehorns, will remain independent and continue to collaborate with iconic filmmakers and major studio partners to develop and produce elevated family entertainment with universal appeal. Netflix’s fast-growing original slate of animated features and shorts includes Academy Award-nominated Robin, Robin,Academy Award-nominated Klaus,Kris Pearn’s The Willoughbys,Academy Award-nominated Over the Moon, Back to the Outback directed by Clare Knight and Harry Cripps, Apollo 10 ½: A Space Age Childhood from Richard Linklater, and the recently released The Sea Beast; as well as upcoming releases including Henry Selick’s Wendell & Wild, Nora Twomey’s My Father’s Dragon, Guillermo del Toro’s Pinocchio, Wendy Rogers’ The Magician’s Elephant, and an Aardman sequel to Chicken Run.  Recent animated film acquisitions by Netflix include the Academy Award-nominated The Mitchells vs. The Machines, Vivo!, Spongebob: Sponge on the Run (ex North America), and Wish Dragon. """

In [4]:
# Having a look ate the stop words which spacy has to offer us
stop_words = list(STOP_WORDS)
stop_words

['off',
 'whenever',
 'from',
 'twelve',
 'this',
 'its',
 'another',
 'third',
 '‘d',
 'herself',
 'due',
 '‘s',
 'anywhere',
 'now',
 'would',
 'an',
 'whither',
 'full',
 'will',
 'wherever',
 'herein',
 'thereupon',
 'whereas',
 'please',
 'mostly',
 'each',
 'nevertheless',
 'about',
 'latterly',
 'doing',
 'various',
 'my',
 'while',
 'nowhere',
 '‘m',
 'behind',
 'before',
 'get',
 'bottom',
 'nobody',
 'some',
 'further',
 'therefore',
 'on',
 'without',
 'ca',
 'same',
 'a',
 'perhaps',
 'be',
 'whatever',
 'could',
 'since',
 'elsewhere',
 '‘ll',
 'do',
 'down',
 'several',
 '’m',
 'yourself',
 'any',
 'together',
 'them',
 'fifty',
 'anyone',
 'or',
 'alone',
 'whence',
 'noone',
 'quite',
 'been',
 'put',
 'whether',
 'last',
 'so',
 'least',
 "'s",
 'next',
 'make',
 '’ve',
 'although',
 'thru',
 'beside',
 'made',
 'is',
 'where',
 'beforehand',
 'amongst',
 'over',
 'me',
 'himself',
 'between',
 'someone',
 'toward',
 'hereafter',
 'many',
 'ever',
 'something',
 'throu

In [5]:
model = spacy.load("en_core_web_sm")
#en_core_web_sm is a small english pipeline trained on written text data

In [6]:
data = model(Text_Data)

In [7]:
tokens = []
#iterating through words in data and adding thame to tokens list
for token in data:
    tokens.append(token.text)

In [8]:
tokens

['Netflix',
 'and',
 'Animal',
 'Logic',
 'are',
 'excited',
 'to',
 'announce',
 'today',
 'that',
 'Netflix',
 'plans',
 'to',
 'acquire',
 'the',
 'Australian',
 'animation',
 'studio',
 '.',
 '*',
 'This',
 'acquisition',
 'will',
 'support',
 'Netflix',
 '’s',
 'ambitious',
 'animated',
 'film',
 'slate',
 ',',
 'building',
 'on',
 'films',
 'like',
 'Academy',
 'Award',
 '-',
 'nominated',
 'Over',
 'the',
 'Moon',
 ',',
 'Academy',
 'Award',
 '-',
 'nominated',
 'Klaus',
 'and',
 'the',
 'recently',
 'released',
 'The',
 'Sea',
 'Beast',
 '.',
 'Animal',
 'Logic',
 'has',
 'been',
 'producing',
 'award',
 '-',
 'winning',
 'design',
 ',',
 'visual',
 'effects',
 'and',
 'animation',
 'for',
 'over',
 '30',
 'years',
 '.',
 'Headquartered',
 'in',
 'Sydney',
 ',',
 'Animal',
 'Logic',
 'set',
 'up',
 'a',
 'second',
 'studio',
 'in',
 'Vancouver',
 ',',
 'Canada',
 'in',
 '2015',
 'and',
 'has',
 'worked',
 'on',
 'Hollywood',
 'blockbusters',
 'including',
 'Happy',
 'Feet',
 ',

In [9]:
# Having a look at what punctuation pacakge has to offer us
punctuation 

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [10]:
#Since some of the text data also contains line spaces("\n") in it, 
#let us add line space to our punctuation package which helps us in cleaning the data.
punctuation = punctuation + '\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

In [11]:
frequency = {}
#iterating through words in our data
for word in data:
    #checking if the word is in stopwords set or not
    if word.text.lower() not in stop_words:
        #checking if the word is in punctuation package or not
        if word.text.lower() not in punctuation:
            if word.text not in frequency.keys():
                #initializing frequency count of the word with 1 if it is not already present in frquency set
                frequency[word.text] = 1
            else:
                #incrementing the frequency by 1 if the word is already present in the frequency set
                frequency[word.text] += 1

In [12]:
# Now we can see the frequency of the words apart from stop words and punctuations in our data
frequency

{'Netflix': 12,
 'Animal': 11,
 'Logic': 12,
 'excited': 2,
 'announce': 1,
 'today': 1,
 'plans': 1,
 'acquire': 1,
 'Australian': 1,
 'animation': 9,
 'studio': 7,
 'acquisition': 1,
 'support': 1,
 'ambitious': 1,
 'animated': 5,
 'film': 4,
 'slate': 3,
 'building': 2,
 'films': 2,
 'like': 2,
 'Academy': 6,
 'Award': 6,
 'nominated': 6,
 'Moon': 2,
 'Klaus': 2,
 'recently': 3,
 'released': 2,
 'Sea': 2,
 'Beast': 2,
 'producing': 3,
 'award': 1,
 'winning': 1,
 'design': 1,
 'visual': 2,
 'effects': 2,
 '30': 2,
 'years': 3,
 'Headquartered': 1,
 'Sydney': 2,
 'set': 1,
 'second': 1,
 'Vancouver': 2,
 'Canada': 1,
 '2015': 1,
 'worked': 1,
 'Hollywood': 1,
 'blockbusters': 1,
 'including': 4,
 'Happy': 1,
 'Feet': 1,
 'Legend': 2,
 'Guardians': 2,
 'Owls': 2,
 'Ga’Hoole': 1,
 'LEGO': 1,
 'Movies': 1,
 'Peter': 2,
 'Rabbit': 2,
 '1': 2,
 '2': 2,
 'alongside': 1,
 'catalogue': 1,
 'amazing': 1,
 'work': 4,
 'Matrix': 1,
 'Moulin': 1,
 'Rouge': 1,
 '300': 1,
 'Great': 1,
 'Gatsby': 1

In [13]:
# Checking out the highest frequency
max_frequency = max(frequency.values())
print(max_frequency)

12


In [14]:
#normalizing the frequencies(making all the frequencies fall between 0 and 1)by dividing all the frequencies by higest frequency
for word in frequency.keys():
    frequency[word] = frequency[word]/max_frequency

In [15]:
frequency

{'Netflix': 1.0,
 'Animal': 0.9166666666666666,
 'Logic': 1.0,
 'excited': 0.16666666666666666,
 'announce': 0.08333333333333333,
 'today': 0.08333333333333333,
 'plans': 0.08333333333333333,
 'acquire': 0.08333333333333333,
 'Australian': 0.08333333333333333,
 'animation': 0.75,
 'studio': 0.5833333333333334,
 'acquisition': 0.08333333333333333,
 'support': 0.08333333333333333,
 'ambitious': 0.08333333333333333,
 'animated': 0.4166666666666667,
 'film': 0.3333333333333333,
 'slate': 0.25,
 'building': 0.16666666666666666,
 'films': 0.16666666666666666,
 'like': 0.16666666666666666,
 'Academy': 0.5,
 'Award': 0.5,
 'nominated': 0.5,
 'Moon': 0.16666666666666666,
 'Klaus': 0.16666666666666666,
 'recently': 0.25,
 'released': 0.16666666666666666,
 'Sea': 0.16666666666666666,
 'Beast': 0.16666666666666666,
 'producing': 0.25,
 'award': 0.08333333333333333,
 'winning': 0.08333333333333333,
 'design': 0.08333333333333333,
 'visual': 0.16666666666666666,
 'effects': 0.16666666666666666,
 '30

In [16]:
sentence_tokens = []
#iterating through sentences in our text data and making a list of them
for sent in data.sents:
    sentence_tokens.append(sent)

In [17]:
sentence_score = {}
#iterating thorugh tokenized sentences in sentence_tokens list
for sent in sentence_tokens:
    #iterating through words in a sentence
    for word in sent:
        #checking if the word is in frequency list
        if word.text.lower() in frequency.keys():
            if sent not in sentence_score.keys():
                #initializing the sentence score with value equivalent to the frequency of the word
                sentence_score[sent] = frequency[word.text.lower()]
            else:
                #incrementing the sentence score with value equivalent to the frequency of the word
                sentence_score[sent] += frequency[word.text.lower()]

In [18]:
# Having a look ate the sentence scores
# Sentence score is the sum of frequencies of all the words in that sentence(apart from stop words and punctuations)
sentence_score

{Netflix and Animal Logic are excited to announce today that Netflix plans to acquire the Australian animation studio.: 1.8333333333333335,
 This acquisition will support Netflix’s ambitious animated film slate, building on films like Academy Award-nominated Over the Moon, Academy Award-nominated Klaus and the recently released The Sea Beast.: 3.3333333333333335,
 Animal Logic has been producing award-winning design, visual effects and animation for over 30 years.: 2.0,
 Headquartered in Sydney, Animal Logic set up a second studio in Vancouver, Canada in 2015 and has worked on Hollywood blockbusters including Happy Feet, Legend of the Guardians: The Owls of Ga’Hoole, The LEGO Movies and Peter Rabbit 1 & 2, alongside a catalogue of amazing visual effects work including The Matrix, Moulin Rouge!, 300, and The Great Gatsby.: 3.1666666666666665,
 The announcement builds on an already strong partnership between the two companies, with a full slate of films across Animal Logic's Sydney and V

In [19]:
sentence_score.values()

dict_values([1.8333333333333335, 3.3333333333333335, 2.0, 3.1666666666666665, 2.4166666666666665, 4.166666666666667, 3.833333333333335, 1.5833333333333333, 4.083333333333333, 2.083333333333333, 3.166666666666667, 1.833333333333333, 1.583333333333333, 2.0, 1.0833333333333333, 1.333333333333333, 0.16666666666666666, 2.7500000000000004, 4.333333333333333, 0.25, 1.833333333333333])

In [20]:
#assigining the summary length(30% of original text) to a variable named summary_length 
summary_length = int(len(sentence_score)*0.3)
summary_length

6

In [21]:
#As our summary_length is 6, the below line of code takes the top 6 sentences from the text 
#with highest sentence scores, and then gives it to us in the form of list elements.
summary = nlargest(summary_length,sentence_score,key=sentence_score.get)

In [22]:
# Now we need to convert the elements in the list into text form
summary = [word.text for word in summary]
summary

['Netflix’s fast-growing original slate of animated features and shorts includes Academy Award-nominated Robin, Robin,Academy Award-nominated Klaus,Kris Pearn’s The Willoughbys,Academy Award-nominated Over the Moon, Back to the Outback directed by Clare Knight and Harry Cripps, Apollo 10 ½: A Space Age Childhood from Richard Linklater, and the recently released The Sea Beast; as well as upcoming releases including Henry Selick’s Wendell & Wild, Nora Twomey’s',
 '“Netflix has been investing in animation over the past few years and this furthers our commitment to building a world-class animation studio,” said Amy Reinhard, Netflix Vice President of Studio Operations.',
 'The Animal Logic and Netflix Animation teams together will create a global creative production team and an animation studio that will produce some of Netflix’s largest animated film titles.',
 '“Animal Logic is a leading animation studio with innovative technology that will strengthen our existing business and increase o

In [23]:
summary = ' '.join(summary)
summary
# This gives our final summary

'Netflix’s fast-growing original slate of animated features and shorts includes Academy Award-nominated Robin, Robin,Academy Award-nominated Klaus,Kris Pearn’s The Willoughbys,Academy Award-nominated Over the Moon, Back to the Outback directed by Clare Knight and Harry Cripps, Apollo 10 ½: A Space Age Childhood from Richard Linklater, and the recently released The Sea Beast; as well as upcoming releases including Henry Selick’s Wendell & Wild, Nora Twomey’s “Netflix has been investing in animation over the past few years and this furthers our commitment to building a world-class animation studio,” said Amy Reinhard, Netflix Vice President of Studio Operations. The Animal Logic and Netflix Animation teams together will create a global creative production team and an animation studio that will produce some of Netflix’s largest animated film titles. “Animal Logic is a leading animation studio with innovative technology that will strengthen our existing business and increase our long-term 

**Note that this is an extractive text summarization. This means that the summary sentences are directly extracted from our data without any modifications, due to which you may find it a bit lengthy**