# What is spaCy?
### spaCy is a free, open-source advanced natural language processing library, written in the programming languages Python and Cython. spaCy mainly used in the development of production software and also supports deep learning workflow via statistical models of PyTorch and TensorFlow.



# Why spaCy?
### spaCy provides a fast and accurate syntactic analysis, named entity recognition and ready access to word vectors. We can use the default word vectors or replace them with any you have. spaCy also offers tokenization, sentence boundary detection, POS tagging, syntactic parsing, integrated word vectors, and alignment into the original string with high accuracy.



# Text summarization can broadly be divided into two categories — Extractive Summarization and Abstractive Summarization.

   ### 1. Extractive Summarization: These methods rely on extracting several parts, such as phrases and sentences, from a piece of text and stack them together to create a summary. Therefore, identifying the right sentences for summarization is of utmost importance in an extractive method.
   ### 2. Abstractive Summarization: These methods use advanced NLP techniques to generate an entirely new summary. Some parts of this summary may not even appear in the original text.


In [1]:
!pip install spacy

Collecting spacy
  Downloading spacy-3.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 14.9 MB/s eta 0:00:01
[?25hCollecting murmurhash<1.1.0,>=0.28.0
  Downloading murmurhash-1.0.7-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21 kB)
Collecting pathy>=0.3.5
  Downloading pathy-0.6.1-py3-none-any.whl (42 kB)
[K     |████████████████████████████████| 42 kB 2.5 MB/s  eta 0:00:01
Collecting spacy-loggers<2.0.0,>=1.0.0
  Downloading spacy_loggers-1.0.2-py3-none-any.whl (7.2 kB)
Collecting spacy-legacy<3.1.0,>=3.0.9
  Downloading spacy_legacy-3.0.9-py2.py3-none-any.whl (20 kB)
Collecting typer<0.5.0,>=0.3.0
  Downloading typer-0.4.1-py3-none-any.whl (27 kB)
Collecting catalogue<2.1.0,>=2.0.6
  Downloading catalogue-2.0.7-py3-none-any.whl (17 kB)
Collecting wasabi<1.1.0,>=0.9.1
  Downloading wasabi-0.9.1-py3-none-any.whl (26 kB)
Collecting langcodes<4.0.0,>=3.2.0
  Dow

In [2]:
# Importing the required packages

import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
import numpy as np

In [3]:
# To visualize the stop words

stopWords = list(STOP_WORDS)
print(stopWords)

['whereupon', 'fifty', 'anywhere', 'most', 'their', 'ca', "'ve", 'him', 'once', 'any', 'over', 'some', 'may', 'only', 'latter', 'almost', 'always', 'however', 'is', 'using', 'in', 'i', 'eleven', 'whose', 'next', 'therein', 'last', 'afterwards', 'he', 'somewhere', 'also', 'formerly', 'cannot', 'give', 'myself', 'call', 'further', 'fifteen', 'anyway', 'or', 'nevertheless', 'for', 'during', 'hereafter', 'to', 'be', 'thereby', 'often', 'three', 'hereby', 'empty', 'she', 'that', 're', 'with', 'have', 'hence', '’ll', 'upon', 'get', 'sometimes', 'here', 'less', 'yourselves', 'one', 'my', 'go', 'thereafter', 'alone', 'latterly', 'moreover', 'along', "'s", '‘re', '’re', 'as', 'other', 'not', 'various', 'were', 'might', 'twelve', 'when', 'others', 'noone', 'name', 'it', 'about', 'more', 'even', 'least', 'well', 'somehow', 'thence', 'besides', 'his', 'at', 'could', 'whenever', 'elsewhere', 'again', 'against', 'used', 'full', 'becomes', 'five', 'must', 'around', 'those', '‘ve', 'because', 'us', 'w

In [8]:
!spacy download en_core_web_sm

Collecting en-core-web-sm==3.3.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0-py3-none-any.whl (12.8 MB)
[K     |████████████████████████████████| 12.8 MB 21.1 MB/s eta 0:00:01
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.3.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [9]:
# Loading english language... (3 different packages are available --small--medium--large). We are loading small packages.

nlp = spacy.load('en_core_web_sm')

In [10]:
# Passing the input data

doc = """Machine learning (ML) and artificial intelligence (AI) are becoming dominant problem-solving techniques in many areas of research and industry, not least because of the recent successes of deep learning (DL). However, the equation AI=ML=DL, as recently suggested in the news, blogs, and media, falls too short. These fields share the same fundamental hypotheses: computation is a useful way to model intelligent behavior in machines. What kind of computation and how to program it? This is not the right question. Computation neither rules out search, logical, and probabilistic techniques, nor (deep) (un)supervised and reinforcement learning methods, among others, as computational models do include all of them. They complement each other, and the next breakthrough lies not only in pushing each of them but also in combining them.

Big Data is no fad. The world is growing at an exponential rate and so is the size of the data collected across the globe. Data is becoming more meaningful and contextually relevant, breaking new grounds for machine learning (ML), in particular for deep learning (DL) and artificial intelligence (AI), moving them out of research labs into production (Jordan and Mitchell, 2015). The problem has shifted from collecting massive amounts of data to understanding it—turning it into knowledge, conclusions, and actions. Multiple research disciplines, from cognitive sciences to biology, finance, physics, and social sciences, as well as many companies believe that data-driven and “intelligent” solutions are necessary to solve many of their key problems. High-throughput genomic and proteomic experiments can be used to enable personalized medicine. Large data sets of search queries can be used to improve information retrieval. Historical climate data can be used to understand global warming and to better predict weather. Large amounts of sensor readings and hyperspectral images of plants can be used to identify drought conditions and to gain insights into when and how stress impacts plant growth and development and in turn how to counterattack the problem of world hunger. Game data can turn pixels into actions within video games, while observational data can help enable robots to understand complex and unstructured environments and to learn manipulation skills.

However, is AI, ML, and DL really synonymous, as recently suggested in the news, blogs, and media? For example, when AlphaGo (Silver et al., 2016) defeated South Korean Master Lee Se-dol in the board game Go in 2016, the terms AI, ML, and DL were used by the media to describe how AlphaGo won. In addition to this, even Gartner's list (Panetta, 2017) of top 10 Strategic Trends for 2018 places (narrow) AI at the very top, specifying it as “consisting of highly scoped machine-learning solutions that target a specific task."""

print(doc)

Machine learning (ML) and artificial intelligence (AI) are becoming dominant problem-solving techniques in many areas of research and industry, not least because of the recent successes of deep learning (DL). However, the equation AI=ML=DL, as recently suggested in the news, blogs, and media, falls too short. These fields share the same fundamental hypotheses: computation is a useful way to model intelligent behavior in machines. What kind of computation and how to program it? This is not the right question. Computation neither rules out search, logical, and probabilistic techniques, nor (deep) (un)supervised and reinforcement learning methods, among others, as computational models do include all of them. They complement each other, and the next breakthrough lies not only in pushing each of them but also in combining them.

Big Data is no fad. The world is growing at an exponential rate and so is the size of the data collected across the globe. Data is becoming more meaningful and cont

In [11]:
# We are passing the input data to spacy 

docs = nlp(doc)
print(doc)

Machine learning (ML) and artificial intelligence (AI) are becoming dominant problem-solving techniques in many areas of research and industry, not least because of the recent successes of deep learning (DL). However, the equation AI=ML=DL, as recently suggested in the news, blogs, and media, falls too short. These fields share the same fundamental hypotheses: computation is a useful way to model intelligent behavior in machines. What kind of computation and how to program it? This is not the right question. Computation neither rules out search, logical, and probabilistic techniques, nor (deep) (un)supervised and reinforcement learning methods, among others, as computational models do include all of them. They complement each other, and the next breakthrough lies not only in pushing each of them but also in combining them.

Big Data is no fad. The world is growing at an exponential rate and so is the size of the data collected across the globe. Data is becoming more meaningful and cont

In [12]:
# Word tokenization is performed

tokens = [i.text for i in docs]
print(tokens)

['Machine', 'learning', '(', 'ML', ')', 'and', 'artificial', 'intelligence', '(', 'AI', ')', 'are', 'becoming', 'dominant', 'problem', '-', 'solving', 'techniques', 'in', 'many', 'areas', 'of', 'research', 'and', 'industry', ',', 'not', 'least', 'because', 'of', 'the', 'recent', 'successes', 'of', 'deep', 'learning', '(', 'DL', ')', '.', 'However', ',', 'the', 'equation', 'AI', '=', 'ML', '=', 'DL', ',', 'as', 'recently', 'suggested', 'in', 'the', 'news', ',', 'blogs', ',', 'and', 'media', ',', 'falls', 'too', 'short', '.', 'These', 'fields', 'share', 'the', 'same', 'fundamental', 'hypotheses', ':', 'computation', 'is', 'a', 'useful', 'way', 'to', 'model', 'intelligent', 'behavior', 'in', 'machines', '.', 'What', 'kind', 'of', 'computation', 'and', 'how', 'to', 'program', 'it', '?', 'This', 'is', 'not', 'the', 'right', 'question', '.', 'Computation', 'neither', 'rules', 'out', 'search', ',', 'logical', ',', 'and', 'probabilistic', 'techniques', ',', 'nor', '(', 'deep', ')', '(', 'un)su

In [13]:
# Punctuation

punctuation = punctuation + '\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

In [14]:
word_frequencies = {}

for word in docs:
    if word.text.lower() not in stopWords:
        if word.text.lower() not in punctuation:
            if word.text not in word_frequencies.keys():
                word_frequencies[word.text] = 1
            else:
                word_frequencies[word.text] += 1
                
                
print(word_frequencies)

{'Machine': 1, 'learning': 6, 'ML': 5, 'artificial': 2, 'intelligence': 2, 'AI': 6, 'dominant': 1, 'problem': 3, 'solving': 1, 'techniques': 2, 'areas': 1, 'research': 3, 'industry': 1, 'recent': 1, 'successes': 1, 'deep': 3, 'DL': 5, 'equation': 1, 'recently': 2, 'suggested': 2, 'news': 2, 'blogs': 2, 'media': 3, 'falls': 1, 'short': 1, 'fields': 1, 'share': 1, 'fundamental': 1, 'hypotheses': 1, 'computation': 2, 'useful': 1, 'way': 1, 'model': 1, 'intelligent': 2, 'behavior': 1, 'machines': 1, 'kind': 1, 'program': 1, 'right': 1, 'question': 1, 'Computation': 1, 'rules': 1, 'search': 2, 'logical': 1, 'probabilistic': 1, 'un)supervised': 1, 'reinforcement': 1, 'methods': 1, 'computational': 1, 'models': 1, 'include': 1, 'complement': 1, 'breakthrough': 1, 'lies': 1, 'pushing': 1, 'combining': 1, '\n\n': 2, 'Big': 1, 'Data': 2, 'fad': 1, 'world': 2, 'growing': 1, 'exponential': 1, 'rate': 1, 'size': 1, 'data': 7, 'collected': 1, 'globe': 1, 'meaningful': 1, 'contextually': 1, 'relevant

In [15]:
# Taking max frequency for normalization

maxFrequency = max(word_frequencies.values())
maxFrequency

7

In [16]:
# Normalizing the data

for i in word_frequencies.keys():
    word_frequencies[i] = word_frequencies[i]/maxFrequency
       

In [17]:
print(word_frequencies)

{'Machine': 0.14285714285714285, 'learning': 0.8571428571428571, 'ML': 0.7142857142857143, 'artificial': 0.2857142857142857, 'intelligence': 0.2857142857142857, 'AI': 0.8571428571428571, 'dominant': 0.14285714285714285, 'problem': 0.42857142857142855, 'solving': 0.14285714285714285, 'techniques': 0.2857142857142857, 'areas': 0.14285714285714285, 'research': 0.42857142857142855, 'industry': 0.14285714285714285, 'recent': 0.14285714285714285, 'successes': 0.14285714285714285, 'deep': 0.42857142857142855, 'DL': 0.7142857142857143, 'equation': 0.14285714285714285, 'recently': 0.2857142857142857, 'suggested': 0.2857142857142857, 'news': 0.2857142857142857, 'blogs': 0.2857142857142857, 'media': 0.42857142857142855, 'falls': 0.14285714285714285, 'short': 0.14285714285714285, 'fields': 0.14285714285714285, 'share': 0.14285714285714285, 'fundamental': 0.14285714285714285, 'hypotheses': 0.14285714285714285, 'computation': 0.2857142857142857, 'useful': 0.14285714285714285, 'way': 0.14285714285714

In [18]:
sent_tokenz = [sent for sent in docs.sents]
print(sent_tokenz) 

[Machine learning (ML) and artificial intelligence (AI) are becoming dominant problem-solving techniques in many areas of research and industry, not least because of the recent successes of deep learning (DL)., However, the equation AI=ML=DL, as recently suggested in the news, blogs, and media, falls too short., These fields share the same fundamental hypotheses: computation is a useful way to model intelligent behavior in machines., What kind of computation and how to program it?, This is not the right question., Computation neither rules out search, logical, and probabilistic techniques, nor (deep) (un)supervised and reinforcement learning methods, among others, as computational models do include all of them., They complement each other, and the next breakthrough lies not only in pushing each of them but also in combining them., 

Big Data is no fad., The world is growing at an exponential rate and so is the size of the data collected across the globe., Data is becoming more meaningf

In [19]:
sentence_score = {}

for sent in sent_tokenz:
    for word in sent:
        if word.text.lower() in word_frequencies.keys():
            if sent not in sentence_score.keys():
                sentence_score[sent] = word_frequencies[word.text.lower()]
            else:
                sentence_score[sent] += word_frequencies[word.text.lower()]
                
                
sentence_score
    

{Machine learning (ML) and artificial intelligence (AI) are becoming dominant problem-solving techniques in many areas of research and industry, not least because of the recent successes of deep learning (DL).: 4.999999999999998,
 However, the equation AI=ML=DL, as recently suggested in the news, blogs, and media, falls too short.: 1.9999999999999998,
 These fields share the same fundamental hypotheses: computation is a useful way to model intelligent behavior in machines.: 1.8571428571428568,
 What kind of computation and how to program it?: 0.5714285714285714,
 This is not the right question.: 0.2857142857142857,
 Computation neither rules out search, logical, and probabilistic techniques, nor (deep) (un)supervised and reinforcement learning methods, among others, as computational models do include all of them.: 3.428571428571428,
 They complement each other, and the next breakthrough lies not only in pushing each of them but also in combining them.: 0.7142857142857142,
 
 
 Big Data

In [20]:
from heapq import nlargest

In [21]:
select_len = int(len(sent_tokenz)*0.3)
select_len

6

In [22]:
summary = nlargest(select_len,sentence_score,sentence_score.get)
summary

[Data is becoming more meaningful and contextually relevant, breaking new grounds for machine learning (ML), in particular for deep learning (DL) and artificial intelligence (AI), moving them out of research labs into production (Jordan and Mitchell, 2015).,
 Game data can turn pixels into actions within video games, while observational data can help enable robots to understand complex and unstructured environments and to learn manipulation skills.,
 Machine learning (ML) and artificial intelligence (AI) are becoming dominant problem-solving techniques in many areas of research and industry, not least because of the recent successes of deep learning (DL).,
 Multiple research disciplines, from cognitive sciences to biology, finance, physics, and social sciences, as well as many companies believe that data-driven and “intelligent” solutions are necessary to solve many of their key problems.,
 In addition to this, even Gartner's list (Panetta, 2017) of top 10 Strategic Trends for 2018 pla

In [23]:
summary = [word.text for word in summary]
summary

['Data is becoming more meaningful and contextually relevant, breaking new grounds for machine learning (ML), in particular for deep learning (DL) and artificial intelligence (AI), moving them out of research labs into production (Jordan and Mitchell, 2015).',
 'Game data can turn pixels into actions within video games, while observational data can help enable robots to understand complex and unstructured environments and to learn manipulation skills.',
 'Machine learning (ML) and artificial intelligence (AI) are becoming dominant problem-solving techniques in many areas of research and industry, not least because of the recent successes of deep learning (DL).',
 'Multiple research disciplines, from cognitive sciences to biology, finance, physics, and social sciences, as well as many companies believe that data-driven and “intelligent” solutions are necessary to solve many of their key problems.',
 "In addition to this, even Gartner's list (Panetta, 2017) of top 10 Strategic Trends for

In [24]:
summary = " ".join(summary)
summary

"Data is becoming more meaningful and contextually relevant, breaking new grounds for machine learning (ML), in particular for deep learning (DL) and artificial intelligence (AI), moving them out of research labs into production (Jordan and Mitchell, 2015). Game data can turn pixels into actions within video games, while observational data can help enable robots to understand complex and unstructured environments and to learn manipulation skills. Machine learning (ML) and artificial intelligence (AI) are becoming dominant problem-solving techniques in many areas of research and industry, not least because of the recent successes of deep learning (DL). Multiple research disciplines, from cognitive sciences to biology, finance, physics, and social sciences, as well as many companies believe that data-driven and “intelligent” solutions are necessary to solve many of their key problems. In addition to this, even Gartner's list (Panetta, 2017) of top 10 Strategic Trends for 2018 places (nar

In [26]:
len(doc),len(summary)

(2834, 1381)