# **NLP Project**
(Submitted by: Harshita Bhambhani)

## **Text Summarization using NLP**

In [10]:
# Import the spaCy library for natural language processing
import spacy

# Load the file
with open("text for nlp.txt", "r") as file:
    # Read the contents of the file and store it in a variable
    text = file.read()

print(text)


"""Natural Language Processing (NLP) stands at the forefront of artificial intelligence, revolutionizing the interaction between machines and human language. Rooted in the convergence of computer science, linguistics, and cognitive psychology, NLP equips computers with the ability to comprehend, interpret, and respond to natural language in a contextually aware and semantically accurate manner. Its core components, including tokenization, part-of-speech tagging, named entity recognition, parsing, and sentiment analysis, form the building blocks for various applications. From the conversational capabilities of chatbots and virtual assistants to language translation, information extraction, text summarization, and speech recognition, NLP permeates diverse sectors, enhancing user experiences and extracting valuable insights from unstructured data. Recent strides, exemplified by models like OpenAI GPT-3, showcase the prowess of deep learning and pre-training techniques, albeit accompanied 

## **Input text for summarization**

In [11]:
text



'"""Natural Language Processing (NLP) stands at the forefront of artificial intelligence, revolutionizing the interaction between machines and human language. Rooted in the convergence of computer science, linguistics, and cognitive psychology, NLP equips computers with the ability to comprehend, interpret, and respond to natural language in a contextually aware and semantically accurate manner. Its core components, including tokenization, part-of-speech tagging, named entity recognition, parsing, and sentiment analysis, form the building blocks for various applications. From the conversational capabilities of chatbots and virtual assistants to language translation, information extraction, text summarization, and speech recognition, NLP permeates diverse sectors, enhancing user experiences and extracting valuable insights from unstructured data. Recent strides, exemplified by models like OpenAI GPT-3, showcase the prowess of deep learning and pre-training techniques, albeit accompanied

In [12]:
# Load stop words from spaCy and common punctuation symbols
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

# Create a list of stop words
stopwords = list(STOP_WORDS)
stopwords


['below',
 'whereupon',
 'any',
 'does',
 'i',
 'side',
 'seeming',
 'regarding',
 'becoming',
 'anywhere',
 'without',
 'off',
 'might',
 "'s",
 're',
 'indeed',
 'since',
 'thru',
 'were',
 'wherever',
 'ourselves',
 '‘ve',
 'that',
 'but',
 'always',
 'beforehand',
 'between',
 'beyond',
 'a',
 'show',
 'top',
 'sometime',
 'becomes',
 'whether',
 'well',
 'itself',
 'then',
 'yourself',
 'really',
 "'re",
 'against',
 'fifteen',
 'the',
 'are',
 '’m',
 'therefore',
 'n’t',
 'used',
 'can',
 'seems',
 'please',
 'through',
 'and',
 'an',
 'sometimes',
 'he',
 'as',
 'herself',
 'else',
 'do',
 'part',
 'enough',
 '’ve',
 'yourselves',
 '‘re',
 'some',
 'will',
 'six',
 'nothing',
 'eight',
 'three',
 'how',
 'last',
 'various',
 'whenever',
 'either',
 'together',
 'call',
 'which',
 'nowhere',
 'your',
 'meanwhile',
 'never',
 'though',
 'is',
 'doing',
 "'ll",
 'front',
 'anyhow',
 'onto',
 'up',
 'everything',
 '‘m',
 'thereupon',
 'to',
 'otherwise',
 'whole',
 'no',
 'least',
 

In [13]:
# Count the total number of stopwords
len(stopwords)

326

In [14]:
# Load spaCy's English language model
nlp = spacy.load('en_core_web_sm')


In [15]:
# Print the punctuation symbols
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [16]:
# Tokenize the input text using spaCy
doc = nlp(text)


In [17]:
# Extract individual tokens from the document
tokens = [token.text for token in doc]
print(tokens)


['"', '"', '"', 'Natural', 'Language', 'Processing', '(', 'NLP', ')', 'stands', 'at', 'the', 'forefront', 'of', 'artificial', 'intelligence', ',', 'revolutionizing', 'the', 'interaction', 'between', 'machines', 'and', 'human', 'language', '.', 'Rooted', 'in', 'the', 'convergence', 'of', 'computer', 'science', ',', 'linguistics', ',', 'and', 'cognitive', 'psychology', ',', 'NLP', 'equips', 'computers', 'with', 'the', 'ability', 'to', 'comprehend', ',', 'interpret', ',', 'and', 'respond', 'to', 'natural', 'language', 'in', 'a', 'contextually', 'aware', 'and', 'semantically', 'accurate', 'manner', '.', 'Its', 'core', 'components', ',', 'including', 'tokenization', ',', 'part', '-', 'of', '-', 'speech', 'tagging', ',', 'named', 'entity', 'recognition', ',', 'parsing', ',', 'and', 'sentiment', 'analysis', ',', 'form', 'the', 'building', 'blocks', 'for', 'various', 'applications', '.', 'From', 'the', 'conversational', 'capabilities', 'of', 'chatbots', 'and', 'virtual', 'assistants', 'to', 'l

In [18]:
tokens

['"',
 '"',
 '"',
 'Natural',
 'Language',
 'Processing',
 '(',
 'NLP',
 ')',
 'stands',
 'at',
 'the',
 'forefront',
 'of',
 'artificial',
 'intelligence',
 ',',
 'revolutionizing',
 'the',
 'interaction',
 'between',
 'machines',
 'and',
 'human',
 'language',
 '.',
 'Rooted',
 'in',
 'the',
 'convergence',
 'of',
 'computer',
 'science',
 ',',
 'linguistics',
 ',',
 'and',
 'cognitive',
 'psychology',
 ',',
 'NLP',
 'equips',
 'computers',
 'with',
 'the',
 'ability',
 'to',
 'comprehend',
 ',',
 'interpret',
 ',',
 'and',
 'respond',
 'to',
 'natural',
 'language',
 'in',
 'a',
 'contextually',
 'aware',
 'and',
 'semantically',
 'accurate',
 'manner',
 '.',
 'Its',
 'core',
 'components',
 ',',
 'including',
 'tokenization',
 ',',
 'part',
 '-',
 'of',
 '-',
 'speech',
 'tagging',
 ',',
 'named',
 'entity',
 'recognition',
 ',',
 'parsing',
 ',',
 'and',
 'sentiment',
 'analysis',
 ',',
 'form',
 'the',
 'building',
 'blocks',
 'for',
 'various',
 'applications',
 '.',
 'From',
 '

In [19]:
# Count the total number of tokens in the document
len(tokens)

458

In [20]:
# Calculate word frequencies in the document, excluding stop words and punctuation
word_frequencies = {}
for word in doc:
    if word.text.lower() not in stopwords:
        if word.text.lower() not in punctuation:
            if word.text not in word_frequencies.keys():
                word_frequencies[word.text] = 1
            else:
                word_frequencies[word.text] += 1



In [21]:
# Display word frequencies
word_frequencies

{'Natural': 2,
 'Language': 2,
 'Processing': 2,
 'NLP': 7,
 'stands': 2,
 'forefront': 1,
 'artificial': 1,
 'intelligence': 1,
 'revolutionizing': 1,
 'interaction': 2,
 'machines': 2,
 'human': 2,
 'language': 4,
 'Rooted': 1,
 'convergence': 1,
 'computer': 2,
 'science': 1,
 'linguistics': 1,
 'cognitive': 1,
 'psychology': 1,
 'equips': 1,
 'computers': 1,
 'ability': 1,
 'comprehend': 1,
 'interpret': 1,
 'respond': 1,
 'natural': 1,
 'contextually': 1,
 'aware': 1,
 'semantically': 1,
 'accurate': 2,
 'manner': 1,
 'core': 1,
 'components': 1,
 'including': 2,
 'tokenization': 2,
 'speech': 3,
 'tagging': 2,
 'named': 2,
 'entity': 2,
 'recognition': 3,
 'parsing': 3,
 'sentiment': 1,
 'analysis': 1,
 'form': 1,
 'building': 1,
 'blocks': 1,
 'applications': 3,
 'conversational': 1,
 'capabilities': 3,
 'chatbots': 1,
 'virtual': 1,
 'assistants': 1,
 'translation': 1,
 'information': 1,
 'extraction': 1,
 'text': 7,
 'summarization': 9,
 'permeates': 1,
 'diverse': 1,
 'sector

In [22]:
# Count the total number of unique words in the document
len(word_frequencies)

188

In [23]:
# Normalize word frequencies by dividing each frequency by the maximum frequency
max_frequency = max(word_frequencies.values())
max_frequency

9

In [24]:
# Update word frequencies to represent the normalized values
for word in word_frequencies.keys():
    word_frequencies[word] = word_frequencies[word]/max_frequency

# Display the normalized word frequencies
word_frequencies


{'Natural': 0.2222222222222222,
 'Language': 0.2222222222222222,
 'Processing': 0.2222222222222222,
 'NLP': 0.7777777777777778,
 'stands': 0.2222222222222222,
 'forefront': 0.1111111111111111,
 'artificial': 0.1111111111111111,
 'intelligence': 0.1111111111111111,
 'revolutionizing': 0.1111111111111111,
 'interaction': 0.2222222222222222,
 'machines': 0.2222222222222222,
 'human': 0.2222222222222222,
 'language': 0.4444444444444444,
 'Rooted': 0.1111111111111111,
 'convergence': 0.1111111111111111,
 'computer': 0.2222222222222222,
 'science': 0.1111111111111111,
 'linguistics': 0.1111111111111111,
 'cognitive': 0.1111111111111111,
 'psychology': 0.1111111111111111,
 'equips': 0.1111111111111111,
 'computers': 0.1111111111111111,
 'ability': 0.1111111111111111,
 'comprehend': 0.1111111111111111,
 'interpret': 0.1111111111111111,
 'respond': 0.1111111111111111,
 'natural': 0.1111111111111111,
 'contextually': 0.1111111111111111,
 'aware': 0.1111111111111111,
 'semantically': 0.1111111111

In [25]:
# Split the document into individual sentences
sentence_tokens = [sent for sent in doc.sents]
sentence_tokens



["""Natural Language Processing (NLP) stands at the forefront of artificial intelligence, revolutionizing the interaction between machines and human language.,
 Rooted in the convergence of computer science, linguistics, and cognitive psychology, NLP equips computers with the ability to comprehend, interpret, and respond to natural language in a contextually aware and semantically accurate manner.,
 Its core components, including tokenization, part-of-speech tagging, named entity recognition, parsing, and sentiment analysis, form the building blocks for various applications.,
 From the conversational capabilities of chatbots and virtual assistants to language translation, information extraction, text summarization, and speech recognition, NLP permeates diverse sectors, enhancing user experiences and extracting valuable insights from unstructured data.,
 Recent strides, exemplified by models like OpenAI GPT-3, showcase the prowess of deep learning and pre-training techniques, albeit acc

In [26]:
# Count the total number of sentences in the document
len(sentence_tokens)

17

In [27]:
# Calculate a score for each sentence based on the normalized word frequencies
sentence_scores = {}
for sent in sentence_tokens:
    for word in sent:
        if word.text.lower() in word_frequencies.keys():
            if sent not in sentence_scores.keys():
                sentence_scores[sent] = word_frequencies[word.text.lower()]
            else:
                sentence_scores[sent] += word_frequencies[word.text.lower()]

# Display the sentence scores
sentence_scores


{"""Natural Language Processing (NLP) stands at the forefront of artificial intelligence, revolutionizing the interaction between machines and human language.: 2.4444444444444446,
 Rooted in the convergence of computer science, linguistics, and cognitive psychology, NLP equips computers with the ability to comprehend, interpret, and respond to natural language in a contextually aware and semantically accurate manner.: 2.6666666666666674,
 Its core components, including tokenization, part-of-speech tagging, named entity recognition, parsing, and sentiment analysis, form the building blocks for various applications.: 3.222222222222223,
 From the conversational capabilities of chatbots and virtual assistants to language translation, information extraction, text summarization, and speech recognition, NLP permeates diverse sectors, enhancing user experiences and extracting valuable insights from unstructured data.: 5.444444444444441,
 Recent strides, exemplified by models like OpenAI GPT-3,

In [28]:
# Import nlargest function from heapq module
from heapq import nlargest

# Determine the length of the summary (30% of the total number of sentences)
select_length = int(len(sentence_tokens) * 0.3)

# Display the selected length for the summary
select_length

5

In [29]:
# Generate the final summary using the nlargest function
summary = nlargest(select_length, sentence_scores, key=sentence_scores.get)

# Display the summary
summary

[From the conversational capabilities of chatbots and virtual assistants to language translation, information extraction, text summarization, and speech recognition, NLP permeates diverse sectors, enhancing user experiences and extracting valuable insights from unstructured data.,
 In the realm of Natural Language Processing (NLP), the spaCy library stands as a powerful tool, offering robust support for various text processing tasks, including text summarization.,
 Leveraging pre-trained models, spaCy facilitates tokenization, part-of-speech tagging, named entity recognition, and syntactic parsing, forming a solid foundation for summarization tasks.,
 However, the library remains a valuable asset in the preprocessing stages of text summarization pipelines, contributing to the overall effectiveness of NLP applications.,
 While spaCy is renowned for its efficiency and ease of use, it essential to note that abstractive summarization, which involves generating new sentences, often requires

In [30]:
# Extract the text from the selected summary sentences
final_summary = [word.text for word in summary]

# Display the final summary
final_summary

['From the conversational capabilities of chatbots and virtual assistants to language translation, information extraction, text summarization, and speech recognition, NLP permeates diverse sectors, enhancing user experiences and extracting valuable insights from unstructured data.',
 'In the realm of Natural Language Processing (NLP), the spaCy library stands as a powerful tool, offering robust support for various text processing tasks, including text summarization.',
 'Leveraging pre-trained models, spaCy facilitates tokenization, part-of-speech tagging, named entity recognition, and syntactic parsing, forming a solid foundation for summarization tasks.',
 'However, the library remains a valuable asset in the preprocessing stages of text summarization pipelines, contributing to the overall effectiveness of NLP applications.',
 'While spaCy is renowned for its efficiency and ease of use, it essential to note that abstractive summarization, which involves generating new sentences, often

In [31]:
file_name = "summary.txt"

# Open the file in write mode
with open(file_name, "w") as file:
    # Write each element of the list to the file
    for item in final_summary:
        file.write(f"{item}\n")

print(f"Summary For your Text is saved to this file. {file_name}")


Summary For your Text is saved to this file. summary.txt
