## ASSIGNMENT 5
## TEXT SUMMARIZATION

Text summarization condenses large bodies of text into shorter versions, aiding comprehension and information retrieval by extracting key points or sentences. It employs algorithms to identify important content, facilitating quicker understanding and decision-making.

### IMPORTING LIBRARIES

In [109]:
# Essential libraries for text processing and graph-based algorithms.

from nltk.corpus import stopwords                     # You can remove stop words for speed
from nltk.cluster.util import cosine_distance
import numpy as np
import networkx as nx

### READING AND PROCESSING TEXT

The code snippet reads text from a file, divides it into sentences, and tokenizes them for subsequent analysis, preparing the data for text processing tasks.

In [110]:
# Reads text from file, splits into sentences, and tokenizes them.

file = open("Text1.txt", "r")
                                                      # This file contains one paragraph of multiple sentences
filedata = file.readlines()
article = filedata[0].split(". ")                     # Just do the first paragraph

sentences = []
for sentence in article:
    print(sentence)
    sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

It was the best of times
It was the worst of times
It was the age of wisdom
It was the age of foolishness
What is the importance of age
This is the best example.


### PRINTING SENTENCES

In [111]:
# Displays the list of tokenized sentences for further processing.

print("Sentences are ", sentences)

Sentences are  [['It', 'was', 'the', 'best', 'of', 'times'], ['It', 'was', 'the', 'worst', 'of', 'times'], ['It', 'was', 'the', 'age', 'of', 'wisdom'], ['It', 'was', 'the', 'age', 'of', 'foolishness'], ['What', 'is', 'the', 'importance', 'of', 'age'], ['This', 'is', 'the', 'best', 'example.']]


### SENTENCE SIMILARITY

This function computes the similarity between two sentences using cosine distance, a common metric in natural language processing, by constructing vectors based on word occurrences and comparing their orientations.

In [112]:
# Calculates similarity between two sentences using cosine distance.

def sentence_similarity(sent1, sent2 ):
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1 + sent2))
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
                                                    # build the vector for the first sentence
    for w in sent1:
          vector1[all_words.index(w)] += 1
                                                    # build the vector for the second sentence
    for w in sent2:
          vector2[all_words.index(w)] += 1
    return 1 - cosine_distance(vector1, vector2)

### CALCULATING SIMILARITY MATRIX

In [113]:
# Computes the similarity matrix for all pairs of sentences.

similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
             if idx1 == idx2:                       # ignore if both are same sentences
                continue 
             similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2])

print("Smilarity matrix \n", similarity_matrix)

Smilarity matrix 
 [[0.         0.83333333 0.66666667 0.66666667 0.33333333 0.36514837]
 [0.83333333 0.         0.66666667 0.66666667 0.33333333 0.18257419]
 [0.66666667 0.66666667 0.         0.83333333 0.5        0.18257419]
 [0.66666667 0.66666667 0.83333333 0.         0.5        0.18257419]
 [0.33333333 0.33333333 0.5        0.5        0.         0.36514837]
 [0.36514837 0.18257419 0.18257419 0.18257419 0.36514837 0.        ]]


### BUILDING SIMILARITY GRAPH

In [114]:
# Constructs a graph from the similarity matrix and calculates PageRank scores.

sentence_similarity_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(sentence_similarity_graph)
print("scores", scores)

scores {0: 0.19306449754902083, 1: 0.18095893645850156, 2: 0.1911855225552033, 3: 0.1911855225552033, 4: 0.14434636291527997, 5: 0.09925915796679063}


### RANKING SENTENCES

This code snippet ranks sentences by their PageRank scores in descending order, aiding in prioritizing important content for text summarization or analysis tasks.

In [115]:
# Sorts sentences based on their PageRank scores in descending order.

ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence)

Indexes of top ranked_sentence order are 

 [(0.19306449754902083, ['It', 'was', 'the', 'best', 'of', 'times']), (0.1911855225552033, ['It', 'was', 'the', 'age', 'of', 'wisdom']), (0.1911855225552033, ['It', 'was', 'the', 'age', 'of', 'foolishness']), (0.18095893645850156, ['It', 'was', 'the', 'worst', 'of', 'times']), (0.14434636291527997, ['What', 'is', 'the', 'importance', 'of', 'age']), (0.09925915796679063, ['This', 'is', 'the', 'best', 'example.'])]


### SUMMARIZING TEXT

In [116]:
# Generates a summary by selecting the top-ranked sentences based on user input.

n = int(input("How many sentences do you want in the summary? "))    #n=2
summarize_text = []
for i in range(n):
      summarize_text.append(" ".join(ranked_sentence[i][1]))

How many sentences do you want in the summary? 5


### PRINTING SUMMARY

In [117]:
# Displays the summarized text composed of selected sentences.

print("Summarize Text: \n", ". ".join(summarize_text))

Summarize Text: 
 It was the best of times. It was the age of wisdom. It was the age of foolishness. It was the worst of times. What is the importance of age


## TEXT SUMMARIZATION ON SECOND FILE 

### READING AND PROCESSING TEXT

The code snippet reads text from a file, divides it into sentences, and tokenizes them for subsequent analysis, preparing the data for text processing tasks.

In [118]:
# Reads text from file, splits into sentences, and tokenizes them.

file2 = open("Text2.txt", "r")
                                                      # This file contains one paragraph of multiple sentences
filedata2 = file2.readlines()
article2 = filedata2[0].split(". ")                     # Just do the first paragraph

sentences2 = []
for sentence in article2:
    print(sentence)
    sentences2.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

I AM SAM
I AM SAM
SAM I AM
THAT SAM-I-AM! THAT SAM-I-AM!
I DO NOT LIKE THAT SAM-I-AM!
DO WOULD YOU LIKE GREEN EGGS AND HAM?
I DO NOT LIKE THEM, SAM-I-AM
I DO NOT LIKE GREEN EGGS AND HAM
WOULD YOU LIKE THEM HERE OR THERE?
I WOULD NOT LIKE THEM HERE OR THERE
I WOULD NOT LIKE THEM ANYWHERE
I DO NOT LIKE GREEN EGGS AND HAM
I DO NOT LIKE THEM, SAM-I-AM
WOULD YOU LIKE THEM IN A HOUSE?
WOULD YOU LIKE THEN WITH A MOUSE?
I DO NOT LIKE THEM IN A HOUSE
I DO NOT LIKE THEM WITH A MOUSE
I DO NOT LIKE THEM HERE OR THERE
I DO NOT LIKE THEM ANYWHERE
I DO NOT LIKE GREEN EGGS AND HAM
I DO NOT LIKE THEM, SAM-I-AM.



### PRINTING SENTENCES

In [119]:
# Displays the list of tokenized sentences for further processing.

print("Sentences are ", sentences2)

Sentences are  [['I', 'AM', 'SAM'], ['I', 'AM', 'SAM'], ['SAM', 'I', 'AM'], ['THAT', 'SAM-I-AM!', 'THAT', 'SAM-I-AM!'], ['I', 'DO', 'NOT', 'LIKE', 'THAT', 'SAM-I-AM!'], ['DO', 'WOULD', 'YOU', 'LIKE', 'GREEN', 'EGGS', 'AND', 'HAM?'], ['I', 'DO', 'NOT', 'LIKE', 'THEM,', 'SAM-I-AM'], ['I', 'DO', 'NOT', 'LIKE', 'GREEN', 'EGGS', 'AND', 'HAM'], ['WOULD', 'YOU', 'LIKE', 'THEM', 'HERE', 'OR', 'THERE?'], ['I', 'WOULD', 'NOT', 'LIKE', 'THEM', 'HERE', 'OR', 'THERE'], ['I', 'WOULD', 'NOT', 'LIKE', 'THEM', 'ANYWHERE'], ['I', 'DO', 'NOT', 'LIKE', 'GREEN', 'EGGS', 'AND', 'HAM'], ['I', 'DO', 'NOT', 'LIKE', 'THEM,', 'SAM-I-AM'], ['WOULD', 'YOU', 'LIKE', 'THEM', 'IN', 'A', 'HOUSE?'], ['WOULD', 'YOU', 'LIKE', 'THEN', 'WITH', 'A', 'MOUSE?'], ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'IN', 'A', 'HOUSE'], ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'WITH', 'A', 'MOUSE'], ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'HERE', 'OR', 'THERE'], ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'ANYWHERE'], ['I', 'DO', 'NOT', 'LIKE', 'GREEN', 'EGGS', 'AN

### SENTENCE SIMILARITY

This function computes the similarity between two sentences using cosine distance, a common metric in natural language processing, by constructing vectors based on word occurrences and comparing their orientations.

In [120]:
# Calculates similarity between two sentences using cosine distance.

def sentence_similarity2(sent3, sent4 ):
    sent1 = [w.lower() for w in sent3]
    sent2 = [w.lower() for w in sent4]
    all_words = list(set(sent3 + sent4))
    vector3 = [0] * len(all_words)
    vector4 = [0] * len(all_words)
                                                    # build the vector for the first sentence
    for w in sent3:
          vector3[all_words.index(w)] += 1
                                                    # build the vector for the second sentence
    for w in sent4:
          vector4[all_words.index(w)] += 1
    return 1 - cosine_distance(vector3, vector4)

### CALCULATING SIMILARITY MATRIX

In [121]:
# Computes the similarity matrix for all pairs of sentences.

similarity_matrix2 = np.zeros((len(sentences2), len(sentences2)))
 
for idx1 in range(len(sentences2)):
        for idx2 in range(len(sentences2)):
             if idx1 == idx2:                       # ignore if both are same sentences
                continue 
             similarity_matrix2[idx1][idx2] = sentence_similarity2(sentences2[idx1], sentences2[idx2])

print("Smilarity matrix \n", similarity_matrix2)

Smilarity matrix 
 [[0.         1.         1.         0.         0.23570226 0.
  0.23570226 0.20412415 0.         0.20412415 0.23570226 0.20412415
  0.23570226 0.         0.         0.20412415 0.20412415 0.20412415
  0.23570226 0.20412415 0.23570226]
 [1.         0.         1.         0.         0.23570226 0.
  0.23570226 0.20412415 0.         0.20412415 0.23570226 0.20412415
  0.23570226 0.         0.         0.20412415 0.20412415 0.20412415
  0.23570226 0.20412415 0.23570226]
 [1.         1.         0.         0.         0.23570226 0.
  0.23570226 0.20412415 0.         0.20412415 0.23570226 0.20412415
  0.23570226 0.         0.         0.20412415 0.20412415 0.20412415
  0.23570226 0.20412415 0.23570226]
 [0.         0.         0.         0.         0.57735027 0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.23570226 0.23570226 0.23570226 0.57735027 0.         0.28867513
 

### BUILDING SIMILARITY GRAPH

In [122]:
# Constructs a graph from the similarity matrix and calculates PageRank scores.

sentence_similarity_graph = nx.from_numpy_array(similarity_matrix2)
scores2 = nx.pagerank(sentence_similarity_graph)
print("scores", scores2)

scores {0: 0.03492638917817515, 1: 0.03492638917817515, 2: 0.03492638917817515, 3: 0.010552425887368076, 4: 0.06321847556853293, 5: 0.03960609463088077, 6: 0.057289451630209653, 7: 0.056226314678287335, 8: 0.03486412173792891, 9: 0.053338934277549654, 10: 0.0566062415473087, 11: 0.056226314678287335, 12: 0.05728945163020967, 13: 0.0350182718613405, 14: 0.02940805086606966, 15: 0.056983599690836606, 16: 0.05702071446687991, 17: 0.057531865612924, 18: 0.06143601920162538, 19: 0.056226314678287335, 20: 0.05637816982094812}


### RANKING SENTENCES

This code snippet ranks sentences by their PageRank scores in descending order, aiding in prioritizing important content for text summarization or analysis tasks.

In [123]:
# Sorts sentences based on their PageRank scores in descending order.

ranked_sentence2 = sorted(((scores2[i],s) for i,s in enumerate(sentences2)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence2)

Indexes of top ranked_sentence order are 

 [(0.06321847556853293, ['I', 'DO', 'NOT', 'LIKE', 'THAT', 'SAM-I-AM!']), (0.06143601920162538, ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'ANYWHERE']), (0.057531865612924, ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'HERE', 'OR', 'THERE']), (0.05728945163020967, ['I', 'DO', 'NOT', 'LIKE', 'THEM,', 'SAM-I-AM']), (0.057289451630209653, ['I', 'DO', 'NOT', 'LIKE', 'THEM,', 'SAM-I-AM']), (0.05702071446687991, ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'WITH', 'A', 'MOUSE']), (0.056983599690836606, ['I', 'DO', 'NOT', 'LIKE', 'THEM', 'IN', 'A', 'HOUSE']), (0.0566062415473087, ['I', 'WOULD', 'NOT', 'LIKE', 'THEM', 'ANYWHERE']), (0.05637816982094812, ['I', 'DO', 'NOT', 'LIKE', 'THEM,', 'SAM-I-AM.\n']), (0.056226314678287335, ['I', 'DO', 'NOT', 'LIKE', 'GREEN', 'EGGS', 'AND', 'HAM']), (0.056226314678287335, ['I', 'DO', 'NOT', 'LIKE', 'GREEN', 'EGGS', 'AND', 'HAM']), (0.056226314678287335, ['I', 'DO', 'NOT', 'LIKE', 'GREEN', 'EGGS', 'AND', 'HAM']), (0.053338934277549654, ['I', 

### SUMMARIZING TEXT

In [124]:
# Generates a summary by selecting the top-ranked sentences based on user input.

n2 = int(input("How many sentences do you want in the summary? "))    #n=2
summarize_text2 = []
for i in range(n2):
      summarize_text2.append(" ".join(ranked_sentence2[i][1]))

How many sentences do you want in the summary? 5


### PRINTING SUMMARY

In [125]:
# Displays the summarized text composed of selected sentences.

print("Summarize Text: \n", ". ".join(summarize_text2))

Summarize Text: 
 I DO NOT LIKE THAT SAM-I-AM!. I DO NOT LIKE THEM ANYWHERE. I DO NOT LIKE THEM HERE OR THERE. I DO NOT LIKE THEM, SAM-I-AM. I DO NOT LIKE THEM, SAM-I-AM


## TEXT SUMMARIZATION FOR THIRD FILE 

### READING AND PROCESSING TEXT

The code snippet reads text from a file, divides it into sentences, and tokenizes them for subsequent analysis, preparing the data for text processing tasks.

In [126]:
# Reads text from file, splits into sentences, and tokenizes them.

file = open("Text3.txt", "r")
                                                      # This file contains one paragraph of multiple sentences
filedata = file.readlines()
article = filedata[0].split(". ")                     # Just do the first paragraph

sentences = []
for sentence in article:
    print(sentence)
    sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

As an institution of higher learning, Sacred Heart University places special emphasis on academic integrity, which is a commitment to the fundamental values of honesty, trust, fairness, respect, and responsibility
Only when these values are widely respected and practiced by all members of the University students, faculty, administrators, and staff can the University maintain a culture that promotes free exploration of knowledge, constructive debate, genuine learning, effective research, fair assessment of student progress, and development of members characters
These aims of the University require that its members exercise mutual responsibilities
At its core, academic integrity is secured by a principled commitment to carry out these responsibilities, not by rules and penalties
Students and faculty should strive to create an academic environment that is honest, fair, and respectful of all
They do this by evaluating others work fairly, by responding to others ideas critically yet courteo

### PRINTING SENTENCES

In [127]:
# Displays the list of tokenized sentences for further processing.

print("Sentences are ", sentences)

Sentences are  [['As', 'an', 'institution', 'of', 'higher', 'learning,', 'Sacred', 'Heart', 'University', 'places', 'special', 'emphasis', 'on', 'academic', 'integrity,', 'which', 'is', 'a', 'commitment', 'to', 'the', 'fundamental', 'values', 'of', 'honesty,', 'trust,', 'fairness,', 'respect,', 'and', 'responsibility'], ['Only', 'when', 'these', 'values', 'are', 'widely', 'respected', 'and', 'practiced', 'by', 'all', 'members', 'of', 'the', 'University', 'students,', 'faculty,', 'administrators,', 'and', 'staff', 'can', 'the', 'University', 'maintain', 'a', 'culture', 'that', 'promotes', 'free', 'exploration', 'of', 'knowledge,', 'constructive', 'debate,', 'genuine', 'learning,', 'effective', 'research,', 'fair', 'assessment', 'of', 'student', 'progress,', 'and', 'development', 'of', 'members', 'characters'], ['These', 'aims', 'of', 'the', 'University', 'require', 'that', 'its', 'members', 'exercise', 'mutual', 'responsibilities'], ['At', 'its', 'core,', 'academic', 'integrity', 'is', 

### SENTENCE SIMILARITY

This function computes the similarity between two sentences using cosine distance, a common metric in natural language processing, by constructing vectors based on word occurrences and comparing their orientations.

In [128]:
# Calculates similarity between two sentences using cosine distance.

def sentence_similarity(sent1, sent2 ):
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1 + sent2))
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
                                                    # build the vector for the first sentence
    for w in sent1:
          vector1[all_words.index(w)] += 1
                                                    # build the vector for the second sentence
    for w in sent2:
          vector2[all_words.index(w)] += 1
    return 1 - cosine_distance(vector1, vector2)

### CALCULATING SIMILARITY MATRIX

In [129]:
# Computes the similarity matrix for all pairs of sentences.

similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
             if idx1 == idx2:                       # ignore if both are same sentences
                continue 
             similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2])

print("Smilarity matrix \n", similarity_matrix)

Smilarity matrix 
 [[0.         0.375      0.20412415 0.22116293 0.31622777 0.25315802
  0.30439039 0.33348648 0.26382243 0.25259074 0.3006689 ]
 [0.375      0.         0.40824829 0.17201562 0.31622777 0.36822985
  0.32780503 0.26274693 0.22613351 0.3127314  0.33407655]
 [0.20412415 0.40824829 0.         0.12038585 0.12909944 0.15032921
  0.22941573 0.14852213 0.18463724 0.29462783 0.21821789]
 [0.22116293 0.17201562 0.12038585 0.         0.2331262  0.35290144
  0.16571045 0.17879963 0.13336627 0.         0.11821656]
 [0.31622777 0.31622777 0.12909944 0.2331262  0.         0.26200013
  0.26655699 0.26843775 0.14301939 0.09128709 0.25354628]
 [0.25315802 0.36822985 0.15032921 0.35290144 0.26200013 0.
  0.27590308 0.22327214 0.19429458 0.1860229  0.22143052]
 [0.30439039 0.32780503 0.22941573 0.16571045 0.26655699 0.27590308
  0.         0.3180176  0.25415212 0.32444284 0.37546963]
 [0.33348648 0.26274693 0.14852213 0.17879963 0.26843775 0.22327214
  0.3180176  0.         0.32907259 0.31

### BUILDING SIMILARITY GRAPH

In [130]:
# Constructs a graph from the similarity matrix and calculates PageRank scores.

sentence_similarity_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(sentence_similarity_graph)
print("scores", scores)

scores {0: 0.10052808533762261, 1: 0.1092617813187325, 2: 0.07743541734389628, 3: 0.06555131921078501, 4: 0.08353031285994028, 5: 0.09032582020361654, 6: 0.10094508912146115, 7: 0.10059598487085707, 8: 0.08058955768995534, 9: 0.08737172856802625, 10: 0.10386490347510732}


### RANKING SENTENCES

This code snippet ranks sentences by their PageRank scores in descending order, aiding in prioritizing important content for text summarization or analysis tasks.

In [131]:
# Sorts sentences based on their PageRank scores in descending order.

ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence)

Indexes of top ranked_sentence order are 

 [(0.1092617813187325, ['Only', 'when', 'these', 'values', 'are', 'widely', 'respected', 'and', 'practiced', 'by', 'all', 'members', 'of', 'the', 'University', 'students,', 'faculty,', 'administrators,', 'and', 'staff', 'can', 'the', 'University', 'maintain', 'a', 'culture', 'that', 'promotes', 'free', 'exploration', 'of', 'knowledge,', 'constructive', 'debate,', 'genuine', 'learning,', 'effective', 'research,', 'fair', 'assessment', 'of', 'student', 'progress,', 'and', 'development', 'of', 'members', 'characters']), (0.10386490347510732, ['All', 'matriculated', 'students', 'will', 'be', 'provided', 'with', 'a', 'full', 'description', 'of', 'the', 'University', 'standards', 'for', 'academic', 'integrity,', 'consequences', 'for', 'violations,', 'and', 'the', 'appeals', 'procedure.']), (0.10094508912146115, ['Appropriate', 'disciplinary', 'action', 'will', 'be', 'taken', 'for', 'violations', 'of', 'academic', 'integrity,', 'including', 'plagiari

### SUMMARIZING TEXT

In [132]:
# Generates a summary by selecting the top-ranked sentences based on user input.

n2 = int(input("How many sentences do you want in the summary? "))    #n=2
summarize_text2 = []
for i in range(n2):
      summarize_text2.append(" ".join(ranked_sentence2[i][1]))

How many sentences do you want in the summary? 5


### PRINTING SUMMARY

In [133]:
# Displays the summarized text composed of selected sentences.

print("Summarize Text: \n", ". ".join(summarize_text2))

Summarize Text: 
 I DO NOT LIKE THAT SAM-I-AM!. I DO NOT LIKE THEM ANYWHERE. I DO NOT LIKE THEM HERE OR THERE. I DO NOT LIKE THEM, SAM-I-AM. I DO NOT LIKE THEM, SAM-I-AM


## TEXT SUMMARIZATION FOR FOURTH FILE 

### READING AND PROCESSING TEXT

The code snippet reads text from a file, divides it into sentences, and tokenizes them for subsequent analysis, preparing the data for text processing tasks.

In [134]:
# Reads text from file, splits into sentences, and tokenizes them.

file = open("Text4.txt", "r")
                                                      # This file contains one paragraph of multiple sentences
filedata = file.readlines()
article = filedata[0].split(". ")                     # Just do the first paragraph

sentences = []
for sentence in article:
    print(sentence)
    sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

Imagine there's no heaven
It's easy if you try
No hell below us
Above us, only sky
Imagine all the people livin' for today
Imagine there's no countries
It isn't hard to do
Nothing to kill or die for and no religion, too
Imagine all the people livin' life in peace
You may say I'm a dreamer but I'm not the only one
I hope someday you'll join us and the world will be as one
Imagine no possessions
I wonder if you can
No need for greed or hunger
A brotherhood of man
Imagine all the people sharing all the world




### PRINTING SENTENCES

In [135]:
# Displays the list of tokenized sentences for further processing.

print("Sentences are ", sentences)

Sentences are  [['Imagine', "there's", 'no', 'heaven'], ["It's", 'easy', 'if', 'you', 'try'], ['No', 'hell', 'below', 'us'], ['Above', 'us,', 'only', 'sky'], ['Imagine', 'all', 'the', 'people', "livin'", 'for', 'today'], ['Imagine', "there's", 'no', 'countries'], ['It', "isn't", 'hard', 'to', 'do'], ['Nothing', 'to', 'kill', 'or', 'die', 'for', 'and', 'no', 'religion,', 'too'], ['Imagine', 'all', 'the', 'people', "livin'", 'life', 'in', 'peace'], ['You', 'may', 'say', "I'm", 'a', 'dreamer', 'but', "I'm", 'not', 'the', 'only', 'one'], ['I', 'hope', 'someday', "you'll", 'join', 'us', 'and', 'the', 'world', 'will', 'be', 'as', 'one'], ['Imagine', 'no', 'possessions'], ['I', 'wonder', 'if', 'you', 'can'], ['No', 'need', 'for', 'greed', 'or', 'hunger'], ['A', 'brotherhood', 'of', 'man'], ['Imagine', 'all', 'the', 'people', 'sharing', 'all', 'the', 'world'], ['\n']]


### SENTENCE SIMILARITY

This function computes the similarity between two sentences using cosine distance, a common metric in natural language processing, by constructing vectors based on word occurrences and comparing their orientations.

In [136]:
# Calculates similarity between two sentences using cosine distance.

def sentence_similarity(sent1, sent2 ):
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1 + sent2))
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
                                                    # build the vector for the first sentence
    for w in sent1:
          vector1[all_words.index(w)] += 1
                                                    # build the vector for the second sentence
    for w in sent2:
          vector2[all_words.index(w)] += 1
    return 1 - cosine_distance(vector1, vector2)

### CALCULATING SIMILARITY MATRIX

In [137]:
# Computes the similarity matrix for all pairs of sentences.

similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
             if idx1 == idx2:                       # ignore if both are same sentences
                continue 
             similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2])

print("Smilarity matrix \n", similarity_matrix)

Smilarity matrix 
 [[0.         0.         0.25       0.         0.18898224 0.75
  0.         0.15811388 0.1767767  0.         0.         0.57735027
  0.         0.20412415 0.         0.14433757 0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.11952286 0.         0.
  0.4        0.         0.         0.         0.        ]
 [0.25       0.         0.         0.         0.         0.25
  0.         0.15811388 0.         0.         0.13867505 0.28867513
  0.         0.20412415 0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.13363062 0.         0.
  0.         0.         0.         0.         0.        ]
 [0.18898224 0.         0.         0.         0.         0.18898224
  0.         0.11952286 0.6681531  0.10101525 0.10482848 0.21821789
  0.         0.15430335 0.         0.65465367 0.        ]
 [0.75       0.         0.25       0.         0.1889822

### BUILDING SIMILARITY GRAPH

In [138]:
# Constructs a graph from the similarity matrix and calculates PageRank scores.

sentence_similarity_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(sentence_similarity_graph)
print("scores", scores)

scores {0: 0.09163334044231103, 1: 0.04408562199586315, 2: 0.05432094163511732, 3: 0.01779516045726271, 4: 0.09575693097837697, 5: 0.09163334044231103, 6: 0.014924507861294583, 7: 0.0653124759476629, 8: 0.08186947693779716, 9: 0.07522194125679456, 10: 0.05101030185038315, 11: 0.09222556696979164, 12: 0.05146059071771966, 13: 0.05778838761945695, 14: 0.01779516045726271, 15: 0.08787832873399998, 16: 0.009287925696594429}


### RANKING SENTENCES

This code snippet ranks sentences by their PageRank scores in descending order, aiding in prioritizing important content for text summarization or analysis tasks.

In [139]:
# Sorts sentences based on their PageRank scores in descending order.

ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence)

Indexes of top ranked_sentence order are 

 [(0.09575693097837697, ['Imagine', 'all', 'the', 'people', "livin'", 'for', 'today']), (0.09222556696979164, ['Imagine', 'no', 'possessions']), (0.09163334044231103, ['Imagine', "there's", 'no', 'heaven']), (0.09163334044231103, ['Imagine', "there's", 'no', 'countries']), (0.08787832873399998, ['Imagine', 'all', 'the', 'people', 'sharing', 'all', 'the', 'world']), (0.08186947693779716, ['Imagine', 'all', 'the', 'people', "livin'", 'life', 'in', 'peace']), (0.07522194125679456, ['You', 'may', 'say', "I'm", 'a', 'dreamer', 'but', "I'm", 'not', 'the', 'only', 'one']), (0.0653124759476629, ['Nothing', 'to', 'kill', 'or', 'die', 'for', 'and', 'no', 'religion,', 'too']), (0.05778838761945695, ['No', 'need', 'for', 'greed', 'or', 'hunger']), (0.05432094163511732, ['No', 'hell', 'below', 'us']), (0.05146059071771966, ['I', 'wonder', 'if', 'you', 'can']), (0.05101030185038315, ['I', 'hope', 'someday', "you'll", 'join', 'us', 'and', 'the', 'world', 'wi

### SUMMARIZING TEXT

In [140]:
# Generates a summary by selecting the top-ranked sentences based on user input.

n = int(input("How many sentences do you want in the summary? "))    #n=2
summarize_text = []
for i in range(n):
      summarize_text.append(" ".join(ranked_sentence[i][1]))

How many sentences do you want in the summary? 5


### PRINTING SUMMARY

In [141]:
# Displays the summarized text composed of selected sentences.

print("Summarize Text: \n", ". ".join(summarize_text))

Summarize Text: 
 Imagine all the people livin' for today. Imagine no possessions. Imagine there's no heaven. Imagine there's no countries. Imagine all the people sharing all the world


## TEXT SUMMARIZATION FOR FIFTH FILE 

### READING AND PROCESSING TEXT

The code snippet reads text from a file, divides it into sentences, and tokenizes them for subsequent analysis, preparing the data for text processing tasks.

In [142]:
# Reads text from file, splits into sentences, and tokenizes them.

file = open("Text5.txt", "r")
                                                      # This file contains one paragraph of multiple sentences
filedata = file.readlines()
article = filedata[0].split(". ")                     # Just do the first paragraph

sentences = []
for sentence in article:
    print(sentence)
    sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

In an attempt to build an AI-ready workforce, Microsoft announced Intelligent Cloud Hub which has been launched to empower the next generation of students with AI-ready skills
Envisioned as a three-year collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, course content and curriculum, developer support, development tools and give students access to cloud and AI services
As part of the program, the Redmond giant which wants to expand its reach and is planning to build a strong developer ecosystem in India with the program will set up the core AI infrastructure and IoT Hub for the selected campuses
The company will provide AI development tools and Azure AI services such as Microsoft Cognitive Services, Bot Services and Azure Machine Learning.According to Manish Prakash, Country General Manager-PS, Health and Education, Microsoft India, said, "With AI being the defining technology of our time, it is transIn an attempt to build an AI-re

### PRINTING SENTENCES

In [143]:
# Displays the list of tokenized sentences for further processing.

print("Sentences are ", sentences)

Sentences are  [['In', 'an', 'attempt', 'to', 'build', 'an', 'AI-ready', 'workforce,', 'Microsoft', 'announced', 'Intelligent', 'Cloud', 'Hub', 'which', 'has', 'been', 'launched', 'to', 'empower', 'the', 'next', 'generation', 'of', 'students', 'with', 'AI-ready', 'skills'], ['Envisioned', 'as', 'a', 'three-year', 'collaborative', 'program,', 'Intelligent', 'Cloud', 'Hub', 'will', 'support', 'around', '100', 'institutions', 'with', 'AI', 'infrastructure,', 'course', 'content', 'and', 'curriculum,', 'developer', 'support,', 'development', 'tools', 'and', 'give', 'students', 'access', 'to', 'cloud', 'and', 'AI', 'services'], ['As', 'part', 'of', 'the', 'program,', 'the', 'Redmond', 'giant', 'which', 'wants', 'to', 'expand', 'its', 'reach', 'and', 'is', 'planning', 'to', 'build', 'a', 'strong', 'developer', 'ecosystem', 'in', 'India', 'with', 'the', 'program', 'will', 'set', 'up', 'the', 'core', 'AI', 'infrastructure', 'and', 'IoT', 'Hub', 'for', 'the', 'selected', 'campuses'], ['The', 'co

### SENTENCE SIMILARITY

This function computes the similarity between two sentences using cosine distance, a common metric in natural language processing, by constructing vectors based on word occurrences and comparing their orientations.

In [144]:
# Calculates similarity between two sentences using cosine distance.

def sentence_similarity(sent1, sent2 ):
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
    all_words = list(set(sent1 + sent2))
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
                                                    # build the vector for the first sentence
    for w in sent1:
          vector1[all_words.index(w)] += 1
                                                    # build the vector for the second sentence
    for w in sent2:
          vector2[all_words.index(w)] += 1
    return 1 - cosine_distance(vector1, vector2)

### CALCULATING SIMILARITY MATRIX

In [145]:
# Computes the similarity matrix for all pairs of sentences.

similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
             if idx1 == idx2:                       # ignore if both are same sentences
                continue 
             similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2])

print("Smilarity matrix \n", similarity_matrix)

Smilarity matrix 
 [[0.         0.20994555 0.32141217 0.6415029  0.20994555 0.32141217
  0.15589237 0.04828045 0.15974461 0.40146253 0.27852425 0.33009387
  0.15569979]
 [0.20994555 0.         0.31546459 0.42735216 1.         0.31546459
  0.4500225  0.41812101 0.31127151 0.18964186 0.15075567 0.30785965
  0.20225996]
 [0.32141217 0.31546459 0.         0.45361105 0.31546459 1.
  0.45317826 0.23897606 0.16943475 0.64517472 0.44312937 0.412959
  0.22019275]
 [0.6415029  0.42735216 0.45361105 0.         0.42735216 0.45361105
  0.78978629 0.28827833 0.26013299 0.46555195 0.34016803 0.39970544
  0.25354628]
 [0.20994555 1.         0.31546459 0.42735216 0.         0.31546459
  0.4500225  0.41812101 0.31127151 0.18964186 0.15075567 0.30785965
  0.20225996]
 [0.32141217 0.31546459 1.         0.45361105 0.31546459 0.
  0.45317826 0.23897606 0.16943475 0.64517472 0.44312937 0.412959
  0.22019275]
 [0.15589237 0.4500225  0.45317826 0.78978629 0.4500225  0.45317826
  0.         0.44155786 0.2282771

### BUILDING SIMILARITY GRAPH

In [146]:
# Constructs a graph from the similarity matrix and calculates PageRank scores.

sentence_similarity_graph = nx.from_numpy_array(similarity_matrix)
scores = nx.pagerank(sentence_similarity_graph)
print("scores", scores)

scores {0: 0.06505377280671527, 1: 0.08331029356108051, 2: 0.0944374740565271, 3: 0.0984332545355071, 4: 0.08331029356108051, 5: 0.0944374740565271, 6: 0.08956736183704041, 7: 0.06144739693431779, 8: 0.05275205184695904, 9: 0.08067100597330357, 10: 0.0654765113515673, 11: 0.07702790911312701, 12: 0.054075200366247446}


### RANKING SENTENCES

This code snippet ranks sentences by their PageRank scores in descending order, aiding in prioritizing important content for text summarization or analysis tasks.

In [147]:
# Sorts sentences based on their PageRank scores in descending order.

ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
print("Indexes of top ranked_sentence order are \n\n", ranked_sentence)

Indexes of top ranked_sentence order are 

 [(0.0984332545355071, ['The', 'company', 'will', 'provide', 'AI', 'development', 'tools', 'and', 'Azure', 'AI', 'services', 'such', 'as', 'Microsoft', 'Cognitive', 'Services,', 'Bot', 'Services', 'and', 'Azure', 'Machine', 'Learning.According', 'to', 'Manish', 'Prakash,', 'Country', 'General', 'Manager-PS,', 'Health', 'and', 'Education,', 'Microsoft', 'India,', 'said,', '"With', 'AI', 'being', 'the', 'defining', 'technology', 'of', 'our', 'time,', 'it', 'is', 'transIn', 'an', 'attempt', 'to', 'build', 'an', 'AI-ready', 'workforce,', 'Microsoft', 'announced', 'Intelligent', 'Cloud', 'Hub', 'which', 'has', 'been', 'launched', 'to', 'empower', 'the', 'next', 'generation', 'of', 'students', 'with', 'AI-ready', 'skills']), (0.0944374740565271, ['As', 'part', 'of', 'the', 'program,', 'the', 'Redmond', 'giant', 'which', 'wants', 'to', 'expand', 'its', 'reach', 'and', 'is', 'planning', 'to', 'build', 'a', 'strong', 'developer', 'ecosystem', 'in', 'In

### SUMMARIZING TEXT

In [148]:
# Generates a summary by selecting the top-ranked sentences based on user input.

n = int(input("How many sentences do you want in the summary? "))    #n=2
summarize_text = []
for i in range(n):
      summarize_text.append(" ".join(ranked_sentence[i][1]))

How many sentences do you want in the summary? 5


### PRINTING SUMMARY

In [149]:
# Displays the summarized text composed of selected sentences.

print("Summarize Text: \n", ". ".join(summarize_text))

Summarize Text: 
 The company will provide AI development tools and Azure AI services such as Microsoft Cognitive Services, Bot Services and Azure Machine Learning.According to Manish Prakash, Country General Manager-PS, Health and Education, Microsoft India, said, "With AI being the defining technology of our time, it is transIn an attempt to build an AI-ready workforce, Microsoft announced Intelligent Cloud Hub which has been launched to empower the next generation of students with AI-ready skills. As part of the program, the Redmond giant which wants to expand its reach and is planning to build a strong developer ecosystem in India with the program will set up the core AI infrastructure and IoT Hub for the selected campuses. As part of the program, the Redmond giant which wants to expand its reach and is planning to build a strong developer ecosystem in India with the program will set up the core AI infrastructure and IoT Hub for the selected campuses. The company will provide AI de