<a href="https://colab.research.google.com/github/id-shiv/knowledge_base/blob/master/%5BProject_103%5D_Text_Summarization_with_SpaCy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Import Libraries

In [0]:
import spacy  # Library to perform natural language processing
from spacy.lang.en.stop_words import STOP_WORDS

# Document

In [0]:
# Document : Text content to be feb for summarization

document = "Enhance visibility into the health of your Dell EMC hardware environment Dell EMC Server and Network switch, Storage and Printer Management \
Packs for Microsoft System Center Operations Manager (SCOM) deliver greater visibility into hardware inventory details, components and configurations \
essential to optimizing the availability of your infrastructure hardware.Dell EMC Server Management Pack Suite offers both in-band and agent-free \
options that enable System Center Operations Manager to discover, monitor and accurately depict the status of Dell PowerEdge servers and modular platforms, \
integrated Dell Remote Access Controllers (iDRACs) and OpenManage Enterprise–Modular (OME-Modular), Chassis Management Controllers (CMCs) on a \
defined network segment.Agent-free out-of-band monitoring is available for 12th generation through current platforms Dell EMC PowerEdge servers and agent-based monitoring with \
OpenManage Server Administrator (OMSA) is available for 11th generation through current platforms PowerEdge servers and PowerVault NX storage appliances.\
OpenManage Integration for Microsoft System Center (OMIMSSC) for System Center Operations Manager (SCOM) is available to deploy as a virtual appliance for \
monitoring in SCOM environments for agent-free management."

# Processing for Summarization

In [0]:
# Load the document for the processing
nlp = spacy.load('en')
docx = nlp(document)

In [0]:
# Tokenize
mytokens = [token.text for token in docx]

# Build a List of Stopwords
stopwords = list(STOP_WORDS)

# Build a dictionary with {'TOKEN' : 'TOKEN_COUNT'}
token_frequency = {}
for token in mytokens:
  if token not in stopwords:
    if token not in token_frequency.keys():
        token_frequency[token.lower()] = 1
    else:
        token_frequency[token.lower()] += 1

# token_frequency

In [0]:
# Normalize the frequency between 0 and 1
token_frequency_max = max(token_frequency.values())
for word in token_frequency.keys():  
  token_frequency[word] = (token_frequency[word]/token_frequency_max)

In [65]:
# Sentence Scores
# Sentence Score (Sentence) = Sum (Frequency of Eeach Word in the Sentence)

sentence_scores = {}  
for sentence in docx.sents: 
  for word in sentence:
    if word.text.lower() in token_frequency.keys():
      if len(sentence.text.split(' ')) < 100:
        if sentence not in sentence_scores.keys():
            sentence_scores[sentence] = token_frequency[word.text.lower()]
        else:
            sentence_scores[sentence] += token_frequency[word.text.lower()]

sentence_scores

{Enhance visibility into the health of your Dell EMC hardware environment Dell EMC Server and Network switch, Storage and Printer Management Packs for Microsoft System Center Operations Manager (SCOM) deliver greater visibility into hardware inventory details, components and configurations essential to optimizing the availability of your infrastructure hardware.: 9.25,
 Dell EMC Server Management Pack Suite offers both in-band and agent-free options that enable System Center Operations Manager to discover, monitor and accurately depict the status of Dell PowerEdge servers and modular platforms, integrated Dell Remote Access Controllers (iDRACs) and OpenManage Enterprise–Modular (OME-Modular), Chassis Management Controllers (CMCs) on a defined network segment.: 17.75,
 Agent-free out-of-band monitoring is available for 12th generation through current platforms Dell EMC PowerEdge servers and agent-based monitoring with OpenManage Server Administrator (OMSA) is available for 11th generati

# Summarization

In [66]:
number_of_sentences = 2
top_n_sentence_scores = sorted(sentence_scores.values(), reverse=True)[:number_of_sentences]

summary = ""
for sentence, sentence_score in sentence_scores.items():
  for score in top_n_sentence_scores:
    if sentence_score == score:
      summary = summary + str(sentence)

for summarized_sentences in summary.split('.'):
  print(summarized_sentences)

Dell EMC Server Management Pack Suite offers both in-band and agent-free options that enable System Center Operations Manager to discover, monitor and accurately depict the status of Dell PowerEdge servers and modular platforms, integrated Dell Remote Access Controllers (iDRACs) and OpenManage Enterprise–Modular (OME-Modular), Chassis Management Controllers (CMCs) on a defined network segment
Agent-free out-of-band monitoring is available for 12th generation through current platforms Dell EMC PowerEdge servers and agent-based monitoring with OpenManage Server Administrator (OMSA) is available for 11th generation through current platforms PowerEdge servers and PowerVault NX storage appliances



# Function

In [0]:
import spacy  # Library to perform natural language processing
from spacy.lang.en.stop_words import STOP_WORDS

def text_summarizer(document, summary_percent=20):
  # Load the document for the processing
  nlp = spacy.load('en')
  docx = nlp(document)

  # Tokenize
  mytokens = [token.text for token in docx]

  # Build a List of Stopwords
  stopwords = list(STOP_WORDS)

  # Build a dictionary with {'TOKEN' : 'TOKEN_COUNT'}
  token_frequency = {}
  for token in mytokens:
    if token not in stopwords:
      if token not in token_frequency.keys():
          token_frequency[token.lower()] = 1
      else:
          token_frequency[token.lower()] += 1

  # Normalize the frequency between 0 and 1
  token_frequency_max = max(token_frequency.values())

  for word in token_frequency.keys():  
    token_frequency[word] = (token_frequency[word]/token_frequency_max)

  # Sentence Scores
  # Sentence Score (Sentence) = Sum (Frequency of Eeach Word in the Sentence)
  sentence_scores = {}  

  for sentence in docx.sents: 
    for word in sentence:
      if word.text.lower() in token_frequency.keys():
        if len(sentence.text.split(' ')) < 100:
          if sentence not in sentence_scores.keys():
              sentence_scores[sentence] = token_frequency[word.text.lower()]
          else:
              sentence_scores[sentence] += token_frequency[word.text.lower()]

  # Summarize using top n sentence scores
  number_of_sentences = int((summary_percent / 100 ) * len(list(docx.sents)))
  # print(number_of_sentences)
  top_n_sentence_scores = sorted(sentence_scores.values(), reverse=True)[:number_of_sentences]

  summary = ""
  for sentence, sentence_score in sentence_scores.items():
    for score in top_n_sentence_scores:
      if sentence_score == score:
        summary = summary + ' ' + str(sentence).strip()

  return summary

In [68]:
import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen

url = 'https://www.dell.com/community/Systems-Management-General/DellEMC-OpenManage-Integration-into-Windows-Admin-Center-not/td-p/7376175'
web_page = urlopen(url=url)
html_page = web_page.read()
page = soup(html_page, 'html.parser')
document = ' '.join([sentence.text.strip().replace('\n', ' ') for sentence in page.find_all('div', {'class': 'lia-message-body-content'})])

summary = text_summarizer(document.strip(), summary_percent=10)

print('\nOriginal:')
print(document)
print('\nSummarized:')
print(summary)

with open('Original.txt', 'w') as f:
  f.write("".join([sentence + '\n' for sentence in document.split('.')]))
with open('Summary.txt', 'w') as f:
  f.write("".join([sentence + '\n' for sentence in summary.split('.')]))


Original:
I am using Windows Admin Center, and I noticed recently there is a DellEMC OpenManage extension for it. I have been waiting for this for a while now, so I installed it but it doesnt' seem be to working. When I went to a server I have managed in WAC (A PowerEdge R720), it showed a page for accepting the usage agreement and what it was going to do (install a USB NIC). I agreed to this and it was gathering data for a bit, then it failed, saying the server was possibly off, rebooting, or did not have 445 open in the firewall (the latter was true). I opened the port but now I don't know how to install the components it needs now. The server is running Microsoft Hyper-V Server 2019 and another one is Running Hyper-V Server 2016. I also have another server on Windows Server 2016 that also fails, but it has port 445 open. I can't find a way to try and install the components again. I uninstalled and reinstalled the extension but it didn't show me the acceptance page again. I'm not su