<a href="https://colab.research.google.com/github/samueljsluo/TextSummerize/blob/main/clusteringTextSummarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
from summarizer import Summarizer

In [5]:
text = """
SINCE PRESIDENT JOE Biden on March 11 directed that states make every adult eligible for a coronavirus vaccine by May 1, many states have ramped up their vaccine rollouts; 
moving up dates and announcing new eligibility to meet the president's timeline. But vaccine rollouts vary by state.
Alaska was the first state to announce and implement eligibility for all adults on March 9. 
Mississippi has since followed suit, with all individuals 16 and older becoming eligible on March 16, while West Virginia opened up eligibility to all adults on March 22. 
All Arizonans 16 and older are eligible for a vaccine on March 24, and adults in Texas are eligible as of March 29.
Still, most states are weeks away from opening up eligibility entirely. 
For the majority of states, elderly populations and health care workers have been prioritized, with eligibility being opened up to those with certain high-risk medical conditions and other essential workers more recently.
But just because states make certain populations eligible does not mean those individuals will secure a vaccine anytime soon, and some populations will continue to be prioritized above others, depending on the state's approach. 
While some states have taken on an age-based vaccine rollout, others have instituted an equity-based rollout, while others have gone for a hybrid approach. 
Rhode Island, for example, is accelerating distribution of the vaccines to those living in ZIP codes disproportionately impacted by the coronavirus, and to those with certain health conditions that make them more vulnerable.
Regardless of approach, some individuals across the country are getting vaccinated without necessarily having priority at the state level, as vaccine rollouts operate differently at the federal, state and county levels. In Delaware, for example, those 50 and older are eligible for a vaccine at local pharmacies, but not with medical providers, or at hospitals. 
And in various parts of the country others, still, are able to get a dose of a vaccine by being in the right place at the right time, such as at a grocery store as the day comes to an end, and unused vaccines run the risk of going to waste.
"""

In [8]:
model = Summarizer()
result = model(text, min_length=60)
print(result)

SINCE PRESIDENT JOE Biden on March 11 directed that states make every adult eligible for a coronavirus vaccine by May 1, many states have ramped up their vaccine rollouts; 
moving up dates and announcing new eligibility to meet the president's timeline. Still, most states are weeks away from opening up eligibility entirely. While some states have taken on an age-based vaccine rollout, others have instituted an equity-based rollout, while others have gone for a hybrid approach.


In [13]:
from nltk.tokenize import sent_tokenize

import tensorflow_hub as hub

from sklearn.metrics.pairwise import cosine_similarity
import networkx as nx

import re

In [14]:
sentences = sent_tokenize(text)
sentences = [re.sub('\n', '', i) for i in sentences]

In [16]:
module_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
embed = hub.load(module_url)

INFO:absl:Using /tmp/tfhub_modules to cache modules.
INFO:absl:Downloading TF-Hub Module 'https://tfhub.dev/google/universal-sentence-encoder/4'.
INFO:absl:Downloaded https://tfhub.dev/google/universal-sentence-encoder/4, Total size: 987.47MB
INFO:absl:Downloaded TF-Hub Module 'https://tfhub.dev/google/universal-sentence-encoder/4'.


In [43]:
sentences_embeddings = embed(sentences)

In [9]:
from sklearn.metrics import pairwise_distances_argmin_min
import numpy as np
from sklearn.cluster import KMeans

In [53]:
def sent_closest_centroids(embedding):
  kmeans = KMeans(n_clusters=10)
  kmeans = kmeans.fit(sentences_embeddings)

  n_clusters = int(np.ceil(len(sentences_embeddings)**0.6))

  avg = []
  for j in range(n_clusters):
      idx = np.where(kmeans.labels_ == j)[0]
      avg.append(np.mean(idx))
  closest, _ = pairwise_distances_argmin_min(kmeans.cluster_centers_, sentences_embeddings)
  ordering = sorted(range(n_clusters), key=lambda k: avg[k])
  summary = [sentences[closest[idx]] for idx in ordering]
  return summary

In [54]:
summary = sent_closest_centroids(sentence_embeddings)
summary

['Alaska was the first state to announce and implement eligibility for all adults on March 9.',
 'All Arizonans 16 and older are eligible for a vaccine on March 24, and adults in Texas are eligible as of March 29.',
 'For the majority of states, elderly populations and health care workers have been prioritized, with eligibility being opened up to those with certain high-risk medical conditions and other essential workers more recently.',
 'While some states have taken on an age-based vaccine rollout, others have instituted an equity-based rollout, while others have gone for a hybrid approach.',
 "But just because states make certain populations eligible does not mean those individuals will secure a vaccine anytime soon, and some populations will continue to be prioritized above others, depending on the state's approach."]

In [39]:
from sentence_transformers import SentenceTransformer

In [74]:
model = SentenceTransformer('paraphrase-distilroberta-base-v1')

In [70]:
sentence_embeddings = model.encode(sentences)

In [73]:
summary = sent_closest_centroids(sentence_embeddings)
summary

["SINCE PRESIDENT JOE Biden on March 11 directed that states make every adult eligible for a coronavirus vaccine by May 1, many states have ramped up their vaccine rollouts; moving up dates and announcing new eligibility to meet the president's timeline.",
 'Mississippi has since followed suit, with all individuals 16 and older becoming eligible on March 16, while West Virginia opened up eligibility to all adults on March 22.',
 'Still, most states are weeks away from opening up eligibility entirely.',
 'While some states have taken on an age-based vaccine rollout, others have instituted an equity-based rollout, while others have gone for a hybrid approach.',
 'Rhode Island, for example, is accelerating distribution of the vaccines to those living in ZIP codes disproportionately impacted by the coronavirus, and to those with certain health conditions that make them more vulnerable.']