<a href="https://colab.research.google.com/github/s-miramontes/News_Filter/blob/master/notebooks/summarize_training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Summarize Training Clusters with BERT 

In [0]:
# %%capture
# # Install the latest Tensorflow version.
# !pip3 install --upgrade tensorflow-gpu
# # Install TF-Hub.
# !pip3 install tensorflow-hub

In [0]:
# import statements 

import pandas as pd
import numpy as np

from summarizer import Summarizer

from sklearn.metrics.pairwise import cosine_similarity

from joblib import Parallel, delayed

import heapq
import operator

from absl import logging

import tensorflow as tf
import tensorflow_hub as hub

## Create Summaries for Clusters from Training Data

In [0]:
# import cluster data

# clusters = pd.read_csv("news_filter/data/clusters.csv") 
clusters = pd.read_csv("news_filter/data/filter_clusters.csv") # filtered data

In [12]:
clusters.head()

Unnamed: 0,id,title,publication,author,date,year,month,url,content,pub_bias,polarity,subjectivity,text,cluster_labels
0,207361,Here’s how the West should respond to the Macr...,Washington Post,Editorial Board,2017-05-08,2017.0,5.0,https://web.archive.org/web/20170509003603/htt...,THE MASSIVE leak of documents from the campai...,left-center,0.097736,0.34538,Here’s how the West should respond to the Macr...,1
1,59653,Graham: Russia’s ’trying to undermine democrac...,CNN,Eugene Scott,2016-12-10,2016.0,12.0,,(CNN) Sen. Lindsey Graham said Saturday that ...,left-center,0.058119,0.383503,Graham: Russia’s ’trying to undermine democrac...,1
2,57968,Russian intelligence agencies behind US electi...,CNN,Tal Kopan,2016-09-22,2016.0,9.0,,Washington (CNN) The top Democrats on Congress...,left-center,0.018389,0.290897,Russian intelligence agencies behind US electi...,1
3,64937,’They are totally embarrassed!’: Trump goes on...,Business Insider,Jeremy Berke,2017-01-08,2017.0,1.0,,’ ’ ’ Donald Trump said Saturday that the...,center,0.095214,0.618889,’They are totally embarrassed!’: Trump goes on...,1
4,17439,Trump Calls for Closer Relationship Between U....,New York Times,Nicholas Fandos,2017-01-08,2017.0,1.0,,WASHINGTON — A day after the release of a d...,left-center,0.036063,0.344684,Trump Calls for Closer Relationship Between U....,1


In [0]:
# instantiate summarizer
model = Summarizer()

# function to return summary of each article in cluster
def make_summaries(cluster):
  result = {}
  for i in range(len(cluster.content)):
    summary = model(cluster.content[i], min_length=50, ratio=0.20) 
    result[i] = ''.join(summary)
  return result

In [0]:
# summarize every aritcle in clusters
cluster_summaries = []
for i in range(1,6):
  summaries = make_summaries(clusters[clusters.cluster_labels == i].reset_index())
  cluster_summaries.append(summaries)

In [15]:
cluster_summaries

[{0: 'THE MASSIVE leak of documents from the campaign of Emmanuel Macron failed to prevent his landslide victory Sunday in the French presidential election. What was most striking about the operation was not its impact on the election, but its sheer audacity. Russia must be deterred from waging cyberwar against core democratic institutions of the United States and its NATO allies. An appropriate response must begin with full investigation and disclosure. Marine Le Pen, Mr. Macron’s   opponent, traveled to Moscow to meet Mr. Putin during the campaign and spoke publicly about what Mr. Macron says were fabricated reports that he holds an offshore bank account.',
  1: '(CNN) Sen. Lindsey Graham said Saturday that Russia is trying to ”is trying to break the backs of democracies.” ” Donald Trump’s presidential transition team issued Friday a stunning rebuke of Intelligence communities who have said, in recent reports from CNN and the Washington Post, that Russian hackers not only wanted to d

## Create Summary of Summaries for each Cluster

In [0]:
# summarize summaries of each cluster 
summary_of_summaries = []
for summaries in cluster_summaries:
  summary = ' '.join(list(summaries.values()))
  summary_of_summaries.append(model(summary))

In [18]:
summary_of_summaries

['THE MASSIVE leak of documents from the campaign of Emmanuel Macron failed to prevent his landslide victory Sunday in the French presidential election. An appropriate response must begin with full investigation and disclosure. We call on President Putin to immediately order a halt to this activity. Intelligence Committee members receive classified briefings they can’t speak about in public. WASHINGTON  —   A day after the release of a damning intelligence report on Russia’s   efforts to influence the American election,   Donald J. Trump called on Saturday for a closer relationship between the two nations, saying only “stupid” people or “fools” would think this was unwise. Ukrainian computer specialists mobilized to restore operations in time for the elections. Says electors potentially undermining democracy pic. Toward the beginning of the hearing, Comey said that he has no doubt that Russia attempted to interfere in the 2016 election and that Russian government officials were aware o

## Output Final Summary

In [19]:
input_topics = ["Russian interference with election", "Immigration and customs enforcement", "Ariana Grande Manchester bombing", "UC Berkeley student protests", "Suicide Squad movie"]

summaries_df = pd.DataFrame({'topic': input_topics, 'summary': summary_of_summaries})
summaries_df

Unnamed: 0,topic,summary
0,Russian interference with election,THE MASSIVE leak of documents from the campaig...
1,Immigration and customs enforcement,Reports of immigration sweeps this week sparke...
2,Ariana Grande Manchester bombing,"Manchester, England (CNN) Monday’s attack outs..."
3,UC Berkeley student protests,"They gathered near the White House, dishearten..."
4,Suicide Squad movie,Los Angeles (CNN)”Suicide Squad” once seemed t...


In [0]:
summaries_df.to_csv('news_filter/data/filter_summaries.csv', index=False)