### Summarizing a sentence using simple
I summarize a sentence using a simple idea. First, I count the frequency of each word in the document given. Then, for each sentence I give it a score according the words in the sentence and how frequent they are. Then, I choose the sentences with the top scores.

In [1]:
# Import the needed libraries and helper files

import utils
from collections import Counter
import random
import numpy as np

In [2]:
'''
Calculates the frequencies of words in the document

Args:
   words: array contains preprocessed words from the original text

Returns:
    freqs: dictionary contains each word's int value as a key and its normalized frequency as a value
    vocab_to_int: dictionary contains words a keys and their int value as values
'''
def get_frequencies(words):
    vocab_to_int, int_to_vocab = utils.create_lookup_tables(words)
    int_words = [vocab_to_int[word] for word in words]

    word_counts = Counter(int_words)

    total_count = len(int_words)
    freqs = {word: count/total_count for word, count in word_counts.items()}
    return freqs,vocab_to_int

In [3]:
'''
Preprocesses text by removing unneeded punctuation.

Args:
    text: string containing the document text

Returns:
    text: preprocessed string with puncatuations replaced by spaces except for commas ',' and periods '.'
'''
def preprocess_text(text):
    text = text.lower()
    text = text.replace('"', ' ')
    text = text.replace(';', ' ')
    text = text.replace('!', ' ')
    text = text.replace('?', ' ')
    text = text.replace('(', ' ')
    text = text.replace(')', ' ')
    text = text.replace('--', ' ')
    text = text.replace('?', ' ')
    text = text.replace(':', ' ')
    text = text.replace('\n', ' ')
    text = text.replace('  ', ' ')
    text = text.replace("'"," ")
    return text

    
'''
Convert text to array of sentences

Args:
    text: string containing the document text

Returns:
    sentences: array of sentences (a sentences is seperated from another by a comma ',' or a period '.')
'''
def preprocess_to_sentences(text):
    text = preprocess_text(text)
    sentences = ",".join(text.split('.'))
    return sentences.split(',')


'''
Convert text to array of words

Args:
    text: string containing the document text

Returns:
    words: array of words in the text
'''
def preprocess_to_words(text):
    text = preprocess_text(text)
    text = text.replace('.', ' ')
    text = text.replace(',', ' ')
    words = text.split(' ')
    return words

'''
Assigns a score to a sentence

Args:
    sentence: string that is getting scores
    freqs: dictionary contains each word's int value as a key and its normalized frequency as a value
    vocab_to_int: dictionary contains words a keys and their int value as values

Returns:
    score: the sentence score
'''
def score(sentence,freqs,vocab_to_int):
    score = 0
    sentence_arr = sentence.split(' ')
    for word in sentence_arr:
        score += freqs[vocab_to_int[word]]
    return score

'''
Summarizes a text (reduces the number of sentences by a ratio reduction_ratio)

Args:
    text: the original text to summarize
    reduction_ratio: f the original text is 100 sentences keep reduction_ratio*100 sentences in the summarization

Returns:
    summarized: the summarized text
'''
def summarize(text,reduction_ratio=0.4):
    sentences = preprocess_to_sentences(text)
    words_arr = preprocess_to_words(text)
    freqs,vocab_to_int = get_frequencies(words_arr)

    # Calculate the scores for each sentence
    sentence_scores = np.zeros(len(sentences))
    for idx,sentence in enumerate(sentences):
        sentence_scores[idx] += score(sentence,freqs,vocab_to_int)
    sentence_scores = np.array(sentence_scores)
    
    # Find the sentences with the best scores
    num_sentences_target = int(len(sentences)*reduction_ratio)
    best_sentences = []
    sentence_indices = []
    for _ in range(num_sentences_target):
        best_rank_index = np.argmax(sentence_scores)
        sentence_scores = np.delete(sentence_scores,best_rank_index)
        best_sentences.append(sentences[best_rank_index])
        sentence_indices.append(best_rank_index)
        del sentences[best_rank_index]

    # Reorder the best sentences found according to the order of their appearance in the original text
    answer_arr = []
    for _ in range(len(best_sentences)):
        smallest_index = np.argmin(sentence_indices)
        temp_str = best_sentences[smallest_index]
        answer_arr.append(temp_str)
        del best_sentences[smallest_index]
        sentence_indices = np.delete(sentence_indices,smallest_index)

    # Join the answer arr contains the sentences using '.'
    summarized = ".".join(answer_arr)
    
    # Return answer
    return summarized

### Input block
Input a text file and the ratio of sentences you want to keep.

In [4]:
with open('text_test') as f:
    text = f.read()

print(text)

Recycling is the process of converting waste materials into new materials and objects. It is an alternative to "conventional" waste disposal that can save material and help lower greenhouse gas emissions. Recycling can prevent the waste of potentially useful materials and reduce the consumption of fresh raw materials, thereby reducing: energy usage, air pollution (from incineration), and water pollution (from landfilling).

Recycling is a key component of modern waste reduction and is the third component of the "Reduce, Reuse, and Recycle" waste hierarchy.[1][2] Thus, recycling aims at environmental sustainability by substituting raw material inputs into and redirecting waste outputs out of the economic system.[3]

There are some ISO standards related to recycling such as ISO 15270:2008 for plastics waste and ISO 14001:2015 for environmental management control of recycling practice.

Recyclable materials include many kinds of glass, paper, cardboard, metal, plastic, tires, textiles, an

In [5]:
reduction_ratio = 0.1
summarized = summarize(text,reduction_ratio)
print(summarized)

 recycling can prevent the waste of potentially useful materials and reduce the consumption of fresh raw materials. recycling is a key component of modern waste reduction and is the third component of the reduce. this is often difficult or too expensive compared with producing the same product from raw materials or other sources . removal and reuse of mercury from thermometers and thermostats 
