In [1]:
from cube.api import Cube

import sys  
sys.path.insert(0, '../')

import text_rank

In [2]:
def extract_sentences(sentences, length, length_pct, clean):
    sentence_tokens = []
    for sentence in sentences:
        words = []
        for entry in sentence:
            words.append(entry.word)

        sentence_tokens.append(' '.join(words))

    return text_rank.extract_sentences(sentence_tokens, 
                                       summary_length=length, 
                                       summary_length_pct=length_pct, 
                                       clean_sentences=clean)

In [3]:
def extract_key_phrases(sentences, keywords_length, keywords_length_pct):
    # Perform POS Tagging
    tagged = []
    for sentence in sentences:
        for entry in sentence:
            #tagged.append((entry.word, entry.upos))
            tagged.append((entry.lemma, entry.upos, entry.word))

    return text_rank.extract_key_phrases(tagged, keywords_length=keywords_length, keywords_length_pct=keywords_length_pct)

In [4]:
def summarize_tokenize(language, 
                       article='1.txt', 
                       summary_length=100, 
                       summary_length_pct=0.2, 
                       clean=True, 
                       keywords_length=10, 
                       keywords_length_pct=0.1):
    #load json object
    with open(f'data/{language}/articles/{article}', encoding='utf-8') as f:
        text = f.read()

    print('Original text')
    print('=====================================')
    print(text)
    print('=====================================')
    
    cube = Cube(verbose=True)
    cube.load(language)
    
    sentences = cube(text)
    
    summary = extract_sentences(sentences, length=summary_length, length_pct=summary_length_pct, clean=clean)
    
    print(f'Text summary in {language}')
    print('=====================================')
    print(summary)
    print('=====================================')
    
    phrases = extract_key_phrases(sentences, keywords_length=keywords_length, keywords_length_pct=keywords_length_pct)
    print(f'Key topics in {language}')
    print('=====================================')
    print(phrases)
    print('=====================================')

In [5]:
summarize_tokenize('ro')

Original text
Cea mai mare companiei de asigurări din Republica Moldova, Moldasig, intenţionează să procure 99,932% din acţiunile societăţii de asigurări ASITO Kapital din România şi să-şi extindă afacerile pe piaţa românească.

"În Moldova nu mai sunt

oportunităţi de investiţii"

Vitalie Bodea, director general Moldasig, a declarat pentru publicaţia Economist că intenţia de a investi pe piaţa românească este determinată şi de faptul, că în Republica Moldova nu prea există la moment oportunităţi de investiţii. "În afară de depozitele bancare, la care au scăzut considerabil ratele dobânzilor, nu avem unde să investim. De aceea am ales să ne diversificăm portofoliul investiţional şi am ales compania ASITO-Capital", a explicat el, fără a da însă detalii vizavi de suma investiţiei. "Deocamdată nu putem vorbi despre investiţie ca despre una finalizată, or trebuie să primim şi avizul Comisiei de Supraveghere a Asigurărilor din România,", a subliniat Bodea. După toate probabilităţile, acesta

In [6]:
summarize_tokenize('en')

Original text
Last week, the supervision authority of the Moldavian insurance market authorized the decrease from 80% to 0% of the share owned by the Russian company ROSGOSSTRAKH in the capital of MOLDASIG, the leader of the Moldavian insurance market.According to the Chisinau press, ROSGOSSTRAKH intends to sell its 80% of MOLDASIG's shares because the Russian group wants to focus on the development in the Russian Federation. According to certain market sources, they are currently negotiating with a number of potential buyers and the decision will be announced shortly.At the end of October 2011, the leader of the Moldavian insurance market extended on the Romanian market, MOLDASIG being authorized by ISC to buy the company ASITO KAPITAL.MOLDASIG was founded in 2002 by three companies with shareholdings mostly owned by the State: The Economy Bank - 51%, The Moldavian Railway - 25% and the Moldavian Postoffice - 24%. In 2008, ROSGOSSTRAKH became the owner of 80% of shareholdings, after a