In [1]:
import sys
sys.path.append('..')

In [2]:
from tesufr import Processor, TextProcessParams, SummarySize
from tesufr.cores import SummaCore, FallbackCore
from tesufr.cores.em_core import EmCoresWrapper
from tesufr.corpora.providers import BbcNewsProvider, Krapivin2009Provider, LimitedProvider
from tesufr.keysum_evaluator import evaluate_processor_on_corpus
from tesufr.corpora import SetType, CorpusDocument, CorpusPurpose



In [3]:
processor_baseline = Processor([FallbackCore()])
processor_summa = Processor([SummaCore()])
processor_em = Processor([EmCoresWrapper()])

In [4]:
def process_and_report(text, process_params, processor):
    doc = processor.process_text(text, process_params)
    print('====================================')
    print("Keywords: "+' | '.join([str(kw) for kw in doc.keywords]))
    print()
    print("Named entities:")
    for ne in doc.entities:
        print(f"{ne.lemma} ({ne.subkind})")
    print()
    print(f"Summary ({len(doc.summary)}):")
    for s in doc.summary:
        print("* "+s.lemma)

In [5]:
# https://www.theguardian.com/us-news/2019/may/02/why-we-are-addicted-to-conspiracy-theories
text_en = open('theguardian.txt', 'rt', encoding='utf-8').read()
print(text_en[:200])

Why we are addicted to conspiracy theories
Outsiders and the disenfranchised have always embraced the existence of wild plots and cover-ups. But now the biggest conspiracy-mongers are in charge.

By A


In [6]:
tpp = TextProcessParams(SummarySize.new_relative(0.1), 10)
process_and_report(text_en, tpp, processor_baseline)

Keywords: 0:'that'(81) | 0:'conspiracy'(35) | 0:'they'(31) | 0:'Jones'(24) | 0:'people'(20) | 0:'have'(19) | 0:'about'(18) | 0:'Trump'(18) | 0:'with'(17) | 0:'group'(16)

Named entities:

Summary (15):
* Why we are addicted to conspiracy theories
* Outsiders and the disenfranchised have always embraced the existence of wild plots and cover-ups.
* But now the biggest conspiracy-mongers are in charge.
* By Anna Merlan
* In January 2015, I spent the longest, queasiest week of my life on a cruise ship filled with conspiracy theorists.
* As our boat rattled toward Mexico and back, I heard about every wild plot, secret plan and dark cover-up imaginable.
* It was mostly fascinating, occasionally exasperating and the cause of a headache that took months to fade.
* To my pleasant surprise, given that I was a reporter travelling among a group of deeply suspicious people, I was accused of working for the CIA only once.
* The unshakeable certainty possessed by many of the conspiracy theorists some

In [7]:
process_and_report(text_en, tpp, processor_summa)

Keywords: 0:'conspiracy'(0) | 0:'conspiracies'(0) | 0:'health'(0) | 0:'trump'(0) | 0:'group'(0) | 0:'groups'(0) | 0:'jones'(0) | 0:'medical'(0) | 0:'people'(0) | 0:'products'(0) | 0:'product'(0) | 0:'vaccines'(0) | 0:'vaccinate'(0) | 0:'vaccine'(0) | 0:'politics'(0) | 0:'political'(0)

Named entities:

Summary (14):
* Jones also made less adorably kooky claims: that a number of mass shootings and acts of terrorism, such as the 1995 Oklahoma City bombing, were faked by the government; that the CEO of Chobani, the yogurt company, was busy importing “migrant rapists” to work at its Idaho plant; that Hillary Clinton is an actual demon who smells of sulphur, hails from Hell itself and has “personally murdered and chopped up and raped” little children.
* Soon after, the US narrowly elected a conspiracy enthusiast as its president, a man who wrongly believes that vaccines cause autism, that global warming is a hoax perpetuated by the Chinese “in order to make US manufacturing non-competitive,

In [8]:
process_and_report(text_en, tpp, processor_em)



Keywords: 0:'conspiracy theorist'(8) | 0:'suspicious people'(1) | 0:'human health'(1) | 0:'medical conspiracy thinking'(1) | 0:'reliable persistence'(1)

Named entities:
Anna Merlan (PERSON)
January 2015 (DATE)
Mexico (GPE)
month (DATE)
CIA (ORG)
american (NORP)
Sean David Morton (PERSON)
Twenty sixteen (DATE)
Morton (ORG)
IRS (ORG)
2017 (DATE)
six year (DATE)
about 2016 (CARDINAL)
Morton (PERSON)
US (GPE)
Sandy Hook (PERSON)
Matthew (PERSON)
the Conspira - Sea Cruise (FAC)
Jezebel (PERSON)
July 2016 (DATE)
Cleveland (GPE)
Ohio (GPE)
thousand (CARDINAL)
Donald Trump (PERSON)
republican (NORP)
anti - Hillary Clinton (PERSON)
TRUMP (ORG)
HILLARY (ORG)
SUCKS (PRODUCT)
InfoWars (ORG)
Austin (GPE)
Texas (GPE)
Alex Jones (PERSON)
many year (DATE)
Jones (PERSON)
Barack Obama (PERSON)
Kenya (GPE)
Jim Watson (PERSON)
AFP / Getty (ORG)
1995 (DATE)
Oklahoma City (GPE)
Chobani (NORP)
Idaho (GPE)
Hillary Clinton (PERSON)
Trump (ORG)
one (CARDINAL)
first (ORDINAL)
Jones ’s (PRODUCT)
Skype (ORG)
Trum

In [9]:
# https://www.cicero.de/innenpolitik/grundgesetz-freiheit-demokratie-meinungsfreiheit-debattenkultur
text_de = open('cicero1.txt', encoding='utf-8').read()
print(text_de[:200])

Warum Freiheit, warum Demokratie?
EIN GASTBEITRAG VON OTTO DEPENHEUER am 24. Mai 2019

Zu seinem 70. Geburtstag wollen die Lobeshymnen auf das Grundgesetz nicht enden. Doch damit einher geht auch die 


In [10]:
tpp = TextProcessParams(SummarySize.new_relative(0.1), 10)
process_and_report(text_de, tpp, processor_em)



Keywords: 0:'grundgesetzlichen Demokratie'(1) | 0:'vorgegeben Wahrheit'(1) | 0:'verfassungsrechtlich Problem'(1) | 0:'Selbstbestimmung'(1) | 0:'Frage'(2) | 0:'Selbstverständnis'(1) | 0:'politisch Grundfragen'(1)

Named entities:
EIN GASTBEITRAG (MISC)
OTTO DEPENHEUER (ORG)
Chemnitz (LOC)
AUTORENINFO (ORG)
Otto Depenheuer (PER)
Universität von Köln (ORG)
kein Verfassung Deutschland (MISC)
Bundesrepublik Deutschland (LOC)
Kraft (LOC)
Zeit (MISC)
Aufstiegs (MISC)
Wohlstand (LOC)
Deutschland (LOC)
GG-Jubiläen (LOC)
“ (LOC)
Folgende (MISC)
verfassungsrechtlich (MISC)
Bibel (MISC)
Problem (LOC)
146 GG (MISC)
erden (LOC)
Rahmendaten (LOC)
deutsch (MISC)
Staat (LOC)
Lobhudelei (LOC)
Glaubt (PER)
Deutsche Bundestag (ORG)
EURO-Rettung (MISC)
CO2-Ausstoss (MISC)
demokratische (MISC)
Befund (LOC)
demokratisch (MISC)
Frankfurter Allgemeine Zeitung (ORG)
“ (PER)
Meinungsspektrum (LOC)
Mitreden (LOC)
Abstimmen (LOC)
Reformation (MISC)
Hegel (LOC)
modern Staat (LOC)
Mehrheitsprinzips (MISC)
Rechtsgese