# TextRank para obtener resúmenes

En este Notebook se implementará TextRank para obtener un resumen con las **oraciones clave** de todo un texto.

## Dependencias

In [None]:
!pip3 install wikipedia

In [None]:
!pip3 install 'txtai[pipeline]'

In [None]:
!pip3 install huggingface_hub==0.24.1

In [None]:
import re

import numpy as np
import scipy.linalg as splinalg

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

import wikipedia

# (Opcional) from txtai.pipeline import Translation

Punkt será nuestro tokenizador (https://www.nltk.org/api/nltk.tokenize.punkt.html)

In [None]:
nltk.download("punkt")
nltk.download("stopwords")
nltk.download("punkt_tab")

In [6]:
# Radicalizador
stemmer = PorterStemmer()

# Palabras de paro
cached_stopwords = stopwords.words('english')
print(cached_stopwords[:10])

['a', 'about', 'above', 'after', 'again', 'against', 'ain', 'all', 'am', 'an']


In [7]:
# Ejemplo de lista por comprension (list-comprehension)
lista = []
for i in range(9):
  lista.append('Hola')

print(lista)

# Otro modo de crearla
otra_lista = ['Hola' for i in range(9)]

print(otra_lista)

['Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola']
['Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola']


## Carga de datos

Los datos que ocuparemos serán el texto de páginas de Wikipedia. 

Descargaremos el texto ocupando el módulo [```wikipedia```](https://pypi.org/project/wikipedia/) que es un "wrapper" del API de Wikipedia. 

Al texto obtenido lo dividiremos en oraciones, procesaremos cada oración, radicalizaremos cada palabra, y aplicaremos TextRank para obtener las oraciones más importantes de todo el documento.

In [8]:
wiki = wikipedia.page('Expropiación del petróleo en México')
book = wiki.content
print(book)

The Mexican oil expropriation (Spanish: expropiación petrolera) was the nationalization of all petroleum reserves, facilities, and foreign oil companies in Mexico on March 18, 1938.  In accordance with Article 27 of the Constitution of 1917, President Lázaro Cárdenas declared that all mineral and oil reserves found within Mexico belong to the nation. The Mexican government established a state-owned petroleum company, Petróleos Mexicanos, or PEMEX.  For a short period, this measure caused an international boycott of Mexican products in the following years, especially by the United States, the United Kingdom, and the Netherlands, but with the outbreak of World War II and the alliance between Mexico and the Allies, the disputes with private companies over compensation were resolved. The anniversary, March 18, is now a Mexican civic holiday.


== Background ==

On August 16, 1935, the Petroleum Workers Union of Mexico (Sindicato de Trabajadores Petroleros de la República Mexicana) was form

## Preprocesamiento

Dividimos el texto en oraciones.

In [9]:
sentences = [x for x in sent_tokenize(book)]
print(f"# oraciones: {len(sentences)}")
for sentence in sentences[:3]:
    print(sentence)
    print()
    print("...Fin de la oración...")
    print()


# oraciones: 98
The Mexican oil expropriation (Spanish: expropiación petrolera) was the nationalization of all petroleum reserves, facilities, and foreign oil companies in Mexico on March 18, 1938.

...Fin de la oración...

In accordance with Article 27 of the Constitution of 1917, President Lázaro Cárdenas declared that all mineral and oil reserves found within Mexico belong to the nation.

...Fin de la oración...

The Mexican government established a state-owned petroleum company, Petróleos Mexicanos, or PEMEX.

...Fin de la oración...



Convertimos a minúsculas, eliminamos stopwords, eliminamos signos de puntuación y radicalizamos.

In [10]:
sent_low = [[stemmer.stem(re.sub('[^a-z]', "", word.lower())) for word in word_tokenize(sentence) if word not in cached_stopwords and len(word) > 2] for sentence in sentences]
sent_low[0]

['the',
 'mexican',
 'oil',
 'expropri',
 'spanish',
 'expropiacin',
 'petrolera',
 'nation',
 'petroleum',
 'reserv',
 'facil',
 'foreign',
 'oil',
 'compani',
 'mexico',
 'march',
 '']

## TextRank

Construimos la matriz de adyacencias/similitud A entre las oraciones, tomando el número de palabras que están en ambas como la similitud entre las dos oraciones.

In [11]:
A = np.zeros((len(sent_low), len(sent_low)))

for i in range(len(sentences)):
    # Comparamos las oraciones unas con otras, pero no consigo mismas
    for j in range(i+1, len(sentences)):
        # La simillitud entre oraciones va a ser el número de palabras que tienen en común
        A[i][j] = A[j][i] = len([x for x in sent_low[i] if x in sent_low[j]])

Así es como se ve un fragmento de la matriz A.

In [12]:
A[:5, :5]

array([[0., 6., 4., 3., 3.],
       [6., 0., 0., 1., 0.],
       [4., 0., 0., 2., 2.],
       [3., 1., 2., 0., 1.],
       [3., 0., 2., 1., 0.]])

In [13]:
[x for x in sent_low[0] if x in sent_low[1]]

['oil', 'nation', 'reserv', 'oil', 'mexico', '']

Normalizamos las columnas de A

In [14]:
suma = np.sum(A, axis=0) # Suma de columnas
print(f"Así se ve la suma:\n{suma}\n")
A_norm = A.copy()

for i in range(len(sentences)):
  if suma[i] != 0:
    A_norm[i,:] = A[i,:]/suma[i]

print(f"Así se ve (una parte de) la matriz normalizada:\n{A_norm[:5, :5]}")

Así se ve la suma:
[246. 134. 109. 114.  46. 189. 143. 130.  15.  61.  66.  86.  96. 148.
  87. 126. 126. 143. 166.  35.  33.  67. 137. 163. 109. 167.  64.  73.
  56.  84. 118. 124.  89.  28.  96.  64. 118. 107. 138.  41. 144.  52.
 178. 110. 124.   3.  36. 150. 260.  10.  62.  94.   1.   1. 142.  13.
 200. 109. 148.  52. 133. 121. 175. 138.  73.  24.  84.  19.  68.  70.
 152.  86.  88.  49.  68.  32.  19.  69. 118. 114.  43.   2.  62.  46.
   0.  92. 118.  46. 145. 138.  43.   0. 133.  71.  38.   0.  97.  47.]

Así se ve (una parte de) la matriz normalizada:
[[0.         0.02439024 0.01626016 0.01219512 0.01219512]
 [0.04477612 0.         0.         0.00746269 0.        ]
 [0.03669725 0.         0.         0.01834862 0.01834862]
 [0.02631579 0.00877193 0.01754386 0.         0.00877193]
 [0.06521739 0.         0.04347826 0.02173913 0.        ]]


In [15]:
print(A_norm[0,:].sum())

0.9999999999999999


Creamos el vector de TextRank lleno de unos e iteramos hasta que converja. Es decir, hasta que obtengamos $\Pi$ tal que $$\Pi = A~\Pi$$

In [16]:
# Para impresiones mas bonitas
np.set_printoptions(suppress=True)

In [17]:
# Tolerancia para la diferencia al comparar
tol = 1e-7

PI_ = np.ones(A_norm.shape[1])
A_norm_a = A_norm.T.copy()

i = 0
while True:
    pi_ = A_norm_a @ PI_
    print(f"[{i}]: Diferencia entre vectores: {abs(PI_- pi_).sum()}")
    if np.allclose(PI_, pi_, tol):
        break
    i += 1
    PI_ = pi_

[0]: Diferencia entre vectores: 47.21349371433569
[1]: Diferencia entre vectores: 10.539870447183898
[2]: Diferencia entre vectores: 1.613475191468893
[3]: Diferencia entre vectores: 0.4671999514884008
[4]: Diferencia entre vectores: 0.15898675563833087
[5]: Diferencia entre vectores: 0.06690911623521784
[6]: Diferencia entre vectores: 0.02985662112275554
[7]: Diferencia entre vectores: 0.013918377700827758
[8]: Diferencia entre vectores: 0.0065056004515607925
[9]: Diferencia entre vectores: 0.0030588092371227373
[10]: Diferencia entre vectores: 0.0014412999863831922
[11]: Diferencia entre vectores: 0.0006788868476072397
[12]: Diferencia entre vectores: 0.00031952635971416003
[13]: Diferencia entre vectores: 0.00015029971830221242
[14]: Diferencia entre vectores: 7.06622384748657e-05
[15]: Diferencia entre vectores: 3.320783597375587e-05
[16]: Diferencia entre vectores: 1.5600962630995044e-05
[17]: Diferencia entre vectores: 7.327399060242865e-06
[18]: Diferencia entre vectores: 3.4407

In [18]:
pi_

array([2.64008135, 1.43809311, 1.16979213, 1.22345233, 0.49367374,
       2.02835515, 1.5346814 , 1.39516491, 0.16098056, 0.65465431,
       0.70831448, 0.92295524, 1.03027562, 1.58834158, 0.93368728,
       1.35223677, 1.35223677, 1.53468143, 1.78151831, 0.37562134,
       0.35415725, 0.71904652, 1.47028917, 1.74932217, 1.16979211,
       1.79225035, 0.68685041, 0.78343875, 0.60099414, 0.90149119,
       1.26638045, 1.33077269, 0.95515135, 0.30049706, 1.03027562,
       0.68685041, 1.26638049, 1.14832806, 1.48102119, 0.44001354,
       1.54541343, 0.55806596, 1.91030273, 1.18052416, 1.3307727 ,
       0.03219611, 0.38635337, 1.6098057 , 2.7903299 , 0.10732037,
       0.66538637, 1.00881157, 0.01073204, 0.01073204, 1.52394942,
       0.1395165 , 2.14640762, 1.16979215, 1.58834162, 0.55806597,
       1.42736105, 1.29857661, 1.87810667, 1.48102124, 0.7834388 ,
       0.25756892, 0.9014912 , 0.20390872, 0.72977858, 0.75124266,
       1.63126976, 0.92295527, 0.94441936, 0.52586987, 0.72977

Alternativamente, podemos obtener los eigenvectores izquierdos de nuestra matriz `A_norm`.

Los valores de TextRank corresponden al vector de probabilidades del estado estacionario de la matriz A, que a su vez es el eigenvector izquierdo con eigenvalor asociado 1.

$$\Pi = \Pi A$$

In [38]:
eigvals, vecs = splinalg.eig(A_norm, left=True, right=False)

In [39]:
eigvals

array([ 1.        +0.j,  0.46923797+0.j,  0.36349296+0.j,  0.28758612+0.j,
        0.23006772+0.j,  0.19959846+0.j,  0.19227596+0.j,  0.17261886+0.j,
       -0.18571196+0.j,  0.15143536+0.j,  0.13074242+0.j,  0.12514176+0.j,
        0.11515043+0.j,  0.09087163+0.j,  0.09162308+0.j,  0.0778439 +0.j,
       -0.1545883 +0.j,  0.07360009+0.j,  0.06765788+0.j,  0.06224883+0.j,
        0.0543754 +0.j,  0.05205869+0.j, -0.13346946+0.j, -0.12833883+0.j,
        0.04994791+0.j,  0.04274986+0.j, -0.12382442+0.j, -0.12079234+0.j,
       -0.1188686 +0.j,  0.03233559+0.j,  0.02581788+0.j,  0.02114334+0.j,
       -0.1147782 +0.j, -0.1105213 +0.j, -0.11081125+0.j, -0.105142  +0.j,
       -0.10199103+0.j,  0.01703464+0.j, -0.09836112+0.j, -0.10869565+0.j,
        0.01331782+0.j,  0.01128001+0.j,  0.00798068+0.j, -0.09457616+0.j,
       -0.09194162+0.j, -0.09129427+0.j,  0.00285769+0.j, -0.08641183+0.j,
       -0.00292286+0.j, -0.00520691+0.j, -0.08466546+0.j, -0.08333362+0.j,
       -0.08242183+0.j, -

In [40]:
vecs.shape

(98, 98)

In [41]:
vecs

array([[ 0.23471851,  0.06378921, -0.12953925, ...,  0.        ,
         0.        ,  0.        ],
       [ 0.1278548 ,  0.12897131, -0.02955997, ...,  0.        ,
         0.        ,  0.        ],
       [ 0.10400129, -0.0603099 , -0.06779721, ...,  0.        ,
         0.        ,  0.        ],
       ...,
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  1.        ],
       [ 0.09255161, -0.02638722, -0.09339139, ...,  0.        ,
         0.        ,  0.        ],
       [ 0.04484459,  0.14697302,  0.19671471, ...,  0.        ,
         0.        ,  0.        ]])

In [42]:
pi2_ = vecs[:, 0]
pi2_

array([0.23471851, 0.1278548 , 0.10400129, 0.10877199, 0.04389045,
       0.18033251, 0.13644206, 0.12403824, 0.0143121 , 0.05820256,
       0.06297326, 0.08205606, 0.09159747, 0.14121276, 0.0830102 ,
       0.12022167, 0.12022167, 0.13644206, 0.15838728, 0.03339491,
       0.03148663, 0.0639274 , 0.13071722, 0.15552486, 0.10400129,
       0.15934143, 0.06106498, 0.06965224, 0.05343186, 0.08014778,
       0.11258855, 0.11831339, 0.08491848, 0.02671593, 0.09159747,
       0.06106498, 0.11258855, 0.10209301, 0.13167136, 0.03911975,
       0.1373962 , 0.04961529, 0.16983697, 0.10495543, 0.11831339,
       0.00286242, 0.03434905, 0.14312104, 0.24807647, 0.0095414 ,
       0.0591567 , 0.08968919, 0.00095414, 0.00095414, 0.13548792,
       0.01240382, 0.19082805, 0.10400129, 0.14121276, 0.04961529,
       0.12690066, 0.11545097, 0.16697455, 0.13167136, 0.06965224,
       0.02289937, 0.08014778, 0.01812867, 0.06488154, 0.06678982,
       0.14502932, 0.08205606, 0.08396434, 0.04675287, 0.06488

Obtenemos los índices de los k valores más grandes en $\Pi$ y los usamos para obtener las oraciones más relevantes.

In [None]:
k = 4
pi_.argsort()[-k:][::-1]
# Lo que pasó...
# Índices que ordenarían la lista en forma ascendente
# Obtenemos los últimos k
# Revertimos el orden

array([48,  0, 56,  5])

In [48]:
k = 4
pi2_.argsort()[-k:][::-1]

array([48,  0, 56,  5])

In [49]:
summary = [sentences[idx] for idx in pi_.argsort()[-k:][::-1]]

In [50]:
for bullet in summary:
    print('___________')
    print(bullet)

___________
== Oil Expropriation Day, March 18, 1938 ==
On March 18, 1938, President Cárdenas embarked on the expropriation of all oil resources and facilities by the state, nationalizing the U.S. and Anglo-Dutch (Mexican Eagle Petroleum Company) operating companies.
___________
The Mexican oil expropriation (Spanish: expropiación petrolera) was the nationalization of all petroleum reserves, facilities, and foreign oil companies in Mexico on March 18, 1938.
___________
== Opposition ==


=== International ===
In retaliation, the oil companies initiated a public relations campaign against Mexico, urging people to stop buying Mexican goods and lobbying to embargo U.S. technology to Mexico.
___________
== Background ==

On August 16, 1935, the Petroleum Workers Union of Mexico (Sindicato de Trabajadores Petroleros de la República Mexicana) was formed and one of the first actions was the writing of a lengthy draft contract transmitted to the petroleum companies demanding a 40-hour working 

## Función para crear resúmenes

Podemos condensar todo lo anterior en una función que reciba texto y nos regrese las oraciones más relevantes de acuerdo a TextRank.

In [51]:
def summary(text, k, tol = 1e-5, eig = False):
    print("Paso 1. Obteniendo oraciones")
    sentences = [x for x in sent_tokenize(text)]

    print(f"# oraciones: {len(sentences)}")

    print("Paso 2. Procesando texto")
    sent_low = [[stemmer.stem(re.sub('[^a-z]', "", word.lower())) for word in word_tokenize(sentence) if word not in cached_stopwords and len(word) > 2] for sentence in sentences]
    
    print("Paso 3. Creando matriz de similitud")
    A = np.zeros((len(sent_low), len(sent_low)))

    for i in range(len(sentences)):
        for j in range(i+1, len(sentences)):
            # La simillitud entre oraciones va a ser el número de palabras que tienen en común
            A[i][j] = A[j][i] = len([x for x in sent_low[i] if x in sent_low[j]])

    print("Paso 4. Normalizando matriz de similitud")
    suma = np.sum(A, axis=0)
    A_norm = A.copy()
    suma = np.sum(A, axis=0)
    for i in range(len(sentences)):
      if suma[i] != 0:
        A_norm[i,:] = A[i,:]/suma[i]

    print("Paso 5. Ejecutando TextRank")
    if eig:
        vals, vecs = splinalg.eig(A_norm, left=True, right=False)
        pi_ = vecs[:, 0]
    else:
        A_norm_a = A_norm.T.copy()
        PI_ = np.ones(A_norm.shape[1])

        while True:
            pi_ = A_norm_a.dot(PI_)
            if np.allclose(PI_, pi_, tol):
                break
            PI_ = pi_

    print("\tPaso 5. Terminado")

   
    return [sentences[idx] for idx in pi_.argsort()[-k:][::-1]]

def print_bullet_points(bullet_points):
    for point in bullet_points:
        print(f"- {point}\n")


In [52]:
wiki = wikipedia.page('Automatic Summarization')
text = wiki.content
bullet_points = summary(text, 5, eig = False)

Paso 1. Obteniendo oraciones
# oraciones: 329
Paso 2. Procesando texto
Paso 3. Creando matriz de similitud
Paso 4. Normalizando matriz de similitud
Paso 5. Ejecutando TextRank
	Paso 5. Terminado


In [53]:
print_bullet_points(bullet_points)

- ==== Maximum entropy-based summarization ====
During the DUC 2001 and 2002 evaluation workshops, TNO developed a sentence extraction system for multi-document summarization in the news domain.

- === Document summarization ===
Like keyphrase extraction, document summarization aims to identify the essence of a text.

- ==== TextRank and LexRank ====
The unsupervised approach to summarization is also quite similar in spirit to unsupervised keyphrase extraction and gets around the issue of costly training data.

- The main difficulty in supervised extractive summarization is that the known summaries must be manually created by extracting sentences so the sentences in an original training document can be labeled as "in summary" or "not in summary".

- === Submodular functions as generic tools for summarization ===
The idea of a submodular set function has recently emerged as a powerful modeling tool for various summarization problems.



In [54]:
# Frankenstein!
!wget https://www.gutenberg.org/files/84/84-0.txt -O book.txt

--2025-05-01 18:46:37--  https://www.gutenberg.org/files/84/84-0.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 428995 (419K) [text/plain]
Saving to: ‘book.txt’


2025-05-01 18:46:37 (1.30 MB/s) - ‘book.txt’ saved [428995/428995]



In [55]:
with open("book.txt") as f:
    book_raw = f.read()

print(book_raw[0:1000])

*** START OF THE PROJECT GUTENBERG EBOOK 84 ***

Frankenstein;

or, the Modern Prometheus

by Mary Wollstonecraft (Godwin) Shelley


 CONTENTS

 Letter 1
 Letter 2
 Letter 3
 Letter 4
 Chapter 1
 Chapter 2
 Chapter 3
 Chapter 4
 Chapter 5
 Chapter 6
 Chapter 7
 Chapter 8
 Chapter 9
 Chapter 10
 Chapter 11
 Chapter 12
 Chapter 13
 Chapter 14
 Chapter 15
 Chapter 16
 Chapter 17
 Chapter 18
 Chapter 19
 Chapter 20
 Chapter 21
 Chapter 22
 Chapter 23
 Chapter 24




Letter 1

_To Mrs. Saville, England._


St. Petersburgh, Dec. 11th, 17—.


You will rejoice to hear that no disaster has accompanied the
commencement of an enterprise which you have regarded with such evil
forebodings. I arrived here yesterday, and my first task is to assure
my dear sister of my welfare and increasing confidence in the success
of my undertaking.

I am already far north of London, and as I walk in the streets of
Petersburgh, I feel a cold northern breeze play upon my cheeks, which
braces my nerves and fills me w

In [66]:
# The rfind() method finds the last occurrence of the specified value.
start = book_raw.rfind("Chapter 1\n")
end = book_raw.rfind('Chapter 2\n')
start, end

(31585, 41744)

In [67]:
chapter_n = book_raw[start + len("Chapter 5\n"): end]
chapter_n

'\n\nI am by birth a Genevese, and my family is one of the most\ndistinguished of that republic. My ancestors had been for many years\ncounsellors and syndics, and my father had filled several public\nsituations with honour and reputation. He was respected by all who\nknew him for his integrity and indefatigable attention to public\nbusiness. He passed his younger days perpetually occupied by the\naffairs of his country; a variety of circumstances had prevented his\nmarrying early, nor was it until the decline of life that he became a\nhusband and the father of a family.\n\nAs the circumstances of his marriage illustrate his character, I cannot\nrefrain from relating them. One of his most intimate friends was a\nmerchant who, from a flourishing state, fell, through numerous\nmischances, into poverty. This man, whose name was Beaufort, was of a\nproud and unbending disposition and could not bear to live in poverty\nand oblivion in the same country where he had formerly been\ndistinguish

In [68]:
bullet_points = summary(chapter_n, 5, eig = False)

Paso 1. Obteniendo oraciones
# oraciones: 74
Paso 2. Procesando texto
Paso 3. Creando matriz de similitud
Paso 4. Normalizando matriz de similitud
Paso 5. Ejecutando TextRank
	Paso 5. Terminado


In [69]:
print_bullet_points(bullet_points)

- Her father grew worse; her time
was more entirely occupied in attending him; her means of subsistence
decreased; and in the tenth month her father died in her arms, leaving
her an orphan and a beggar.

- One day, when my father had gone by himself to Milan, my mother,
accompanied by me, visited this abode.

- When my father returned from Milan, he found playing with me in the hall of
our villa a child fairer than pictured cherub—a creature who seemed
to shed radiance from her looks and whose form and motions were lighter
than the chamois of the hills.

- The father of their
charge was one of those Italians nursed in the memory of the antique glory
of Italy—one among the _schiavi ognor frementi,_ who exerted
himself to obtain the liberty of his country.

- He passed his younger days perpetually occupied by the
affairs of his country; a variety of circumstances had prevented his
marrying early, nor was it until the decline of life that he became a
husband and the father of a family.



## Ejercicios

### Cambiar la similitud entre oraciones

Para la similitud entre las oraciones se uso el número de palabras que aparecen en ambas. **Reemplazar por similitud coseno** y comparar los resultados.

Un muy buen primer acercamiento podría ser usando Latent Semantic Analysis y calcular la similitud coseno entre todos los documentos.

Si tienen una DataFrame con las columnas ```[id_documento_1, id_documento_2, similitud]```, usar la función [```pandas.DataFrame.pivot```](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html) puede ayudar a crear la matriz de similitud, dicha función toma como argumentos "index", "columns" y "values".




### Oraciones vs. Palabras

En este Notebook utilizamos las oraciones para obtener el resumen, de haber utilizado las palabras obtendríamos las palabras clave del texto.

Intentar implementar TextRank con palabras. Para la matriz de similitud (o adyacencias), se pueden ligar las palabras que son consecutivas o definir una ventana de k palabras consecutivas en cada oración (parecido a skip-gram) y ligar todas estas palabras. En este caso, la matriz A tendría la dimensión del vocabulario (lista de palabras únicas) y tendría un 1 si las palabras están ligadas.

Una alternativa más sería ocupar un embedding de palabras (e.g. word2vec) y calcular la similitud coseno entre los vectores de cada palabra para llenas a A.

Después de eso, todo sería lo mismo.

### Idioma 

Este ejemplo esta hecho para texto en inglés por las stopwords que se usan y el radicalizador (PorterStemmer). Hacer los cambios necesarios para que reciba textos en español.

Esto es, cambiar las stopwords (nltk tiene stopwords en español) y el radicalizador (Pista: ```nltk.stemmer``` tiene más radicalizadores y uno de ellos tienen un algoritmo para el español)

### Resumen sobre un tema

Aquí usamos sólo un documento para aplicarle TextRank. Podemos tener un corpus de documentos del mismo tema (e.g. noticias sobre el apagón de Europa, etc) y aplicarlo para obtener los puntos importantes de todo el corpus.

A la implementación actual no se le tiene que cambiar nada, sólo concatenar en una sola cadena de texto todo el corpus.

Ejercicio: Construir un corpus con 4 artículos sobre un tema de interés, concatenarlos y pasarlo como parámetro a la función ```summary```.

## Sobre la obtención de los valores de PageRank

https://nlp.stanford.edu/IR-book/html/htmledition/the-pagerank-computation-1.html

https://nlp.stanford.edu/IR-book/html/htmledition/markov-chains-1.html