## Text Summary

Text summarization is important in the field of machine learning and natural language processing for several reasons:

1. **Information Retrieval:** Text summarization helps users quickly grasp the main points or key information from a large document, making it easier to decide whether to read the full document or not. This is particularly valuable in scenarios where individuals are inundated with vast amounts of textual data, such as news articles, research papers, or social media posts.

2. **Time Efficiency:** Summarization algorithms can process and generate summaries much faster than humans can read and summarize large texts. This saves time and allows users to focus their attention on the most relevant content.

3. **Content Extraction:** Text summarization can automatically extract essential information from a document, enabling applications like content recommendation, keyword extraction, and topic modeling.

4. **Content Generation:** Summarization models can be used to generate concise, coherent, and informative summaries for various purposes, such as creating abstracts for research papers, news article headlines, or social media post previews.

5. **Multilingual Support:** Text summarization can be applied to texts in multiple languages, making it a valuable tool for global communication and information retrieval.

6. **Personalization:** Summarization can be personalized to individual preferences. Machine learning models can learn from user feedback to generate summaries that align more closely with a user's interests and priorities.

7. **Scalability:** As the volume of digital content continues to grow, automated summarization becomes crucial for scaling information processing and retrieval. Machine learning-based summarization models can adapt and handle large volumes of text efficiently.

8. **Legal and Compliance:** In legal and regulatory contexts, automated summarization can help organizations review contracts, policies, and legal documents to ensure compliance and identify critical clauses or information.

9. **Search Engine Optimization (SEO):** Summarized content can be used to create concise and engaging snippets for search engine results, improving the discoverability of web content.

10. **Content Creation:** Summarization can be integrated into content creation tools, helping authors and content creators generate concise and informative content more efficiently.

Overall, text summarization is an essential component of machine learning and natural language processing, enabling efficient information retrieval, content extraction, and content generation across a wide range of applications and industries. It plays a critical role in handling the ever-increasing amount of textual data available in the digital age.

---
Exercise:

Now, as a data scientist expert in NLP, you are asked to create a model to be able to summarize text in Spanish. Your stakeholders will pass you an article and your model should summarize it.

In [None]:
!pip install requests beautifulsoup4



In [None]:
import requests
from bs4 import BeautifulSoup

# URL del artículo
url = "https://time.com/collection/time100-ai/6309026/geoffrey-hinton/"

# Realizar una solicitud HTTP para obtener el contenido de la página
response = requests.get(url)

# Verificar si la solicitud fue exitosa
if response.status_code == 200:
    # Analizar el contenido HTML de la página con BeautifulSoup
    soup = BeautifulSoup(response.text, "html.parser")

    # Encontrar el contenido del artículo (puedes inspeccionar el HTML de la página para encontrar la estructura adecuada)
    article_content = soup.find("div", {"class": "article-content"})

    # Extraer el texto del artículo
    article_text = ""
    for paragraph in article_content.find_all("p"):
        article_text += paragraph.get_text() + "\n"

    # Imprimir el texto del artículo
    print(article_text)
else:
    print("Error al obtener la página:", response.status_code)

Over the course of February, Geoffrey Hinton, one of the most influential AI
researchers of the past 50 years, had a “slow eureka moment.”
Hinton, 76, has spent his career trying to build AI systems that model the human
brain, mostly in academia before joining Google in 2013. He had always believed
that the brain was better than the machines that he and others were building,
and that by making them more like the brain, they would improve. But in
February, he realized “the digital intelligence we’ve got now may be better than
the brain already. It’s just not scaled up quite as big.”
Developers around the world are currently racing to build the biggest AI systems
that they can. Given the current rate at which AI companies are increasing the
size of models, it could be less than five years until AI systems have 100
trillion connections—roughly as many as there are between neurons in the human
brain.
Alarmed, Hinton left his post as VP and engineering fellow in May and gave a
flurry of int

In [None]:
# Uncomment and run this cell if you're on Colab or Kaggle
!git clone https://github.com/nlp-with-transformers/notebooks.git
%cd notebooks
from install import *
install_requirements()

Cloning into 'notebooks'...
remote: Enumerating objects: 530, done.[K
remote: Counting objects: 100% (212/212), done.[K
remote: Compressing objects: 100% (50/50), done.[K
remote: Total 530 (delta 184), reused 164 (delta 162), pack-reused 318 (from 1)[K
Receiving objects: 100% (530/530), 28.52 MiB | 12.24 MiB/s, done.
Resolving deltas: 100% (253/253), done.
Updating files: 100% (127/127), done.
/content/notebooks/notebooks
⏳ Installing base requirements ...
✅ Base requirements installed!
⏳ Installing Git LFS ...
✅ Git LFS installed!


In [None]:
#hide
from utils import *
setup_chapter()

No GPU was detected! This notebook can be *very* slow without a GPU 🐢
Go to Runtime > Change runtime type and select a GPU hardware accelerator.
Using transformers v4.16.2
Using datasets v1.16.1


In [None]:
#hide_output
from transformers import pipeline

#Modelo Vistos en Clase
se fragmenta el texto ya que es demasiado largo

In [None]:
summarizer = pipeline("summarization")

max_chunk_length = 1024
chunks = [article_text[i:i+max_chunk_length] for i in range(0, len(text), max_chunk_length)]


summaries = []
for chunk in chunks:
    summary = summarizer(chunk, max_length=150, min_length=40, do_sample=False)
    summaries.append(summary[0]['summary_text'])

combined_summary = " ".join(summaries)

translator = pipeline("translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es")
translated_summary = translator(combined_summary, clean_up_tokenization_spaces=True)[0]['translation_text']

print(translated_summary)


Geoffrey Hinton, de 76 años, ha pasado su carrera tratando de construir sistemas
de IA que modelen el cerebro humano. Siempre había creído que el cerebro era
mejor que las máquinas que él y otros estaban construyendo. Pero en febrero, se
dio cuenta de que “la inteligencia digital que tenemos ahora puede ser mejor
que. el cerebro ya. No es tan grande” Hinton se preocupa por lo que podría
suceder una vez que los sistemas de IA se amplían al tamaño de los cerebros
humanos. “Esto se volverá más inteligente que nosotros y se hará cargo”, dice.
Hinton proviene de una larga línea de luminarias, con parientes como la
matemática Mary Everest Boole y el lógico George Boole. En la década de 1970, la
inteligencia artificial estaba pasando por un período de entusiasmo más moderado
que nosotros y ahora se conoce como el “invierno de AI”. En este campo
infasionable, Hinton persiguió una idea impopular: las redes neurales y las
empresas de IA se hacen cargo de la investigación.


# Modelos que soportan entradas largas

In [None]:
summarizer = pipeline("summarization", model="allenai/led-base-16384")

summary = summarizer(article_text, max_length=500, min_length=128, do_sample=False)[0]['summary_text']

translator = pipeline("translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es")
translated_summary = translator(summary, clean_up_tokenization_spaces=True)[0]['translation_text']

print(translated_summary)

A lo largo de febrero, Geoffrey Hinton, uno de los investigadores de IA más
influyentes de los últimos 50 años, tuvo un “momento de eureka más lento”.
“Hinton, 76, ha pasado su carrera tratando de construir sistemas de IA que
modelen el cerebro humano, principalmente en la academia antes de unirse a
Google en 2013. Siempre había creído que el cerebro era mejor que las máquinas
que él y otros estaban construyendo, y que al hacerlos más parecidos al cerebro,
mejorarían. Pero en febrero, se dio cuenta de que “la inteligencia digital que
tenemos ahora puede ser mejor que el cerebro ya en los últimos dos años.
Simplemente no se ha ampliado tanto como la alarma. ”Hinton’s ged the management
on AI is not to the future.


In [None]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

summary = summarizer(
    article_text,
    max_length=600,
    min_length=400,
    do_sample=False,
    truncation=True
)[0]['summary_text']

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")
translated_summary = translator(summary, clean_up_tokenization_spaces=True)[0]['translation_text']

print(translated_summary)

“Geoffrey Hinton es uno de los investigadores de inteligencia artificial más
influyentes de los últimos 50 años. Hinton, de 76 años, ha pasado su carrera
tratando de construir sistemas de inteligencia artificial que modelen el cerebro
humano. En febrero, se dio cuenta de que “la inteligencia digital que tenemos
ahora puede ser mejor que el cerebro ya. Simplemente no se ha ampliado tan
grande” Hinton se preocupa por lo que podría suceder una vez que los sistemas de
inteligencia artificial se amplían hasta el tamaño de los cerebros humanos y la
perspectiva de que la humanidad sea aniquilada por la tecnología que ayudó a
crear. Ha dejado su puesto como VP y compañero de ingeniería en mayo y ha dado
una gran cantidad de datos para ser ingerido y procesado. Su trabajo ha
potencialmente apresurado el futuro en el que la inteligencia artificial se
convierte en superhumano con resultados desastrosos para los seres humanos,
dice. En una entrevista con el New York Times, dijo, “Si yo lo he hecho