In [1]:
import requests
from bs4 import BeautifulSoup
from transformers import pipeline
import os
from urllib.parse import urlparse, unquote
from transformers import LEDForConditionalGeneration, LEDTokenizer

def get_article_text(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    paragraphs = soup.find_all('p')
    article_text = '\n'.join([p.get_text() for p in paragraphs])
    return article_text

def summarize_text(text, max_length=500):
    summarizer = pipeline("summarization")
    summary = summarizer(text, max_length=max_length, do_sample=False, clean_up_tokenization_spaces=True)
    return summary[0]['summary_text']

def summarize_article(url):
    article_text = get_article_text(url)
    summarized_text = summarize_text(article_text)
    return summarized_text

def write_summarized_to_file(summarized_text, url):
    # Parse the URL and extract the path
    parsed_url = urlparse(url)
    path = parsed_url.path

    # Remove the leading/trailing slashes and split the path by slashes
    path_parts = path.strip('/').split('/')

    # Get the last part of the path and decode any URL-encoded characters
    last_part = unquote(path_parts[-1])

    # Create the filename with a .txt extension
    filename = f"{last_part}.md"

    # Write the summarized text to the file
    with open(filename, "w") as file:
        file.write(summarized_text)

    print(f"Summarized text saved to {filename}")



In [14]:
def summarize_text(text, max_length=5000):
    model_name = "allenai/led-base-16384"
    tokenizer = LEDTokenizer.from_pretrained(model_name)
    model = LEDForConditionalGeneration.from_pretrained(model_name)

    input_ids = tokenizer(text, return_tensors="pt", padding="max_length", truncation=True, max_length=16384).input_ids
    summary_ids = model.generate(input_ids, num_beams=4, length_penalty=2.0, max_length=max_length, min_length=50)
    summary_text = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    return summary_text

In [15]:
url = "https://www.engadget.com/chatgpts-new-plugins-will-deliver-real-time-stats-182900388.html"
summarized_text = summarize_article(url)
print(summarized_text)
write_summarized_to_file(summarized_text=summarized_text, url=url)

IndexError: index out of range in self

In [None]:
url = "https://newsroom.churchofjesuschrist.org/article/temple-news-from-north-and-central-america"
summarized_text = summarize_article(url)
print(summarized_text)
write_summarized_to_file(summarized_text=summarized_text, url=url)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


 The First Presidency of The Church of Jesus Christ of Latter-day Saints has released the open house and dedication dates for the Feather River California Temple. In addition, the location of the Huehuetenango Guatemala Temple and a rendering of the Tampa Florida Temple have also been announced. There are 10 temples in California, the most of any state besides Utah.


In [5]:
url = "https://newsroom.churchofjesuschrist.org/article/2022-annual-report-caring-for-those-in-need"
summarized_text = summarize_article(url)
print(summarized_text)
write_summarized_to_file(summarized_text=summarized_text, url=url)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Token indices sequence length is longer than the specified maximum sequence length for this model (1318 > 1024). Running this sequence through the model will result in indexing errors


IndexError: index out of range in self

In [2]:
from IPython.display import Markdown as md
url = "https://newsroom.churchofjesuschrist.org/article/c-shane-reese-byu-14th-president"
display(md(get_article_text(url)))


Available Editions
The Board of Trustees of Brigham Young University (BYU) — chaired by President Russell M. Nelson of The Church of Jesus Christ of Latter-day Saints — has appointed C. Shane Reese as the university’s 14th president.
Reese will succeed Kevin J Worthen, who has served as president since 2014, on May 1, 2023.
Watch: Elder Holland Announces New BYU President
 
Elder Jeffrey R. Holland of the Quorum of the Twelve made the announcement on Tuesday, March 21, 2023, during the weekly BYU devotional in Provo, Utah.
After reading from the prophetic succession story of Elijah and Elisha in the book of 2 Kings, Elder Holland praised BYU’s outgoing and incoming presidents.
“In the spirit of this succession in the ministry,” said Elder Holland, who represented the prophet at Tuesday’s devotional, “President Russell M. Nelson is announcing the conclusion of President Kevin Worthen’s remarkable service as president of Brigham Young University.”
Elder Holland praised President Worthen and his wife, Peggy, for a their “truly remarkable, outstanding tenure.”
 
“As the Elijah in my Biblical reference earlier, President Worthen, and of course Peggy, may well be translated following this devotional. I did see a chariot warming up in his parking stall [this morning],” Elder Holland said. “We hope … Kevin might choose to bring his matchless influence back to the J. Reuben Clark Law School, from whence he came 15 years ago. He is truly a man of God and a remarkable university president and a dear friend. He has been recognized as such across the nation. His skill and accomplishments have greatly enhanced the stature of the university. We love him, dearly.”
After Elder Holland’s 14-minute remarks, President and Sister Worthen, along with Reese and his wife, Wendy, each spoke briefly.
“If you know nothing else about me and learn nothing else from my tenure, let me share with you that I believe with all my heart, mind and soul in the truth that there is a God in heaven,” President Worthen said. “He is our Heavenly Father, and He loves us in ways we cannot completely comprehend or understand. A manifestation of that love is that He gave His Son, His Only Begotten Son, whose life death and Resurrection make possible all good endings for each and every one of us.”
Sister Worthen shared her “deep and abiding love for this university and all that it stands for.”
“I will always cherish the memories and the friendships we have made here,” she added. “You have treated us with such generosity, graciousness and love.”
 
Reese has served as BYU’s academic vice president since 2019. Before that he was dean of the College of Physical and Mathematical Sciences from 2017 to 2019 and joined the BYU statistics faculty in 2001. He and Wendy are the parents of three children.
Elder Holland said Reese, who has a doctoral degree in statistics from Texas A&M and has focused on sports analytics, turned down a front office job with the Philadelphia Eagles to stay at BYU.
“It is one thing to turn down the Los Alamos Laboratory or even the U.S. Government, which he’s done, but my goodness — turn down the Philadelphia Eagles! Talk about loyalty,” Elder Holland said.
“I am told that a statistician can have his head in an oven and feet in an ice cube and say that on average he feels just about right,” Elder Holland continued. “Over the next several years, president, you will have plenty of fires to put out and cold-blooded decisions to make, so you should be ecstatic all the time.”
“At this important occasion, Wendy and I pledge our best efforts to build on the progress of President and Sister Worthen,” Reese said. “We feel a deep and abiding spiritual connection to [former Church] President [Spencer W.] Kimball’s prophetic vision for BYU as we approach the beginning of the second half of the second century [of the university].”
“We are humbled by this opportunity and excited about the many experiences we will have while in this office,” Wendy added. “The motto of BYU is, ‘Enter to learn, go forth to serve.’ We look forward to serving and learning alongside each of you.”


Style Guide Note:When reporting about The Church of Jesus Christ of Latter-day Saints, please use the complete name of the Church in the first reference. For more information on the use of the name of the Church, go to our online Style Guide.


Global Newsroom


You are leaving the Newsroom
You are about to access Constant Contacts (http://visitor.constantcontact.com).
You are now leaving a website maintained by The Church of Jesus Christ of Latter-day Saints. We provide the link to this third party's website solely as a convenience to you. The linked site has its own terms of use, privacy policies, and security practices that differ from those on our website. By referring or linking you to this website, we do not endorse or guarantee this content, products, or services offered. If you would like to stay on the page you are viewing please click Cancel.
To download media files, please first review and agree to the Terms of Use. Download a photo or video by clicking or tapping on it. To download all photos or videos related to this article, select the links at the bottom of each section.
PHOTOS