<br>
<font color='cyan'><font size="5">1. Importing Necessary Libraries</font>
<br><br>

In [1]:
from transformers import pipeline
from bs4 import BeautifulSoup
import requests

<br>
<font color='cyan'><font size="5">2. Loading the Summarization Pipeline</font>
<br><br>

In [2]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


<br>
<font color='cyan'><font size="5">3. Web Scraping using Beautiful Soup</font>
<br><br>

In [3]:
# Defining the URL of the Article to Summarize
URL = "https://www.slashfilm.com/783792/wait-is-the-batman-the-best-batman-movie/"

# We will setup our request
r = requests.get(URL)

# Next, we will use Beautiful Soup to web scrape the article
soup = BeautifulSoup(r.text, 'html.parser')

# Extracting all the titles and subtitles along with their coressponding paragraphs
results = soup.find_all(['h1', 'p'])

# Joiing all the necessary text without all the HTML Tags 
text = [result.text for result in results]

# Extracting title of the Article
title = text[0]

# Combining all the sentences into a single paragraph
article = ' '.join(text)

<br>
<font color='cyan'><font size="5">4. Chunking</font>
<br><br>

In [4]:
# We will replace all the punctuations with an End of Sentece Tag (<eos>) in order to preserve
# them in our summary.
article = article.replace('.', '.<eos>')
article = article.replace('?', '?<eos>')
article = article.replace('!', '!<eos>')
sentences = article.split('<eos>')

In [5]:
# We will define the maximum chunk size and corresponding parameters
chunk_limit = 500
current_chunk = 0 
chunks = []

# We will loop through each one of our senteces
for sentence in sentences:
    # Firstly, we will check if we have a current chunk
    if chunks: 
        # Next, we will check if appending the current sentence crosses the chunk limit or not
        if len(chunks[current_chunk]) + len(sentence.split(' ')) <= chunk_limit:
            # If it doesn't, we append it
            chunks[current_chunk].extend(sentence.split(' '))
        else:
            # If it does, we create a new chunk
            current_chunk += 1
            chunks.append(sentence.split(' '))
    else:
        chunks.append(sentence.split(' '))

# Now, we will append all the individual words in a chunk to create a paragraph 
for chunk_index in range(len(chunks)):
    chunks[chunk_index] = ' '.join(chunks[chunk_index])

<br>
<font color='cyan'><font size="5">5. Summarization using Hugging Face</font>
<br><br>

In [6]:
# We will use the Hugging Face Summarizer to summarize our article
res = summarizer(chunks, max_length = 150, min_length = 50, do_sample=False)

<br>
<font color='cyan'><font size="5">6. The Result</font>
<br><br>

In [7]:
# Now, We will combine our summary into a single block of text
summary = ''.join([summ['summary_text'] for summ in res])

In [8]:
# Finally, we will print the title of the article along with its summary
title += ('\n\n' + summary)
print(title)

Wait, Is The Batman The Best Batman Movie?

 Michael Keaton has always been my Batman, because, despite my enjoyment of the Adam West TV series, by the time that film came out, I really got Batman . I may not love the films (and I have no issue with anyone who did), but I got the world-weary Batfleck . From the first images and trailers for "The Batman," it seemed like we were getting "emo Batman," and while that sounded fun, that isn't at all what we have here . A dark avenger (sorry Marvel) is an archetype that we can find in any culture and it speaks to us on a very deep level . Robert Pattinson isn't playing emo.  He's dealing with intense PTSD, as so many of us do.  It's something most adults can relate to.  I think the decision to take this out of the Snyder continuity was an important one . Matt Reeves' "The Batman" gives us a hero who needs to take down criminals . He can trap a gang in a subway and pummel the hell out of them with no remorse . Film and storytelling at its best

In [9]:
# Saving the File
with open('article_summary.txt', 'w') as f:
    f.write(title)