# Web scraping and crawling
Uses the *BeautifulSoup* and *Selenium* external packages.
[BeautifulSoup documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/).

# Setup / Data

In [23]:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver

from util import utility
from settings import *

# Getting started

The Website to work with, i.e. to scrape info from and crawl over it - Ultimate Classic Rock.
The starting URL refers to articles about Paul McCartney.

In [24]:
start_url = 'https://ultimateclassicrock.com/search/?s=paul%20mccartney'

Create `Response` object from GET request, using `requests.get(<url>, allow_redirects=False)`.

In [25]:
response = requests.get(start_url)
print(response)

<Response [200]>


Get response text from `Response` object, using `<response>.text`.

In [26]:
response_text = response.text
print(response_text)

<!doctype html>
      <html class="search desktop layout-clean fixed-full-width" lang="en">
        <head>
          <title>Search For paul mccartney on Ultimate Classic Rock</title>
          <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/><meta charset="UTF-8"/><meta name="instagram_profile" value="ultimateclassicrock"/><meta name="description" content="Search For paul mccartney on Ultimate Classic Rock"/><meta name="keywords" content="Search, paul mccartney"/><meta property="fb:app_id" content="2831251863795172"/><meta property="fb:admins" content="581803834"/><meta property="fb:admins" content="583052867"/><meta property="fb:admins" content="100002025987268"/><meta property="fb:admins" content="732998853"/><meta property="fb:admins" content="8802808"/><meta property="fb:use_automatic_ad_placement" content="false"/><meta property="og:title" content="Search For paul mccartney on Ultimate Classic Rock"/><meta property="og:description" content="Search For paul mccart

Get BeautifulSoup object from response text, using `BeautifulSoup(<response text>, 'html.parser')`.

In [27]:
soup = BeautifulSoup(response_text, 'html.parser')

The `get_soup(url)` function for getting a BeautifulSoup object.

In [28]:
def get_soup(url: str) -> BeautifulSoup:
    """Returns BeautifulSoup object from the corresponding URL, passed as a string.
    Creates Response object from HTTP GET request, using requests.get(<url string>, allow_redirects=False),
    and then uses the text field of the Response object and the 'html.parser' to create the BeautifulSoup object.
    """

    # Create Response object from HTTP GET request; assume that no redirection is allowed (allow_redirects=False)
    response = requests.get(url, allow_redirects=False)
    # Get text from the Response object, using <response>.text
    response_text = response.text
    # Create and return the corresponding BeautifulSoup object from the response text; use features='html.parser'
    return BeautifulSoup(response_text, 'html.parser')

Test `get_soup(url)`.

In [29]:
soup = get_soup(start_url)
print(soup)

<!DOCTYPE html>

<html class="search desktop layout-clean fixed-full-width" lang="en">
<head>
<title>Search For paul mccartney on Ultimate Classic Rock</title>
<meta content="text/html;charset=utf-8" http-equiv="Content-Type"/><meta charset="utf-8"/><meta name="instagram_profile" value="ultimateclassicrock"/><meta content="Search For paul mccartney on Ultimate Classic Rock" name="description"/><meta content="Search, paul mccartney" name="keywords"/><meta content="2831251863795172" property="fb:app_id"/><meta content="581803834" property="fb:admins"/><meta content="583052867" property="fb:admins"/><meta content="100002025987268" property="fb:admins"/><meta content="732998853" property="fb:admins"/><meta content="8802808" property="fb:admins"/><meta content="false" property="fb:use_automatic_ad_placement"/><meta content="Search For paul mccartney on Ultimate Classic Rock" property="og:title"/><meta content="Search For paul mccartney on Ultimate Classic Rock" property="og:description"/><m

Save BeautifulSoup object to an HTML file, using `<Path-file-object>.write_text(str(<BeautifulSoup object>), encoding='utf-8', errors='replace')`.

In [30]:
soup_file = DATA_DIR / 'soup.html'
soup_file.write_text(str(soup), encoding='utf-8', errors='replace')

167175

Demonstrate `<BeautifulSoup object>.find('<tag>')`; e.g., find the first `<span>` tag.

In [31]:
print(soup.find('span'))
print(soup.span)

<span>Grohl &amp; Pink Hanukkah Cover</span>
<span>Grohl &amp; Pink Hanukkah Cover</span>


Demonstrate `<BeautifulSoup object>.find('<tag>').find('<nested tag>')`; e.g., find the `<a>` tag in an `<article>` tag.

In [32]:
print(soup.article, '\n')
print(soup.article.a, '\n')
print(soup.article.a.figcaption, '\n')

<article class="teaser"><div class="article-image-wrapper"><figure class="frameme"><a class="theframe" data-image="https://townsquare.media/site/295/files/2022/12/attachment-mccartney.jpg" rel=""><figcaption class="visually-hidden">Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?</figcaption></a></figure></div><div class="content"><a class="title" href="https://ultimateclassicrock.com/paul-mccartney-live-and-let-die-producers/" target="_self" title="Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?">Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?</a><div class="auth-date"><time></time>,<em><span> by </span> <!-- -->Martin Kielty</em></div><p class="excerpt">One of the things we discovered is that, if it’s a good story, <em>Paul</em> will go with it....Listen to <em>Paul</em> <em>McCartney</em> Perform ‘Live and Let Die’ Odd Couples: <em>Paul</em> <em>McCartney</em> and Kanye West</p><div class="content-footer"><span class="view-more">View more 

Demonstrate getting a tag with specific attributes using `<BeautifulSoup object>.find('<tag>', {'<attribute>': '<value>'})`; e.g.:
- find a `<span>` tag with the `visually-hidden` attribute
- find an `<article>` tag with the `title` attribute

In [33]:
print(soup.find('span', {'class': 'visually-hidden'}))
print(soup.find('article').find('a', {'class': 'title'}))
print(soup.find('article').find('a', {'class': 'theframe'}))

<span class="visually-hidden">Visit us on Youtube</span>
<a class="title" href="https://ultimateclassicrock.com/paul-mccartney-live-and-let-die-producers/" target="_self" title="Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?">Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?</a>
<a class="theframe" data-image="https://townsquare.media/site/295/files/2022/12/attachment-mccartney.jpg" rel=""><figcaption class="visually-hidden">Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?</figcaption></a>


Demonstrate getting values of tag attributes, e.g. `<BeautifulSoup object>.find('<tag>').text` for an `<a>` tag and for a `<visually-hidden>` tag.


In [34]:
print(soup.article, '\n')
print(soup.article.a, '\n')
print(soup.article.a.text, '\n')
print(soup.find('span', {'class': 'visually-hidden'}), '\n')
print(soup.find('span', {'class': 'visually-hidden'}).text, '\n')

<article class="teaser"><div class="article-image-wrapper"><figure class="frameme"><a class="theframe" data-image="https://townsquare.media/site/295/files/2022/12/attachment-mccartney.jpg" rel=""><figcaption class="visually-hidden">Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?</figcaption></a></figure></div><div class="content"><a class="title" href="https://ultimateclassicrock.com/paul-mccartney-live-and-let-die-producers/" target="_self" title="Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?">Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?</a><div class="auth-date"><time></time>,<em><span> by </span> <!-- -->Martin Kielty</em></div><p class="excerpt">One of the things we discovered is that, if it’s a good story, <em>Paul</em> will go with it....Listen to <em>Paul</em> <em>McCartney</em> Perform ‘Live and Let Die’ Odd Couples: <em>Paul</em> <em>McCartney</em> and Kanye West</p><div class="content-footer"><span class="view-more">View more 

Demonstrate `<BeautifulSoup object>.find_all(<tag>)`, e.g. for the `<article>` tag; returns a `ResultSet` object. Show the entire object, its type, and loop through the ses to show individual articles.

In [35]:
articles = soup.find_all('article')
for article in articles:
    print(article, '\n')
print(type(articles))

<article class="teaser"><div class="article-image-wrapper"><figure class="frameme"><a class="theframe" data-image="https://townsquare.media/site/295/files/2022/12/attachment-mccartney.jpg" rel=""><figcaption class="visually-hidden">Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?</figcaption></a></figure></div><div class="content"><a class="title" href="https://ultimateclassicrock.com/paul-mccartney-live-and-let-die-producers/" target="_self" title="Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?">Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?</a><div class="auth-date"><time></time>,<em><span> by </span> <!-- -->Martin Kielty</em></div><p class="excerpt">One of the things we discovered is that, if it’s a good story, <em>Paul</em> will go with it....Listen to <em>Paul</em> <em>McCartney</em> Perform ‘Live and Let Die’ Odd Couples: <em>Paul</em> <em>McCartney</em> and Kanye West</p><div class="content-footer"><span class="view-more">View more 

The following lines demonstrate that getting the soup with `requests.get()` does not capture all tags (those filled with JavaScript, e.g. `<time>`). In this example, these tags can be found in `div` tags (`{'class': 'auth-date'}`). Try to find it using `find()` or using `findNext()`.

That's when using `selenium.webdriver` is better.

In [36]:
print(soup.find('div', {'class': 'auth-date'}))
print(soup.find('div', {'class': 'auth-date'}).time)

<div class="auth-date"><time></time>,<em><span> by </span> <!-- -->Martin Kielty</em></div>
<time></time>


Selenium version, needed for extracting the `<time>` tag info.

In [37]:
from selenium import webdriver

# Before running the following line, make sure to download and unzip THE LATEST VERSION of chromedriver
# (or the version COMPATIBLE with your version of Chrome)
# and put chromedriver.exe in the Scripts subfolder of your Python installation folder,
# e.g. C:\Users\Vladan\AppData\Local\Programs\Python\Python310\Scripts.
# The driver should be downloaded from https://chromedriver.chromium.org/downloads.
# Then you need not provide the path of the driver, just run: driver = webdriver.Chrome().
# (Adapted from https://stackoverflow.com/a/60062969/1899061.)
driver = webdriver.Chrome()

driver.get(start_url)
soup = BeautifulSoup(driver.page_source, 'html.parser')

In [None]:
driver = webdriver.Chrome()
driver.get(start_url)
soup = BeautifulSoup(str(driver.page_source), 'html.parser')

Save the BeautifulSoup object to an HTML file, using `<Path-file-object>.write_text(str(<BeautifulSoup object>), encoding='utf-8', errors='replace')`.

In [None]:
soup_file.write_text(str(soup), errors='replace', encoding='utf-8')

The `get_soup_selenium(url)` function for getting a BeautifulSoup object using the `selenium` package instead of `requests`.

In [38]:
def get_soup_selenium(url: str) -> BeautifulSoup:
    """Returns BeautifulSoup object from the corresponding URL, passed as a string.
    Makes an HTTP GET request, using driver = webdriver.Chrome() from the selenium package and its driver.get(url).
    Then uses the page_source field of the driver object and the 'html.parser' to create and return the BeautifulSoup o.
    """

    driver = webdriver.Chrome()
    driver.get(url)
    return BeautifulSoup(driver.page_source, 'html.parser')

Test `get_soup_selenium(url)`.

In [39]:
soup = get_soup_selenium(start_url)
print(soup)

<html class="search desktop layout-clean fixed-full-width" lang="en"><head>
<title>Search For paul mccartney on Ultimate Classic Rock</title>
<meta content="text/html;charset=utf-8" http-equiv="Content-Type"/><meta charset="utf-8"/><meta name="instagram_profile" value="ultimateclassicrock"/><meta content="Search For paul mccartney on Ultimate Classic Rock" name="description"/><meta content="Search, paul mccartney" name="keywords"/><meta content="2831251863795172" property="fb:app_id"/><meta content="581803834" property="fb:admins"/><meta content="583052867" property="fb:admins"/><meta content="100002025987268" property="fb:admins"/><meta content="732998853" property="fb:admins"/><meta content="8802808" property="fb:admins"/><meta content="false" property="fb:use_automatic_ad_placement"/><meta content="Search For paul mccartney on Ultimate Classic Rock" property="og:title"/><meta content="Search For paul mccartney on Ultimate Classic Rock" property="og:description"/><meta content="https

In [None]:
print(soup.find('div', {'class': 'auth-date'}))

Demonstrate occasional anomalies in the ResultSet returned by `<BeautifulSoup object>.find_all(<tag>)`; note that they may be appearing only in the `selenium` version, not in the `requests` version.

In [40]:
# The following lines show that there are 11 articles on the page, not 10.
# The 11th one is something else, not visible on the page at the first glance and should be eliminated from
# further processing.

# print(len(soup.find_all('article')))
articles = soup.find_all('article')
for article in articles:
    print(article, '\n')

<article class="teaser"><div class="article-image-wrapper"><figure class="frameme frame-loaded"><a class="theframe" data-image="https://townsquare.media/site/295/files/2022/12/attachment-mccartney.jpg" rel="" style='background-image: url("https://townsquare.media/site/295/files/2022/12/attachment-mccartney.jpg?w=300&amp;q=75");'><figcaption class="visually-hidden">Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?</figcaption></a></figure></div><div class="content"><a class="title" href="https://ultimateclassicrock.com/paul-mccartney-live-and-let-die-producers/" target="_self" title="Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?">Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?</a><div class="auth-date"><time datetime="2022/12/19 14:10:03 +0000">December 19, 2022</time>,<em><span> by </span> <!-- -->Martin Kielty</em></div><p class="excerpt">One of the things we discovered is that, if it’s a good story, <em>Paul</em> will go with it....Listen 

The following line shows an anomaly in the articles `ResultSet`.

In [None]:
print(articles[len(articles) - 1])

Compare it to any of the other results from the `ResultSet` returned by `<BeautifulSoup object>.find_all(<tag>)`.

In [None]:
print(articles[0])

Demonstrate different ways of getting an attribute value for a tag (a `bs4.element.Tag` object):
- `<tag>.find('<subtag>')`, filtered with `<{'class': "<class name>"}>`
- `<tag>.find('<subtag>')['<attr>']`
- `<tag>.find('<subtag>').get('<attribute>')`
- `<tag>.find('<subtag>').attrs['<attribute>']`
- `<tag>.find('<subtag>').<attribute>` (`<attribute>`: e.g. `text`)

In [None]:
# Get any 'regular' article from the articles collected above and get attributes of the first <a> tag from it in 5 different ways
# (not all attributes will be available using the .<attribute notation>, but the 'text' attribute will)
article = articles[0]
print(type(article))
print(article.find('a', {'class': 'theframe'}))
print(article.find('a')['class'])
print(article.find('a')['data-image'])
print(article.find('a').get('data-image'))
print(article.find('a').attrs['data-image'])
print(article.find('a').attrs)
print(article.find('a').text)

Demonstrate `<tag>.find_next_siblings()` (returns all `<tag>`'s siblings) and `<tag>.find_next_sibling()` (returns just the first one).

In [None]:
# Find the 'div' tag containing the 'class' attribute with the value 'rowline clearfix' (there is only one such a 'div' tag in the soup object so far)
rowline_clearfix = soup.find('div', {'class': 'rowline clearfix'})

In [None]:
# Use find_next_sibling() and find_next_siblings() to find 'span' tags in the 'div' tag found in the cell above
print(rowline_clearfix.find('span'), '\n')
print(rowline_clearfix.find('span').find_next_sibling(), '\n')
siblings = rowline_clearfix.find('span').find_next_siblings()
for sibling in siblings:
    print(sibling, '\n')

Each `bs4.element.ResultSet`, `bs4.element.Tag`,... can be used to create another BeautifulSoup object, using `BeautifulSoup(str(<bs4.element object>), features='html.parser')`.

In [None]:
articles = BeautifulSoup(str(rowline_clearfix.find_all('article')), 'html.parser')
print(type(articles))
print(articles.article)

Get/Return all text from a `bs4.element.Tag` object, using `<bs4.element.Tag object>.text`, e.g. for an `<article>` tag.

In [None]:
articles = articles.find_all('article')
print(articles[0].text)

Get/Return and remove a specific item from a `bs4.element.ResultSet` using `<result set>.pop(<index>)` (default: last).

In [None]:
articles.pop(0)
print(articles[0])

# Example

Getting a specific page from a Website where long lists of items are split in multiple pages.
The URL of a specific page of such a multi-page Website includes the main part and the suffix. Typical suffixes in the URLs are `'&page=<n>'`, `'&searchpage=<n>'`...

In [41]:
def get_specific_page(start_url: str, page=1):
    """Returns a specific page from a Website where long lists of items are split in multiple pages.
    """

    if page > 1:
        return start_url.split('&searchpage=')[0] + '&searchpage=' + str(page)
    return start_url.split('&searchpage=')[0]

Test `get_specific_page(start_url, page)`.

In [42]:
print(get_specific_page(start_url, ))
print(get_specific_page(start_url, 2))

https://ultimateclassicrock.com/search/?s=paul%20mccartney
https://ultimateclassicrock.com/search/?s=paul%20mccartney&searchpage=2


Getting the BeautifulSoup object corresponding to a specific page page from a Website where long lists of items are split in multiple pages.

In [19]:
def get_next_soup(start_url: str, page=1):
    """Returns the BeautifulSoup object corresponding to a specific page
    in case there are multiple pages that list objects of interest.
    Parameters:
    - start_url: the starting page/url of a multi-page list of objects
    - page: the page number of a specific page of a multi-page list of objects
    Essentially, get_next_soup() just returns get_soup(get_specific_page(start_url, page)),
    i.e. converts the result of the call to get_specific_page(start_url, page), which is a string,
    into a BeautifulSoup object.
    """

    return get_soup(get_specific_page(start_url, page))

Test `get_next_soup(start_url: str, page=1)`.

In [20]:
print(get_next_soup(start_url).find('article').text)
print(get_next_soup(start_url, 3).find('article').text)

Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?, by  Martin KieltyOne of the things we discovered is that, if it’s a good story, Paul will go with it....Listen to Paul McCartney Perform ‘Live and Let Die’ Odd Couples: Paul McCartney and Kanye WestView more articles like this
Motley Crue Would Have Fired Paul McCartney, Says John CorabiMotley Crue Would Have Fired Paul McCartney, Says John Corabi, by  Martin KieltyMcCartney....“His words were, ‘I don't give a fuck if Paul McCartney was fronting this band – It's not what they paid for.View more articles like this


Getting the BeautifulSoup object corresponding to a specific page page from a Website where long lists of items are split in multiple pages - the `selenium` version.

In [43]:
def get_next_soup_selenium(start_url: str, page=1):
    """Returns the BeautifulSoup object corresponding to a specific page
    in case there are multiple pages that list objects of interest, using selenium instead of requests.
    Parameters:
    - start_url: the starting page/url of a multi-page list of objects
    - page: the page number of a specific page of a multi-page list of objects
    Essentially, get_next_soup() just returns get_soup_selenium(get_specific_page(start_url, page)),
    i.e. converts the result of the call to get_specific_page(start_url, page), which is a string,
    into a BeautifulSoup object.
    """

    return get_soup_selenium(get_specific_page(start_url, page))

Test `get_next_soup_selenium(start_url: str, page=1)`.

In [44]:
print(get_next_soup_selenium(start_url).find('article').text)
print(get_next_soup_selenium(start_url, 3).find('article').text)

Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?December 19, 2022, by  Martin KieltyOne of the things we discovered is that, if it’s a good story, Paul will go with it....Listen to Paul McCartney Perform ‘Live and Let Die’ Odd Couples: Paul McCartney and Kanye WestView more articles like this
Motley Crue Would Have Fired Paul McCartney, Says John CorabiMotley Crue Would Have Fired Paul McCartney, Says John CorabiJune 11, 2022, by  Martin KieltyMcCartney....“His words were, ‘I don't give a fuck if Paul McCartney was fronting this band – It's not what they paid for.View more articles like this


Web crawler that collects info about specific articles from Ultimate Classic Rock, implemented as a Python generator.

In [46]:
def crawl(url: str, max_pages=1):
    """Web crawler that collects info about specific articles from Ultimate Classic Rock,
    implemented as a Python generator that yields BeautifulSoup objects (get_next_soup() or get_next_soup_selenium())
    from multi-page movie lists.
    Parameters: the url of the starting page and the max number of pages to crawl in case of multi-page lists.
    """

    for page in range(max_pages):
        yield get_next_soup_selenium(url, page + 1)

Test `crawl(url: str, max_pages=1)`.

In [None]:
next_soup = crawl(start_url, 3)
while True:
    try:
        s = next(next_soup)
        print(s.article.text)
    except StopIteration:
        break

A crawler that returns structured information about articles related to Paul McCartney from Ultimate Classic Rock:
    - article title
    - article author
    - article date
    - featured image (URL)

In [47]:
def get_article_info_list(start_url: str, max_pages=1):
    """
    Returns structured information about articles related to Paul McCartney from a multi-page article list.
    :param start_url: the url of the starting page of a multi-page article list
    :param max_pages: the max number of pages to crawl
    :return: a list of tuples of info-items about the articles from a multi-page article list
    Creates and uses the following data:
    - article title
    - article author
    - article date
    - featured image (URL)
    """

    article_info_list = []
    next_soup = crawl(start_url, max_pages)

    while True:
        try:
            s = next(next_soup)
            for article in s.find_all('article')[:-1]:
                title = article.a.text
                author = article.find('div', {'class': 'auth-date'}).find('em').text.lstrip(' by  ')
                date = article.find('div', {'class': 'auth-date'}).time.text
                image = article.a.attrs['data-image']
                article_info_list.append((title, author, date, image))
        except StopIteration:
            break

    return article_info_list

In [None]:
print(article, '\n')
print(article.a.text)
print(article.find('div', {'class': 'auth-date'}).find('em').text.lstrip(' by  '))
print(article.find('div', {'class': 'auth-date'}).time.text)
print(article.a.attrs['data-image'])

Test `get_articles_info(start_url: str, max_pages=1)`.

In [48]:
a_info = get_article_info_list(start_url, 3)

In [49]:
for a in a_info:
    print(a)

('Did Paul McCartney Stretch the Truth About ‘Live and Let Die’?', 'Martin Kielty', 'December 19, 2022', 'https://townsquare.media/site/295/files/2022/12/attachment-mccartney.jpg')
('Paul McCartney Guitarist Brian Ray Doesn’t Mind All the Waiting', 'Martin Kielty', 'December 3, 2022', 'https://townsquare.media/site/295/files/2022/12/attachment-maccaray.jpg')
('When Paul McCartney Discovered Lennon-McCartney ‘Scam’', 'Martin Kielty', 'November 11, 2022', 'https://townsquare.media/site/295/files/2020/09/lennonmac.jpg')
('Paul McCartney and Elton John Star in Abbey Road Studios Movie', 'Martin Kielty', 'November 17, 2022', 'https://townsquare.media/site/295/files/2022/11/attachment-mccartney1.jpg')
('Slash, Lifeson, McCartney Sign On for Ukraine Benefit Auction', 'Martin Kielty', 'November 12, 2022', 'https://townsquare.media/site/295/files/2022/11/attachment-gibsons.jpg')
('Paul Simon Honored With CBS Tribute Concert', 'Allison Rapp', 'December 16, 2022', 'https://townsquare.media/site/2

Save the info returned by the crawler in a `csv` file.

In [None]:
# import csv
#
# csv_file = DATA_DIR / 'articles.csv'
# header_row = ['Title', 'Author', 'Date', 'Featured image']
# with open(csv_file, 'w', newline='', encoding='utf-8') as f:  # newline: avoid blank rows; encoding: enable ш,š...
#     out = csv.writer(f)
#     out.writerow(header_row)
#     out.writerows(article_info_list)

In [50]:
import csv

articles_csv = DATA_DIR / 'articles.csv'
header = ['Title', 'Author', 'Date', 'Featured image']
with open(articles_csv, 'w', newline='', encoding='utf-8') as f:
    out = csv.writer(f)
    out.writerow(header)
    out.writerows(a_info)

And show the `csv` file in Pandas.

In [51]:
# import pandas as pd
# article_info_table = pd.read_csv(csv_file)
# article_info_table
import pandas as pd
articles = pd.read_csv(articles_csv)
articles

Unnamed: 0,Title,Author,Date,Featured image
0,Did Paul McCartney Stretch the Truth About ‘Li...,Martin Kielty,"December 19, 2022",https://townsquare.media/site/295/files/2022/1...
1,Paul McCartney Guitarist Brian Ray Doesn’t Min...,Martin Kielty,"December 3, 2022",https://townsquare.media/site/295/files/2022/1...
2,When Paul McCartney Discovered Lennon-McCartne...,Martin Kielty,"November 11, 2022",https://townsquare.media/site/295/files/2020/0...
3,Paul McCartney and Elton John Star in Abbey Ro...,Martin Kielty,"November 17, 2022",https://townsquare.media/site/295/files/2022/1...
4,"Slash, Lifeson, McCartney Sign On for Ukraine ...",Martin Kielty,"November 12, 2022",https://townsquare.media/site/295/files/2022/1...
5,Paul Simon Honored With CBS Tribute Concert,Allison Rapp,"December 16, 2022",https://townsquare.media/site/295/files/2019/0...
6,Paul Stanley Has a Message for His Critics,Matt Wardlaw,"December 16, 2022",https://townsquare.media/site/295/files/2022/1...
7,Paul McCartney and Chrissie Hynde Join Foo Fig...,Matt Wardlaw,"September 3, 2022",https://townsquare.media/site/295/files/2022/0...
8,Paul Stanley Says Kiss Is ‘Far From Done’ as F...,Bryan Rolli,"December 14, 2022",https://townsquare.media/site/295/files/2022/1...
9,Paul Cook Shares His Big Sex Pistols Regret,Martin Kielty,"November 14, 2022",https://townsquare.media/site/295/files/2021/0...



