# Purpose

[Workout Wednesday](https://workout-wednesday.com/) hosts weekly challenges to improve data visualization skills in one of the following BI tools: Tableau, Power BI, CRM Analytics (Salesforce), and Sigma. Please see the README.md file for more information.

This notebook explores methods to scrape weekly challenges and store the data into a structured table format.

# Connect to the web page and explore its HTML elements

Given a URL, we can use BeautifulSoup to fetch data from the web page. Since the returned data are in HTML format, it's helpful to gain some familiarity with the basics of HTML. [W3Shools](https://www.w3schools.com/) is a good resource for this purpose.

In [None]:
# Install packages needed
%pip install requests 
%pip install beautifulsoup4 
%pip install lxml # parser for HTML elements
%pip install pandas

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.



In [11]:
# Connect to the web page

import requests
from bs4 import BeautifulSoup

# Target URL with all challenge data in Tableau
url = "https://workout-wednesday.com/latest"

# Send GET request
response = requests.get(url)
print("Status Code:", response.status_code)

# Parse HTML with BeautifulSoup
soup = BeautifulSoup(response.text, "lxml")

# Print title of the page
print(soup.title.text)


Status Code: 200
Latest Challenges – Workout Wednesday


Here, soup contains all HTML elements of the target url.

If you go to the target url, right click somewhere, and select inspect, you'll see them all.

![Inspect HTML elements of a web page](../images/inspect_webpage.png)

Here, I'm looking at a block of elements that make up a challenge card that we see on the web page.

![HTML elements of a card](../images/first_tableau_card.png)

In [None]:
# Here, soup contains all HTML elements of the target url
print(soup)

<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<link href="https://gmpg.org/xfn/11" rel="profile"/>
<title>Latest Challenges – Workout Wednesday</title>
<meta content="max-image-preview:large" name="robots"/>
<style>img:is([sizes="auto" i], [sizes^="auto," i]) { contain-intrinsic-size: 3000px 1500px }</style>
<link href="//fonts.googleapis.com" rel="dns-prefetch"/>
<link href="https://workout-wednesday.com/feed/" rel="alternate" title="Workout Wednesday » Feed" type="application/rss+xml"/>
<link href="https://workout-wednesday.com/comments/feed/" rel="alternate" title="Workout Wednesday » Comments Feed" type="application/rss+xml"/>
<!-- This site uses the Google Analytics by MonsterInsights plugin v8.20.1 - Using Analytics tracking - https://www.monsterinsights.com/ -->
<script async="" data-cfasync="false" data-wpfc-render="false" src="//www.googletagmanager.com/gtag/js?id=G-RGQ7VZ34MC"></script>

In [13]:
# Print title of the web page, including its HTML tag
print(soup.title)

<title>Latest Challenges – Workout Wednesday</title>


In [119]:
# The target url probably has no heading (h1)
print(soup.find('h1').text)

AttributeError: 'NoneType' object has no attribute 'text'

In [None]:
# Get all article links on the target page
posts = soup.find_all("article")

print(f"Found {len(posts)} posts")

for post in posts:
    title_tag = post.find("h2", class_="entry-title")
    link = title_tag.find("a")["href"] if title_tag else "No link"
    title = title_tag.text.strip() if title_tag else "No title"
    print(f"{title} -> {link}")


Found 201 posts
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No title -> No link
No t

In [9]:
# Print a collection of all articles
print(posts)


[<article class="post-1599 page type-page status-publish has-post-thumbnail ast-article-single" id="post-1599" itemscope="itemscope" itemtype="https://schema.org/CreativeWork">
<header class="entry-header ast-header-without-markup">
</header><!-- .entry-header -->
<div class="entry-content clear" itemprop="text">
<div class="elementor elementor-1599" data-elementor-id="1599" data-elementor-post-type="page" data-elementor-type="wp-page">
<section class="elementor-section elementor-top-section elementor-element elementor-element-1248366 elementor-section-boxed elementor-section-height-default elementor-section-height-default" data-element_type="section" data-id="1248366" data-settings='{"shape_divider_bottom":"mountains","background_background":"gradient"}'>
<div class="elementor-background-overlay"></div>
<div class="elementor-shape elementor-shape-bottom" data-negative="false">
<svg preserveaspectratio="none" viewbox="0 0 1000 100" xmlns="http://www.w3.org/2000/svg">
<path class="eleme

# Extract data for the first Tableau challenge card

Here, we extract data for the first challenge card. As we've seen that everything on a web page is just HTML elements underneath, the key then is to grab the HTML elements that represent the first card. The picture below suggests that the tag we're looking for is *article*.

![HTML elements of a card](../images/first_tableau_card.png)

Furthermore, we'll focus on extracting the following data from a card,

![Info to be extracted](../images/data_to_be_extracted.png)

In [None]:
# Fetch data from the target url
import requests
from bs4 import BeautifulSoup

url = "https://workout-wednesday.com/latest/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "lxml")


In [None]:
# Get info from all cards by grabbing the "article" tag
challenge_cards = soup.find_all("article", class_="elementor-post")
print(f"Found {len(challenge_cards)} challenges")


Found 200 challenges


In [None]:
# Pick the first card
card = challenge_cards[0]

In [20]:
card

<article class="elementor-post elementor-grid-item post-19177 post type-post status-publish format-standard has-post-thumbnail hentry category-tableau category-ww tag-containers tag-dynamic-zone-visibility">
<div class="elementor-post__card">
<a class="elementor-post__thumbnail__link" href="https://workout-wednesday.com/2025w16tab/" tabindex="-1"><div class="elementor-post__thumbnail"><img alt="" class="attachment-full size-full wp-image-19179" decoding="async" height="912" loading="lazy" src="https://workout-wednesday.com/wp-content/uploads/2025/04/WOW2025-Wk16.gif" width="1208"/></div></a>
<div class="elementor-post__badge">containers</div>
<div class="elementor-post__avatar">
<img alt="Kyle Yetter" class="avatar avatar-128 photo" height="128" src="https://secure.gravatar.com/avatar/c9562860300ed35d8f59195de500e49743ee87a839cc76c6de3c2577dc440c37?s=128&amp;d=mm&amp;r=g" srcset="https://secure.gravatar.com/avatar/c9562860300ed35d8f59195de500e49743ee87a839cc76c6de3c2577dc440c37?s=256&a

In [21]:
type(card)

bs4.element.Tag

In [22]:
card.get("class")

['elementor-post',
 'elementor-grid-item',
 'post-19177',
 'post',
 'type-post',
 'status-publish',
 'format-standard',
 'has-post-thumbnail',
 'hentry',
 'category-tableau',
 'category-ww',
 'tag-containers',
 'tag-dynamic-zone-visibility']

In [26]:
card.get("a")

In [None]:
# Extract challenge type
classes = card.get("class")  # What attribute holds the list of classes?

challenge_type = next(
    (c.replace("category-", "") for c in classes if c.startswith("category")),  # What prefix are we searching for?
    None
)

print("Challenge type:", challenge_type)


Challenge type: tableau


In [None]:
# Find the title of the card
title_tag = card.find("h3", class_="elementor-post__title")  # What class should we look for?


In [32]:
title_tag

<h3 class="elementor-post__title">
<a href="https://workout-wednesday.com/2025w16tab/">
				#WOW2025 Week 16 | Can you use Containers and Dynamic Zone Visibility?			</a>
</h3>

In [42]:
title_tag.text

'\n\n\t\t\t\t#WOW2025 Week 16 | Can you use Containers and Dynamic Zone Visibility?\t\t\t\n'

In [40]:
title_tag.find("a")['href']

str

In [None]:
# Find the posted date of the card
date_tag = card.find("span", class_="elementor-post-date")

In [44]:
date_tag

<span class="elementor-post-date">
			April 17, 2025		</span>

In [46]:
date_tag.text.strip()

'April 17, 2025'

In [None]:
# Find the contributor name and their profile picture
avatar_tag = card.find("img", class_="avatar")

In [50]:
avatar_tag

<img alt="Kyle Yetter" class="avatar avatar-128 photo" height="128" src="https://secure.gravatar.com/avatar/c9562860300ed35d8f59195de500e49743ee87a839cc76c6de3c2577dc440c37?s=128&amp;d=mm&amp;r=g" srcset="https://secure.gravatar.com/avatar/c9562860300ed35d8f59195de500e49743ee87a839cc76c6de3c2577dc440c37?s=256&amp;d=mm&amp;r=g 2x" width="128"/>

In [53]:
avatar_tag.get("alt")

'Kyle Yetter'

In [54]:
avatar_tag.get("src")

'https://secure.gravatar.com/avatar/c9562860300ed35d8f59195de500e49743ee87a839cc76c6de3c2577dc440c37?s=128&d=mm&r=g'

**As a practice, let's find all the information again. But this time, asign the extracted information to variables so in the end, we can wrap them up inside a dictionary.**

In [None]:
# Find the skills that the challenge focuses on
challenge_tag = card.find("div", class_="elementor-post__badge")
challenge_skills = challenge_tag.text.strip()

In [74]:
challenge_skills

'containers'

In [71]:
avatar_tag = card.find("img", class_="avatar")
contributor_name = avatar_tag.get("alt")
contributor_avatar = avatar_tag.get("src")

In [72]:
print(contributor_name)
print(contributor_avatar)

Kyle Yetter
https://secure.gravatar.com/avatar/c9562860300ed35d8f59195de500e49743ee87a839cc76c6de3c2577dc440c37?s=128&d=mm&r=g


In [77]:
title_tag = card.find("h3", class_="elementor-post__title")
challenge_name = title_tag.find("a").text.strip()
challenge_link = title_tag.find("a").get("href")

In [78]:
print(challenge_name)
print(challenge_link)

#WOW2025 Week 16 | Can you use Containers and Dynamic Zone Visibility?
https://workout-wednesday.com/2025w16tab/


In [83]:
date_tag = card.find("span", class_="elementor-post-date")
challenge_date = date_tag.text.strip() if date_tag else None

In [84]:
print(challenge_date)

April 17, 2025


In [None]:
# Dictionary containing extracted information
challenge_data = {
    "challenge_type": ["tableau"], # hard code this value since we know that we're extracting information about Tableau challenges
    "challenge_skills": challenge_skills,
    "contributor_name": contributor_name,
    "contributor_avatar": contributor_avatar,
    "challenge_name": challenge_name,
    "challenge_link": challenge_link,
    "challenge_date": challenge_date,
}

print(challenge_data)


{'challenge_type': 'tableau', 'challenge_skills': 'containers', 'contributor_name': 'Kyle Yetter', 'contributor_avatar': 'https://secure.gravatar.com/avatar/c9562860300ed35d8f59195de500e49743ee87a839cc76c6de3c2577dc440c37?s=128&d=mm&r=g', 'challenge_name': '#WOW2025 Week 16 | Can you use Containers and Dynamic Zone Visibility?', 'challenge_link': 'https://workout-wednesday.com/2025w16tab/', 'challenge_date': 'April 17, 2025'}


**As a practice, let's extract information from the second card.**

In [87]:
# Get the second card
card = challenge_cards[1]

In [88]:
challenge_type = "tableau"

In [None]:
# Notice that this time, the challenge didn't mention any skills so we set the variable to None
challenge_tag = card.find("div", class_="elementor-post__badge")
challenge_skills = challenge_tag.text.strip() if challenge_tag else None

print(challenge_skills)

None


In [90]:
avatar_tag = card.find("img", class_="avatar")
contributor_name = avatar_tag.get("alt")
contributor_avatar = avatar_tag.get("src")

print(contributor_name)
print(contributor_avatar)

Erica Hughes
https://secure.gravatar.com/avatar/be165cc10d8a1d1b33dee8d16abe513469d84ad7b417826b77f0d4360781695e?s=128&d=mm&r=g


In [91]:
title_tag = card.find("h3", class_="elementor-post__title")
challenge_name = title_tag.find("a").text.strip()
challenge_link = title_tag.find("a").get("href")

print(challenge_name)
print(challenge_link)

#WOW2025 | Week 15 | Community Month: Anna Clara Gatti
https://workout-wednesday.com/2025w15tab/


In [93]:
date_tag = card.find("span", class_="elementor-post-date")
challenge_date = date_tag.text.strip() if date_tag else None

print(challenge_date)

April 8, 2025


In [96]:
challenge_data = {
    "challenge_type": challenge_type,
    "challenge_skills": challenge_skills,
    "contributor_name": contributor_name,
    "contributor_avatar": contributor_avatar,
    "challenge_name": challenge_name,
    "challenge_link": challenge_link,
    "challenge_date": challenge_date,
}

print(challenge_data)

{'challenge_type': 'tableau', 'challenge_skills': None, 'contributor_name': 'Erica Hughes', 'contributor_avatar': 'https://secure.gravatar.com/avatar/be165cc10d8a1d1b33dee8d16abe513469d84ad7b417826b77f0d4360781695e?s=128&d=mm&r=g', 'challenge_name': '#WOW2025 | Week 15 | Community Month: Anna Clara Gatti', 'challenge_link': 'https://workout-wednesday.com/2025w15tab/', 'challenge_date': 'April 8, 2025'}


# Extract information for all cards at the target url

Now, we wrap up all the code inside a for loop to extract information from all the cards at the target url.

In [97]:
challenge_cards = soup.find_all("article", class_="elementor-post")
print(f"Found {len(challenge_cards)} challenges")

Found 200 challenges


In [103]:
challenge_data = {
    "challenge_type": [],
    "challenge_skills": [],
    "contributor_name": [],
    "contributor_avatar": [],
    "challenge_name": [],
    "challenge_link": [],
    "challenge_date": [],
}

for card in challenge_cards:
    try:
        challenge_type = "tableau"
        challenge_data["challenge_type"].append(challenge_type)

        challenge_tag = card.find("div", class_="elementor-post__badge")
        challenge_skills = challenge_tag.text.strip().lower() if challenge_tag else None
        challenge_data["challenge_skills"].append(challenge_skills)

        avatar_tag = card.find("img", class_="avatar")
        contributor_name = avatar_tag.get("alt") if avatar_tag else None
        contributor_avatar = avatar_tag.get("src") if avatar_tag else None
        challenge_data["contributor_name"].append(contributor_name)
        challenge_data["contributor_avatar"].append(contributor_avatar)

        title_tag = card.find("h3", class_="elementor-post__title")
        challenge_name = title_tag.find("a").text.strip()
        challenge_link = title_tag.find("a").get("href")
        challenge_data["challenge_name"].append(challenge_name)
        challenge_data["challenge_link"].append(challenge_link)

        date_tag = card.find("span", class_="elementor-post-date")
        challenge_date = date_tag.text.strip() if date_tag else None
        challenge_data["challenge_date"].append(challenge_date)
    except:
        print(card)

In [None]:
# Convert into a dataframe
import pandas as pd

df = pd.DataFrame(challenge_data)
df.head()

# Wrap up everything inside a function

We're reaching the end of this notebook. At this point, it'd be beneficial to wrap up everything inside a function so we reuse the code. The *scrape_and_save_challenges* function allows users to specify a challenge track (e.g., Tableau, Power BI, etc.) and it will scrape all challenges from that track and save the data as a csv file. 

Our function also needs to be robust to deal with **pagination**, that is if a track has multiple web pages. To this end, the *while* loop goes through every possible page, at each page, it runs the *for* loop to extract information from each card on the page.

Finally, returned data are stored in a respective timestamp folder every time the function is run. This will ensure we will have always have good scraped data. For instance, if the site admin screws up page 2 in the Tableau challenge track today, we can still roll back to the good data we scraped sometime ago. In a way, this is similar to data versioning.

In [116]:
from pathlib import Path
from datetime import date
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

def scrape_and_save_challenges(challenge_track):
    base_url = "https://workout-wednesday.com"
    if challenge_track == "tableau":
        url = base_url + "/latest/"
    elif challenge_track == "power-bi":
        url = base_url + "/power-bi-challenges/"
    elif challenge_track == "crm-analytics":
        url = base_url + "/crm-analytics-challenges/"
    elif challenge_track == "sigma":
        url = base_url + "/sigma-challenges/"
    else:
        raise ValueError(f"Unsupported challenge track: {challenge_track}")

    # Prepare to collect all data
    challenge_data = {
        "challenge_track": [],
        "challenge_name": [],
        "challenge_link": [],
        "post_date": [],
        "contributor_name": [],
        "contributor_avatar": [],
        "challenge_skills": [],
    }

    while url:
        print(f"Scraping page: {url}")
        time.sleep(1)
        response = requests.get(url)
        soup = BeautifulSoup(response.text, "lxml")

        # Get all cards on the page
        cards = soup.find_all("article", class_="elementor-post")

        for card in cards:
            try:
                challenge_data["challenge_track"].append(challenge_track)

                title_tag = card.find("h3", class_="elementor-post__title")
                link_tag = title_tag.find("a") if title_tag else None
                challenge_name = link_tag.text.strip() if link_tag else None
                challenge_link = link_tag.get("href") if link_tag else None
                challenge_data["challenge_name"].append(challenge_name)
                challenge_data["challenge_link"].append(challenge_link)

                date_tag = card.find("span", class_="elementor-post-date")
                post_date = date_tag.text.strip() if date_tag else None
                challenge_data["post_date"].append(post_date)

                avatar_tag = card.find("img", class_="avatar")
                contributor_name = avatar_tag.get("alt") if avatar_tag else None
                contributor_avatar = avatar_tag.get("src") if avatar_tag else None
                challenge_data["contributor_avatar"].append(contributor_avatar)
                challenge_data["contributor_name"].append(contributor_name)

                badge = card.find("div", class_="elementor-post__badge")
                challenge_skill = badge.text.strip().lower() if badge else None
                challenge_data["challenge_skills"].append(challenge_skill)
        
            except Exception as e:
                print("Error while parsing a card:", e)
                continue
        print(f"🧾 Found {len(cards)} cards on this page.") 
        
        # Find the "Next" page link
        next_button = soup.find("a", class_="page-numbers next")
        url = next_button.get("href") if next_button else None

    # Save the results to a dated folder
    today = date.today().isoformat()
    folder_path = Path("..") / "data" / "raw" / today
    folder_path.mkdir(parents=True, exist_ok=True)

    df = pd.DataFrame(challenge_data)
    file_path = folder_path / f"{challenge_track}_challenges.csv"
    df.to_csv(file_path, index=False)

    print(f"✅ Scraped {len(df)} challenges total.")
    print(f"📁 Saved to: {file_path}")

    return df

df = scrape_and_save_challenges(challenge_track="tableau")
df = scrape_and_save_challenges(challenge_track="power-bi")
df = scrape_and_save_challenges(challenge_track="crm-analytics")
df = scrape_and_save_challenges(challenge_track="sigma")
df.head()

Scraping page: https://workout-wednesday.com/latest/
🧾 Found 200 cards on this page.
Scraping page: https://workout-wednesday.com/latest/2/
🧾 Found 200 cards on this page.
Scraping page: https://workout-wednesday.com/latest/3/
🧾 Found 34 cards on this page.
✅ Scraped 434 challenges total.
📁 Saved to: ..\data\raw\2025-04-22\tableau_challenges.csv
Scraping page: https://workout-wednesday.com/power-bi-challenges/
🧾 Found 200 cards on this page.
Scraping page: https://workout-wednesday.com/power-bi-challenges/2/
🧾 Found 19 cards on this page.
✅ Scraped 219 challenges total.
📁 Saved to: ..\data\raw\2025-04-22\power-bi_challenges.csv
Scraping page: https://workout-wednesday.com/crm-analytics-challenges/
🧾 Found 58 cards on this page.
✅ Scraped 58 challenges total.
📁 Saved to: ..\data\raw\2025-04-22\crm-analytics_challenges.csv
Scraping page: https://workout-wednesday.com/sigma-challenges/
🧾 Found 68 cards on this page.
✅ Scraped 68 challenges total.
📁 Saved to: ..\data\raw\2025-04-22\sigma_c

Unnamed: 0,challenge_track,challenge_name,challenge_link,post_date,contributor_name,contributor_avatar,challenge_skills
0,sigma,2025 Week 16 | Sigma: Can you update and delete?,https://workout-wednesday.com/2025-week-16-sig...,"April 16, 2025",Ashley Bennett,https://secure.gravatar.com/avatar/f706b14658f...,input tables
1,sigma,2025 Week 15 | Sigma: Can You Override Elegantly?,https://workout-wednesday.com/2025-week-15-sig...,"April 9, 2025",Eric Heidbreder,https://secure.gravatar.com/avatar/f6ae8fdb3d4...,input tables
2,sigma,2025 Week 14 | Sigma: Can you Create a Small M...,https://workout-wednesday.com/2025-week-14-sig...,"April 3, 2025",Carter Voekel,https://secure.gravatar.com/avatar/31c23237708...,design
3,sigma,2025 Week 12 | Sigma: Can you save space?,https://workout-wednesday.com/2025-week-12-sig...,"March 19, 2025",Ashley Bennett,https://secure.gravatar.com/avatar/f706b14658f...,modals
4,sigma,2025 Week 11 | Sigma: Can you use a legend?,https://workout-wednesday.com/2025-week-11-sig...,"March 12, 2025",Eric Heidbreder,https://secure.gravatar.com/avatar/f6ae8fdb3d4...,legend


# Conclusion

In this notebook, we explored methods to scrape information about data visualization challenges on Workout Wednesday. In subsequent notebooks, we'll discuss how to prep this data for various usage. Even though we haven't scraped the detail of each challenge (e.g., description, solutions, comments, etc.), this dataset is already very useful. For example, one can figure out who's the most prolific contributor for the Tableau challenge track, or who has dropped off the from race. A nice follow-up application is to send out an automatic email inviting them to contribute again to the community 🚀🚀🚀