# Purpose

[Workout Wednesday](https://workout-wednesday.com/) is a platform that hosts weekly data visualization challenges across various BI tools including Tableau, Power BI, CRM Analytics (Salesforce), and Sigma.

This notebook demonstrates methods to scrape these weekly challenges and structure the extracted data into a tabular format for further analysis. For more context about the project setup and goals, please refer to the `README.md` file.


## Connect to the Web Page

Given a URL, we can use BeautifulSoup to fetch the page content. Since this content is returned in HTML format, it's helpful to be familiar with HTML basics. [W3Schools](https://www.w3schools.com/) is a great resource for learning HTML structure and elements.

In [1]:
# Install packages needed
# %pip install requests 
# %pip install beautifulsoup4 
# %pip install lxml # parser for HTML elements
# %pip install pandas

In [2]:
# Connect to the web page

import requests
from bs4 import BeautifulSoup

# Target URL with all challenge data in Tableau
url = "https://workout-wednesday.com/latest"

# Send GET request
response = requests.get(url)
print("Status Code:", response.status_code)

# Parse HTML with BeautifulSoup
soup = BeautifulSoup(response.text, "lxml")

# Print title of the page
print(soup.title.text)


Status Code: 200
Latest Challenges – Workout Wednesday


Here, the `soup` object contains all the HTML elements from the target URL.

To inspect these elements manually:
- Visit the URL
- Right-click on the page
- Choose **Inspect**

![Inspect HTML elements of a web page](../images/inspect_webpage.png)

Below is an example of one full challenge "card" shown in HTML:

![HTML elements of a card](../images/first_tableau_card.png)

**Note:** As the web page continuously publishes new challenges every Wednesday, your output may look different from what's shown here.


In [3]:
# Here, soup contains all HTML elements of the target url
# It's too long so we print out just the fist 1000 characters
print(soup.prettify()[:1000])

<!DOCTYPE html>
<html lang="en-US">
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <link href="https://gmpg.org/xfn/11" rel="profile"/>
  <title>
   Latest Challenges – Workout Wednesday
  </title>
  <meta content="max-image-preview:large" name="robots"/>
  <style>
   img:is([sizes="auto" i], [sizes^="auto," i]) { contain-intrinsic-size: 3000px 1500px }
  </style>
  <link href="//fonts.googleapis.com" rel="dns-prefetch"/>
  <link href="https://workout-wednesday.com/feed/" rel="alternate" title="Workout Wednesday » Feed" type="application/rss+xml"/>
  <link href="https://workout-wednesday.com/comments/feed/" rel="alternate" title="Workout Wednesday » Comments Feed" type="application/rss+xml"/>
  <!-- This site uses the Google Analytics by MonsterInsights plugin v8.20.1 - Using Analytics tracking - https://www.monsterinsights.com/ -->
  <script async="" data-cfasync="false" data-wpfc-render="false" src="//www.googletagmanager.c

In [4]:
# Print title of the web page, including its HTML tag
print(soup.title)

<title>Latest Challenges – Workout Wednesday</title>


In [5]:
# The target url probably has no heading (h1)
print(soup.find('h1').text)

AttributeError: 'NoneType' object has no attribute 'text'

In [6]:
# Get all article links on the target page
posts = soup.find_all("article")

print(f"Found {len(posts)} posts")

for post in posts:
    title_tag = post.find("h2", class_="entry-title")
    link = title_tag.find("a")["href"] if title_tag else "No link"
    title = title_tag.text.strip() if title_tag else "No title"
    #print(f"{title} -> {link}") # It's all No title -> No link


Found 201 posts


In [7]:
# Print a collection of all articles
print(str(posts)[:1000])


[<article class="post-3169 page type-page status-publish ast-article-single" id="post-3169" itemscope="itemscope" itemtype="https://schema.org/CreativeWork">
<header class="entry-header ast-header-without-markup">
</header><!-- .entry-header -->
<div class="entry-content clear" itemprop="text">
<div class="elementor elementor-3169" data-elementor-id="3169" data-elementor-post-type="page" data-elementor-type="wp-page">
<section class="elementor-section elementor-top-section elementor-element elementor-element-a681edc elementor-section-boxed elementor-section-height-default elementor-section-height-default" data-element_type="section" data-id="a681edc">
<div class="elementor-container elementor-column-gap-default">
<div class="elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-e250497" data-element_type="column" data-id="e250497">
<div class="elementor-widget-wrap elementor-element-populated">
<div class="elementor-element elementor-element-5ac3c8

# Extract Info from One Card

Every visible component on the web page is represented by underlying HTML tags. To extract data from a challenge card, we locate its surrounding tag — in this case, it's the `<article>` element.

![HTML elements of a card](../images/first_tableau_card.png)

We'll focus on extracting the following fields from each challenge card:

![Info to be extracted](../images/data_to_be_extracted.png)


In [8]:
# Fetch data from the target url
import requests
from bs4 import BeautifulSoup

url = "https://workout-wednesday.com/latest/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "lxml")


In [9]:
# Get info from all cards by grabbing the "article" tag
challenge_cards = soup.find_all("article", class_="elementor-post")
print(f"Found {len(challenge_cards)} challenges")


Found 200 challenges


In [10]:
# Pick the first card
card = challenge_cards[0]

In [11]:
print(card.prettify()[:1000])

<article class="elementor-post elementor-grid-item post-19209 post type-post status-publish format-standard has-post-thumbnail hentry category-tableau category-ww tag-parameters">
 <div class="elementor-post__card">
  <a class="elementor-post__thumbnail__link" href="https://workout-wednesday.com/2025w17tab/" tabindex="-1">
   <div class="elementor-post__thumbnail">
    <img alt="" class="attachment-full size-full wp-image-19215" decoding="async" height="832" loading="lazy" sizes="(max-width: 893px) 100vw, 893px" src="https://workout-wednesday.com/wp-content/uploads/2025/04/WOW2025_17.png" srcset="https://workout-wednesday.com/wp-content/uploads/2025/04/WOW2025_17.png 893w, https://workout-wednesday.com/wp-content/uploads/2025/04/WOW2025_17-300x280.png 300w, https://workout-wednesday.com/wp-content/uploads/2025/04/WOW2025_17-768x716.png 768w" width="893"/>
   </div>
  </a>
  <div class="elementor-post__badge">
   Parameters
  </div>
  <div class="elementor-post__avatar">
   <img alt="Do

In [12]:
card.get("class")

['elementor-post',
 'elementor-grid-item',
 'post-19209',
 'post',
 'type-post',
 'status-publish',
 'format-standard',
 'has-post-thumbnail',
 'hentry',
 'category-tableau',
 'category-ww',
 'tag-parameters']

In [13]:
card.get("a")

In [14]:
# Extract challenge type
classes = card.get("class")  # What attribute holds the list of classes?

challenge_type = next(
    (c.replace("category-", "") for c in classes if c.startswith("category")),  # What prefix are we searching for?
    None
)

print("Challenge type:", challenge_type)


Challenge type: tableau


In [15]:
# Find the title of the card
title_tag = card.find("h3", class_="elementor-post__title")  # What class should we look for?


In [16]:
title_tag

<h3 class="elementor-post__title">
<a href="https://workout-wednesday.com/2025w17tab/">
				#WOW2025 | WEEK 17 | Can you switch measures?			</a>
</h3>

In [17]:
title_tag.text

'\n\n\t\t\t\t#WOW2025 | WEEK 17 | Can you switch measures?\t\t\t\n'

In [18]:
title_tag.find("a")['href']

'https://workout-wednesday.com/2025w17tab/'

In [19]:
# Find the posted date of the card
date_tag = card.find("span", class_="elementor-post-date")

In [20]:
date_tag

<span class="elementor-post-date">
			April 22, 2025		</span>

In [21]:
date_tag.text.strip()

'April 22, 2025'

In [22]:
# Find the contributor name and their profile picture
avatar_tag = card.find("img", class_="avatar")

In [23]:
avatar_tag

<img alt="Donna Coles" class="avatar avatar-128 photo" height="128" src="https://secure.gravatar.com/avatar/1904e3dc24c53453cdaba442aa3dbbaf8957dc2cb9ae3d815b0e01e0403f9224?s=128&amp;d=mm&amp;r=g" srcset="https://secure.gravatar.com/avatar/1904e3dc24c53453cdaba442aa3dbbaf8957dc2cb9ae3d815b0e01e0403f9224?s=256&amp;d=mm&amp;r=g 2x" width="128"/>

In [24]:
avatar_tag.get("alt")

'Donna Coles'

In [25]:
avatar_tag.get("src")

'https://secure.gravatar.com/avatar/1904e3dc24c53453cdaba442aa3dbbaf8957dc2cb9ae3d815b0e01e0403f9224?s=128&d=mm&r=g'

**Practice:**  
Let’s extract all the required information from the first card again and assign the results to variables. At the end, we’ll store the extracted values in a dictionary for clarity and reusability.

In [26]:
# Find the skills that the challenge focuses on
challenge_tag = card.find("div", class_="elementor-post__badge")
challenge_skills = challenge_tag.text.strip()

In [27]:
challenge_skills

'Parameters'

In [28]:
avatar_tag = card.find("img", class_="avatar")
contributor_name = avatar_tag.get("alt")
contributor_avatar = avatar_tag.get("src")

In [29]:
print(contributor_name)
print(contributor_avatar)

Donna Coles
https://secure.gravatar.com/avatar/1904e3dc24c53453cdaba442aa3dbbaf8957dc2cb9ae3d815b0e01e0403f9224?s=128&d=mm&r=g


In [30]:
title_tag = card.find("h3", class_="elementor-post__title")
challenge_name = title_tag.find("a").text.strip()
challenge_link = title_tag.find("a").get("href")

In [31]:
print(challenge_name)
print(challenge_link)

#WOW2025 | WEEK 17 | Can you switch measures?
https://workout-wednesday.com/2025w17tab/


In [32]:
date_tag = card.find("span", class_="elementor-post-date")
challenge_date = date_tag.text.strip() if date_tag else None

In [33]:
print(challenge_date)

April 22, 2025


In [34]:
# Dictionary containing extracted information
challenge_data = {
    "challenge_type": ["tableau"], # hard code this value since we know that we're extracting information about Tableau challenges
    "challenge_skills": challenge_skills,
    "contributor_name": contributor_name,
    "contributor_avatar": contributor_avatar,
    "challenge_name": challenge_name,
    "challenge_link": challenge_link,
    "challenge_date": challenge_date,
}

print(challenge_data)

{'challenge_type': ['tableau'], 'challenge_skills': 'Parameters', 'contributor_name': 'Donna Coles', 'contributor_avatar': 'https://secure.gravatar.com/avatar/1904e3dc24c53453cdaba442aa3dbbaf8957dc2cb9ae3d815b0e01e0403f9224?s=128&d=mm&r=g', 'challenge_name': '#WOW2025 | WEEK 17 | Can you switch measures?', 'challenge_link': 'https://workout-wednesday.com/2025w17tab/', 'challenge_date': 'April 22, 2025'}


**Practice:**  
Now, try repeating the extraction process for the second challenge card.

In [35]:
# Get the second card
card = challenge_cards[1]

In [36]:
challenge_type = "tableau"

In [None]:
# Return None if the challenge doesn't mention any skills
challenge_tag = card.find("div", class_="elementor-post__badge")
challenge_skills = challenge_tag.text.strip() if challenge_tag else None

print(challenge_skills)

containers


In [38]:
avatar_tag = card.find("img", class_="avatar")
contributor_name = avatar_tag.get("alt")
contributor_avatar = avatar_tag.get("src")

print(contributor_name)
print(contributor_avatar)

Kyle Yetter
https://secure.gravatar.com/avatar/c9562860300ed35d8f59195de500e49743ee87a839cc76c6de3c2577dc440c37?s=128&d=mm&r=g


In [39]:
title_tag = card.find("h3", class_="elementor-post__title")
challenge_name = title_tag.find("a").text.strip()
challenge_link = title_tag.find("a").get("href")

print(challenge_name)
print(challenge_link)

#WOW2025 Week 16 | Can you use Containers and Dynamic Zone Visibility?
https://workout-wednesday.com/2025w16tab/


In [40]:
date_tag = card.find("span", class_="elementor-post-date")
challenge_date = date_tag.text.strip() if date_tag else None

print(challenge_date)

April 17, 2025


In [41]:
challenge_data = {
    "challenge_type": challenge_type,
    "challenge_skills": challenge_skills,
    "contributor_name": contributor_name,
    "contributor_avatar": contributor_avatar,
    "challenge_name": challenge_name,
    "challenge_link": challenge_link,
    "challenge_date": challenge_date,
}

print(challenge_data)

{'challenge_type': 'tableau', 'challenge_skills': 'containers', 'contributor_name': 'Kyle Yetter', 'contributor_avatar': 'https://secure.gravatar.com/avatar/c9562860300ed35d8f59195de500e49743ee87a839cc76c6de3c2577dc440c37?s=128&d=mm&r=g', 'challenge_name': '#WOW2025 Week 16 | Can you use Containers and Dynamic Zone Visibility?', 'challenge_link': 'https://workout-wednesday.com/2025w16tab/', 'challenge_date': 'April 17, 2025'}


# Scraping All Cards from a Page

Now that we understand how to extract a single card’s data, we'll wrap this logic inside a loop to collect information from **all cards** on the page.

In [42]:
challenge_cards = soup.find_all("article", class_="elementor-post")
print(f"Found {len(challenge_cards)} challenges")

Found 200 challenges


In [43]:
challenge_data = {
    "challenge_type": [],
    "challenge_skills": [],
    "contributor_name": [],
    "contributor_avatar": [],
    "challenge_name": [],
    "challenge_link": [],
    "challenge_date": [],
}

for card in challenge_cards:
    try:
        challenge_type = "tableau"
        challenge_data["challenge_type"].append(challenge_type)

        challenge_tag = card.find("div", class_="elementor-post__badge")
        challenge_skills = challenge_tag.text.strip().lower() if challenge_tag else None
        challenge_data["challenge_skills"].append(challenge_skills)

        avatar_tag = card.find("img", class_="avatar")
        contributor_name = avatar_tag.get("alt") if avatar_tag else None
        contributor_avatar = avatar_tag.get("src") if avatar_tag else None
        challenge_data["contributor_name"].append(contributor_name)
        challenge_data["contributor_avatar"].append(contributor_avatar)

        title_tag = card.find("h3", class_="elementor-post__title")
        challenge_name = title_tag.find("a").text.strip()
        challenge_link = title_tag.find("a").get("href")
        challenge_data["challenge_name"].append(challenge_name)
        challenge_data["challenge_link"].append(challenge_link)

        date_tag = card.find("span", class_="elementor-post-date")
        challenge_date = date_tag.text.strip() if date_tag else None
        challenge_data["challenge_date"].append(challenge_date)
    except:
        print(card)

In [44]:
# Convert into a dataframe
import pandas as pd

df = pd.DataFrame(challenge_data)
df.head()

Unnamed: 0,challenge_type,challenge_skills,contributor_name,contributor_avatar,challenge_name,challenge_link,challenge_date
0,tableau,parameters,Donna Coles,https://secure.gravatar.com/avatar/1904e3dc24c...,#WOW2025 | WEEK 17 | Can you switch measures?,https://workout-wednesday.com/2025w17tab/,"April 22, 2025"
1,tableau,containers,Kyle Yetter,https://secure.gravatar.com/avatar/c9562860300...,#WOW2025 Week 16 | Can you use Containers and ...,https://workout-wednesday.com/2025w16tab/,"April 17, 2025"
2,tableau,,Erica Hughes,https://secure.gravatar.com/avatar/be165cc10d8...,#WOW2025 | Week 15 | Community Month: Anna Cla...,https://workout-wednesday.com/2025w15tab/,"April 8, 2025"
3,tableau,packed bubble,Yusuke Nakanishi,https://secure.gravatar.com/avatar/64dbbade799...,#WOW2025 | Week 14 | Which products have low p...,https://workout-wednesday.com/2025%ef%bd%9714tab/,"April 1, 2025"
4,tableau,dynamic zone visibility,Yoshitaka Arakawa,https://secure.gravatar.com/avatar/ad4357a314b...,#WOW2025 | Week 13 | Data Storytelling with Dy...,https://workout-wednesday.com/2025w13tab/,"March 25, 2025"


# Finalize as a Function

We’re nearing the end of this notebook. To make our scraping logic reusable, we’ll encapsulate it into a function: `scrape_and_save_challenges`.

This function allows you to specify a challenge track (e.g., Tableau, Power BI), and it will scrape **all challenge cards** from that track, across **multiple pages** (pagination included), and save the data as a `.csv` file.

Each time the function runs, it saves the result in a date-stamped folder. This acts like a versioning system — so if, for example, the Tableau page 2 is broken in a future scrape, we can restore a previous, working version of that data.

In [45]:
from pathlib import Path
from datetime import date
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

def scrape_and_save_challenges(challenge_track):
    base_url = "https://workout-wednesday.com"
    if challenge_track == "tableau":
        url = base_url + "/latest/"
    elif challenge_track == "power-bi":
        url = base_url + "/power-bi-challenges/"
    elif challenge_track == "crm-analytics":
        url = base_url + "/crm-analytics-challenges/"
    elif challenge_track == "sigma":
        url = base_url + "/sigma-challenges/"
    else:
        raise ValueError(f"Unsupported challenge track: {challenge_track}")

    # Prepare to collect all data
    challenge_data = {
        "challenge_track": [],
        "challenge_name": [],
        "challenge_link": [],
        "post_date": [],
        "contributor_name": [],
        "contributor_avatar": [],
        "challenge_skills": [],
    }

    while url:
        print(f"Scraping page: {url}")
        time.sleep(1)
        response = requests.get(url)
        soup = BeautifulSoup(response.text, "lxml")

        # Get all cards on the page
        cards = soup.find_all("article", class_="elementor-post")

        for card in cards:
            try:
                challenge_data["challenge_track"].append(challenge_track)

                title_tag = card.find("h3", class_="elementor-post__title")
                link_tag = title_tag.find("a") if title_tag else None
                challenge_name = link_tag.text.strip() if link_tag else None
                challenge_link = link_tag.get("href") if link_tag else None
                challenge_data["challenge_name"].append(challenge_name)
                challenge_data["challenge_link"].append(challenge_link)

                date_tag = card.find("span", class_="elementor-post-date")
                post_date = date_tag.text.strip() if date_tag else None
                challenge_data["post_date"].append(post_date)

                avatar_tag = card.find("img", class_="avatar")
                contributor_name = avatar_tag.get("alt") if avatar_tag else None
                contributor_avatar = avatar_tag.get("src") if avatar_tag else None
                challenge_data["contributor_avatar"].append(contributor_avatar)
                challenge_data["contributor_name"].append(contributor_name)

                badge = card.find("div", class_="elementor-post__badge")
                challenge_skill = badge.text.strip().lower() if badge else None
                challenge_data["challenge_skills"].append(challenge_skill)
        
            except Exception as e:
                print("Error while parsing a card:", e)
                continue
        print(f"🧾 Found {len(cards)} cards on this page.") 
        
        # Find the "Next" page link
        next_button = soup.find("a", class_="page-numbers next")
        url = next_button.get("href") if next_button else None

    # Save the results to a dated folder
    today = date.today().isoformat()
    folder_path = Path("..") / "data" / "raw" / today
    folder_path.mkdir(parents=True, exist_ok=True)

    df = pd.DataFrame(challenge_data)
    file_path = folder_path / f"{challenge_track}_challenges.csv"
    df.to_csv(file_path, index=False)

    print(f"✅ Scraped {len(df)} challenges total.")
    print(f"📁 Saved to: {file_path}")

    return df

df = scrape_and_save_challenges(challenge_track="tableau")
df = scrape_and_save_challenges(challenge_track="power-bi")
df = scrape_and_save_challenges(challenge_track="crm-analytics")
df = scrape_and_save_challenges(challenge_track="sigma")
df.head()

Scraping page: https://workout-wednesday.com/latest/
🧾 Found 200 cards on this page.
Scraping page: https://workout-wednesday.com/latest/2/
🧾 Found 200 cards on this page.
Scraping page: https://workout-wednesday.com/latest/3/
🧾 Found 34 cards on this page.
✅ Scraped 434 challenges total.
📁 Saved to: ..\data\raw\2025-04-23\tableau_challenges.csv
Scraping page: https://workout-wednesday.com/power-bi-challenges/
🧾 Found 200 cards on this page.
Scraping page: https://workout-wednesday.com/power-bi-challenges/2/
🧾 Found 19 cards on this page.
✅ Scraped 219 challenges total.
📁 Saved to: ..\data\raw\2025-04-23\power-bi_challenges.csv
Scraping page: https://workout-wednesday.com/crm-analytics-challenges/
🧾 Found 58 cards on this page.
✅ Scraped 58 challenges total.
📁 Saved to: ..\data\raw\2025-04-23\crm-analytics_challenges.csv
Scraping page: https://workout-wednesday.com/sigma-challenges/
🧾 Found 70 cards on this page.
✅ Scraped 70 challenges total.
📁 Saved to: ..\data\raw\2025-04-23\sigma_c

Unnamed: 0,challenge_track,challenge_name,challenge_link,post_date,contributor_name,contributor_avatar,challenge_skills
0,sigma,2025 Week 17 | Sigma : Can you create these bars?,https://workout-wednesday.com/2025-week-17-sig...,"April 23, 2025",Katrina Menne,https://secure.gravatar.com/avatar/f480d17c94e...,design
1,sigma,2025 Week 16 | Sigma: Can you update and delete?,https://workout-wednesday.com/2025-week-16-sig...,"April 16, 2025",Ashley Bennett,https://secure.gravatar.com/avatar/f706b14658f...,input tables
2,sigma,2025 Week 15 | Sigma: Can You Override Elegantly?,https://workout-wednesday.com/2025-week-15-sig...,"April 9, 2025",Eric Heidbreder,https://secure.gravatar.com/avatar/f6ae8fdb3d4...,input tables
3,sigma,2025 Week 14 | Sigma: Can you Create a Small M...,https://workout-wednesday.com/2025-week-14-sig...,"April 3, 2025",Carter Voekel,https://secure.gravatar.com/avatar/31c23237708...,design
4,sigma,2025 Week 13 | Sigma : Can you track the winni...,https://workout-wednesday.com/2025-week-13-sig...,"March 27, 2025",Katrina Menne,https://secure.gravatar.com/avatar/f480d17c94e...,groupings


# Conclusion

In this notebook, we explored how to scrape metadata about data visualization challenges from the Workout Wednesday website. Although we haven’t scraped deeper challenge details (such as descriptions, solutions, or comments), the dataset we’ve built is already valuable.

For example, we could identify the most prolific contributors in each challenge track, or detect which contributors have recently stopped participating. A potential future application would be to automatically email top contributors to re-engage them in the community 🚀🚀🚀

This notebook lays a strong foundation for downstream data preparation and analysis.