# Workout Webscraper &#x1F3CB;

<br>

**Mission:** The objective of this project was to collect a comprehensive array of workout information to populate an exercise database for my [Fittbook Web Application](https://github.com/kpperez/Fittbook-Web-App) portfolio project. The database aims to detail the types of exercise equipment required and the specific muscle groups targeted by each workout. [BodyBuilding.com](https://www.bodybuilding.com/)  offers an extensive repository of workouts and workout routines accessible online for free.
<br>
<br>
I successfully gathered information on over 1,000 different exercises! &#x1F389;
<br>
<br>
One of the significant challenges encountered during this project was navigating through a delayed pop-up advertisement that obstructed the webdriver's view. Overcoming this obstacle provided a valuable learning experience in utilizing Selenium's "wait" conditions and iframe navigation techniques.

In [3]:
# Importing enviorments needed 
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
import time

# Targeted URL
url = 'https://www.bodybuilding.com/exercises/finder'

In [4]:
# Set user-agent
user_agent = "'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

# Options for the Chrome browser
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f"user-agent={user_agent}")
chrome_options.add_argument("--start-maximized")

# Initialize a Selenium webdriver
driver = webdriver.Chrome(options=chrome_options)  # You need to have ChromeDriver installed and in your PATH

# Open the page with Selenium
driver.get(url)

# Wait for the iframe to be present
iframe = WebDriverWait(driver, 2).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, "body > div.ab-iam-root.v3.ab-animate-in.ab-animate-out.ab-effect-html.ab-show > iframe")))

# Switch to the iframe to close pop-up
driver.switch_to.frame(iframe)

# Wait for the button to be visible
close_button = WebDriverWait(driver, 2).until(
    EC.visibility_of_element_located((By.XPATH, "/html/body/div/div/div[2]/button"))
)

# Click the button
close_button.click()
    
# Switch back to the default content after interacting with the iframe
driver.switch_to.default_content()

ad_cl = driver.find_element(By.CSS_SELECTOR, "#fs-slot-footer-wrapper > button")
ad_cl.click()

time.sleep(1)

cookie_reject = driver.find_element(By.ID, "onetrust-reject-all-handler")
cookie_reject.click()

driver.switch_to.default_content()

def load_bot():
    for i in range(18, 1000, 15):
           
        load_button = driver.find_element(By.XPATH, f'//*[@id="js-ex-category-body"]/div[2]/div[{i}]/button')
        
        # Create an ActionChains instance
        actions = ActionChains(driver)

        # Scroll to the view_more element
        actions.move_to_element(load_button).perform()
        
        time.sleep(3)
        
        # Wait for the element to be clickable
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, f'//*[@id="js-ex-category-body"]/div[2]/div[{i}]/button')))
        
        # Find and click the load_button
        load_button.click()
        
        time.sleep(3)
        
load_bot()

# Get the page source code after scrolling
page_source = driver.page_source

# Close webdriver
driver.quit()

In [None]:
# Lets make soup!
soup = BeautifulSoup(page_source, "html.parser")

# Use prettify() to get a nicely formatted version of the HTML
pretty_html = soup.prettify()

#print(pretty_html)

In [6]:
# Find data cards 
workout_cards = soup.find_all('div', class_="ExResult-cell ExResult-cell--nameEtc")

print('Number of listing datacards: ', len(workout_cards))
print('Sample Data Card: ')
print(workout_cards[0])

Number of listing datacards:  1005
Sample Data Card: 
<div class="ExResult-cell ExResult-cell--nameEtc">
<h3 class="ExHeading ExResult-resultsHeading">
<a href="/exercises/rickshaw-carry" itemprop="name">
                Rickshaw Carry
              </a>
</h3>
<div class="ExResult-details ExResult-muscleTargeted">
              Muscle Targeted:
              <a href="/exercises/muscle/forearms">
                Forearms
              </a>
</div>
<div class="ExResult-details ExResult-equipmentType">
              Equipment Type:
              <a href="/exercises/equipment/other">
                Other
              </a>
</div>
</div>


In [7]:
workout_list = []

for card in workout_cards:
    # Extract workout name
    name = card.find('a', itemprop='name').text.strip()

    # Extract muscle targeted
    muscle_targeted = card.find('div', class_='ExResult-muscleTargeted').text.strip().replace('Muscle Targeted:', '')

    # Extract equipment type
    equipment_type = card.find('div', class_='ExResult-equipmentType').text.strip().replace('Equipment Type:', '')

    # Create a dictionary to store the extracted information
    workout_data = {
        'name': name,
        'muscle_targeted': muscle_targeted,
        'equipment_type': equipment_type
    }

    # Append the dictionary to the workout list
    workout_list.append(workout_data)

df = pd.DataFrame(workout_list)

df = df.apply(lambda x: x.str.replace('\n', '') if x.dtype == 'O' else x)

In [8]:
df

Unnamed: 0,name,muscle_targeted,equipment_type
0,Rickshaw Carry,Forearms,Other
1,Single-Leg Press,Quadriceps,Machine
2,Landmine twist,Abdominals,Other
3,Weighted pull-up,Lats,Other
4,T-Bar Row with Handle,Middle Back,Other
...,...,...,...
1000,Hip Stretch With Twist,Hamstrings,Body Only
1001,Linear 3-Part Start Technique,Hamstrings,Body Only
1002,Chest Push (single response),Chest,Medicine Ball
1003,Jump lunge heel kick,Quadriceps,Body Only


In [9]:
# Save DataFrame to a CSV file
df.to_csv('workouts.csv', index=False)