# Particl Case Study: Web Scraping Tool

by Boston McClary

## Introduction

In this case study, I developed a web scraping tool in Python to scrape an ecommerce product page. The product page we will be scraping is [Gymshark Crest Sweatshirt](https://us.shop.gymshark.com/products/gymshark-crest-sweatshirt-persimmon-red-ss23).

## Objective

The objective of this case study is to develop a Python script that can extract relevant information from the product page, such as the product name, price, description, and available sizes.

## Tools and Libraries

To accomplish this task, I used the following tools and libraries:

- Python
- Beautiful Soup
- Requests
- ChatGPT 3.5 Turbo

## Steps

1. Inspect the product page for each element needed (name, description, price, etc.)
2. Put those elements into ChatGPT 3.5 asking it to structure the code properly for web scraping
3. Send a GET request to the product page URL
4. Parse the HTML content using Beautiful Soup
5. Extract the relevant information from the parsed HTML
6. Store the extracted information in a structured format of either JSON or CSV

## Additional Notes

I did end up finding the inventory quantity when digging into the website, but it looks like it's stored in a JavaScript package and therefore the BeautifulSoup code snippet below would not be able to scrape this data. I used ChatGPT to write a new code snippet just to extract this data and it's listed below.


## BeautifulSoup Script

In [3]:
import requests
from bs4 import BeautifulSoup
import json
import csv

URL = "https://us.shop.gymshark.com/products/gymshark-crest-sweatshirt-persimmon-red-ss23"
response = requests.get(URL)
soup = BeautifulSoup(response.content, "html.parser")

# Get the product title
title = soup.find('h1', class_='product-information_title__Wx52B').get_text()

# Get the product description
description = soup.find('div', class_='accordion_accordion-content__qZ83F', attrs={"data-locator-id": "pdp-accordionContent-DESCRIPTION-read"}).get_text().replace('\n', ' ')

# Get the product price
price = soup.find('span', class_='product-information_price__6g6xM', attrs={"data-locator-id": "pdp-totalValue-read"}).get_text()

# Get the available sizes
sizes_div = soup.find('div', class_='add-to-cart_sizes__qtfGR')
sizes = [button.get_text() for button in sizes_div.find_all('button', class_='size_size__zRXlq')]

# Uncomment this section to save the scraped data to a CSV file
# with open('product_data.csv', 'w', newline='') as csvfile:
#     writer = csv.writer(csvfile)
#     writer.writerow(['Product Title', 'Product Description', 'Price', 'Available Sizes'])
#     writer.writerow([title, description, price, ', '.join(sizes)])
# print("Data saved to product_data.csv")

# Uncomment this section to save the scraped data to a JSON file
data = {
    'Product Title': title,
    'Product Description': description,
    'Price': price,
    'Available Sizes': sizes
}

with open('product_data.json', 'w') as jsonfile:
    json.dump(data, jsonfile)

print(data)

{'Product Title': 'Crest Sweatshirt Persimmon Red', 'Product Description': 'REST DAY THE CREST WAY Consistently comfortable and casually stylish, you can wear Crest anywhere and pair it with anything.  • Durable embroidered logo that’ll last through every wear• Soft, brushed back fabric inside for full comfort SIZE & FIT• Slim fit• Model is 6\'0" and wears size M MATERIALS & CARE• 80% Cotton, 20% PolyesterSKU:\xa0A2A1V-RBMB', 'Price': '$36', 'Available Sizes': ['xs', 's', 'm', 'l', 'xl', 'xxl', '3xl']}


## Selenium Script

from selenium import webdriver
import json

URL = "https://us.shop.gymshark.com/products/gymshark-crest-sweatshirt-persimmon-red-ss23"

### Create a new instance of the browser driver (e.g., Chrome)
driver = webdriver.Chrome()

### Navigate to the page
driver.get(URL)

### Allow the page to fully load (you may need to adjust the waiting time)
driver.implicitly_wait(10)

### Execute JavaScript to get the dataLayer object
inventory_data = driver.execute_script("return window.dataLayer;")

### Extract the relevant inventory information
products = inventory_data[0]['ecommerce']['detail']['products']
for product in products:
    print("Product Name:", product['name'])
    for variant in product['variant']:
        print(f"Size: {variant['size']}, Inventory Quantity: {variant['inventoryQuantity']}")

### Close the browser window
driver.quit()

## Conclusion  
At the end of this case study, I made a functional web scraping tool that can extract information from the Gymshark Crest Sweatshirt product page in under **15 minutes** using ChatGPT 3.5 Turbo and BeautifulSoup. However, there is more data to be scraped that will require a different method of Selenium.