## Product Web Scraper - Hamish and Andy

During the Christmas holidays, my sister expressed her frustration about never being able to buy the "Lost touch with the common man" t-shirt produced by Australian radio duo Hamish and Andy as it was always out of stock. To help her catch when new stock became available, I decided to write the following Python script that could monitor its availability.

In [1]:
# Load libraries
import requests
import random
from bs4 import BeautifulSoup
import pandas as pd

I first needed to create a list of User Agents to simulate real user activity. This enabled the script to access the website without triggering the 403 Forbidden error which can be triggered when a site suspects it is being accessed by an automation rather than a real user. 

In [2]:
# create a list of user agents
userAgents = ['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Safari/605.1.1',
              'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2.1 Safari/605.1.1',
              'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.3 Safari/605.1.1',
              'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.3',
              'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.3']

# select a random user agent and access website
website = "https://hamishandandy.com/shop/"
html_data = requests.get(website, headers={'User-Agent': random.choice(userAgents)})

Using the developer tools available in any browser, I identified the HTML elements that define each product and their in-stock status. I then created a BeautifulSoup object to parse the HTML code and locate these elements, outputting a list of all items and their availability.

In [15]:
# Parse HTML Code
# create beautiful soup object
soup = BeautifulSoup(html_data.content, "html.parser")

shop_elements = soup.find_all("div", class_="c-product-tile__wrap")
#print(shop_elements)

# create lists to store product names and 'in stock' information
product_list = []
stock_list = []

# iterate through the shop elements extracting product names and checking if the soldout tag exists
# append this information to the relevant lists
for element in shop_elements:
    product_element = element.find_all("h5")
    stock_element = element.find_all("span", class_="soldout-tag")
    product_text = product_element[0].text
    product_list.append(product_text)
    stock_text = stock_element[0].text if stock_element else "available"
    stock_list.append(stock_text)

# create dataframe using list information
product_data = pd.DataFrame({"Product": product_list, "In Stock": stock_list})

# adjust formatting of the output
# align text to left
styled_df = product_data.style.set_properties(**{'text-align': 'left'}).set_table_styles([{
    'selector': 'th',
    'props': [('text-align', 'left')]
}])
# hide index
styled_df.hide(axis='index')


Product,In Stock
‘I will never mention Mr Ralph’ Grey Tee,available
‘Must Be Very Nice’ Baseball Style Hat,available
‘Must Be Very Nice’ Dad Style Hat,available
‘Must Be Very Nice’ Bucket Hat Yellow with Blue Embroidery,available
‘Must Be Very Nice’ Tee Glacier Blue with Yellow Print,available
Power Moves book Vol 2 (exciting low-cost edition),available
‘Must Be Nice’ Baseball Style Hat,sold out
‘Must Be Nice’ Dad Style Hat,available
Lost Touch Tee Navy with Gold Embroidery,available
In Touch Tee Navy with Tarnished Bronze Print,available


This script can be scheduled to automatically run each day using Crontab (for Mac).