👘👗Fashion Store Crawler

I created an application that crawls the fashion store brand Reformation and extracts product information.

About the Project

A coding challenge was provided by Changing Room, a Honey extension for sustainable fashion that helps users track and reduce their environmental impact. The coding challenge features an application that crawls a fashion brand's website using a crawler framework (in this case Selenium), extracts information in a structured manner, and stores the data in a hosted database (AWS RDS).

Extracted information:

display_name (str)
product_material (str)
color (str)
size (list)
price (str)
product_url (str)
image_links (list)
brand_name (str)
description (str)
scraped_date (date)
category (str)

Process

Addressing Strategy to Scrape Entire Website & Automatic Updates

"By finding the appropriate xpath, '//div[@class="product-tile__quickadd"]/div/a', I was able to grab all the product links which enabled me to go through each product page and extract the necessary information. I think this strategy can be expanded to include all the critical pages of the site using Beautiful Soup.

"I understand that the site structure can change which can be tricky, so finding a unique ID/primary would be necessary to keep track of new, old, or not available products. This is something I would look into more. I know for Amazon products, there is a unique identifier (ASINS) that can be used to track the stage of the product.

"Lastly, with all the projected requests on the site, it is likely that an IP block will occur, so a proxy server would need to be purchased.

Information Extraction Example

 product_material_find = driver.find_elements(By.XPATH, '//div[@class="margin-b--15"]')
 product_materials_list = [y.get_attribute('innerHTML') for y in product_material_find]
 product_material = " ".join(list(map(lambda z: z.strip(), product_materials_list)))

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
AWS.JPG		AWS.JPG
LICENSE		LICENSE
README.md		README.md
Reformation Website Structure.png		Reformation Website Structure.png
main.py		main.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👘👗Fashion Store Crawler

About the Project

Process

Addressing Strategy to Scrape Entire Website & Automatic Updates

Information Extraction Example

Technologies

Screenshots of Process

Other Acknowledgements

License

About

Languages

License

franklinjtan/fashion-store-crawler

Folders and files

Latest commit

History

Repository files navigation

👘👗Fashion Store Crawler

About the Project

Process

Addressing Strategy to Scrape Entire Website & Automatic Updates

Information Extraction Example

Technologies

Screenshots of Process

Other Acknowledgements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages