<a href="https://colab.research.google.com/github/owenlee20/msdia-portfolio/blob/main/GB745_WebScraping_Lee_O.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###GB745 - Assignment 6 - Owen Lee
#####My tech company has recently received requests from newer, less experienced customers for a one-stop-spot for all information on the latest technology, especially with all the crazy new headlines in the tech world these days. To help them keep up, I began a tech brochure for compiling the best recommendations and information on what's new.
#####I curate this brochure, and it details the most exciting and cool tech gear for people that are interested in upgrading their devices. I focus specifically on laptops, and include information on all the latest, highest-rated laptops in my brochure.
#####To assist in curating my brochure information and making sure I am staying up to date, I wrote a python program that automates the scraping of a website that contains the information I need.
#####Whenever new laptops come out, I can stay up to date by running my program and re-printing the newest edition of my tech brochure.

In [None]:
# import appropriate libraries for scraping web data.
import requests
from bs4 import BeautifulSoup
import csv

In [None]:
# define the website to be scraped.
webpage = requests.get('https://webscraper.io/test-sites/e-commerce/allinone/computers/laptops')

In [None]:
# create a 'mysoup' object to scrape website information.
mysoup = BeautifulSoup(webpage.content, 'html.parser')

#####Specific pieces of html can be found by navigating to the website, hovering over a specific section, and right clicking to 'inspect'.

In [None]:
# find all instances of card thumbnails on the website, indicating different laptop selections.
laptops = mysoup.find_all("div", class_="card thumbnail")

In [None]:
# test a loop to ensure desired information is pulled and looped through correctly.
for laptop in laptops:

  price = laptop.find("h4", class_="price float-end card-title pull-right")

  print(price.text)

#####Using a loop and defining multiple variables allows me to gather important information on multiple aspects of a product. Using python for automation leaves endless possibilities for future enhancements.

In [None]:
# add more to loop and print results for additionally defined variables.
for laptop in laptops:

  product_name = laptop.find("a", class_="title")['title'].strip() # ['title'] used to access the title attribute in the title class.
  price = laptop.find("h4", class_="price float-end card-title pull-right").text.strip()
  description = laptop.find("p", class_="description card-text").text.strip()
  num_reviews = laptop.find("p", class_="review-count float-end").text.strip()

  # find the stars within the div with class "ratings"
  rating_div = laptop.find("div", class_="ratings")
  stars = rating_div.find_all("span", class_="ws-icon ws-icon-star")

  # the rating is represented by the number of stars
  rating = len(stars)  # length is the 'number' of stars

  print(product_name)
  print(price)
  print(description)
  print(num_reviews)
  print(rating)
  print()

#####Creating a .csv file is a useful way to get all of my desired information into Excel where I can organize and manipulate the data for my brochure. My customers appreciate when I display the highest quality products with the most positive reviews.

In [None]:
# create a csv file to write scraped information to.
with open('laptopsforsale.csv', mode = 'w', newline = '') as laptop_data_csv_file:

  writer = csv.writer(laptop_data_csv_file)
  writer.writerow(("product_name", "price", "description", "num_reviews", "rating")) # write a header row for csv file.

  for laptop in laptops:

      product_name = laptop.find("a", class_="title")['title'].strip()
      price = laptop.find("h4", class_="price float-end card-title pull-right").text.strip()
      description = laptop.find("p", class_="description card-text").text.strip()
      num_reviews = laptop.find("p", class_="review-count float-end").text.strip()

      rating_div = laptop.find("div", class_="ratings")
      stars = rating_div.find_all("span", class_="ws-icon ws-icon-star")
      rating = len(stars)

      writer.writerow((product_name, price, description, num_reviews, rating)) # write data for each laptop on the website.

#####The .csv file generated can be downloaded at any time (or whenever the website is updated with new tech) so that the information can be manipulated and displayed in the best way for customer experience.

In [None]:
from google.colab import files
files.download('laptopsforsale.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>