# Price Audits - Fastenal 

Project scope -  Create an app that scrapes product information from the Fastenal website. That information will then be used to compare wholesale price, Materion's discounted price, and invoice price to identify variabilities and potential discrepancies. 

## Import all dependencies and start selenium 

We are using selenium and BeautifulSoup for the scraping process

In [1]:
#Finding the location of chromedriver.exe
#https://splinter.readthedocs.io/en/latest/drivers/chrome.html
!which chromedriver

/usr/local/bin/chromedriver


In [2]:
# Import your newly installed selenium package
from selenium import webdriver
from splinter import Browser
from selenium.webdriver.common.keys import Keys
import time
time.sleep(3)
from splinter.exceptions import ElementDoesNotExist
from bs4 import BeautifulSoup
import pandas as pd


# Now create an 'instance' of your driver
WebDriver = {'executable_path': 'chromedriver'}

# A new Chrome (or other browser) window should open up
browser = Browser('chrome', **WebDriver, headless=False)

# Defining width and height of the browser
browser.driver.set_window_size(1750, 1250)

#For Mac users - Opening the targer url
#driver.get("https://www.fastenal.com/product/abrasives/coated-and-non-woven-abrasives/fiber-and-sanding-discs/609478?categoryId=609478&level=3&isExpanded=true&productFamilyId=26373")

#For Windows users - Opening the target url
url ='https://www.fastenal.com/product/abrasives/coated-and-non-woven-abrasives/fiber-and-sanding-discs/609478?categoryId=609478&level=3&isExpanded=true&productFamilyId=26373&view=1'
browser.visit(url)

## Start the browser and making the soup 

In [3]:
#activating soup
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

## Initialize searches 

### First we need to create the searches for each piece of information we would like to extract: SKU number, item description, wholesale price, and manufacturer  

#### SKU Number 

Inspecting the website, we found that the SKU number information is in a div inside the class 'media-item-row', and it is consistent across all the pages, so we can create a function that gives us access to that div initially 

In [4]:
#access the sku information in the HTML to create the sku list
sku_list = soup.findAll('div',class_='media-item-row')

#Display the search results
#sku_list

Now that we were able to access the first record, we can use that code in a nested for loop to extract all of the sku numbers from all pages

In [5]:
#Creating a loop to extract the description data within the div for page 1 only. 
#This code will be used later in the nested loop

#SKU information
#sku=[]
for view in sku_list:
        #Accessing the div that contains the sku information
        sku3=view('a')
        #Creating a nested loop to extract the specific sku number
        for k in sku3:
            #Accessing the sku information in the anchor tag and stripping spaces
            sku4=sku3[0].text.strip()
            #Appending the sku information to the empty list
            #sku.append(sku4)
            #Print the results
            print(sku4)
#sku

0826359


0826362


0812730



0812777



0826357


0812732



0802678


0812762



0200175





0812767



0812733



0812738



0826359


0826362


0812730



0812777



0826357


0812732



0802678


0812762



0200175



0812767



0812733



0812738





#### Item description 

The item description is not in the same div, so inspecting the website, we found that the the description is inside a div under the class div 'gridview__prd--desc', and it is consistent across all the pages, so we can create a function that gives us access to that div initially

In [6]:
#access the information in the HTML to create the description list
grid_view = soup.findAll('div',class_='gridview__prd--desc')
#Display the search results and accessing the first record
#grid_view[0]

After accessing the first record, we can now try the code in a for loop to extract all of the descriptions numbers from page 1

In [7]:
#Creating a loop to extract the description data within the div for page 1 only. 
#This code will be used later in the nested loop

#Product description
description= []
for view in grid_view:
        #Accessing the description information in the div
        title = view.text
        #Appending the description information to the empty list
        description.append(title)
        #Print the results
        #print(title)
#description
df_desc = pd.DataFrame(description,columns = ['sku'])
df_desc.head()

Unnamed: 0,sku
0,"5"" x 7/8"" 36+ 982C 3M CUBITRON II Fiber Disc"
1,"7"" x 7/8"" 36+ 982C 3M CUBITRON II Fiber Disc"
2,"4-1/2"" x 7/8"" Brown 36 Grit Aluminum Oxide Bla..."
3,"4-1/2"" x 7/8"" Red/Pink 36 Grit Ceramic Blackst..."
4,"4-1/2"" x 7/8"" 36+ 982C 3M CUBITRON II Fiber Disc"


#### Price 

Price information is not in any of the previous divs, so inspecting the website, we found that the the wholesale price is inside a div under the class div 'color--blue margin-bottom--5', and it is consistent across all the pages, so we can create a function that gives us access to that div initially

In [8]:
#access the information in the HTML to create the wholesale price list
wholesale_cal = soup.findAll('div',class_='color--blue margin-bottom--5')

#Display the search results and accessing the first record
#wholesale_cal

After accessing the first record, we can now try the code in a for loop to extract all of the descriptions numbers from page 1

In [9]:
#Creating a loop to extract the wholesale price data within the div for page 1 only. 
#This code will be used later in the nested loop

#Price
wholesale_price=[]
for sale in wholesale_cal:
        #Accessing the price information in the div
        price = sale('span')
        #Creating a nested loop to extract the specific wholesale price
        for p in price:
            #Accessing the sku information in the anchor tag and stripping spaces
            price2=price[1].text.strip().split('\n')[0].strip("/")
            #Appending the price information to the empty list
            wholesale_price.append(price2)
            #Print the results
            #print(price2)

df_whole_price = pd.DataFrame(wholesale_price,columns = ['wholesale_price'])
df_whole_price.head()

Unnamed: 0,wholesale_price
0,$1.58
1,$1.58
2,$4.92
3,$4.92
4,$1.19


Online price is not in the previous divs, so inspecting the website, we found that the the wholesale price is inside a div under the class div 'color-highlight margin-bottom--5', and it is consistent across all the pages, so we can create a function that gives us access to that div initially

In [10]:
#access the information in the HTML to create the online price list
online_cal = soup.findAll('div',class_='color-highlight margin-bottom--5')
#wholesale_cal = soup.findAll('span', class_="")

#Display the search results and accessing the first record
#online_cal

In [11]:
#Creating a loop to extract the online price data within the div for page 1 only. 
#This code will be used later in the nested loop

on_price=[]
for online in online_cal:
        #Accessing the price information in the div
        online_price = online('span')
        #Creating a nested loop to extract the specific online price
        for o in online_price:
            #Accessing the sku information in the anchor tag and stripping spaces
            online_price2=online_price[1].text.strip().split('\n')[0].strip("/")
            #Appending the price information to the empty list
            on_price.append(price2)
            #Print the results
            #print(online_price2)
df_on_price = pd.DataFrame(on_price,columns = ['online_price'])
df_on_price.head()

Unnamed: 0,online_price
0,$1.82
1,$1.82
2,$1.82
3,$1.82
4,$1.82


#### Manufacturer 

Manufacturer information is in the same div and under the same class as the sku information, so we will use the same code to access the first record to extract the information 

In [12]:
#access the information in the HTML to create the manufacturers list
manufacture_cal = soup.findAll('div',class_='media-item-row')

#Display the search results and accessing the first record
manufacture_cal
#manu_2=manufacture_cal[2].text
#manu_2

[<div class="media-item-row">
 <span class="txtweight-medium">
                                             
                                                 
                                                     Fastenal Part No. (SKU)
                                                 
                                                 
                                             
                                         </span>
 <a href="/products/details/0826359;jsessionid=MKZdi2lYYpF1bNbRYBYJOCdf.713728dc-d653-355a-a89c-3e45dac6e990">
                                                 
                                                     
                                                         0826359
                                                     
                                                     
                                                 
                                             </a>
 </div>,
 <div class="media-item-row">5" x 7/8" 36+ 982C 3M CUBITRON II Fiber Dis

Following the same process, after accessing the first record, we can now try the code in a for loop to extract all of the descriptions numbers from page 1

In [13]:
#Creating a loop to extract the manufacturers data within the div for page 1 only. 
#This code will be used later in the nested loop
for manu in manufacture_cal:
        #Accessing the manufacturers information in the div
        detail = manu.text.strip().split('\n')
        #Creating a nested loop to extract the manufacturer
        #for d in detail:
            #Accessing the sku information in the anchor tag and stripping spaces
            #detail2=detail.text
            #Print the results
        print(detail)
        
#sku2 = sku_list[0].text.strip().split('\n')[-1].strip()

['Fastenal Part No. (SKU)', '                                                ', '                                                ', '                                            ', '                                        ', '', '                                                ', '                                                    ', '                                                        0826359']
['5" x 7/8" 36+ 982C 3M CUBITRON II Fiber Disc']
['3M PRODUCTS']
['×', '', 'Compliance', '', '', '', '', '', '', '', '', 'Locker', '', '', 'This Product has been approved for Locker Pickup service, but final compliance will be subject to weight and size limits for the quantity ordered', '                                    ', '                                ', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'Vendible Item', '', '', 'The item is tested and deemed vendible in the following vending machines as the EA and/or Package Quantity.']
['Fastenal Part No. (SKU)', '        

### Master loop 

Once all the searches have been created, we consolidate everything into a master loop that will ultimately scrape and generate all the combined results at once

In [None]:
#Creating all the empty lists that will hold all the search results extracted from the for loop
#SKU information
sku=[]
#Product description
description= []
#Price
wholesale_price=[]
#Manufacturer
manufacturer=[]

#Initiating the for loop and setting the page range
for i in range(2,5):
    
    #Opening the browser
    html = browser.html
    #Making the soup
    soup = BeautifulSoup(html, 'html.parser')
    #Accessing the divs that contain the data we are looking for
    grid_view = soup.findAll('div',class_='gridview__prd--desc')
    
    #Creating a nested loop to extract the data within the divs
    for view in grid_view:
        #Accessing the sku information in the div
        sku_list=sku_list[0].text.strip().split('\n')[-1].strip()
        #Accessing the description information in the div
        description_list = view.text
        #Accessing the price information in the div
        wholesale_price_list=
        #Accessing the manufacturer information in the div
        manufacturer_list=
        
        #Appending the sku information to the empty list
        sku.append(sku_list)
        #Appending the description information to the empty list
        description.append(description_list)
        #Appending the price information to the empty list
        wholesale_price.append(wholesale_price_list)
        #Appending the manufacturer information to the empty list
        manufacturer.append(manufacturer_list)
        
        #Setting a try function to iterate over all the pages
        try:
            #Initial url
            link = f"/product/abrasives/coated-and-non-woven-abrasives/fiber-and-sanding-discs/609478?categoryId=609478&level=3&isExpanded=true&productFamilyId=26373&page={i}&pageSize=12&exactSkuMatchLevel=useData&view=1"
            #Printing the url found
            print(link)
            #Set a click by href action to go to the next page
            browser.click_link_by_href(link)
            #Print the page number
            print(i)

        #Setting an except function to stop when there is not other url to click on
        except ElementDoesNotExist:
            #Print an exit message
            print("Scraping Complete")
            #Breaking the loop when there is no other url to click on
            break
   