# Web Scraping Amazon Product Details

This project is designed to scrape product details from an Amazon product page, specifically the product title and price, and format this information for display. Additionally, the project sets up the foundation for sending an email notification if the product price drops.


## Import libraries

Begin by importing the necessary libraries for web scraping and sending emails.

In [None]:
# import libraries

from bs4 import BeautifulSoup
import requests
import smtplib # for sending emails to yourself
import time
import datetime

### Setup

- we establish a connection to the Amazon product page using the `requests` library
- We then use `BeautifulSoup` to parse the HTML content.
- We extract the product title and price using the BeautifulSoup object
- Finally, we print the product title and price in a formatted way
- This will be the main part of our function 

In [None]:
# Connect to Website 

url = 'https://www.amazon.com/Funny-Data-Systems-Business-Analyst/dp/B07FNW9FGJ/ref=sr_1_3?dchild=1&keywords=data%2Banalyst%2Btshirt&qid=1626655184&sr=8-3&customId=B0752XJYNL&th=1'

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}

page = requests.get(url, headers = headers)

soup1 = BeautifulSoup(page.content, 'html.parser')

soup2 = BeautifulSoup(soup1.prettify(), 'html.parser')

title = soup2.find(id = 'productTitle').get_text(strip = True)

price = soup2.find(class_ = 'a-price-whole').get_text(strip=True)

dollar_sign = soup2.find(class_ = 'a-price-symbol').get_text(strip=True)

a_price = soup2.find(class_ = 'a-price-fraction').get_text(strip=True)  # strip=true, removes all the unecessary spaces 


print(title) 
formated_wind = (dollar_sign + price + a_price).center(45)  # moving the text on the right direction
print(formated_wind)



In [None]:
price = formated_wind.strip()[1:] # we want everything after the $
title = title.strip()
print(price)
print(title)

## Create the CSV file

we create a CSV file to store the product details (title, price, and date) collected from the Amazon page. This CSV file will be used to log the data for future reference and analysis.

In [None]:
# creating the csv file
# run this once just for creating the csv file
import csv

# add one extra column for datetime
import datetime

today = datetime.date.today()

print(today)

header = ['Title','Price','Date']
data = [title,price,today]

with open('AmazonWebScraperDataset.csv','w', newline = '', encoding = 'UTF8') as f: # name of file, write mode, no newline characters will be added
                                                                                    # ex. \n 
                                                                                    # write mode, it will overwrite the eixting file or if it doesnt exist
                                                                                    # it will create a new one
    writer = csv.writer(f)
    writer.writerow(header)
    writer.writerow(data)

In [None]:
# last step which to append every time a new row to our file

with open('AmazonWebScraperDataset.csv','a+', newline = '', encoding = 'UTF8') as f:

    writer = csv.writer(f)
    writer.writerow(data)

## Importing Data with Pandas

We use the Pandas library to import the data from the CSV file into a DataFrame and display its contents.


In [None]:
import pandas as pd

df = pd.read_csv(r"C:\Users\User\AmazonWebScraperDataset.csv")

print(df)

## Defining the `check_price` Function

This function consolidates all the previous steps into a single process to check the product price, update the CSV file, and notify if the price falls below a specified threshold.



In [None]:
# function so that we can be able to check the prize
# combines all the previous steps

def check_price():
    url = 'https://www.amazon.com/Funny-Data-Systems-Business-Analyst/dp/B07FNW9FGJ/ref=sr_1_3?dchild=1&keywords=data%2Banalyst%2Btshirt&qid=1626655184&sr=8-3&customId=B0752XJYNL&th=1'

    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}

    page = requests.get(url, headers = headers)

    soup1 = BeautifulSoup(page.content, 'html.parser')

    soup2 = BeautifulSoup(soup1.prettify(), 'html.parser')

    title = soup2.find(id = 'productTitle').get_text()

    price = soup2.find(class_ = 'a-price-whole').get_text(strip=True)

    dollar_sign = soup2.find(class_ = 'a-price-symbol').get_text(strip=True)

    a_price = soup2.find(class_ = 'a-price-fraction').get_text(strip=True)  # strip=true, removes all the unecessary spaces 

    price = formated_wind.strip()[1:] # its taking everything after the $
    title = title.strip()

    import datetime

    today = datetime.date.today()

    import csv

    header = ['Title','Price','Date']
    data = [title,price,today]

    with open('AmazonWebScraperDataset.csv','a+', newline = '', encoding = 'UTF8') as f:
        writer = csv.writer(f)
        writer.writerow(data)

    # we set a price which the program will let us be notified when it falls under this value
    if (price < 14):
        send_mail()
    

## Setting Up the Program Execution Timer

We implement a timer to repeatedly execute the `check_price()` function at regular intervals. This allows the program to continuously monitor the product price and take action if the price falls below the specified threshold.

In [None]:
# set program execution timer 
while(True):
    check_price()
    time.sleep(5)

In [None]:
import pandas as pd

df = pd.read_csv(r"C:\Users\User\AmazonWebScraperDataset.csv")

print(df)

## Sending Email Notifications

In this section, we define the `send_mail()` function to send an email notification when the product price falls below a specified threshold. This feature can be used to alert you to take action, such as purchasing the product. For that reason we use `smtplib` library

In [None]:
# If uou want to try sending yourself an email (just for fun) when a price hits below a certain level you can try it
# out with this script

def send_mail():
    server = smtplib.SMTP_SSL('smtp.gmail.com',465) # SMTP connection to Gmail's SMTP server using SSL encryption on port 465
    server.ehlo() # identifies the client to the SMTP server
    #server.starttls()
    #server.ehlo()
    server.login('kyris1---@gmail.com','xxxxxxxxxxxxxx')
    
    subject = "The Shirt you want is below $15! Now is your chance to buy!"
    body = "Kyriakos, This is the moment we have been waiting for. Now is your chance to pick up the shirt of your dreams. Don't mess it up! Link here: https://www.amazon.com/Funny-Data-Systems-Business-Analyst/dp/B07FNW9FGJ/ref=sr_1_3?dchild=1&keywords=data+analyst+tshirt&qid=1626655184&sr=8-3"
   
    msg = f"Subject: {subject}\n\n{body}"  # f'' it allows us to embed expressions s inside string literals, using curly braces {}
    
    server.sendmail(
        'kyris---@gmail.com', # sender 
        'kyris---@gmail.com', # recipient
        msg
     
    )