# eBay Product Scraper with Automated Email Notifications

This script scrapes product information from a ebay and sends email notifications when the price falls below a certain threshold. It uses OAuth 2.0 for secure email sending via Gmail.

## Key Features

*   Web scraping using BeautifulSoup and requests.
*   Price cleaning using regular expressions.
*   Email notifications using Gmail API and OAuth 2.0.
*   Data storage in CSV format (implementation not shown in this snippet but can be easily added).
*   Robust error handling and logging.

## Dependencies

*   `beautifulsoup4`
*   `requests`
*   `google-api-python-client`
*   `google-auth-httplib2`
*   `google-auth-oauthlib`
*   `flask` (if using the web interface)

## Configuration

The following configurations should ideally be stored in a separate configuration file or environment variables for security and maintainability:

*   `client_secret.json`: OAuth 2.0 client secrets file (downloaded from Google Cloud Console).
*   `token.pickle`: Stores OAuth 2.0 tokens (generated during authorization).
*   `TARGET_URL`: URL of the product to scrape.
*   `PRICE_THRESHOLD`: The price below which an email notification is sent.
*   `SENDER_EMAIL`: Your Gmail address.
*   `RECIPIENT_EMAIL`: Recipient's email address.


## Import Libraries

This project utilizes a variety of Python libraries to achieve its functionality. Below is an explanation of the imported libraries:

- **`bs4` (BeautifulSoup)**: For parsing and extracting data from HTML and XML documents.
- **`requests`**: To make HTTP requests for web scraping and API calls.
- **`time`**: For handling delays and measuring time intervals.
- **`datetime`**: To work with date and time objects.
- **`csv`**: For reading from and writing to CSV files.
- **`re`**: Regular expressions for pattern matching and text processing.
- **`smtplib`**: For sending emails using the Simple Mail Transfer Protocol (SMTP).
- **`googleapiclient.discovery`**: To access Google APIs such as Gmail.
- **`google_auth_oauthlib.flow`**: For managing the OAuth 2.0 authentication flow.
- **`google.auth.transport.requests`**: For making authenticated HTTP requests.
- **`email.mime.text`**: To create email messages in MIME format.
- **`base64`**: To encode and decode data in Base64 format.
- **`pickle`**: For serializing and deserializing Python objects.
- **`os`**: To interact with the operating system (e.g., file paths).

These libraries collectively enable the project to perform tasks such as:
- Web scraping,
- Data manipulation,
- Email automation, and
- API integration.


In [3]:
# import libraries
from bs4 import BeautifulSoup
import requests
import time
import datetime
import csv
import re
import smtplib
from datetime import date
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from email.mime.text import MIMEText
import base64
import pickle
import os
import schedule



# Gmail API Integration for Notifications

This block integrates the Gmail API to send email notifications. Key highlights:

- **OAuth 2.0 Authentication**: Ensures secure access to Gmail using credentials (`client_secret.json`).
- **Key Functions**:
  - `create_message(sender, to, subject, message_text)`: Creates an email message.
  - `send_message(service, user_id, message)`: Sends the email message using Gmail API.
  - `send_mail(product_name, price, product_link)`: Automates email notifications when a product matches the desired price.

### Required Steps:
1. Download and set up the `client_secret.json` file from Google Cloud Console.
2. Authenticate with Google API to generate a token.
3. Specify the sender and receiver email addresses.


In [5]:
# Replace with the path to your downloaded credentials.json file
CREDENTIALS_FILE = r'client_secret.json'

# If modifying these scopes, delete the token.pickle file.
SCOPES = ['https://www.googleapis.com/auth/gmail.send']



def create_message(sender, to, subject, message_text):
    """Creates a MIMEText email message."""
    message = MIMEText(message_text)
    message['to'] = to
    message['from'] = sender
    message['subject'] = subject
    raw_message = {'raw': base64.urlsafe_b64encode(message.as_bytes()).decode()}
    return raw_message


def send_message(service, user_id, message):
    """Sends an email message.

    Args:
      service: The authorized Gmail API service instance.
      user_id: The user's email address.
      message: The email message to be sent.
    """
    try:
        message = (service.users().messages().send(userId=user_id, body=message)
                   .execute())
        print(f'Message Id: {message["id"]}')
        return message
    except Exception as error:
        print(f'An error occurred: {error}')
        return None


def send_mail(product_name,price,product_link):
    """Sends an email using OAuth 2.0 authentication."""

    # Get credentials from token or authorization flow
    creds = None
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                CREDENTIALS_FILE, SCOPES)
            creds = flow.run_local_server(port=0) # specify port if using localhost
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)
    # Build the Gmail API service object
    service = build('gmail', 'v1', credentials=creds)

    # Define email content
    sender = "mubashirahmed421@gmail.com"  # Replace with your Gmail address
    receiver = "mubashirshakeel312@gmail.com" # Replace with the receiver email
    subject = f"The {product_name} you want is below {price}! Now is your chance to buy!"
    body = f" This is the moment we have been waiting for. Now is your chance to pick up the {product_name} of your dreams. Don't mess it up! Link here: {product_link}"

    # Create message object
    message = create_message(sender, receiver, subject, body)

    # Send the email
    send_message(service, "me", message)





# Cleaning and Processing Price Data

Web-scraped prices often contain currency symbols, commas, or other non-numeric characters. 
This block provides a function to clean such data and extract valid price information.

### Function Details:
- **`clean_price(price_str)`**:
  - Strips out unnecessary characters.
  - Extracts valid numeric characters and decimal points.
  - Returns the price as a floating-point number.
  - Handles `None` or invalid inputs gracefully by returning `0`.

### Examples:
- **Input**: `$12.99` → **Output**: `12.99`
- **Input**: `1,234.56` → **Output**: `1234.56`


In [7]:
# Function to clean price
def clean_price(price_str):
    if price_str is None:
        return 0
    cleaned_price = re.sub(r'[^\d.]', '', price_str)
    try:
        return int(float(cleaned_price) * 100)  # Convert to integer cents
    except ValueError:
        return 0

# Example usage:
prices = ["$12.99", "£10", "€25.50", "Free", "1,234.56", "123 abc", None, ""]
cleaned_prices = [clean_price(price) for price in prices]
print(cleaned_prices)

[1299, 1000, 2550, 0, 123456, 12300, 0, 0]


## Function: `scrap_product_list`

### Purpose:
Scrapes product data from eBay based on the provided product name, shoe size, and desired price. Saves the data to a CSV file and sends an email if the total price meets the desired criteria.

### Parameters:
- **`product_name`**: The name of the product to search for.
- **`shoe_size`**: The desired shoe size for filtering the results.
- **`desired_price`**: The maximum acceptable price (including delivery charges).
- **`relevency_filter`**: *(Optional)* A filter for sorting results by relevance. Default is `15`.

### Key Steps:
1. Formats the product name to create a valid search query.
2. Creates a CSV file (`product.csv`) if it doesn’t already exist, with the following columns:
   - Product Name
   - Price
   - Delivery Charges
   - Product Link
   - Image
   - Date
3. Sends an HTTP request to eBay and parses the search results using BeautifulSoup.
4. Extracts details for each product, such as:
   - Name
   - Price
   - Delivery cost
   - Product link
   - Image URL
5. Cleans the price and delivery charges.
6. Sends an email notification if the total price (price + delivery) is within the desired price range.
7. Appends the data to the `product.csv` file.

### Output:
- Scraped product details saved to `product.csv`.
- Email notifications sent when conditions are met.

### Example:
```python
scrap_product_list("Nike Shoes", "10", 100)


In [9]:
# Function to scrap data
def scrap_product_list(product_name, shoe_size, desired_price, relevency_filter=15):
    product = re.sub(r"\s+", "+", product_name)
    csv_file = 'product.csv'
    
    # Ensure CSV has header row
    if not os.path.exists(csv_file):
        with open(csv_file, 'w', newline='', encoding='utf-8') as f:
            writer = csv.writer(f)
            writer.writerow(["Product Name", "Price", "Delivery Charges", "Product Link", "Image", "Date"])
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
    }
    main_url = f'https://www.ebay.com/sch/i.html?_from=R50&_nkw={product}&_sacat=0&_dcat=15709&_sop={relevency_filter}&_ipg=120&US%2520Shoe%2520Size={shoe_size}&_pgn=1'
    print(f"Scraping URL: {main_url}")
    page = requests.get(main_url, headers=headers)
    soup1 = BeautifulSoup(page.content, 'html.parser')
    soup2 = BeautifulSoup(soup1.prettify(), 'html.parser')
    container = soup2.find('ul', class_='srp-results')
    if not container:
        print("No results found.")
        return

    cards = container.find_all('li', class_='s-item__pl-on-bottom')

    for card in cards:
        product_link = card.find('a').get('href')
        image = card.find('img').get('src')
        name = card.find('div', class_='s-item__title').text.strip()
        price = card.find('span', class_='s-item__price').text.strip()
        delivery = card.find('span', class_='s-item__logisticsCost').text.strip() if card.find('span', class_='s-item__logisticsCost') else "0"
        today = date.today()

        cleaned_price = clean_price(price)
        cleaned_delivery = clean_price(delivery)
        total_price = cleaned_delivery + cleaned_price

        # Send email if total price is within desired range
        if total_price <= int(desired_price) * 100 and product_name.lower() in name.lower():  # Convert desired_price to cents for comparison
            send_mail(name, f"${cleaned_price / 100:.2f}", product_link)

        # Write data to CSV
        with open(csv_file, 'a', newline='', encoding='utf-8') as csvfile:
            writer = csv.writer(csvfile)
            writer.writerow([name, f"${cleaned_price / 100:.2f}", f"${cleaned_delivery / 100:.2f}", product_link, image, today])







## Function: `daily_scraping_job`

### Purpose:

Runs the `scrap_product_list` function daily to scrape product data and save it.

### Parameters:

*   `product_name` (string): The name of the product to scrape daily.
*   `shoe_size` (string): The desired shoe size for filtering the results.
*   `desired_price` (number): The maximum acceptable price (including delivery charges).

### Usage:

This function is used in combination with the `schedule_scraping` function to run the scraping process at a scheduled time.

### Example:

```python
daily_scraping_job("Nike Shoes", "10", 100)

In [11]:
# Function to run daily
def daily_scraping_job(product_name, shoe_size, desired_price):
    scrap_product_list(product_name, shoe_size, desired_price)

## Function: `schedule_scraping`

### Purpose:

Schedules the `daily_scraping_job` function to run at a specified time every day.

### Parameters:

*   `product_name` (string): The name of the product to scrape.
*   `shoe_size` (string): The desired shoe size for filtering the results.
*   `desired_price` (number): The maximum acceptable price (including delivery charges).
*   `time_of_day` (string): The time of day when the scraping job should run (in `HH:MM` format, 24-hour clock).

### How It Works:

*   Uses the `schedule` library to set up a recurring daily task.
*   Automatically executes the `daily_scraping_job` function at the specified time.

### Example:

```python
schedule_scraping("Nike Shoes", "10", 100, "08:00")


In [13]:
# Schedule the job dynamically
def schedule_scraping(product_name, shoe_size, desired_price, time_of_day):
    schedule.every().day.at(time_of_day).do(daily_scraping_job, product_name, shoe_size, desired_price)

## Main Execution Block

### Purpose:
This is the entry point of the program, where user inputs are taken, and the scraping job is scheduled to run daily at a specified time.

### Key Steps:
1. **User Inputs**:
   - Prompts the user for:
     - **Product Name**: The item to search for.
     - **Shoe Size**: Desired shoe size for filtering results.
     - **Desired Price**: The maximum price the user is willing to pay (in USD).
     - **Time of Day**: The daily schedule time in `HH:MM` format (24-hour clock).

2. **Job Scheduling**:
   - Calls the `schedule_scraping` function to schedule the daily scraping task with the user's inputs.

3. **Scheduler Execution**:
   - Keeps the scheduler running continuously to ensure the scraping task is executed at the specified time.

4. **Notifications**:
   - Displays messages to confirm successful scheduling and informs the user that the scheduler is running.

### Example Workflow:
1. The user enters:
   - Product Name: `Nike Shoes`
   - Shoe Size: `10`
   - Desired Price: `100`
   - Time of Day: `08:00`
2. The program schedules the job and prints:

### Notes:
- To stop the scheduler, press **Ctrl+C**.



In [None]:
if __name__ == "__main__":
    # Example variables (replace with user inputs during presentation)
    product_name = input("Enter the product name: ")
    shoe_size = input("Enter the shoe size: ")
    desired_price = input("Enter your desired price (USD): ")
    time_of_day = input("Enter the time to run the job daily (HH:MM, 24-hour format): ")

    # Schedule the scraping job
    schedule_scraping(product_name, shoe_size, desired_price, time_of_day)
    print(f"Scheduled scraping for '{product_name}' at {time_of_day} daily.")

    # Keep the scheduler running
    print("Scheduler is running. Press Ctrl+C to stop.")
    while True:
        schedule.run_pending()
        time.sleep(1)

Enter the product name:  Air Jordan XX3
Enter the shoe size:  9
Enter your desired price (USD):  150
Enter the time to run the job daily (HH:MM, 24-hour format):  05:59


Scheduled scraping for 'Air Jordan XX3' at 05:59 daily.
Scheduler is running. Press Ctrl+C to stop.
Scraping URL: https://www.ebay.com/sch/i.html?_from=R50&_nkw=Air+Jordan+XX3&_sacat=0&_dcat=15709&_sop=15&_ipg=120&US%2520Shoe%2520Size=9&_pgn=1
Message Id: 1941a395d7524a75


### Script Output

The script generates two main outputs:

1. **CSV File**:  
   The scraped data is saved in a CSV file (`product.csv`). The CSV file contains the following columns:  
   - **Product Name**  
   - **Price**  
   - **Delivery Charges**  
   - **Product Link**  
   - **Image**  
   - **Date**  

   Below is a screenshot of the generated CSV file:

   ![CSV Output](csv_output.jpg)

---

2. **Email Notification**:  
   When a product's total price (price + delivery charges) is within the desired range, the script sends an email notification. The email contains:  
   - **Product Name**  
   - **Price**  
   - **Product Link**  

   Below is a screenshot of an email sent by the script:

   ![Email Output](email_output.jpg)


## Notes

The script is designed for demonstration purposes and may require adjustments for production use. Ensure compliance with eBay’s and Gmail’s terms of service when using this script.

## Future Enhancements

*   Support for multiple product searches in a single run.
*   Integration with other email services (e.g., Outlook, Yahoo).
*   Improved error handling and logging mechanisms.
*   Better Structured email.
*   Support for scraping multiple e-commerce websites.

## Acknowledgments

This project utilizes open-source libraries and the Gmail API to demonstrate web scraping and automated notifications. Special thanks to the Python and developer community for their resources and support.