# Automated Web Scraper: Live Bitcoin Price Tracker (Beginner Project)

This project was built as part of my Python learning process.  
I used Selenium, BeautifulSoup, and regular expressions to scrape the live Bitcoin price from CoinMarketCap.  

The goal of this project was to:
- Scrape the live Bitcoin price every 7 seconds for 35 seconds (5 scrapes total)
- Store the results in a CSV file for later analysis
- Track and display the total price change across the scraping period

Basic error handling is included to skip failed scrapes and ensure the script completes cleanly even if the page fails to load or the price is unavailable.

---

## Project Overview

The purpose of this project was to:
- Load the CoinMarketCap Bitcoin page using Selenium
- Wait for the page to fully render dynamic content
- Parse the page HTML using BeautifulSoup
- Extract the price using a regular expression to handle number formatting variations
- Save each valid scrape (timestamp, price) to a CSV file
- Track valid prices across all scrapes to report total price change

---

## Libraries and Imports


In [1]:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from datetime import datetime
import time
import pandas as pd
import os

## Methodology

Approach:

- Use Selenium to load the CoinMarketCap Bitcoin page
- Wait for dynamic content to load fully
- Parse the HTML content using BeautifulSoup
- Locate and extract:
    - The coin name (`<span>` with string 'Bitcoin')
    - The price (`<span>` with `data-test="text-cdp-price-display"` attribute)
- Use a regular expression to safely extract the numeric portion of the price
- Store each valid scrape in a CSV file
- Append valid prices to an in-memory list for tracking
- Repeat the scrape process 5 times (one scrape every 7 seconds)
- After all scrapes:
    - Calculate and display the total price difference between the first and last valid scrapes
    - Gracefully handle missing prices to avoid crashes

## Final Output

In [2]:
# It makes a lot of sense for this to be put all of this within a function
def automated_crypto_pull():
    # Path to your chromedriver
    service = Service(r"C:\Users\jrwie\OneDrive\Desktop\Data Stuffs\Python_Things\chromedriver-win64\chromedriver-win64\chromedriver.exe")
    driver = webdriver.Chrome(service=service)
    
    # Step 1: Go to the live Bitcoin price page
    driver.get("https://coinmarketcap.com/currencies/bitcoin/")
    time.sleep(5)  # Wait for JavaScript to finish loading
    
    # Step 2: Parse the rendered HTML
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    
    # Step 3: Extract the coin name and price
    crypto_name = soup.find('span', string='Bitcoin').text
    price = soup.find('span', attrs={'data-test': 'text-cdp-price-display'}).text
    
    # Use regex to safely extract price
    import re
    price_match = re.search(r'[\d,.]+', price)
    if price_match:
        final_price_str = price_match.group().replace(',', '')
        formatted_price = f"${float(final_price_str):,.2f}"
        returned_price = float(final_price_str)
    else:
        final_price_str = 'N/A'
        formatted_price = 'N/A'
        returned_price = None
    
    # Step 4: Store in DataFrame
    date_time = datetime.now()
    formatted_time = date_time.strftime("%Y-%m-%d %H:%M:%S")
    
    crypto_dict = {
        'Crypto Name': [crypto_name],
        'Price': [final_price_str],
        'TimeStamp': [date_time]
    }
    df = pd.DataFrame(crypto_dict)

    # Step 5: Clean up
    driver.quit()
    
    # Step 6: Write to CSV
    output_path = r"C:\Users\jrwie\OneDrive\Desktop\Data Stuffs\Python_Things\Crypto_Web_Puller\Crypto_Automated_Pull.csv"
    columns = ['Crypto Name', 'Price', 'TimeStamp']
    
    if os.path.exists(output_path):
        df.to_csv(output_path, mode='a', header=False, index=False, columns=columns)
    else:
        df.to_csv(output_path, index=False, columns=columns)
    
    # Final print
    print(f"[{formatted_time}] Scraped {crypto_name} | Price: {formatted_price}")
    
    # Return the scraped price (float or None)
    return returned_price

# Now that the function is defined, we can do a for loop
# Define empty list to track prices
scraped_prices = []

# Run the scraper 5 times
for i in range(5):
    print(f"Scrape {i+1} of 5")
    latest_price = automated_crypto_pull()  # Capture return value!
    if latest_price is not None:
        scraped_prices.append(latest_price)
    else:
        print("Skipping this scrape due to missing price.")
    time.sleep(7)

# After loop, calculate difference if possible
if len(scraped_prices) >= 2:
    price_difference = scraped_prices[-1] - scraped_prices[0]
    formatted_difference = f"${price_difference:,.2f}"
    print("\n====== FINAL REPORT ======")
    print(f"Total Scraped Bitcoin Difference: {formatted_difference}")
else:
    print("\n====== FINAL REPORT ======")
    print("Not enough valid scrapes to compute difference.")

Scrape 1 of 5
[2025-06-09 19:19:01] Scraped Bitcoin | Price: $110,020.88
Scrape 2 of 5
[2025-06-09 19:19:18] Scraped Bitcoin | Price: $110,055.11
Scrape 3 of 5
[2025-06-09 19:19:34] Scraped Bitcoin | Price: $110,049.44
Scrape 4 of 5
[2025-06-09 19:19:52] Scraped Bitcoin | Price: $110,043.73
Scrape 5 of 5
[2025-06-09 19:20:09] Scraped Bitcoin | Price: $110,026.61

Total Scraped Bitcoin Difference: $5.73


## Next Steps

Potential improvements:
- Add more robust error handling and logging
- Handle dynamic page structure changes more gracefully (e.g., changes to class names or page layout)
- Automatically adjust scrape frequency based on actual page load times
- Extend the scraper to run over longer periods and analyze longer-term trends
- Visualize the price trend within the notebook