# Scraping Data from eBay for Business Intelligence

### Background:
ABC Analytics, is a business intelligence and data analytics firm specializing in providing valuable insights to e-commerce businesses. One of our clients is interested in gaining a competitive edge by analyzing pricing trends, product popularity, and customer behavior on eBay. To achieve this, we'll need to scrape data from eBay to feed into their business intelligence tools.

### Objective:
Our goal is to scrape data from eBay to gather valuable information for our client's business intelligence analysis, including pricing trends, product listings, and seller performance metrics.

#### Import all libraries required

In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
import seaborn as sns
from prettytable import PrettyTable

#### Extract the HTML content

In [2]:
url_pattern = 'https://www.ebay.com/sch/267/i.html?_from=R40&_nkw=Business+Intelligence+&rt=nc'

In [3]:
# create an empty list to store the scraped data
product_data = []

# iterate over the page numbers
for page_num in range(1, 11):   # scrape data from page 1 to 10
    # create the url for the current page
    url = url_pattern.format(page_num=page_num)

In [4]:
#send a GET request to the URL and extract the HTML content

response = requests.get(url)
content = response.content

In [5]:
#Use Beautiful Soup to parse the HTML content

soup = BeautifulSoup(content, 'html.parser')

#### Extract the product information needed

In [6]:
items = soup.find_all('div', {'class': 's-item__wrapper clearfix'})
    

for item in items:
    title = item.find('div', {'class': 's-item__title'}).text.strip()
    
    price_sold = float(item.find('span', {'class': 's-item__price'}).text.replace('$','').replace(',','').strip())
    shipping_cost = item.find('span', {'class': 's-item__shipping s-item__logisticsCost'})
    if shipping_cost:
        shipping_cost = shipping_cost.text.replace('+','').replace('$','').replace(',','').strip()
    else:
        shipping_cost = 0.0
    item_location = item.find('span', {'class': 's-item__location s-item__itemLocation'})
    if item_location:
        item_location = item_location.text.replace('from','').strip()
    else:
        item_location = ''
    item_seller = item.find('span', {'class':'s-item__seller-info'})
    if item_seller:
        item_seller = item_seller.text.strip()
    else:
        item_seller = ''
    link = item.find('a', {'class': 's-item__link'})['href']
    product_data.append([title, price_sold, shipping_cost, item_location, item_seller, link])
    
book_data = pd.DataFrame(product_data, columns=['Title', 'Price_sold', 'Shipping_cost', 'Item_location','Item_seller', 'Link'])

#### View data

In [7]:
book_data.head()

Unnamed: 0,Title,Price_sold,Shipping_cost,Item_location,Item_seller,Link
0,Shop on eBay,20.0,0.0,,,https://ebay.com/itm/123456?hash=item28caef0a3...
1,Influence and Persuasion (HBR Emotional Intell...,6.06,9.11 shipping,United Kingdom,"worldofbooks08 (9,072,810) 99.1%",https://www.ebay.com/itm/143585166181?epid=239...
2,Power BI - Business Intelligence Clinic: Creat...,14.99,30.57 shipping,United States,chev1967_138 (563) 100%,https://www.ebay.com/itm/186043394348?epid=603...
3,Intelligent Investor: The Definitive Book on V...,12.13,9.11 shipping,United Kingdom,"worldofbooks08 (9,072,810) 99.1%",https://www.ebay.com/itm/304442680952?epid=872...
4,Isaack Onyango Business Intelligence (Hardback),208.8,3.63 shipping,United Kingdom,"rarewaves-outlet (992,371) 98.8%",https://www.ebay.com/itm/305029496829?epid=230...


In [8]:
book_data.tail()

Unnamed: 0,Title,Price_sold,Shipping_cost,Item_location,Item_seller,Link
71,Data Mining for Business Intelligence : Concep...,5.1,85.96 shipping,United States,anflo_413 (0) 0%,https://www.ebay.com/itm/404349057818?epid=844...
72,Pentaho Solutions: Business Intelligence and D...,16.74,5.12 shipping,United Kingdom,"book_fountain (174,453) 99.2%",https://www.ebay.com/itm/175834848693?epid=955...
73,Stars and Spies: The story of Intelligence Ope...,10.88,23.32 shipping,United Kingdom,"webuybooks (1,867,582) 99.3%",https://www.ebay.com/itm/155835665853?epid=110...
74,Microsoft Business Intelligence Tools for Exce...,75.0,23.49 shipping,United States,"awesomebooksusa (397,115) 98%",https://www.ebay.com/itm/314753512807?epid=170...
75,Introduction to R for Business Intelligence - ...,28.86,26.24 shipping,United States,"second.sale (2,860,466) 98.2%",https://www.ebay.com/itm/275629511052?epid=223...


In [9]:
book_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 76 entries, 0 to 75
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Title          76 non-null     object 
 1   Price_sold     76 non-null     float64
 2   Shipping_cost  76 non-null     object 
 3   Item_location  76 non-null     object 
 4   Item_seller    76 non-null     object 
 5   Link           76 non-null     object 
dtypes: float64(1), object(5)
memory usage: 3.7+ KB


#### Save the data

In [10]:
file_path = r"C:\Users\USER\Documents\Web_Scraping\book_data.csv"
book_data.to_csv(file_path, index=True)