# Web Scraping Demo - Product List (v1) Solution for Reference

#### Author: Yu-Chang Ho (Andy), UC Davis
#### Latest Update: 2019 10/13


This notebook demonstrates the basic implementation for scraping data using BeautifulSoup4 library. Please use the website [https://hipposerver.ddns.net/webscraping/v1/](https://hipposerver.ddns.net/webscraping/v1/) as the same webpage to practice webscraping.

- Target website: [https://hipposerver.ddns.net/webscraping/](https://hipposerver.ddns.net/webscraping/v1/)
- Objective: Get a list of product data in a clean CSV format

In [1]:
### import the required libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd

url = 'https://hipposerver.ddns.net/webscraping/v1/'

# get the source code of the webpage
r = requests.get( url )
# check it out!
print( r.text )

<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">

<title>Web-scraping Demo - Product List v1</title>

<!-- CSS -->
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css">

<style>
.container {
    padding-top: 20px;
    padding-bottom: 20px;
}

.card-img-top
{
    width: 100%;
    height: 160px;
    object-fit: none;
}
</style>

<!-- JavaScrip, JQuery -->
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.7/umd/popper.min.js" integrity="sha384-UO2eT0CpHqdSJQ

In [2]:
### create a Beautiful Soup parser
soup = BeautifulSoup( r.text, 'html.parser' )

# Task 1. Get the Data of the First Item in the List

There are totally 8 products in the sample website. In the part, please try to retrive the information for the first item in the list, which is the "iPhone 11". Please have the following fields:

```
[ "name", "category", "year", "price", "rating", "sold" ]
```

In [3]:
card = soup.find( "div", class_="card" )
print( card )

<div class="card" style="width: 18rem; text-align: center;">
<img alt="iPhone 11" class="card-img-top" src="../assets/img/iphone_11.png" width="100px"/>
<div class="card-body">
<h4 class="item-title">iPhone 11</h4>
<h5 class="item-cate">Smartphone</h5>
<h6 class="item-year">2019</h6>
<p class="item-price"><span>$765.00</span></p>
<div class="star-rating">
<p class="item-rating"><span>4.3</span></p>
<ul class="list-inline">
<li class="list-inline-item"><i class="fa fa-star"></i></li>
<li class="list-inline-item"><i class="fa fa-star"></i></li>
<li class="list-inline-item"><i class="fa fa-star"></i></li>
<li class="list-inline-item"><i class="fa fa-star"></i></li>
<li class="list-inline-item"><i class="fa fa-star-o"></i></li>
</ul>
</div>
<span class="item-sold">1000</span>
</div>
</div>


In [4]:
name = card.find( 'h4', class_="item-title" ).text
cate = card.find( 'h5', class_="item-cate" ).text
year = card.find( 'h6', class_="item-year" ).text
price = card.find( 'p', class_="item-price" ).find( 'span' ).text
rating = card.find( 'p', class_="item-rating" ).find( 'span' ).text
sold = card.find( 'span', class_="item-sold" ).text

print( name, cate, year, price, rating, sold )

iPhone 11 Smartphone 2019 $765.00 4.3 1000


# Task 2. Get all the Data

Now you are able to retrieve one of the item, could you do it for all the products available and output the data into a dataframe in an elegant code?

In [5]:
# create a list to hold the data
data = []

# get all the card and append data points
for card in soup.find_all( "div", class_="card" ):
    name = card.find( 'h4', class_="item-title" ).text
    cate = card.find( 'h5', class_="item-cate" ).text
    year = card.find( 'h6', class_="item-year" ).text
    price = card.find( 'p', class_="item-price" ).find( 'span' ).text
    rating = card.find( 'p', class_="item-rating" ).find( 'span' ).text
    sold = card.find( 'span', class_="item-sold" ).text
    
    row = [ name, cate, year, price, rating, sold ]
    data.append( row )

# create a dataframe
header = [ "name", "category", "year", "price", "rating", "sold" ]
df = pd.DataFrame( data, columns=header )
# the result
print( df )

                 name    category  year    price rating  sold
0           iPhone 11  Smartphone  2019  $765.00    4.3  1000
1           iPhone XR  Smartphone  2018  $599.00    4.6  1500
2            iPhone 8  Smartphone  2017  $449.00    4.1  2900
3            iPhone 7  Smartphone  2019  $259.00    4.8  2080
4           iPhone 6S  Smartphone  2014  $114.00    4.4  1000
5           iPhone 3G  Smartphone  2008   $39.00    4.7  1000
6  Samsung Galaxy S10  Smartphone  2018  $899.00    4.6  2500
7   Samsung Galaxy S9  Smartphone  2019  $599.00    4.7  2405


In [None]:
# maybe you would like to output it into a csv file
df.to_csv(  )