# Problem Statement: Web Scraping Product Information from WhatsMobile

**Background:**

WhatsMobile is an e-commerce platform where users can search for and purchase various products. It contains a vast catalog of products with details such as product names, prices, sellers, and additional specifications. To gather data for analysis or other purposes, it can be valuable to extract specific product information from Flipkart's website programmatically.

**Objective:**

The objective of this project is to create a Python script that performs web scraping on WhatsMobilert's website to extract essential information about Samsung mobile phone and store it for further analysis or use. 

### Import necessary libraries

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

### Define the URL to scrape data

In [2]:
URL = f"https://www.whatmobile.com.pk/Samsung_Mobiles_Prices"

### Setting user-agent headers

In [3]:
HEADERS = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'}

### Send an HTTP GET request to the URL

In [4]:
response = requests.get(URL,headers=HEADERS)

### Print the HTTP response object

In [5]:
response

<Response [200]>

### Get the content of the HTTP response

In [6]:
response.content

b'<html lang="en-US" prefix="og: http://ogp.me/ns#">\n<head>\n       <link rel="canonical" href="https://www.whatmobile.com.pk/Samsung_Mobiles_Prices">\n       <link rel="alternate" media="only screen and (max-width: 640px)" href="https://m.whatmobile.com.pk/Samsung_Mobiles_Prices">\n<!-- Google Tag Manager -->\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\'gtm.start\':\nnew Date().getTime(),event:\'gtm.js\'});var f=d.getElementsByTagName(s)[0],\nj=d.createElement(s),dl=l!=\'dataLayer\'?\'&l=\'+l:\'\';j.async=true;j.src=\n\'https://www.googletagmanager.com/gtm.js?id=\'+i+dl;f.parentNode.insertBefore(j,f);\n})(window,document,\'script\',\'dataLayer\',\'GTM-NVDVQQ\');</script>\n<!-- End Google Tag Manager -->\n    <link rel="manifest" href="/manifest.json">\n    <title>Samsung mobiles - Samsung mobile prices in Pakistan 2024 - WhatMobile</title>\n    <meta name="description"\n          content="Latest Samsung Mobile Phones Prices in Pakistan 2024 (Islamabad, Lahore &amp; Karach

### Parse the HTML content of the response using BeautifulSoup

In [7]:
soup = BeautifulSoup(response.content,"html.parser")

In [8]:
soup

<html lang="en-US" prefix="og: http://ogp.me/ns#">
<head>
<link href="https://www.whatmobile.com.pk/Samsung_Mobiles_Prices" rel="canonical"/>
<link href="https://m.whatmobile.com.pk/Samsung_Mobiles_Prices" media="only screen and (max-width: 640px)" rel="alternate"/>
<!-- Google Tag Manager -->
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','GTM-NVDVQQ');</script>
<!-- End Google Tag Manager -->
<link href="/manifest.json" rel="manifest"/>
<title>Samsung mobiles - Samsung mobile prices in Pakistan 2024 - WhatMobile</title>
<meta content="Latest Samsung Mobile Phones Prices in Pakistan 2024 (Islamabad, Lahore &amp; Karachi) - Price and Specifications of new smartphones. Compare Price list &amp; features. B

### Find all links with class "s1Q9rs" on the search results page

In [12]:
links = soup.find_all("a",class_="BiggerText")

### Print the list of links

In [13]:
links

[<a class="BiggerText" href="/Samsung_Galaxy-Z-Fold-5" style="text-decoration:none" title="Samsung Galaxy Z Fold 5 price">
                                             Samsung<br/>Galaxy Z Fold 5<br/></a>,
 <a class="BiggerText" href="/Samsung_Galaxy-Z-Fold-6" style="text-decoration:none" title="Samsung Galaxy Z Fold 6 price">
                                             Samsung<br/>Galaxy Z Fold 6<br/></a>,
 <a class="BiggerText" href="/Samsung_Galaxy-Z-Fold-4-12GB" style="text-decoration:none" title="Samsung Galaxy Z Fold 4 12GB price">
                                             Samsung<br/>Galaxy Z Fold 4 12GB<br/></a>,
 <a class="BiggerText" href="/Samsung_Galaxy-Z-Fold-4" style="text-decoration:none" title="Samsung Galaxy Z Fold 4 price">
                                             Samsung<br/>Galaxy Z Fold 4<br/></a>,
 <a class="BiggerText" href="/Samsung_Galaxy-S24-Ultra-512GB" style="text-decoration:none" title="Samsung Galaxy S24 Ultra 512GB price">
                        

### Get the first link from the list

In [14]:
links[0]

<a class="BiggerText" href="/Samsung_Galaxy-Z-Fold-5" style="text-decoration:none" title="Samsung Galaxy Z Fold 5 price">
                                            Samsung<br/>Galaxy Z Fold 5<br/></a>

### Extract the href attribute from the first link to get the product URL

In [15]:
link = links[0].get("href")

### Print the extracted product URL

In [16]:
link

'/Samsung_Galaxy-Z-Fold-5'

### Construct the full product URL by appending it to the base URL

In [17]:
product_url = "https://www.whatmobile.com.pk/" + link

### Print the full product URL

In [18]:
product_url

'https://www.whatmobile.com.pk//Samsung_Galaxy-Z-Fold-5'

### Send a new HTTP GET request to the product URL

In [68]:
new_response = requests.get(product_url,headers=HEADERS)

### Print the new HTTP response object

In [69]:
new_response

<Response [200]>

### Parse the HTML content of the product page using BeautifulSoup

In [70]:
new_soup = BeautifulSoup(new_response.content,"html.parser")

In [71]:
new_soup

<!DOCTYPE html>

<html lang="en-US" prefix="og: http://ogp.me/ns#">
<head>
<!-- Google Tag Manager -->
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','GTM-NVDVQQ');</script>
<!-- End Google Tag Manager -->
<link href="/manifest.json" rel="manifest"/>
<meta content="text/html;charset=utf-8" http-equiv="Content-Type"/>
<meta content="width=1060, initial-scale=1.0" name="viewport"/>
<title> Samsung Galaxy Z Fold 5 Price in Pakistan &amp; Specifications 2023</title>
<meta content="Samsung Galaxy Z Fold 5 price in Pakistan, daily updated Samsung phones including specs &amp; information : WhatMobile.com.pk : Samsung Galaxy Z Fold 5 price Pakistan :" name="Description">
<meta content="Samsung Galaxy Z Fold 5, S

### Find and print the product title

In [85]:
new_soup.find("h2",class_="Heading1")

<h2 class="Heading1" style="padding:0px; margin:0px; display:inline;">Samsung Galaxy Z Fold 5                                detailed specifications</h2>

### Extract and print the product title without unwanted characters

In [86]:
new_soup.find("h2",class_="Heading1").text

'Samsung Galaxy Z Fold 5                                detailed specifications'

In [87]:
new_soup.find("h2",class_="Heading1").text.replace("                                detailed specifications'","")

'Samsung Galaxy Z Fold 5                                detailed specifications'

### Find and print the product price

In [23]:
new_soup.find("div",class_="_30jeq3 _16Jk6d")

<div class="_30jeq3 _16Jk6d">₹45,999</div>

In [24]:
new_soup.find("div",class_="_30jeq3 _16Jk6d").text

'₹45,999'

### Extract and print the product rating


In [25]:
new_soup.find("div",class_="_3LWZlK").text

'4.3'

### Extract and print the number of reviews

In [26]:
new_soup.find("span",class_="_2_R_DZ").find_all("span")[3].text.replace("\xa0","")

'7,441 Reviews'

### Extract and print product color

In [27]:
new_soup.find_all("tr",class_="_1s_Smc row")[3].find("li",class_="_21lJbe").text

'Olive'

### Extract and print product display size

In [28]:
new_soup.find_all("tr",class_="_1s_Smc row")[9].find("li",class_="_21lJbe").text

'Yes'