# Problem Statement: Web Scraping Product Information from Flipkart

**Background:**

Flipkart is an e-commerce platform where users can search for and purchase various products. It contains a vast catalog of products with details such as product names, prices, sellers, and additional specifications. To gather data for analysis or other purposes, it can be valuable to extract specific product information from Flipkart's website programmatically.

**Objective:**

The objective of this project is to create a Python script that performs web scraping on Flipkart's website to extract essential information about Samsung mobile phone and store it for further analysis or use. 

### Import necessary libraries

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

### Define the URL to scrape data

In [2]:
URL = f"https://www.flipkart.com/search?q=samsung&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"

### Setting user-agent headers

In [3]:
HEADERS = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'}

### Send an HTTP GET request to the URL

In [4]:
response = requests.get(URL,headers=HEADERS)

### Print the HTTP response object

In [5]:
response

<Response [200]>

### Get the content of the HTTP response

In [6]:
response.content

b'<!doctype html><html lang="en"><head><link href="https://rukminim2.flixcart.com" rel="preconnect"/><link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.905c37.css"/><link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.ccbde3.css"/><meta http-equiv="Content-type" content="text/html; charset=utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=Edge"/><meta property="fb:page_id" content="102988293558"/><meta property="fb:admins" content="658873552,624500995,100000233612389"/><meta name="robots" content="noodp"/><link rel="shortcut icon" href="https://static-assets-web.flixcart.com/www/promos/new/20150528-140547-favicon-retina.ico"/><link type="application/opensearchdescription+xml" rel="search" href="/osdd.xml?v=2"/><meta property="og:type" content="website"/><meta name="og_site_name" property="og:site_name" content="Flipkart.com"/><link rel="apple-touch-icon" sizes="57x57" 

### Parse the HTML content of the response using BeautifulSoup

In [7]:
soup = BeautifulSoup(response.content,"html.parser")

In [8]:
soup

<!DOCTYPE html>
<html lang="en"><head><link href="https://rukminim2.flixcart.com" rel="preconnect"/><link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.905c37.css" rel="stylesheet"/><link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.ccbde3.css" rel="stylesheet"/><meta content="text/html; charset=utf-8" http-equiv="Content-type"/><meta content="IE=Edge" http-equiv="X-UA-Compatible"/><meta content="102988293558" property="fb:page_id"/><meta content="658873552,624500995,100000233612389" property="fb:admins"/><meta content="noodp" name="robots"/><link href="https://static-assets-web.flixcart.com/www/promos/new/20150528-140547-favicon-retina.ico" rel="shortcut icon"/><link href="/osdd.xml?v=2" rel="search" type="application/opensearchdescription+xml"/><meta content="website" property="og:type"/><meta content="Flipkart.com" name="og_site_name" property="og:site_name"/><link href="/apple-touch-icon-57x57.png" rel

### Find all links with class "s1Q9rs" on the search results page

In [9]:
links = soup.find_all("a",class_="s1Q9rs")

### Print the list of links

In [10]:
links

[<a class="s1Q9rs" href="/samsung-galaxy-s21-fe-5g-snapdragon-888-olive-128-gb/p/itm628856d2794e5?pid=MOBGTKQGTQW4PZUF&amp;lid=LSTMOBGTKQGTQW4PZUFFPR4CC&amp;marketplace=FLIPKART&amp;q=samsung&amp;store=search.flipkart.com&amp;srno=s_1_1&amp;otracker=search&amp;otracker1=search&amp;fm=organic&amp;iid=en_GW7tThZ4u9aunEdF_i3RgKpq4KGY888FMaKpt6xy6gDcbGg9PcW8Y0seJQ5n7pEnCuZ1lerEETU6kUVC_IWoxA%3D%3D&amp;ppt=None&amp;ppn=None&amp;ssid=xoyv116rq80000001696534648639&amp;qH=fe546279a62683de" rel="noopener noreferrer" target="_blank" title="Samsung Galaxy S21 FE 5G with Snapdragon 888 (Olive, 128 GB)">Samsung Galaxy S21 FE 5G with Snapdragon 888 (Olive, 12...</a>,
 <a class="s1Q9rs" href="/samsung-10000-mah-power-bank-25-w-fast-charging/p/itm92007bd657e24?pid=PWBGZKDAPVWBGQRA&amp;lid=LSTPWBGZKDAPVWBGQRAUPNEQU&amp;marketplace=FLIPKART&amp;q=samsung&amp;store=search.flipkart.com&amp;srno=s_1_2&amp;otracker=search&amp;otracker1=search&amp;fm=organic&amp;iid=en_GW7tThZ4u9aunEdF_i3RgKpq4KGY888FMaKpt6x

### Get the first link from the list

In [11]:
links[0]

<a class="s1Q9rs" href="/samsung-galaxy-s21-fe-5g-snapdragon-888-olive-128-gb/p/itm628856d2794e5?pid=MOBGTKQGTQW4PZUF&amp;lid=LSTMOBGTKQGTQW4PZUFFPR4CC&amp;marketplace=FLIPKART&amp;q=samsung&amp;store=search.flipkart.com&amp;srno=s_1_1&amp;otracker=search&amp;otracker1=search&amp;fm=organic&amp;iid=en_GW7tThZ4u9aunEdF_i3RgKpq4KGY888FMaKpt6xy6gDcbGg9PcW8Y0seJQ5n7pEnCuZ1lerEETU6kUVC_IWoxA%3D%3D&amp;ppt=None&amp;ppn=None&amp;ssid=xoyv116rq80000001696534648639&amp;qH=fe546279a62683de" rel="noopener noreferrer" target="_blank" title="Samsung Galaxy S21 FE 5G with Snapdragon 888 (Olive, 128 GB)">Samsung Galaxy S21 FE 5G with Snapdragon 888 (Olive, 12...</a>

### Extract the href attribute from the first link to get the product URL

In [12]:
link = links[0].get("href")

### Print the extracted product URL

In [13]:
link

'/samsung-galaxy-s21-fe-5g-snapdragon-888-olive-128-gb/p/itm628856d2794e5?pid=MOBGTKQGTQW4PZUF&lid=LSTMOBGTKQGTQW4PZUFFPR4CC&marketplace=FLIPKART&q=samsung&store=search.flipkart.com&srno=s_1_1&otracker=search&otracker1=search&fm=organic&iid=en_GW7tThZ4u9aunEdF_i3RgKpq4KGY888FMaKpt6xy6gDcbGg9PcW8Y0seJQ5n7pEnCuZ1lerEETU6kUVC_IWoxA%3D%3D&ppt=None&ppn=None&ssid=xoyv116rq80000001696534648639&qH=fe546279a62683de'

### Construct the full product URL by appending it to the base URL

In [14]:
product_url = "https://www.flipkart.com" + link

### Print the full product URL

In [15]:
product_url

'https://www.flipkart.com/samsung-galaxy-s21-fe-5g-snapdragon-888-olive-128-gb/p/itm628856d2794e5?pid=MOBGTKQGTQW4PZUF&lid=LSTMOBGTKQGTQW4PZUFFPR4CC&marketplace=FLIPKART&q=samsung&store=search.flipkart.com&srno=s_1_1&otracker=search&otracker1=search&fm=organic&iid=en_GW7tThZ4u9aunEdF_i3RgKpq4KGY888FMaKpt6xy6gDcbGg9PcW8Y0seJQ5n7pEnCuZ1lerEETU6kUVC_IWoxA%3D%3D&ppt=None&ppn=None&ssid=xoyv116rq80000001696534648639&qH=fe546279a62683de'

### Send a new HTTP GET request to the product URL

In [16]:
new_response = requests.get(product_url,headers=HEADERS)

### Print the new HTTP response object

In [17]:
new_response

<Response [200]>

### Parse the HTML content of the product page using BeautifulSoup

In [18]:
new_soup = BeautifulSoup(new_response.content,"html.parser")

In [19]:
new_soup

<!DOCTYPE html>
<html lang="en"><head><link href="https://rukminim2.flixcart.com" rel="preconnect"/><link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.905c37.css" rel="stylesheet"/><link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.ccbde3.css" rel="stylesheet"/><meta content="text/html; charset=utf-8" http-equiv="Content-type"/><meta content="IE=Edge" http-equiv="X-UA-Compatible"/><meta content="102988293558" property="fb:page_id"/><meta content="658873552,624500995,100000233612389" property="fb:admins"/><meta content="noodp" name="robots"/><link href="https://static-assets-web.flixcart.com/www/promos/new/20150528-140547-favicon-retina.ico" rel="shortcut icon"/><link href="/osdd.xml?v=2" rel="search" type="application/opensearchdescription+xml"/><meta content="website" property="og:type"/><meta content="Flipkart.com" name="og_site_name" property="og:site_name"/><link href="/apple-touch-icon-57x57.png" rel

### Find and print the product title

In [20]:
new_soup.find("span",class_="B_NuCI")

<span class="B_NuCI">Samsung Galaxy S21 FE 5G with Snapdragon 888 (Olive, 128 GB)<!-- -->  (8 GB RAM)</span>

### Extract and print the product title without unwanted characters

In [21]:
new_soup.find("span",class_="B_NuCI").text

'Samsung Galaxy S21 FE 5G with Snapdragon 888 (Olive, 128 GB)\xa0\xa0(8 GB RAM)'

In [22]:
new_soup.find("span",class_="B_NuCI").text.replace("\xa0","")

'Samsung Galaxy S21 FE 5G with Snapdragon 888 (Olive, 128 GB)(8 GB RAM)'

### Find and print the product price

In [23]:
new_soup.find("div",class_="_30jeq3 _16Jk6d")

<div class="_30jeq3 _16Jk6d">₹45,999</div>

In [24]:
new_soup.find("div",class_="_30jeq3 _16Jk6d").text

'₹45,999'

### Extract and print the product rating


In [25]:
new_soup.find("div",class_="_3LWZlK").text

'4.3'

### Extract and print the number of reviews

In [26]:
new_soup.find("span",class_="_2_R_DZ").find_all("span")[3].text.replace("\xa0","")

'7,441 Reviews'

### Extract and print product color

In [27]:
new_soup.find_all("tr",class_="_1s_Smc row")[3].find("li",class_="_21lJbe").text

'Olive'

### Extract and print product display size

In [28]:
new_soup.find_all("tr",class_="_1s_Smc row")[9].find("li",class_="_21lJbe").text

'Yes'