# Problem Statement: Web Scraping Product Information from Flipkart

**Background:**

Flipkart is an e-commerce platform where users can search for and purchase various products. It contains a vast catalog of products with details such as product names, prices, sellers, and additional specifications. To gather data for analysis or other purposes, it can be valuable to extract specific product information from Flipkart's website programmatically.

**Objective:**

The objective of this project is to create a Python script that performs web scraping on Flipkart's website to extract essential information about Samsung mobile phone and store it for further analysis or use. 

In [1]:
# Import necessary libraries
import pandas as pd
import requests
import numpy as np
from bs4 import BeautifulSoup

In [2]:
# Define functions to extract title
def get_title(soup):
    
    try:
        # Extract and clean the product title
        title_value = soup.find("span",class_="B_NuCI").text.replace("\xa0","")
    except:
        # If the title is not found, set it as an empty string
        title_value = ""
        
    return title_value

In [3]:
# Define functions to extract price
def get_price(soup):
    
    try:
        # Extract the product price
        price_value = soup.find("div",class_="_30jeq3 _16Jk6d").text
    except:
        # If the price is not found, set it as an empty string
        price_value = ""
    
    return price_value

In [4]:
# Define functions to extract rating
def get_rating(soup):
    
    try:
        # Extract the product rating
        rating_value = soup.find("div",class_="_3LWZlK").text
    except:
        # If the rating is not found, set it as an empty string
        rating_value = ""
    
    return rating_value

In [5]:
# Define functions to extract number of reviews
def get_num_review(soup):
    
    try:
        # Extract the number of reviews
        reviews_value = soup.find("span",class_="_2_R_DZ").find_all("span")[3].text.replace("\xa0","")
    except:
        # If the number of reviews is not found, set it as an empty string
        reviews_value = ""
    
    return reviews_value

In [6]:
# Define functions to extract product color
def get_color(soup):
    
    try:
        # Extract the product color
        color_value = soup.find_all("tr",class_="_1s_Smc row")[3].find("li",class_="_21lJbe").text
    except:
        # If the color is not found, set it as an empty string
        color_value = ""
    
    return color_value

In [7]:
# Define functions to extract product display size
def get_display_size(soup):
    
    try:
        # Extract the product display size
        display_value = soup.find_all("tr",class_="_1s_Smc row")[9].find("li",class_="_21lJbe").text
    except:
        # If the display size is not found, set it as an empty string
        display_value = ""
    
    return display_value

In [8]:
# Set the user-agent headers for HTTP requests
HEADERS = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'}

In [9]:
# Define the URL for Flipkart's Samsung product search
URL = "https://www.flipkart.com/search?q=samsung&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"

In [10]:
# Send an HTTP GET request to the search page
response = requests.get(URL,headers=HEADERS)

In [11]:
# Print the HTTP response object
response

<Response [200]>

In [12]:
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.content,"html.parser")

In [13]:
# Find all product links on the search results page
links = soup.find_all("a",class_="s1Q9rs")

In [14]:
# Create an empty list to store product URLs
links_list = []

In [15]:
# Extract product URLs and store them in the list
for link in links:
    links_list.append(link.get("href"))

In [16]:
# Create a dictionary to store scraped data
d = {"title":[],"price":[],"rating":[],"num_review":[],"color":[],"display":[]}

In [17]:
# Iterate through product URLs and scrape data
for link in links_list:
    
    # Create the full URL for the product page
    new_webpage_URL = "https://www.flipkart.com" + link
    
    # Send an HTTP GET request to the product page and parse the HTML content
    new_response = requests.get(new_webpage_URL,headers=HEADERS)
    new_soup = BeautifulSoup(new_response.content,"html.parser")
    
    # Call the defined functions to extract and append data to the dictionary
    d["title"].append(get_title(new_soup))
    d["price"].append(get_price(new_soup))
    d["rating"].append(get_rating(new_soup))
    d["num_review"].append(get_num_review(new_soup))
    d["color"].append(get_color(new_soup))
    d["display"].append(get_display_size(new_soup))

In [18]:
# Create a DataFrame from the scraped data
df = pd.DataFrame(d)

# Replace empty strings in the "title" column with NaN and drop rows with NaN values in the "title" column
df["title"].replace("",np.nan,inplace=True)
df = df.dropna(subset=["title"])

In [19]:
# Display the DataFrame
df

Unnamed: 0,title,price,rating,num_review,color,display
0,Samsung Galaxy S21 FE 5G with Snapdragon 888 (...,"₹45,999",4.3,"7,441 Reviews",Olive,Yes
1,"SAMSUNG 10000 mAh Power Bank (25 W, Fast Charg...","₹1,999",4.3,79 Reviews,2,Surge Damages and Battery Damages will Not Cov...
2,"SAMSUNG Galaxy F04 (Opal Green, 64 GB)(4 GB RAM)","₹6,499",4.2,"2,690 Reviews",Opal Green,16.51 cm (6.5 inch)
3,"SAMSUNG Galaxy F13 (Sunrise Copper, 64 GB)(4 G...","₹9,199",4.3,"9,973 Reviews",Sunrise Copper,16.76 cm (6.6 inch)
4,"SAMSUNG Galaxy F04 (Jade Purple, 64 GB)(4 GB RAM)","₹6,499",4.2,"2,690 Reviews",Jade Purple,16.51 cm (6.5 inch)
6,"SAMSUNG Galaxy F13 (Nightsky Green, 64 GB)(4 G...","₹9,199",4.3,"9,973 Reviews",Nightsky Green,16.76 cm (6.6 inch)
7,"SAMSUNG Original 25W, Type C Power Adaptor com...","₹1,299",4.4,"6,606 Reviews",Travel Adapter (EP-TA800),Samsung phones
8,"SAMSUNG Galaxy F14 5G (B.A.E. Purple, 128 GB)(...","₹12,490",4.2,"2,986 Reviews",B.A.E. Purple,16.76 cm (6.6 inch)
9,SAMSUNG Guru Music 2(Black),"₹2,832",4.3,"19,158 Reviews",Black,Yes
11,"SAMSUNG Galaxy M14 5G (Smoky Teal, 128 GB)(6 G...","₹13,484",4.2,294 Reviews,Smoky Teal,"Data Cable (USB TypeC to C), Ejection Pin, Use..."


In [20]:
# Save the DataFrame to a CSV file
df.to_csv("Flipkart_samsung_phone_data.csv", index=False)