## Smartphone Market Analysis using Web Scraping - Flipkart Case Study

### Project Overview

This project focuses on extracting **real-time smartphone data from Flipkart** using Python web scraping techniques (BeautifulSoup, Requests). The smartphone industry is highly competitive, with frequent product launches, fluctuating prices, and diverse customer preferences.

The dataset includes product names, prices, discounts, ratings, and technical specifications, which are further cleaned and prepared for analysis.

The ultimate goal is to visualize market trends in Power BI, providing insights into -
- Price distribution across smartphone models
- Discount patterns for different brands
- Consumer rating behavior
- Key feature comparisons (RAM, Battery, Display, Processor, Camera)

This project simulates an end-to-end data analysis workflow — from raw web scraping to structured data analysis — showcasing practical skills in data collection, cleaning, transformation, and visualization.

---

### Phase 1 - Web Scraping

Before beginning the data cleaning and analysis, we first need to collect the raw dataset.  
In this phase, we will scrape smartphone listings from **Flipkart** to capture essential details such as:  
- Brand & Model  
- Price & Discounts  
- Ratings  
- Key Specifications (RAM, ROM, Display, Battery, Processor, Camera, etc.)  

The scraped dataset will be stored in a structured format (CSV) and used in the next phase for transformation and analysis.

Importing Required Libraries

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time, random

Let's defines the scraping function, iterates through all pages, extracts product details and stores the results into a CSV file.

In [2]:
# Base setup
BASE_URL = "https://www.flipkart.com/search?q=smartphones&page={}"
HEADERS = {"User-Agent": "Mozilla/5.0"}
TOTAL_PAGES = 388
OUTPUT_FILE = "flipkart_smartphones.csv"

# Store all products
all_products = []

def scrape_page(page_num):
    try:
        url = BASE_URL.format(page_num)
        response = requests.get(url, headers=HEADERS)
        if response.status_code != 200:
            print(f"⚠️ Failed to fetch page {page_num}")
            return []
    except Exception as e:
        print(f"Error fetching {url}: {e}")
        return []
    
    soup = BeautifulSoup(response.text, "html.parser")
    data = []

    # Each product card
    product_cards = soup.find_all("div", {"class": "tUxRFH"})
    for card in product_cards:
        # Name
        name_tag = card.find("div", {"class": "KzDlHZ"})
        name = name_tag.text.strip() if name_tag else None

        # Price
        price_tag = card.find("div", {"class": "Nx9bqj _4b5DiR"})
        price = price_tag.text.strip().replace("₹", "").replace(",", "") if price_tag else None

        # Discount
        dis_tag = card.find("div", {"class": "UkUFwK"})
        discount = dis_tag.text.strip() if dis_tag else None

        # Rating
        rating_tag = card.find("div", {"class": "XQDdHH"})
        rating = rating_tag.text.strip() if rating_tag else None

        # Specs
        specs_container = card.find("div", {"class": "_6NESgJ"})
        if specs_container:
            specs_list = specs_container.find_all("li", {"class": "J+igdf"})
            specs = "; ".join([s.text.strip() for s in specs_list])
        else:
            specs = None

        if name:
            data.append({
                "Name": name,
                "Price": price,
                "Discount": discount,
                "Rating": rating,
                "Specs": specs
            })

    return data


# Main loop
for page in range(1, TOTAL_PAGES + 1):
    print(f"Scraping page {page}/{TOTAL_PAGES}...")
    products = scrape_page(page)
    all_products.extend(products)

    # Save progress every 20 pages
    if page % 20 == 0:
        pd.DataFrame(all_products).to_csv(OUTPUT_FILE, index=False)
        print(f"✅ Progress saved at page {page}")

    # Sleep to avoid blocking
    time.sleep(random.uniform(1, 3))

# Final save
df = pd.DataFrame(all_products)
df.to_csv(OUTPUT_FILE, index=False)
print(f"\n Done! Collected {len(df)} products into {OUTPUT_FILE}")

Scraping page 1/388...
Scraping page 2/388...
Scraping page 3/388...
Scraping page 4/388...
Scraping page 5/388...
Scraping page 6/388...
Scraping page 7/388...
Scraping page 8/388...
Scraping page 9/388...
Scraping page 10/388...
Scraping page 11/388...
Scraping page 12/388...
Scraping page 13/388...
Scraping page 14/388...
Scraping page 15/388...
Scraping page 16/388...
Scraping page 17/388...
Scraping page 18/388...
Scraping page 19/388...
Scraping page 20/388...
✅ Progress saved at page 20
Scraping page 21/388...
Scraping page 22/388...
Scraping page 23/388...
Scraping page 24/388...
Scraping page 25/388...
Scraping page 26/388...
Scraping page 27/388...
Scraping page 28/388...
Scraping page 29/388...
Scraping page 30/388...
Scraping page 31/388...
Scraping page 32/388...
Scraping page 33/388...
Scraping page 34/388...
Scraping page 35/388...
Scraping page 36/388...
Scraping page 37/388...
Scraping page 38/388...
Scraping page 39/388...
Scraping page 40/388...
✅ Progress saved at p

---


### Phase 2 - Data Preparation & Transformation

The web scraping phase is now complete, and we have successfully collected raw smartphone data from Flipkart.  
The dataset has been saved to **`flipkart_smartphones`**, which will serve as the foundation for further analysis.  

In the upcoming **Data Preparation & Transformation** notebook, we will:  
- Assess and improve data quality  
- Handle missing values and duplicates  
- Standardize specifications (RAM, ROM, Display, Battery, etc.)  
- Prepare the dataset for Exploratory Data Analysis (EDA)  

*Proceeding to the transformation stage ensures that the data is structured, reliable, and ready to generate actionable business insights.*

---
---