# FLIPKART WEBSCAPING USING PYTHON

#### Import essential libraries—Pandas for data manipulation, BeautifulSoup for parsing HTML, and Requests for making HTTP requests—to access and retrieve structured data from the Flipkart website for analysis. These tools enable efficient web scraping and organization of extracted information.

In [24]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

#### Create empty lists to store scraped data for attributes such as product name, price, description, star rating, MRP, and reviews & ratings. These lists will act as placeholders to organize and populate data extracted from Flipkart.

In [31]:
name = []
price = []
describ = []
rating = []
MRP = []
stars = []
discount = []

#### Use the Requests library to send an HTTP GET request to Flipkart's website with a user agent to mimic browser behavior and access data. Loop through the HTML content using BeautifulSoup to extract individual product details. Append the gathered data (such as product name, price, description, etc.) to their respective empty lists created earlier.

In [32]:
# Request flipkart to access API:
for i in range(1, 16):
    url = 'https://www.flipkart.com/search?q=laptops&as=on&as-show=on&otracker=AS_Query_TrendingAutoSuggest_2_0_na_na_na&otracker1=AS_Query_TrendingAutoSuggest_2_0_na_na_na&as-pos=2&as-type=HISTORY&suggestionId=laptops&requestId=820e67e9-0691-4a92-845a-dc6180bc98d8&page=' + str(i)
    HEADERS = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US, en;q=0.5'
    }
    r = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(r.text, "lxml")
    
    # Gathering Product Name
    products = soup.find_all("div", class_="KzDlHZ")
    for product in products:
        name.append(product.get_text(strip=True))
        
    # Gathering Product Price
    prices = soup.find_all("div", class_="Nx9bqj _4b5DiR")
    for price_item in prices:
        price.append(price_item.get_text(strip=True))
        
    # Gathering Product Description
    descriptions = soup.find_all("div", class_="_6NESgJ")
    for description in descriptions:
        describ.append(description.get_text(strip=True))
        
    # Gathering Product Rating & Reviews
    ratings = soup.find_all("span", class_="Wphh3N")
    for rate in ratings:
        rating.append(rate.get_text(strip=True))
        
    # Gathering Product MRP (Maximum Retail Price)
    MRP_elements = soup.find_all("div", class_='yRaY8j ZYYwLA')
    for mrp in MRP_elements:
        MRP.append(mrp.get_text(strip=True))
        
    # Gathering Product Star Rating
    stars_elements = soup.find_all("div", class_="_5OesEi")
    for element in stars_elements:
        stars.append(element.get_text(strip=True))
        
    # Gathering Product Discount
    discount_elements = soup.find_all("div", class_="UkUFwK")
    for element in discount_elements:
        discount_text = element.get_text(strip=True)
        discount_percentage = ''.join(filter(str.isdigit, discount_text))
        discount.append(discount_percentage)
        
print(r)

<Response [200]>


#### Print the populated lists to verify the extracted data and use the len() function to determine the number of entries in each list. This step ensures that the data structure is complete and consistent across all attributes.

In [33]:
print(name, price, describ, rating, discount, MRP, stars)
print(len(name), len(price), len(describ), len(rating), len(discount), len(MRP), len(stars))

['Apple MacBook AIR Apple M2 - (8 GB/256 GB SSD/Mac OS Monterey) MLY33HN/A', 'Apple MacBook AIR Apple M2 - (8 GB/256 GB SSD/Mac OS Monterey) MLY13HN/A', 'DELL Inspiron 3520 Intel Core i3 12th Gen 1215U - (8 GB/512 GB SSD/Windows 11 Home) New Inspiron 15 La...', 'DELL Intel Core i3 12th Gen 1215U - (8 GB/512 GB SSD/Windows 11 Home) Inspiron 3520 Thin and Light Lap...', 'Apple MacBook AIR Apple M2 - (8 GB/256 GB SSD/Mac OS Monterey) MLXW3HN/A', 'Acer Aspire 3 Backlit Intel Core i5 12th Gen 1235U - (16 GB/512 GB SSD/Windows 11 Home) A324-51 Thin a...', 'Acer Chromebook Plus Google AI Intel Core i3 N305 - (8 GB/256 GB SSD/Chrome OS) CB514-4H-39T7 Chromebo...', 'Apple MacBook AIR Apple M2 - (8 GB/256 GB SSD/Mac OS Monterey) MLXY3HN/A', 'Primebook Wifi MediaTek MT8183 - (4 GB/64 GB EMMC Storage/Prime OS) PB Wifi Thin and Light Laptop', 'Acer Aspire 7 Intel Core i5 12th Gen 12450H - (16 GB/512 GB SSD/Windows 11 Home/4 GB Graphics/NVIDIA G...', 'Acer Chromebook Intel Celeron Dual Core N4500 - 

####  Combine all the populated lists into a dictionary, using attribute names as keys (e.g., product name, price, description). This approach ensures that all data points are aligned, avoiding length mismatch errors when converting the dictionary into a DataFrame for further analysis.

In [34]:
laptop_dict = dict({
    "Product Name": name,
    "Product Price": price,
    "Product Description": describ,
    "Rating": rating,
    "Discount": discount,
    "MRP": MRP,
    "Stars" : stars})

#### Convert the dictionary containing the extracted data into a Pandas DataFrame. This transformation organizes the data into a structured tabular format, making it suitable for analysis, visualization, and exporting to various file types.

In [35]:
df = pd.DataFrame.from_dict(laptop_dict, orient='index').transpose()
df.dropna(inplace=True)
df

Unnamed: 0,Product Name,Product Price,Product Description,Rating,Discount,MRP,Stars
0,Apple MacBook AIR Apple M2 - (8 GB/256 GB SSD/...,"₹73,990",Apple M2 Processor8 GB Unified Memory RAMMac O...,"14,664 Ratings&869 Reviews",25,"₹99,900","4.714,664 Ratings&869 Reviews"
1,Apple MacBook AIR Apple M2 - (8 GB/256 GB SSD/...,"₹73,990",Apple M2 Processor8 GB Unified Memory RAMMac O...,"14,664 Ratings&869 Reviews",25,"₹99,900","4.714,664 Ratings&869 Reviews"
2,DELL Inspiron 3520 Intel Core i3 12th Gen 1215...,"₹33,350",Intel Core i3 Processor (12th Gen)8 GB DDR4 RA...,"3,740 Ratings&271 Reviews",33,"₹50,520","4.23,740 Ratings&271 Reviews"
3,DELL Intel Core i3 12th Gen 1215U - (8 GB/512 ...,"₹33,200",Intel Core i3 Processor (12th Gen)8 GB DDR4 RA...,"3,740 Ratings&271 Reviews",30,"₹47,869","4.23,740 Ratings&271 Reviews"
4,Apple MacBook AIR Apple M2 - (8 GB/256 GB SSD/...,"₹73,990",Apple M2 Processor8 GB Unified Memory RAMMac O...,"14,664 Ratings&869 Reviews",25,"₹99,900","4.714,664 Ratings&869 Reviews"
...,...,...,...,...,...,...,...
344,MSI Thin 15 Intel Core i5 12th Gen 12450H - (1...,"₹57,990",Intel Core i5 Processor (12th Gen)16 GB DDR4 R...,74 Ratings&9 Reviews,17,"₹1,39,999",4.374 Ratings&9 Reviews
345,HP AMD Ryzen 5 Hexa Core 5500U - (16 GB/512 GB...,"₹42,590",AMD Ryzen 5 Hexa Core Processor16 GB DDR4 RAM6...,32 Ratings&2 Reviews,16,"₹60,990",4.332 Ratings&2 Reviews
346,Infinix X Air Pro+ Intel Core i5 13th Gen 1334...,"₹49,990",Intel Core i5 Processor (13th Gen)16 GB LPDDR4...,464 Ratings&23 Reviews,28,"₹72,990",4.2464 Ratings&23 Reviews
347,ASUS (2024) Intel Pentium Quad Core N5030 - (4...,"₹27,500",Intel Pentium Quad Core Processor4 GB DDR4 RAM...,177 Ratings&18 Reviews,31,"₹84,838",4.1177 Ratings&18 Reviews


#### Export the DataFrame to a file format such as CSV, Excel, or JSON. This step allows for saving the structured data for further analysis or sharing, ensuring compatibility with other tools and software.