# Smartphone Price Prediction

Vishakha Joshi (22070126132)  
Yash Chandak (22070126134)  
Girish Mahale (23070126504)

GitHub Link - https://github.com/girishmahale786/smartphone-price-prediction  
Deployment Link - https://smartphone-price-prediction.streamlit.app

# Data Aquisition

**Importing Libraries**  
In this cell, we start by importing the necessary libraries for our project. We use BeautifulSoup for parsing HTML, Pandas for data manipulation, Requests for making HTTP requests, and other standard Python libraries.

In [1]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
import time
import os


**Function Definition - get_soup**  
In this cell, we define a custom function called `get_soup(url)`. This function takes a URL as its input and returns a BeautifulSoup object that we can use to parse the content of a webpage. We've also added error handling to manage request exceptions and implemented a retry mechanism in case of issues.

In [2]:
def get_soup(url):
    while True:
        try:
            response = requests.get(url)
            response.raise_for_status()
            return BeautifulSoup(response.content, 'html.parser')
        except requests.exceptions.RequestException as e:
            time.sleep(2)
            continue


**Function Definition - scrape_phone_details**  
Here, we define another function named `scrape_phone_details(phone_url)`. This function is responsible for extracting details of a mobile phone from a given URL. It collects information such as the phone's name, price, brand, rating, and specifications. All of these details are stored in a dictionary and returned.

In [3]:
def scrape_phone_details(phone_url):
    phone_soup = get_soup(phone_url)

    name = phone_soup.find('div', class_='aMaAEs').find('span', class_='B_NuCI').text.strip()
    disc_price = phone_soup.find('div', class_='_30jeq3').text.strip()

    price = disc_price
    if phone_soup.find('div', class_='_3I9_wc'):
        price = phone_soup.find('div', class_='_3I9_wc').text.strip()

    brand = None
    if phone_soup.find('div', class_='_1MR4o5'):
        brand = phone_soup.find('div', class_='_1MR4o5').find_all('a')[3].text.strip()

    rating = None
    if phone_soup.find('div', class_='_3LWZlK'):
        rating = phone_soup.find('div', class_='_3LWZlK').text.strip()

    phone = {
        'Name': name,
        'Brand': brand,
        'Price': price,
        'Discounted Price': disc_price,
        'Rating': rating
    }

    specs_table = phone_soup.find_all('table', class_='_14cfVK')
    for spec in specs_table:
        for tr in spec.find_all('tr'):
            td = tr.contents
            if len(td) > 1:
                phone[td[0].text.strip()] = td[1].text.strip()

    return phone


**Function Definition - scrape_flipkart_data**  
In this cell, we define a function called `scrape_flipkart_data(base_url, brand_urls)`. This function is the core of our web scraping project. It scrapes data from Flipkart for various brands of mobile phones. It takes a base URL and a dictionary of brand URLs as inputs, iterates through the brand URLs, and collects information from each page. The data is then stored in a nested dictionary structure for further analysis.

In [4]:
def scrape_flipkart_data(base_url, brand_urls):
    phones = {}
    for brand, url in brand_urls.items():
        phones[brand] = []

        brand_soup = get_soup(url)
        page_count = 0
        if brand_soup.find('div', class_='_2MImiq'):
            page_count = int(brand_soup.find('div', class_='_2MImiq').span.text.split()[-1])

        for page in range(0, page_count + 1):
            page_url = f'{url}&page={page + 1}'
            page_soup = get_soup(page_url)
            phones_list = page_soup.find_all('div', class_='_13oc-S')

            for phone in phones_list:
                phone_url = f"{base_url}{phone.find('a')['href']}"
                phone_specs = scrape_phone_details(phone_url)
                phones[brand].append(phone_specs)

    return phones


**Define Base URL and Brand URLs**  
Here, we set the base URL to 'https://www.flipkart.com' and define the URLs for various smartphone brands on Flipkart. Each brand URL is specified, allowing us to focus on collecting data for specific brands.

In [5]:
base_url = 'https://www.flipkart.com'
search = f'{base_url}/search?sid=tyy%2C4io&otracker=CLP_Filters&p%5B%5D=facets.price_range.from%3D10000&p%5B%5D=facets.price_range.to%3DMax'
apple = f'{search}&p%5B%5D=facets.brand%255B%255D%3DAPPLE'
samsung = f'{search}&p%5B%5D=facets.brand%255B%255D%3DSAMSUNG'
google = f'{search}&p%5B%5D=facets.brand%255B%255D%3DGoogle'
nothing = f'{search}&p%5B%5D=facets.brand%255B%255D%3DNothing'
asus = f'{search}&p%5B%5D=facets.brand%255B%255D%3DASUS'
oneplus = f'{search}&p%5B%5D=facets.brand%255B%255D%3DOnePlus'
oppo = f'{search}&p%5B%5D=facets.brand%255B%255D%3DOPPO'
vivo = f'{search}&p%5B%5D=facets.brand%255B%255D%3Dvivo'
mi = f'{search}&p%5B%5D=facets.brand%255B%255D%3DMi'
redmi = f'{search}&p%5B%5D=facets.brand%255B%255D%3DREDMI'
realme = f'{search}&p%5B%5D=facets.brand%255B%255D%3Drealme'
poco = f'{search}&p%5B%5D=facets.brand%255B%255D%3DPOCO'
iqoo = f'{search}&p%5B%5D=facets.brand%255B%255D%3DIQOO'
motorola = f'{search}&p%5B%5D=facets.brand%255B%255D%3DMOTOROLA'

brand_urls = {
    'apple': apple,
    'samsung': samsung,
    'google': google,
    'nothing': nothing,
    'asus': asus,
    'oneplus': oneplus,
    'oppo': oppo,
    'vivo': vivo,
    'mi': mi,
    'redmi': redmi,
    'realme': realme,
    'poco': poco,
    'iqoo': iqoo,
    'motorola': motorola,
}


**Scraping Data**  
This cell is where the actual scraping happens. We call the `scrape_flipkart_data` function, passing in the base URL and the dictionary of brand URLs. The code then iterates through each brand, scrapes information from their respective pages, and stores the data in separate CSV files, one for each brand.

In [6]:
phones = scrape_flipkart_data(base_url, brand_urls)
for brand in brand_urls.keys():
    df = pd.DataFrame(phones[brand])
    df.to_csv(f'data/{brand}.csv', index=False)


**Combining Data from CSV Files**  
In this final cell, we bring all the data together. We read the CSV files for each brand into Pandas DataFrames and merge them into one comprehensive DataFrame named 'phones_df.' This combined dataset is saved as a 'phones.csv' file, which we can use for further analysis, research, or visualization.


In [7]:
all_df = []
for file in os.listdir('data/'):
    df = pd.read_csv(f'data/{file}')
    all_df.append(df)

phones_df = pd.concat(all_df)
phones_df.to_csv('data/phones.csv', index=False)
