# Top3 Solar Panels Reccomendation Scraping Project

## Introduction
This project involves scraping data about solar panels from a website using BeautifulSoup. The data includes various specifications and features of different solar panels, which will be used to build a recommendation system.

## Objectives
- To collect detailed information about solar panels from the web:  'https://www.moglix.com/solar/solar-panels/213110000'.
- To clean and structure the scraped data for analysis.
- To store the data in a format suitable for further processing and analysis.

---


### Libraries and Modules

The following Python libraries are used in this project:

- **pandas**: To handle and manipulate the data in tabular form.
- **requests**: To make HTTP requests to the web pages.
- **BeautifulSoup**: To parse the HTML content and extract data.
- **numpy**: For numerical operations and handling arrays.


In [1]:
!pip3 install pandas



In [2]:
!pip3 install beautifulSoup4



In [3]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import numpy as np

In [4]:
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Safari/537.36'
}

response = requests.get('https://www.moglix.com/solar/solar-panels/213110000', headers=headers)
print(response.text)



      <!DOCTYPE html><html lang="en"><head>
    <!--- PWA Related configs & metas -->
    <meta charset="utf-8">
    <meta http-equiv="x-ua-compatible" content="ie=edge">
    <!-- <meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no"> -->
    <meta name="copyright" content="© 2017 Moglix.com.">
    <meta name="Author" content="Moglix.com">
    <meta name="Created By" content="Moglix.com">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <!-- <meta http-equiv="content-language" content="en" /> -->
    <meta name="msvalidate.01" content="87CB1389FBF8BFACC48814311296A163">
    <meta name="viewport" content="user-scalable=no, width=device-width">
    <meta http-equiv="ScreenOrientation" content="autoRotate:disabled">
    <meta name="theme-color" content="#d9232d">
    <meta name="apple-mobile-web-app-status-bar-style" content="#ffffff">
    <meta name="theme-color" content="#ffffff">
    <meta name="theme-color" content="#1976d2">

In [5]:
# webpage=requests.get('https://www.zigwheels.com/newcars/electric-cars').text
webpage=requests.get('https://www.moglix.com/solar/solar-panels/213110000').text

In [6]:
soup=BeautifulSoup(webpage,'lxml')

In [7]:
# company=soup.find_all('div',class_='product-vertical-grid-card cursor active-hover')
company=soup.find_all('div',class_='product-card-lower-inner')

In [8]:
len(company)

40

## To find out the brand names

In [9]:
name = []
for i in company:
    name.append(i.find('div',class_='name').text.strip())

In [10]:
name

['Luminous 550W 24V Mono PERC Halfcut Solar Panel',
 'Waaree Bi-55-540 540W 144 Cells Framed Dual Glass Mono PERC Bifacial Solar Module ...',
 'Solar Universe 335W 24V Polycrystalline Solar Panel, SUI-335',
 'Waaree 540W 144 Cells Monocrystalline PERC Solar Panel, WSMD-540 ...',
 'Waaree 535Wp 144 Cells Framed Dual Glass Mono PERC Bifacial Solar Module, Bi-55-535 ...',
 'Luminous LUM 12170 170W Polycrystalline Solar Panel',
 'Luminous LUM 12170 170W Polycrystalline Solar Panel (Pack of 2) ...',
 'Solar Universe India 180W 12V Monocrystalline Solar Panel',
 'UTL 165W 12V Polycrystalline Solar PV Panel',
 'SUI Solar Panel 10w 12v',
 'Solar Universe India 425W Bifacial Monocrystalline Solar Panel (Pack of 2) ...',
 'Solar Universe India 20W Solar Panel',
 'SUI Solar Silicon & Aluminium 40 Watt Panel (Blue)',
 'Solar Universe 400W 24V Monocrystalline Solar Panel, SUI-400',
 'Solar Universe India 425W Bifacial Monocrystalline Solar Panel ...',
 'SUI 200W Solar Panel 12V (2 Units)',
 'Waaree

## To find out the prices

In [11]:
purchase_price = []
for i in company:
    purchase_price.append(i.find('div',class_='prod-selling-price').text.strip())
    

In [12]:
purchase_price

['₹16,409',
 '₹12,799',
 '₹10,899',
 '₹12,699',
 '₹12,899',
 '₹5,799',
 '₹12,639',
 '₹8,699',
 '₹6,099',
 '₹699',
 '₹36,850',
 '₹1,330',
 '₹1,909',
 '₹15,099',
 '₹16,500',
 '₹16,610',
 '₹12,499',
 '₹1,390',
 '₹12,399',
 '₹3,370',
 '₹18,400',
 '₹102,359',
 '₹41,610',
 '₹16,919',
 '₹8,199',
 '₹8,000',
 '₹8,000',
 '₹5,980',
 '₹39,420',
 '₹24,399',
 '₹560',
 '₹24,000',
 '₹6,129',
 '₹25,379',
 '₹16,950',
 '₹3,200',
 '₹6,330',
 '₹3,090',
 '₹16,300',
 '₹33,999']

## To find out the discounts

In [13]:
discount = []
for i in company:
    price_div = i.find('div', class_='price')
    if price_div:
        discount_span = price_div.find('span', class_='prod-discount')
        if discount_span:
            discount.append(discount_span.text.strip())
        else:
            discount.append(None)
    else:
        discount.append(None)

In [14]:
discount

['60% OFF',
 '55% OFF',
 '31% OFF',
 '53% OFF',
 '42% OFF',
 '61% OFF',
 '57% OFF',
 '10% OFF',
 '36% OFF',
 '39% OFF',
 '38% OFF',
 '34% OFF',
 '30% OFF',
 '31% OFF',
 '45% OFF',
 '22% OFF',
 '32% OFF',
 '35% OFF',
 '28% OFF',
 '20% OFF',
 '8% OFF',
 '1% OFF',
 '18% OFF',
 '32% OFF',
 '23% OFF',
 '21% OFF',
 '20% OFF',
 '60% OFF',
 '6% OFF',
 '57% OFF',
 '18% OFF',
 '56% OFF',
 '22% OFF',
 '44% OFF',
 '10% OFF',
 '31% OFF',
 '43% OFF',
 '18% OFF',
 '19% OFF',
 '29% OFF']

## To find out the ratings

In [15]:
review = []
for i in company:
    rating_span = i.find('span', class_='avgrating')
    if rating_span:  # Check if the span element was found
        review.append(rating_span.text.strip())
    else:
        review.append('N/A')  # Or handle the absence of the span element as needed

print(review)

['4.6', '4.0', '4.5', '4.5', '4.6', '4.6', '4.6', '4.4', '4.2', '4.3', '4.3', '3.8', '4.5', '4.4', '4.0', '4.8', '4.6', '4.2', '4.6', '4.2', '4.4', '4.2', '4.0', '4.6', '4.2', '4.3', '4.0', '4.3', '4.0', '4.2', '4.4', '4.6', '4.4', '4.6', '4.3', '4.7', '4.7', '4.3', '4.8', '4.8']


In [16]:
real_price = []
for i in company:   
    real_price_span = i.find('span', class_='prod-mrp')
    real_price.append(real_price_span.text.strip() if real_price_span else np.nan)
    

In [17]:
real_price

['₹42,000',
 '₹28,869',
 '₹16,000',
 '₹27,279',
 '₹22,399',
 '₹15,000',
 '₹30,000',
 '₹9,750',
 '₹9,570',
 '₹1,155',
 '₹60,000',
 '₹2,018',
 '₹2,730',
 '₹22,000',
 '₹30,200',
 '₹21,525',
 '₹18,480',
 '₹2,152',
 '₹17,360',
 '₹4,253',
 '₹20,100',
 '₹103,425',
 '₹50,925',
 '₹25,000',
 '₹10,750',
 '₹10,250',
 '₹10,000',
 '₹14,990',
 '₹42,195',
 '₹57,719',
 '₹683',
 '₹54,549',
 '₹7,875',
 '₹46,000',
 '₹19,000',
 '₹4,650',
 '₹11,290',
 '₹3,779',
 '₹20,250',
 '₹48,000']

## Now we wiil create a solar_panel.csv 

In [18]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re

final = []

for j in range(1, 1000):
    webpage = requests.get('https://www.moglix.com/solar/solar-panels/213110000={}'.format(j)).text
    soup = BeautifulSoup(webpage, 'lxml')
    company = soup.find_all('div', class_='product-card-lower-inner')
    
    name = []
    price = []
    discount = []
    real_price = []
    review = []
    all_links = []

    for i in company:
        try:
            name.append(i.find('div', class_='name').text.strip())
        except:
            name.append(np.nan)

        try:
            price.append(i.find('div', class_='prod-selling-price').text.strip())
        except:
            price.append(np.nan)
        
        try:
            discount_span = i.find('span', class_='prod-discount')
            discount.append(discount_span.text.strip() if discount_span else np.nan)
        except:
            discount.append(np.nan)

        try:
            real_price_span = i.find('span', class_='prod-mrp')
            real_price.append(real_price_span.text.strip() if real_price_span else np.nan)
        except:
            real_price.append(np.nan)
        
        try:
            rating_span = i.find('span', class_='avgrating')
            review.append(rating_span.text.strip() if rating_span else np.nan)
        except:
            review.append(np.nan)

        try:
            discription = i.find('div',class_='name')
            pattern = r'href="([^"]+)"'
            match = re.search(pattern, str(discription))
            if match:
                url = match.group(1)
                all_links.append(url)
            else:
                all_links.append(np.nan)       
        except:
            all_links.append(np.nan)

    # Ensure all lists are of the same length
    max_length = max(len(name), len(price), len(discount), len(real_price), len(review), len(all_links))

    name.extend([np.nan] * (max_length - len(name)))
    price.extend([np.nan] * (max_length - len(price)))
    discount.extend([np.nan] * (max_length - len(discount)))
    real_price.extend([np.nan] * (max_length - len(real_price)))
    review.extend([np.nan] * (max_length - len(review)))
    all_links.extend([np.nan] * (max_length - len(all_links)))


    df = pd.DataFrame({
        'name': name,
        'purchase_price': price,
        'discount': discount,
        'real_price': real_price,
        'rating': review,
        'Page URL': all_links
    })
    
    final.append(df)

final_df = pd.concat(final, ignore_index=True)
print(final_df)


                                                  name purchase_price  \
0      Luminous 550W 24V Mono PERC Halfcut Solar Panel        ₹16,409   
1    Waaree Bi-55-540 540W 144 Cells Framed Dual Gl...        ₹12,799   
2    Solar Universe 335W 24V Polycrystalline Solar ...        ₹10,899   
3    Waaree 540W 144 Cells Monocrystalline PERC Sol...        ₹12,699   
4    Waaree 535Wp 144 Cells Framed Dual Glass Mono ...        ₹12,899   
..                                                 ...            ...   
355  Solar Universe India 75W White Polycrystalline...         ₹3,200   
356              Solar Universe India 150W Solar Panel         ₹6,330   
357               Solar Universe India 60W Solar Panel         ₹3,090   
358  Solar Universe India 375W Monocrystalline Sola...        ₹16,300   
359  Solar Universe 1kW 24V Polycrystalline Solar P...        ₹33,999   

    discount real_price rating  \
0    60% OFF    ₹42,000    4.6   
1    55% OFF    ₹28,869    4.0   
2    31% OFF    ₹16,0

In [None]:
final_df.to_csv('solar_panel.csv')