#  What is Web Scraping?
**Web scraping means automatically extracting data from websites using code. 
Instead of manually copying information from webpages,
you write a Python script (or use tools) to collect data programmatically.**


# How Scraping Usually Works:
**1. Send a Request ‚ûî Your code sends a request to a website URL.**

**2. Get the Response ‚ûî You receive HTML (page content) as text.**

**3. Parse the HTML ‚ûî Use libraries like BeautifulSoup, lxml, or selectolax to find and extract specific data (like headings, tables, prices, etc.).**

**4. Store the Data ‚ûî Save it into CSV, JSON, Database, or wherever you want.**


In [49]:
import requests
from bs4 import BeautifulSoup 
from urllib.request import urlopen 


In [2]:
!pip install requests





In [4]:
 flipkart_url = "https://www.flipkart.com/search?q=" + "iphone12pro"
# urlClient=urlopen(flipkart_url)
# flipcart_page=urlClient.read()
# flipkart_html=bs(flipcart_page,'html.parser')
# flipkart_html


# import requests
# from bs4 import BeautifulSoup

# # Corrected URL
# flipkart_url = "https://www.flipkart.com/search?q=" + "iphone12pro"

# # Sending the GET request
# headers = {
#     "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
# }
# response = requests.get(flipkart_url, headers=headers)

# # Parsing the page with BeautifulSoup
# flipkart_html = BeautifulSoup(response.text, 'html.parser')

# # Printing the parsed HTML
# print(flipkart_html)


In [5]:
flipkart_url

'https://www.flipkart.com/search?q=iphone12pro'

In [7]:
 headers = {
     "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

# ***Sending the GET request***

In [9]:
#request = Request(flipkart_url, headers=headers)
# Opening the URL with urlopen
#urlClient = urlopen(request)

#or

response = requests.get(flipkart_url, headers=headers)

# To convert in text

In [15]:
response.text   #urlClient.read()

'<!doctype html><html lang="en"><head><link href="https://rukminim2.flixcart.com" rel="preconnect"/><link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/atlas.chunk.8dd48d.css"/><link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.c48a12.css"/><link rel="stylesheet" href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.e4e719.css"/><meta http-equiv="Content-type" content="text/html; charset=utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=Edge"/><meta property="fb:page_id" content="102988293558"/><meta property="fb:admins" content="658873552,624500995,100000233612389"/><link rel="shortcut icon" href="https://static-assets-web.flixcart.com/www/promos/new/20150528-140547-favicon-retina.ico"/><link type="application/opensearchdescription+xml" rel="search" href="/osdd.xml?v=2"/><meta property="og:type" content="website"/><meta name="og_site_name" property=

# **Handle It:  'Site is overloaded'**

This way, your code can wait and try again after a few seconds instead of crashing.

In [98]:
# import time
# import requests

# url =flipkart_url # Replace with your API
# max_retries = 5

# for attempt in range(max_retries):
#     response = requests.get(url)
    
#     if "overloaded" not in response.text.lower():
#         print("Success:", response.text)
#         break
#     else:
#         print(f"Attempt {attempt + 1}: Site overloaded, retrying...")
#         time.sleep(2)  # Wait 2 seconds before retrying
# else:
#     print("Failed after multiple retries.")


# Best Professional Way ‚Äî Use tenacity library   ---> for ***Handle It: 'Site is overloaded'***
The tenacity library automatically handles retries with backoff + jitter (best for production quality).

In [100]:
# from tenacity import retry, wait_exponential, stop_after_attempt
# import requests

# @retry(wait=wait_exponential(multiplier=1, min=2, max=60), stop=stop_after_attempt(10))
# def fetch_data():
#     response = requests.get(flipkart_url, headers=headers)
#     if "overloaded" in response.text.lower():
#         raise Exception("Site overloaded")
#     return response

# try:
#     data = fetch_data()
#     print(data.text)
# except Exception as e:
#     print("Failed after retries:", str(e))

 # Parsing the page with BeautifulSoup

In [17]:
#flipcart_page=urlClient.read()
#flipkart_html=bs(flipcart_page,'html.parser')
#flipkart_html

#or


flipkart_html = BeautifulSoup(response.text, 'html.parser')
#print(flipkart_html)
print(flipkart_html.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <link href="https://rukminim2.flixcart.com" rel="preconnect"/>
  <link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/atlas.chunk.8dd48d.css" rel="stylesheet"/>
  <link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.c48a12.css" rel="stylesheet"/>
  <link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.e4e719.css" rel="stylesheet"/>
  <meta content="text/html; charset=utf-8" http-equiv="Content-type"/>
  <meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
  <meta content="102988293558" property="fb:page_id"/>
  <meta content="658873552,624500995,100000233612389" property="fb:admins"/>
  <link href="https://static-assets-web.flixcart.com/www/promos/new/20150528-140547-favicon-retina.ico" rel="shortcut icon"/>
  <link href="/osdd.xml?v=2" rel="search" type="application/opensearchdescription+xml"/>
  <meta content="website" property="og:type"/>

In [19]:
big_box=flipkart_html.find_all("div",{"class":"cPHDOP col-12-12"})

In [21]:
len(big_box)

29

In [23]:
del big_box[0:2]
del big_box[-3:]

In [213]:
len(big_box)

23

In [25]:
big_box[1]

<div class="cPHDOP col-12-12"><div class="_75nlfW"><div data-id="MOBFWBYZVRPH2UCD" style="width:100%"><div class="tUxRFH"><a class="CGtC98" href="/apple-iphone-12-pro-pacific-blue-256-gb/p/itmea897274baa30?pid=MOBFWBYZVRPH2UCD&amp;lid=LSTMOBFWBYZVRPH2UCDKBB2PK&amp;marketplace=FLIPKART&amp;q=iphone12pro&amp;store=tyy%2F4io&amp;srno=s_1_2&amp;otracker=search&amp;fm=organic&amp;iid=e6be7059-7006-4eb4-8194-ab8b6f878a72.MOBFWBYZVRPH2UCD.SEARCH&amp;ppt=None&amp;ppn=None&amp;ssid=k9srvsi48w0000001745926151801&amp;qH=712933e6bd68e7b9" rel="noopener noreferrer" target="_blank"><div class="Otbq5D"><div class="yPq5Io"><div><div class="_4WELSP" style="height:200px;width:200px"><img alt="Apple iPhone 12 Pro (Pacific Blue, 256 GB)" class="DByuf4" loading="eager" src="https://rukminim2.flixcart.com/image/312/312/kg8avm80/mobile/u/c/d/apple-iphone-12-pro-dummyapplefsn-original-imafwgbrzxg3nggd.jpeg?q=70"/></div></div><div class="DShtpz"><span class="vfSpSs">Currently unavailable</span></div></div><div

In [27]:
big_box[3].div.div.div.a['href']

'/apple-iphone-12-pro-silver-512-gb/p/itm0ccf9fc219a71?pid=MOBFWBYZ5UY6ZBVA&lid=LSTMOBFWBYZ5UY6ZBVAWNVLCR&marketplace=FLIPKART&q=iphone12pro&store=tyy%2F4io&srno=s_1_4&otracker=search&fm=organic&iid=e6be7059-7006-4eb4-8194-ab8b6f878a72.MOBFWBYZ5UY6ZBVA.SEARCH&ppt=None&ppn=None&ssid=k9srvsi48w0000001745926151801&qH=712933e6bd68e7b9'

In [29]:
"https://www.flipkart.com"+big_box[3].div.div.div.a['href'] 

'https://www.flipkart.com/apple-iphone-12-pro-silver-512-gb/p/itm0ccf9fc219a71?pid=MOBFWBYZ5UY6ZBVA&lid=LSTMOBFWBYZ5UY6ZBVAWNVLCR&marketplace=FLIPKART&q=iphone12pro&store=tyy%2F4io&srno=s_1_4&otracker=search&fm=organic&iid=e6be7059-7006-4eb4-8194-ab8b6f878a72.MOBFWBYZ5UY6ZBVA.SEARCH&ppt=None&ppn=None&ssid=k9srvsi48w0000001745926151801&qH=712933e6bd68e7b9'

In [31]:
for i in big_box:
    print("https://www.flipkart.com"+i.div.div.div.a['href'])


#or
# for i in big_box:
#     if i.div and i.div.div and i.div.div.div and i.div.div.div.a:
#         print(i.div.div.div.a['href'])
#     else:
#         print("No anchor tag found")


https://www.flipkart.com/apple-iphone-12-pro-gold-128-gb/p/itma14a108237af5?pid=MOBFWBYZMDJZMHA9&lid=LSTMOBFWBYZMDJZMHA9SVNUXV&marketplace=FLIPKART&q=iphone12pro&store=tyy%2F4io&srno=s_1_1&otracker=search&fm=organic&iid=e6be7059-7006-4eb4-8194-ab8b6f878a72.MOBFWBYZMDJZMHA9.SEARCH&ppt=None&ppn=None&ssid=k9srvsi48w0000001745926151801&qH=712933e6bd68e7b9
https://www.flipkart.com/apple-iphone-12-pro-pacific-blue-256-gb/p/itmea897274baa30?pid=MOBFWBYZVRPH2UCD&lid=LSTMOBFWBYZVRPH2UCDKBB2PK&marketplace=FLIPKART&q=iphone12pro&store=tyy%2F4io&srno=s_1_2&otracker=search&fm=organic&iid=e6be7059-7006-4eb4-8194-ab8b6f878a72.MOBFWBYZVRPH2UCD.SEARCH&ppt=None&ppn=None&ssid=k9srvsi48w0000001745926151801&qH=712933e6bd68e7b9
https://www.flipkart.com/apple-iphone-12-pro-graphite-256-gb/p/itm4fa4da575698c?pid=MOBFWBYZBA36UB7G&lid=LSTMOBFWBYZBA36UB7GZYS7EA&marketplace=FLIPKART&q=iphone12pro&store=tyy%2F4io&srno=s_1_3&otracker=search&fm=organic&iid=e6be7059-7006-4eb4-8194-ab8b6f878a72.MOBFWBYZBA36UB7G.SEARCH

In [211]:
## By going sinle link
"https://www.flipkart.com"+big_box[4].div.div.div.a['href']

'https://www.flipkart.com/apple-iphone-12-pro-graphite-128-gb/p/itm03e5f2595d843?pid=MOBFWBYZBZ7Y56WD&lid=LSTMOBFWBYZBZ7Y56WDLRWIKS&marketplace=FLIPKART&q=iphone12pro&store=tyy%2F4io&srno=s_1_5&otracker=search&fm=organic&iid=e6be7059-7006-4eb4-8194-ab8b6f878a72.MOBFWBYZBZ7Y56WD.SEARCH&ppt=None&ppn=None&ssid=k9srvsi48w0000001745926151801&qH=712933e6bd68e7b9'

# we can the url page by big_box[i] to extract one by one page

**or we can combine all the page and scrap at ones ----> code for that is  the
in last**  


In [217]:
product_link="https://www.flipkart.com"+big_box[22].div.div.div.a['href']

In [219]:
product_req=requests.get(product_link)

In [220]:
product_html=BeautifulSoup(product_req.text,'html.parser')

In [221]:
comment_box=product_html.find_all('div',{'class':'RcXBOT'})

In [222]:
len(comment_box)

11

# Name of customer

In [228]:
comment_box[0].div.div.find_all('p',{'class':'_2NsDsF AwS1CA'})[0].text

'Flipkart Customer'

In [230]:
comment_box[0].find_all('p',{'class':'_2NsDsF AwS1CA'})[0].text

'Flipkart Customer'

In [232]:
for i in comment_box:
    print(i.find_all('p',{'class':'_2NsDsF AwS1CA'})[0].text)

Flipkart Customer
Puneeth Kumar P M
Flipkart Customer
Abinash Mohanty
Mukesh Thakor
Aman  Kamboj
Flipkart Customer
Chirag  bansal
Abhishek Tyagi
Flipkart Customer


IndexError: list index out of range

To handel the **list index out of range**

In [235]:
for i in comment_box:
    if i.div and i.div.div and i.div.div and i.div.div.p:
        print(i.find_all('p',{'class':'_2NsDsF AwS1CA'})[0].text)
    else:
        print("No anchor tag found")

Flipkart Customer
Puneeth Kumar P M
Flipkart Customer
Abinash Mohanty
Mukesh Thakor
Aman  Kamboj
Flipkart Customer
Chirag  bansal
Abhishek Tyagi
Flipkart Customer
No anchor tag found


In [237]:
del comment_box[-1:] ## drop or delet the last row


In [239]:
for i in comment_box:
    if i.div and i.div.div and i.div.div and i.div.div.p:
        print(i.find_all('p',{'class':'_2NsDsF AwS1CA'})[0].text)
    else:
        print("No anchor tag found")

Flipkart Customer
Puneeth Kumar P M
Flipkart Customer
Abinash Mohanty
Mukesh Thakor
Aman  Kamboj
Flipkart Customer
Chirag  bansal
Abhishek Tyagi
Flipkart Customer


# Review OF customer

In [242]:
for i in comment_box:
    print(i.find_all('p',{'class':'z9E0IG'})[0].text)

Worth every penny
Nice product
Perfect product!
Awesome
Highly recommended
Fabulous!
Terrific purchase
Brilliant
Wonderful
Best in the market!


# Rating 

In [245]:
for i in comment_box:
    print(i.div.div.div.div.text)

5
4
5
5
5
5
5
5
4
5


In [247]:
# or
for i in comment_box:
    print(i.find_all('div',{'class':'XQDdHH Ga3i8K'})[0].text)

5
4
5
5
5
5
5
5
4
5


# comment

In [250]:
for i in comment_box:
    print(i.find_all('div',{'class':''})[0].text)

Using it since last week.The display, photo and performance are best. Touch sensitivity is best. Photos are so natural, night time photos are too good. Photo quality after zooming is also good. Videos are also good. Better to buy Pro Max for longer battery backup. After heavy use, watching movies, battery consumption is 70% per day in average (no idea about backup for game users). Regarding weight, I am used to it. Dint find much difference from Samsung M31S and Flagship models. You can hold ...READ MORE
First iPhone, Battery drain faster, camera quality is awesome.READ MORE
Lovely phone ‚ù§Ô∏è‚ù§Ô∏è i love this ü•∞ü•∞READ MORE
The product is just Awesome  love it 6gbRAM 256gb its enough worth of money.Love the product,good build in camera.Magnificient phn,no other phones can replace it.READ MORE
I love iPhoneREAD MORE
love this phoneREAD MORE
Highly appreciable after sell service from apple team. I faced issue due to which I rated it only 1 star but now I am changing to 5 star as I 

In [268]:
for i in big_box:
    product_link="https://www.flipkart.com"+i.div.div.div.a['href']
    product_req=requests.get(product_link)
    product_html=BeautifulSoup(product_req.text,'html.parser')
    comment_box=product_html.find_all('div',{'class':'RcXBOT'})

    for i in comment_box:
        if i.div and i.div.div and i.div.div and i.div.div.p:
            print(i.find_all('p',{'class':'_2NsDsF AwS1CA'})[0].text)
            print(i.find_all('p',{'class':'z9E0IG'})[0].text)
            print(i.div.div.div.div.text)
            print(i.find_all('div',{'class':''})[0].text)
            print("-"*55)
        else:
            print("No anchor tag found")

Rajkumar tiwari
Simply awesome
5
indeed a great phone feels premium in hand but battery life is lowREAD MORE
-------------------------------------------------------
Tushar Saini
Fabulous!
5
The camera is mind-blowing I love it üòòREAD MORE
-------------------------------------------------------
Aasheesh Vats
Fabulous!
5
A masterpiece to cherish.READ MORE
-------------------------------------------------------
Pankaj Mahor
Awesome
5
Best designed iPhoneREAD MORE
-------------------------------------------------------
Vinit Kar
Best in the market!
5
I have been using iPhones for years, but the fast charging and battery on this one is brilliant. Like i always say, iPhone is more of a jewellery than a phone. But this one has some power-packed features and an awesome camera. So if you can afford it, buy it!READ MORE
-------------------------------------------------------
Rahul  Meena
Highly recommended
5
I like the most in this phone:-1. I like the size of this phone because it is easy to 