****Create a web scraping tool to extract data from websites, such as product prices, stock prices, news articles, etc.****

**GET Requests**

In [10]:
import requests

# Making a GET request
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')

# print request object
# prints the url requested
print(r.url)
   
# print status code
# check status code for response received
# success code - 200
# <Response [200]>
print(r.status_code)

# print content of request object
# huge! commenting out
# print(r.content)


https://www.geeksforgeeks.org/python-programming-language/
200


**Now to parse above request object content, let's use BeautifulSoup**

In [11]:
import requests
from bs4 import BeautifulSoup
 
 
# Making a GET request
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
 
# check status code for response received
# success code - 200
print(r)
 
# Parsing the HTML
# outputs HTML DOC
soup = BeautifulSoup(r.content, 'html.parser')
# print(soup.prettify())

<Response [200]>


**Use tags to get the needed data**

In [16]:
import requests
from bs4 import BeautifulSoup
 
 
# Making a GET request
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
 
# Parsing the HTML
soup = BeautifulSoup(r.content, 'html.parser')
 
# Getting the title tag
print(soup.title)
 
# Getting the name of the tag
print(soup.title.name)
 
# Getting the name of parent tag
print(soup.title.parent.name)
 
# use the child attribute to get
# the name of the child tag

<title>Python Programming Language - GeeksforGeeks</title>
title
html


**Now, let's extract categories of products from urbanic**
- Since urbanic is dynamic site, bs4 can't render JavaScript hence 
    - using selenium
    - use webdriver
- category div= "cate_box"
- category name div = "cate_item"
- ![title](urbanic-categories.png)

In [9]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import time

# initiating the webdriver.
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

#url of the page we want to scrape
url = "https://in.urbanic.com/category"

driver.get(url) 

# this is just to ensure that the page is loaded
time.sleep(5) 
  
html = driver.page_source
  
# this renders the JS code and stores all
# of the information in static HTML code.
  
# Now, we could simply apply bs4 to html variable
soup = BeautifulSoup(html, "html.parser")
all_divs = soup.find('div', {'class' : 'cate_box'})
categories = all_divs.find_all('div', {'class' : 'cate_item'})
  
# printing all categories
for category in categories :
    print(category.text)
  
driver.close() # closing the webdriver

 Dresses & Skirts 
 Tops 
 Bottoms 
 Sweaters & Sweatshirts 
 Co-ords 
 Outerwears 
 Sports 
 Jewelry 
 Bags 
 Lingerie 
 Pyjamas 
 Swimwear 
 Curve 
 Phone Accessories 
 Other Accessories 


**Let's extract product names and prices**
- DOM STRUCTURE
- ('ul', {'class' : 'results-base'})
    - ('li', {'class' : 'product-base'})
        - ('a')
            - ('div', {'class' : 'product-productMetaInfo'})
                - ('h4', {'class' : 'product-product'}) --->product name here
                - ('div', {'class' : 'product-price'})
                    - ('span', {'class' : 'product-discountedPrice'}) --->product price here
- ![title](namesandprices.png)

In [31]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import time

# initiating the webdriver.
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

#url of the page we want to scrape
url = "https://www.myntra.com/women-kurtas-kurtis-suits"

driver.get(url) 

# this is just to ensure that the page is loaded
time.sleep(5) 
  
html = driver.page_source
  
# this renders the JS code and stores all
# of the information in static HTML code.
  
# Now, we could simply apply bs4 to html variable
soup = BeautifulSoup(html, "html.parser")
all_divs = soup.find('ul', {'class' : 'results-base'})
products = all_divs.find_all('li', {'class' : 'product-base'})
  
# printing names and prices of prices
for product in products :
    atag= product.find('a').find('div', {'class' : 'product-productMetaInfo'})
    name= atag.find('h4', {'class' : 'product-product'})    
    price = atag.find('div', {'class' : 'product-price'}).find('span', {'class' : 'product-discountedPrice'})
    
    print(name.text, price.text)

# print(products)

Printed A-Line Kurta Rs. 398
Women Yoke Design Kurta Set Rs. 887
Women Floral Embroidered Georgette Kurta Rs. 659
Women Kurta Set With Dupatta Rs. 1529
Women Solid Kurta with Palazzos & Dupatta Rs. 539
Women Anarkali Kurta Rs. 3442
Women Printed Kurta with Trousers With Dupatta Rs. 874
Cotton Floral Print Kurta Sets Rs. 1475
Kurta with Palazzos & Dupatta Rs. 1899
Women Solid Kurta with Trousers & Dupatta Rs. 815
Women Kurta With Trouser Rs. 1368
Women Kurta With Dupatta Rs. 1199
Women Ethnic Motifs Printed Kurta Rs. 644
Printed A-Line Kurta Rs. 406
Printed Straight Kurta Rs. 809
Women Embroidered Kurta Set Rs. 887
Women Kurta with Trousers With Dupatta Rs. 1434
Women Paisley Checked Anarkali Kurta Rs. 1199
Women Embroidered Kurta with Trousers With Dupatta Rs. 1665
Printed Anarkali Kurta Rs. 749
Pompom Lace Hem Rayon Kurta Rs. 809
Cotton Yoke Design Kurta  Set Rs. 1434
Women Kurta with Palazzos With Dupatta Rs. 799
Women Ethnic Motifs Printed Anarkali Kurta Rs. 1399
Embroidered Kurta w

In [32]:
# printing names and prices of those products whose price <=500
for product in products :
    atag= product.find('a').find('div', {'class' : 'product-productMetaInfo'})
    name= atag.find('h4', {'class' : 'product-product'})    
    price = atag.find('div', {'class' : 'product-price'}).find('span', {'class' : 'product-discountedPrice'})
    
    # list(price.text)=['R','s','.',' ','5','3','5']
    if int("".join(list(price.text)[4:]))<=500:
        print(name.text, price.text)

driver.close() # closing the webdriver

Printed A-Line Kurta Rs. 398
Printed A-Line Kurta Rs. 406
Embroidered A-Line Kurta Rs. 412
Self Designed Straight Kurta Rs. 341
