## Data Scraping

 **Data Scraping** also known as **Web Scraping** is the process of importing information from a website into a spreadsheet or local file saved on your computer. It’s one of the most efficient ways to get data from the web, and in some cases to channel that data to another website. Popular uses of data scraping include:

- Research for web content/business intelligence
- Pricing for travel booker sites/price comparison sites
- Finding sales leads/conducting market research by crawling public data sources (e.g. Yell and Twitter)
- Sending product data from an e-commerce site to another online vendor (e.g. Google Shopping)

In [None]:
import requests
from bs4 import BeautifulSoup as soup

In [None]:
header = {'Origin': 'https://www.1mg.com',
'Referer': 'https://www.1mg.com/categories/exclusive/immunity-boosters/vitamin-c-734',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'
}

In [None]:
url = 'https://www.1mg.com/categories/exclusive/immunity-boosters/vitamin-c-734'
html = requests.get(url=url,headers=header)
html.status_code

200

In [None]:
bsobj = soup(html.content, 'lxml')
bsobj

<!DOCTYPE html>
<!-- Chrome, Firefox OS, Opera and Vivaldi --><html><head><meta content="#FFF3E3" name="theme-color"/>
<!-- Windows Phone -->
<meta content="#FFF3E3" name="msapplication-navbutton-color"/>
<!-- iOS Safari -->
<meta content="#FFF3E3" name="apple-mobile-web-app-status-bar-style"/>
<meta content="yes" name="apple-mobile-web-app-capable"/><!--Newrelic Header--><!-- Application styles--><link href="/faviconRebrand.ico" rel="shortcut icon" type="image/x-icon"/><meta content="width=device-width, height=device-height, initial-scale=1.0, maximum-scale=1.0, user-scalable=0" name="viewport"/><meta content="099F7701BE7D0B79C7C51734FDC2A2D9" name="msvalidate.01"/><link href="/notifyvisitors_push/chrome/manifest.json" rel="manifest"/><!-- Apple Touch Icons--><link href="/apple-touch-icon-57x57.png" rel="apple-touch-icon" sizes="57x57"/><link href="/apple-touch-icon-60x60.png" rel="apple-touch-icon" sizes="60x60"/><link href="/apple-touch-icon-72x72.png" rel="apple-touch-icon" sizes="

In [None]:
bsobj.findAll('div',{'class':'style__product-description___2XaG0'})

[<div class="style__product-description___2XaG0"><div class="style__pro-title___2QwJy">New Celin 500 Tablet</div><div class="style__pack-size___2JQG7">strip of 25 tablets</div></div>,
 <div class="style__product-description___2XaG0"><div class="style__pro-title___2QwJy">Carbamide Forte Chelated Iron + Vitamin C + Folic Acid + Vit B12 + Zinc Vegetarian Tablet</div><div class="style__pack-size___2JQG7">bottle of 60 tablets</div></div>,
 <div class="style__product-description___2XaG0"><div class="style__pro-title___2QwJy">Purayati Vitamin C Vegetarian Tablet</div><div class="style__pack-size___2JQG7">bottle of 90 tablets</div></div>,
 <div class="style__product-description___2XaG0"><div class="style__pro-title___2QwJy">1mg Vitamin C+ Chewable Tablets Supports Immunity with Vitamin D3, Zinc and Amla Extract Vegetarian</div><div class="style__pack-size___2JQG7">box of 30 Chewable Tablets</div></div>,
 <div class="style__product-description___2XaG0"><div class="style__pro-title___2QwJy">Limc

In [None]:
product_name = []
for name in bsobj.findAll('div',{'class':'style__product-description___2XaG0'}):
  product_name.append(name.text.strip())

In [None]:
product_name

['New Celin 500 Tabletstrip of 25 tablets',
 'Carbamide Forte Chelated Iron + Vitamin C + Folic Acid + Vit B12 + Zinc Vegetarian Tabletbottle of 60 tablets',
 'Purayati Vitamin C Vegetarian Tabletbottle of 90 tablets',
 '1mg Vitamin C+ Chewable Tablets Supports Immunity with Vitamin D3, Zinc and Amla Extract Vegetarianbox of 30 Chewable Tablets',
 'Limcee Chewable Tablet Orangestrip of 15 Chewable Tablets',
 'Chicnutrix Super C Amla Extract & Zinc Orange Effervescent Tabletbottle of 20 Effervescent Tablet',
 "Nature's Velvet Natural Vitamin C 500mg Capsulebottle of 60 capsules",
 'MaxCee Vitamin C Chewable Tabletstrip of 20 Chewable Tablets',
 'Alron-Z Capsule Buy 1 Get 1 Freestrip of 15 capsules',
 'Cimune SF Chewable Tablet Orangestrip of 20 Chewable Tablets',
 'Fast&Up Charge Natural Vitamin C & Zinc Orange Effervescent Tabletcombo pack of 60 Effervescent Tablet',
 'Chicnutrix Super C Amla Extract & Zinc Fizzy Orange Effervescent Tabletbox of 60 Effervescent Tablet',
 'Celin + Chewa

In [None]:
len(product_name)

40

In [None]:
pack_size = []

for size in bsobj.findAll('div', {'class': 'style__pack-size___2JQG7'}):
  pack_size.append(size.text.strip())

pack_size

['strip of 25 tablets',
 'bottle of 60 tablets',
 'bottle of 90 tablets',
 'box of 30 Chewable Tablets',
 'strip of 15 Chewable Tablets',
 'bottle of 20 Effervescent Tablet',
 'bottle of 60 capsules',
 'strip of 20 Chewable Tablets',
 'strip of 15 capsules',
 'strip of 20 Chewable Tablets',
 'combo pack of 60 Effervescent Tablet',
 'box of 60 Effervescent Tablet',
 'strip of 20 Chewable Tablets',
 'bottle of 60 tablets',
 'bottle of 30 gummies',
 'bottle of 60 tablets',
 'bottle of 60 capsules',
 'bottle of 60 tablets',
 'bottle of 20 Effervescent Tablet',
 'bottle of 120 tablets',
 'bottle of 60 Chewable Tablets',
 'bottle of 150 tablets',
 'bottle of 20 Effervescent Tablet',
 'jar of 500 gm Powder',
 'strip of 10 capsules',
 'strip of 15 Chewable Tablets',
 'packet of 200 ml Liquid',
 'strip of 15 Chewable Tablets',
 'box of 30 capsules',
 'strip of 10 tablets',
 'bottle of 30 gummies',
 'bottle of 100 tablets',
 'bottle of 20 Effervescent Tablet',
 'strip of 10 tablets',
 'strip of 

In [None]:
mrp = []

for price in bsobj.findAll('div', {'class': 'style__price-tag___cOxYc'}):
  mrp.append(price.text.replace('₹', '').replace('MRP','').strip())

mrp

['34',
 '349',
 '429',
 '99',
 '20',
 '312',
 '480',
 '102',
 '71',
 '100',
 '702',
 '878',
 '26',
 '270',
 '253',
 '599',
 '1474',
 '764',
 '269',
 '555',
 '209',
 '499',
 '300',
 '1696',
 '185',
 '56',
 '40',
 '109',
 '198',
 '1125',
 '223',
 '499',
 '315',
 '290',
 '101.2',
 '76',
 '223',
 '47',
 '241',
 '359']

In [None]:
d1 = {'pname': product_name, 'psize': pack_size, 'mrp':mrp}

In [None]:
import pandas as pd

df = pd.DataFrame(d1, columns=['pname', 'psize', 'mrp'])

In [None]:
df.head(5)

Unnamed: 0,pname,psize,mrp
0,New Celin 500 Tabletstrip of 25 tablets,strip of 25 tablets,34
1,Carbamide Forte Chelated Iron + Vitamin C + Fo...,bottle of 60 tablets,349
2,Purayati Vitamin C Vegetarian Tabletbottle of ...,bottle of 90 tablets,429
3,1mg Vitamin C+ Chewable Tablets Supports Immun...,box of 30 Chewable Tablets,99
4,Limcee Chewable Tablet Orangestrip of 15 Chewa...,strip of 15 Chewable Tablets,20
