
# Scraping  Selected Categories on netmeds.com using Python



![banner-image](https://i.imgur.com/MrX3wdW.png)



# Introduction:
netmeds.com is a one-stop healthcare shop containing list of product with different categories. There are exclusive discounts all year round. They also call themselves as `India Ki Pharmacy` where consumer can buy medicines and wellness products using online portal or app.

The page https://www.netmeds.com/non-prescriptions/diabetes-support provides a list of categories on netmeds.com. In this project, we will select category  and retrieve information from this page using _web scraping_  : the process of extracting information from a website in an automated fashion using code.

We'll use the Python library `requests` and `beautifulsoup4` to scrape data from this page.



# Project Outline:
Here's an outline of the steps we'll follow:
1. Define the webpage.
2. Download the webpage using `requests`.
3. Parse the HTML source code using `BeautifulSoup`.
4. Extract the list of category Names.
5. Extract the list of Sub-category(name and href).
6. Extract all Category and their Sub-category data.
7. Extract Product Name, BuyPrice, MRP and Discount from Sub-category.
8. Extract Current Page and Next Page Product Details (Product Name, BuyPrice, MRP and Discount) from Sub-category. Get the details till the page has next button.
9. Scrape the data by compiling Python lists and dictionaries. 
10. Save it into a CSV file and read the CSV file.


By the end of the project, 
1. We will create a CSV file named as Category name containing list of product details of every sub-category in the following format.

`Category Name,Sub Category Name,Product Name,Buy Price,MRP,product_discount`

`Diabetes Support,Diabetes Care - Ayurveda,Kapiva Karela Jamun Juice 1 ltr, 296.65, 349.00,15%`

## How to Run the Code

You can execute the code using the "Run" button at the top of this page and selecting 'Run on Binder'. You can make changes and save your own version of the notebook to [Jovian](https://www.jovian.ai)  by executing the following cells.

## Installation and set-up.

In [1]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

- The library is installed and imported

In [3]:
!pip install requests --upgrade --quiet
import requests
!pip install beautifulsoup4 --upgrade --quiet
from bs4 import BeautifulSoup
import pandas as pd
import os

# Define the webpage.

In [4]:
# ````````````````````````````MAIN URL TO DEFINE THE WEBSITE ````````````````````````````
## BASIC URL 
main_url = "https://www.netmeds.com/non-prescriptions/diabetes-support"    

# Download the webpage using `requests` &  Parse the HTML source code using `BeautifulSoup`.

- Let's define a function `get_BSoupdoc_fromUrl()` for parsing the data using BeautifulSoup.
1. To download the webpage, we use `requests.get` function to return a response object containing the data from the webpage.
2. The `.statuscode` propery can be used to check if the response was successful. A successful response will have the [HTTP status code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) between 200 to 299.
3. We are parsing the data using `BeautifulSoup`

In [5]:
#``````````````````````FUNCTION DEFINED FOR PARSING USING BEAUTIFULSOUP``````````````````````````````
def get_BSoupdoc_fromUrl(url):                                           
    response = requests.get(url)           # Download the webpage using request function
    if response.status_code != 200:        # Status check for 200
        print('Not available')
        raise Exception('Failed to fetch the data from {}'.format(category_link))
    return BeautifulSoup(response.text,'html.parser')  # beautiful soup document



 Check the type of data using `type()`

In [6]:
main_doc = get_BSoupdoc_fromUrl(main_url)

# Extract category Name and sub-category name.

## Extract the list of category Names.

![](https://i.imgur.com/3nIIFDd.png)

- Let's define a function `get_main_categoryList()` to get the list of categories.

1. Create a variable with blank list `catname_list = []`.
2. We have used for loop in `<'ul', class_ = 'cat-menu'> tag, <'li',recursive=False> tag and <"a", class_='cat-submenu'> tag` for getting the list category.
3. `catname = a.text` will give the category name in text format.
4. `catname_list.append(catname)` will add the category name to the list.
5. `return catname_list;` returns the value in a list out from the function. 

In [7]:
#``````````````````````FUNCTION DEFINED FOR GETTING CATEGORY LIST``````````````````````````````````
def get_main_categoryList(doc):
    catname_list = []
    categoryMenu = doc.find_all('ul', class_ = 'cat-menu') ## Get category list using find_all in ('ul', class_ = 'cat-menu')
    for menuList in categoryMenu:
        for category in menuList.find_all('li',recursive=False): ## Finding 'li' into list of list using recursive = False
            for a in category.find_all("a", class_='cat-submenu'): ## Finding Catergory Name using "a", class_='cat-submenu'
                catname = a.text
                catname_list.append(catname)
    return catname_list;# ["Ayurveda","Diabetic Support"]    ## Returning list of category 



In [8]:
category_list = get_main_categoryList(main_doc);

- We are getting list of category from variable `category_list`

OUTPUT: 
['Veterinary',
 'Ayush',
 'Fitness',
 'Mom & Baby',
 'Sexual Wellness',
 'Treatments',
 'Devices',
 'Health Conditions',
 'Otc Deals',
 'Eyewear',
 'Covid Essentials',
 'Surgical',
 'Diabetes Support',
 'Fragrances',
 'Make-Up',
 'Hair',
 "Men's Grooming",
 'Skin Care',
 'Tools & Appliances',
 'Wellness',
 'Personal Care']

## Extract the list of Sub-category(name and href).

![](https://i.imgur.com/g8YOZbl.png)

- Let's define a function `get_subcategory_of_category()` to get the list of subcategories (name and href)

1. Creating a blank list `subcat_list = []`
2. We have used for loop to get into `<'ul', class_ = 'cat-menu'> tag, <'li',recursive=False> tag and <"a", class_='cat-submenu'> tag` for getting the list of the category names.
3. `if a.text == categoryName:` will check for specific category names.
4. We have mentioned `<a tag> class_='cat-submenu-level1'` to get Subcategory details.
5. We have created dictionary as  `bv = dict()` to store `name and href` of Subcategory in list of dictionary.
6. `subcat_list.append(bv)` will add dictionary data in list.

OUTPUT format :- 
                {'name': 'SubcategoryName',
                  'href' : 'Subcategorylink'}

In [9]:
#`````````````````````````FUNCTION DEFINED FOR GETTING SUB-CATEGORY NAME & HREF`````````````````````
def get_subcategory_of_category(doc, category_name):
    subcat_list = []
    categoryMenu = doc.find_all('ul', class_ = 'cat-menu')        ##  Finding all data in menu using ('ul', class_ = 'cat-menu')
    for menuList in categoryMenu:
        for category in menuList.find_all('li',recursive=False):  ##  Finding 'li' into list of list using recursive = False
            for a in category.find_all("a", class_='cat-submenu'):  ##  Finding Catergory Name using "a", class_='cat-submenu'
                if a.text == category_name:
                    for subList in category.find_all("a", class_='cat-submenu-level1'): ## Finding Subcategory Name using "a", class_='cat-submenu-level1'
                        bv = dict()                             ## Create a dictionary
                        bv['name']=subList.text                 ## Getting Name in dictionary                              
                        bv['href']= subList["href"]             ## Getting href in dictionary    
                        subcat_list.append(bv)                  ## Compiling all data of dictionary in list
    return subcat_list;        # [dict({name:"",href:""})]





In [10]:
subcategory_list = get_subcategory_of_category(main_doc, 'Diabetes Support')


It is stored into a list of dictionary using variable `subcategory_list`

OUTPUT: 


[{'name': 'Diabetes Care - Ayurveda',
  'href': 'https://www.netmeds.com/non-prescriptions/diabetes-support/diabetes-care-ayurveda'},
 {'name': 'Glucometers',
  'href': 'https://www.netmeds.com/non-prescriptions/diabetes-support/glucometers'},
 {'name': 'Lancets & Test Strips',
  'href': 'https://www.netmeds.com/non-prescriptions/diabetes-support/lancets-test-strips'},
 {'name': 'Sugar Substitutes',
  'href': 'https://www.netmeds.com/non-prescriptions/diabetes-support/sugar-substitutes'},
 {'name': 'Diabetes Management Supplements',
  'href': 'https://www.netmeds.com/non-prescriptions/diabetes-support/diabetes-management-supplements'}]

## Extract all Category and their Sub-category data.


- Let's define a function `get_all_catandsubcat_data()` to get the list of all subcategories and categories.

1. `subcategory_list = dict()` will create dictionary.
2. `data` is retrived from `data = get_subcategory_of_category(doc,category)` which will give list of dictionary of subcategory(name & href).
3.  `subCategoryList[category]=data` mentions category as key.
4. `return subCategoryList;` will returns values of category as key and values as list of dictionary of subcategory(containing name and href).

OUTPUT format :- 
 
{CategoryName_1 : [{'name_1': 'SubcategoryName_1', 'href_1' : 'Subcategorylink'}, 
    {'name_2': 'SubcategoryName_2', 'href_2' : 'Subcategorylink'}, 
    {'name_3': 'SubcategoryName_3', 'href_3' : 'Subcategorylink'},....................................................
    CategoryName_N : [{'name_1': 'SubcategoryName_1', 'href_1' : 'Subcategorylink'}, 
        {'name_2': 'SubcategoryName_2', 'href_2' : 'Subcategorylink'}, 
        {'name_3': 'SubcategoryName_3', 'href_3' : 'Subcategorylink }]

In [11]:
#```````````````````````````` FUNCTION DEFINED FOR GETTING ALL SUB-CATEGORY & CATEGORY DATA``````````````````````
def get_all_catandsubcat_data(doc, category_list):
    subcategory_list = dict()                             ## Creating dictionary
    for category in category_list:
        data = get_subcategory_of_category(doc,category)    ## Calling getSubCategoryOfCategory function here
        subcategory_list[category]=data                   ## Assingning key to category. Assigning values to Subcategory(name,href) as 
    return subcategory_list;

In [12]:
get_all_catandsubcat = get_all_catandsubcat_data(main_doc, category_list)


OUTPUT:- - The output will be in dictionary containing Category as key and a list of dictionary with name as key and href as values


{'Veterinary': [{'name': 'Petcare',
   'href': 'https://www.netmeds.com/non-prescriptions/veterinary/petcare'},
  {'name': 'Farm Animals',
   'href': 'https://www.netmeds.com/non-prescriptions/veterinary/farm-animals'},
  {'name': 'Poultry',
   'href': 'https://www.netmeds.com/non-prescriptions/veterinary/poultry'},
  {'name': 'Aquaculture',
   'href': 'https://www.netmeds.com/non-prescriptions/veterinary/aquaculture'}],
 'Ayush': [{'name': 'Homeopathy',
   'href': 'https://www.netmeds.com/non-prescriptions/ayush/homeopathy'},
  {'name': 'Ayurvedic',
   'href': 'https://www.netmeds.com/non-prescriptions/ayush/ayurvedic'},
  {'name': 'Unani',
   'href': 'https://www.netmeds.com/non-prescriptions/ayush/unani'},
  {'name': 'Siddha',
   'href': 'https://www.netmeds.com/non-prescriptions/ayush/siddha'}],
 'Fitness': [{'name': 'Vitamins And Supplements',
   'href': 'https://www.netmeds.com/non-prescriptions/fitness/vitamins-and-supplements'},
  {'name': 'Family Nutrition',
   'href': 'https://www.netmeds.com/non-prescriptions/fitness/family-nutrition'},
  {'name': 'Health Food And Drinks',
   'href': 'https://www.netmeds.com/non-prescriptions/fitness/health-food-and-drinks'},
  {'name': 'Ayurvedic Supplements',
   'href': 'https://www.netmeds.com/non-prescriptions/fitness/ayurvedic-supplements'},
  {'name': 'Sports Supplements',
   'href': 'https://www.netmeds.com/non-prescriptions/fitness/sports-supplements'},
  {'name': 'Smoking Cessation Support',
   'href': 'https://www.netmeds.com/non-prescriptions/fitness/smoking-cessation-support'},
  {'name': 'Weight Management',
   'href': 'https://www.netmeds.com/non-prescriptions/fitness/weight-management'}],
 'Mom & Baby': [{'name': 'Baby Care',
   'href': 'https://www.netmeds.com/non-prescriptions/mom-baby/baby-care'},
  {'name': 'Feminine Hygiene',
   'href': 'https://www.netmeds.com/non-prescriptions/mom-baby/feminine-hygiene'},
  {'name': 'Maternity Care',
   'href': 'https://www.netmeds.com/non-prescriptions/mom-baby/maternity-care'},
  {'name': 'Toys & Games',
   'href': 'https://www.netmeds.com/non-prescriptions/mom-baby/toys-games'},
  {'name': 'Baby Bath Time',
   'href': 'https://www.netmeds.com/non-prescriptions/mom-baby/baby-bath-time'},
  {'name': 'Maternity Accessories',
   'href': 'https://www.netmeds.com/non-prescriptions/mom-baby/maternity-accessories'}],
 'Sexual Wellness': [{'name': 'Lubricants',
   'href': 'https://www.netmeds.com/non-prescriptions/sexual-wellness/lubricants'},
  {'name': 'Massagers/Vibrators',
   'href': 'https://www.netmeds.com/non-prescriptions/sexual-wellness/massagers-vibrators'},
  {'name': 'Sprays/Gels',
   'href': 'https://www.netmeds.com/non-prescriptions/sexual-wellness/sprays-gels'},
  {'name': 'Condoms',
   'href': 'https://www.netmeds.com/non-prescriptions/sexual-wellness/condoms'},
  {'name': 'Sexual Health Supplements',
   'href': 'https://www.netmeds.com/non-prescriptions/sexual-wellness/sexual-health-supplements'}],
 'Treatments': [{'name': 'Diabetes Care',
   'href': 'https://www.netmeds.com/non-prescriptions/treatments/diabetes-care'},
  {'name': 'First Aid',
   'href': 'https://www.netmeds.com/non-prescriptions/treatments/first-aid'},
  {'name': 'Pain Relief Application',
   'href': 'https://www.netmeds.com/non-prescriptions/treatments/pain-relief-application'},
  {'name': 'Usual Symptoms',
   'href': 'https://www.netmeds.com/non-prescriptions/treatments/usual-symptoms'},
  {'name': 'General Discomfort',
   'href': 'https://www.netmeds.com/non-prescriptions/treatments/general-discomfort'},
  {'name': 'Cough & Cold',
   'href': 'https://www.netmeds.com/non-prescriptions/treatments/cough-cold'},
  {'name': 'General Health Supplements',
   'href': 'https://www.netmeds.com/non-prescriptions/treatments/general-health-supplements'},
  {'name': 'Smoking Cessation (T)',
   'href': 'https://www.netmeds.com/non-prescriptions/treatments/smoking-cessation-t'},
  {'name': 'Skin Treatment',
   'href': 'https://www.netmeds.com/non-prescriptions/treatments/skin-treatment'}],
 'Devices': [{'name': 'Orthopaedics',
   'href': 'https://www.netmeds.com/non-prescriptions/devices/orthopaedics'},
  {'name': 'Breathe Easy',
   'href': 'https://www.netmeds.com/non-prescriptions/devices/breathe-easy'},
  {'name': 'Measurements',
   'href': 'https://www.netmeds.com/non-prescriptions/devices/measurements'},
  {'name': 'Surgical Accessories',
   'href': 'https://www.netmeds.com/non-prescriptions/devices/surgical-accessories'}],
 'Health Conditions': [{'name': 'Bone And Joint Pain',
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/bone-and-joint-pain'},
  {'name': 'Liver Care',
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/liver-care'},
  {'name': 'Stomach Care',
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/stomach-care'},
  {'name': 'Diabetic Care',
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/diabetic-care'},
  {'name': 'Lung Care',
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/lung-care'},
  {'name': 'Piles Care',
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/piles-care'},
  {'name': 'Mental Care',
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/mental-care'},
  {'name': 'Cold And Fever',
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/cold-and-fever'},
  {'name': "Women's Care",
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/women-s-care'},
  {'name': 'Weight Care',
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/weight-care'},
  {'name': 'De-Addiction',
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/de-addiction'},
  {'name': 'Cardiac Care',
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/cardiac-care'},
  {'name': 'Immunity Care',
   'href': 'https://www.netmeds.com/non-prescriptions/health-conditions/immunity-care'}],
 'Otc Deals': [{'name': "Valentine's day offers",
   'href': 'https://www.netmeds.com/non-prescriptions/otc-deals/valentines-day-offers'},
  {'name': 'Healthvit cenvitan deals',
   'href': 'https://www.netmeds.com/non-prescriptions/healthvit-cenvitan-deals'},
  {'name': "Women's Day Special Offers",
   'href': 'https://www.netmeds.com/non-prescriptions/otc-deals/womens-day-special-offers'},
  {'name': 'Healthvit deals',
   'href': 'https://www.netmeds.com/non-prescriptions/otc-deals/healthvit-deals'},
  {'name': 'Physiogel',
   'href': 'https://www.netmeds.com/non-prescriptions/otc-deals/physiogel'}],
 'Eyewear': [{'name': 'Contact Lenses (EW)',
   'href': 'https://www.netmeds.com/non-prescriptions/eyewear/contact-lenses-ew'},
  {'name': 'Eye Glasses',
   'href': 'https://www.netmeds.com/non-prescriptions/eyewear/eye-glasses'},
  {'name': 'Reading Glasses',
   'href': 'https://www.netmeds.com/non-prescriptions/eyewear/reading-glasses'},
  {'name': 'Sunglasses',
   'href': 'https://www.netmeds.com/non-prescriptions/eyewear/sunglasses'},
  {'name': 'Computer Glasses',
   'href': 'https://www.netmeds.com/non-prescriptions/eyewear/computer-glasses'}],
 'Covid Essentials': [{'name': 'Personal & Home Essentials',
   'href': 'https://www.netmeds.com/non-prescriptions/covid-essentials/personal-home-essentials'},
  {'name': 'Mask, Gloves & Protective Equipment',
   'href': 'https://www.netmeds.com/non-prescriptions/covid-essentials/mask-gloves-protective-equipment'},
  {'name': 'Immunity Booster',
   'href': 'https://www.netmeds.com/non-prescriptions/covid-essentials/immunity-booster'},
  {'name': 'Business Essentials',
   'href': 'https://www.netmeds.com/non-prescriptions/covid-essentials/business-essentials'},
  {'name': 'Travel Essentials',
   'href': 'https://www.netmeds.com/non-prescriptions/covid-essentials/travel-essentials'},
  {'name': 'Oxygen Can',
   'href': 'https://www.netmeds.com/non-prescriptions/covid-essentials/oxygen-can'}],
 'Surgical': [{'name': 'Dressing',
   'href': 'https://www.netmeds.com/non-prescriptions/surgical/dressing'},
  {'name': 'Gi Care',
   'href': 'https://www.netmeds.com/non-prescriptions/surgical/gi-care'},
  {'name': 'Iv Infusion',
   'href': 'https://www.netmeds.com/non-prescriptions/surgical/iv-infusion'},
  {'name': 'Respiratory Supplies',
   'href': 'https://www.netmeds.com/non-prescriptions/surgical/respiratory-supplies'},
  {'name': 'Surgical Consumables',
   'href': 'https://www.netmeds.com/non-prescriptions/surgical/surgical-consumables'},
  {'name': 'Surgical Instrument',
   'href': 'https://www.netmeds.com/non-prescriptions/surgical/surgical-instrument'},
  {'name': 'Urinary Care',
   'href': 'https://www.netmeds.com/non-prescriptions/surgical/urinary-care'},
  {'name': 'Wound Treatment',
   'href': 'https://www.netmeds.com/non-prescriptions/surgical/wound-treatment'}],
 'Diabetes Support': [{'name': 'Diabetes Care - Ayurveda',
   'href': 'https://www.netmeds.com/non-prescriptions/diabetes-support/diabetes-care-ayurveda'},
  {'name': 'Glucometers',
   'href': 'https://www.netmeds.com/non-prescriptions/diabetes-support/glucometers'},
  {'name': 'Lancets & Test Strips',
   'href': 'https://www.netmeds.com/non-prescriptions/diabetes-support/lancets-test-strips'},
  {'name': 'Sugar Substitutes',
   'href': 'https://www.netmeds.com/non-prescriptions/diabetes-support/sugar-substitutes'},
  {'name': 'Diabetes Management Supplements',
   'href': 'https://www.netmeds.com/non-prescriptions/diabetes-support/diabetes-management-supplements'}],
 'Fragrances': [{'name': 'Men',
   'href': 'https://www.netmeds.com/non-prescriptions/fragrances/men'},
  {'name': 'Unisex',
   'href': 'https://www.netmeds.com/non-prescriptions/fragrances/unisex'},
  {'name': 'Women',
   'href': 'https://www.netmeds.com/non-prescriptions/fragrances/women'}],
 'Make-Up': [{'name': 'Lips',
   'href': 'https://www.netmeds.com/non-prescriptions/make-up/lips'},
  {'name': 'Eyes',
   'href': 'https://www.netmeds.com/non-prescriptions/make-up/eyes'},
  {'name': 'Nails',
   'href': 'https://www.netmeds.com/non-prescriptions/make-up/nails'},
  {'name': 'Face Makeup',
   'href': 'https://www.netmeds.com/non-prescriptions/make-up/face-makeup'},
  {'name': 'Make-Up Tools & Brushes',
   'href': 'https://www.netmeds.com/non-prescriptions/make-up/make-up-tools-brushes'}],
 'Hair': [{'name': 'Hair Styling',
   'href': 'https://www.netmeds.com/non-prescriptions/hair/hair-styling'},
  {'name': 'Hair Color',
   'href': 'https://www.netmeds.com/non-prescriptions/hair/hair-color'},
  {'name': 'Scalp Treatments',
   'href': 'https://www.netmeds.com/non-prescriptions/hair/scalp-treatments'},
  {'name': 'Shop By Hair Type',
   'href': 'https://www.netmeds.com/non-prescriptions/hair/shop-by-hair-type'},
  {'name': 'Hair Care',
   'href': 'https://www.netmeds.com/non-prescriptions/hair/hair-care'},
  {'name': 'Hair Tools & Accessories',
   'href': 'https://www.netmeds.com/non-prescriptions/hair/hair-tools-accessories'}],
 "Men's Grooming": [{'name': 'Shaving',
   'href': 'https://www.netmeds.com/non-prescriptions/men-s-grooming/shaving'},
  {'name': 'Beard Care',
   'href': 'https://www.netmeds.com/non-prescriptions/men-s-grooming/beard-care'}],
 'Skin Care': [{'name': 'Cleansers',
   'href': 'https://www.netmeds.com/non-prescriptions/skin-care/cleansers'},
  {'name': 'Masks',
   'href': 'https://www.netmeds.com/non-prescriptions/skin-care/masks'},
  {'name': 'Moisturizers',
   'href': 'https://www.netmeds.com/non-prescriptions/skin-care/moisturizers'},
  {'name': 'Sunscreen',
   'href': 'https://www.netmeds.com/non-prescriptions/skin-care/sunscreen'},
  {'name': 'Eye Care',
   'href': 'https://www.netmeds.com/non-prescriptions/skin-care/eye-care'},
  {'name': 'Toners & Serums',
   'href': 'https://www.netmeds.com/non-prescriptions/skin-care/toners-serums'},
  {'name': 'Aromatherapy',
   'href': 'https://www.netmeds.com/non-prescriptions/skin-care/aromatherapy'},
  {'name': 'Face Skin Care',
   'href': 'https://www.netmeds.com/non-prescriptions/skin-care/face-skin-care'}],
 'Tools & Appliances': [{'name': 'Hair Styling Tools',
   'href': 'https://www.netmeds.com/non-prescriptions/tools-appliances/hair-styling-tools'},
  {'name': 'Face/Skin Tools',
   'href': 'https://www.netmeds.com/non-prescriptions/tools-appliances/face-skin-tools'},
  {'name': 'Massage Tools',
   'href': 'https://www.netmeds.com/non-prescriptions/tools-appliances/massage-tools'}],
 'Wellness': [{'name': 'Health - Supplements',
   'href': 'https://www.netmeds.com/non-prescriptions/wellness/health-supplements'},
  {'name': 'Weight - Management',
   'href': 'https://www.netmeds.com/non-prescriptions/wellness/weight-management'}],
 'Personal Care': [{'name': 'Bathing Accessories',
   'href': 'https://www.netmeds.com/non-prescriptions/personal-care/bathing-accessories'},
  {'name': 'Face Personal Care',
   'href': 'https://www.netmeds.com/non-prescriptions/personal-care/face-personal-care'},
  {'name': 'Body Care',
   'href': 'https://www.netmeds.com/non-prescriptions/personal-care/body-care'},
  {'name': 'Senior Care',
   'href': 'https://www.netmeds.com/non-prescriptions/personal-care/senior-care'},
  {'name': 'Lip Care',
   'href': 'https://www.netmeds.com/non-prescriptions/personal-care/lip-care'},
  {'name': 'Oral Care',
   'href': 'https://www.netmeds.com/non-prescriptions/personal-care/oral-care'},
  {'name': 'Bath & Shower',
   'href': 'https://www.netmeds.com/non-prescriptions/personal-care/bath-shower'},
  {'name': 'Hands & Feet',
   'href': 'https://www.netmeds.com/non-prescriptions/personal-care/hands-feet'},
  {'name': 'Home & Health',
   'href': 'https://www.netmeds.com/non-prescriptions/personal-care/home-health'},
  {'name': 'Personal Care Tools & Accessories',
   'href': 'https://www.netmeds.com/non-prescriptions/personal-care/personal-care-tools-accessories'},
  {'name': 'Eye Care Lens',
   'href': 'https://www.netmeds.com/non-prescriptions/personal-care/eye-care-lens'}]}

# Extract Product Details and Get Page Wise Product Details till it has next button.

## Extract Product Name, BuyPrice, MRP and Discount from Sub-category

- Lets define a function `get_productlist_of_subcategory()` for getting Product Name, BuyPrice, MRP and Discount.

1. `product_details` will contain html of subcategory page.
2. `productlist_detail` will create a empty list.
3. Finding productName using `.find('span', class_ = "clsgetname")`


![](https://i.imgur.com/Qzlt4sL.png)



4. Finding BuyPrice using `.find('span', id ="final_price")`


![](https://i.imgur.com/kWX70Y9.png)



5. Finding MRP using `.find('strike', id = "price")`


![](https://i.imgur.com/FvOwTwy.png)



6. Finding Discount using `.find('span', {'class':"save-badge"})`


![](https://i.imgur.com/XzdLgcp.png)



 

### Lets define a function `create_dict` to add product details in dictionary. 
1. Create an empty dictionary as `cd = dict()`.
2. `cd["Category Name"]= categoryName  & cd["Sub Category Name"]= subCategoryName` are just mentioning names to the dictionary.
3. `cd["Product Name"], cd["Buy Price"], cd["MRP"] & cd["product_discount"]` are storing subcategory product details in `cd dictionary`.
4.`productlist_details.append(bv)` adding data into list of dictionary.

In [13]:
#`````````````````````````` FUNCTION DEFINFED FOR CREATING A DICTIONARY  ``````````````````````````
def create_dict(category_name, subcategory_name, product_name, product_buyprice, product_mrp, product_discount):
    cd=dict()                                        ## Creating Dictionary
    cd["Category Name"]= category_name               ## Category Name     
    cd["Sub Category Name"]= subcategory_name        ## Sub-category Name
    cd["Product Name"]= product_name.text            ## Product Name   
    if product_buyprice == None:
        cd["Buy Price"] = "00.00"                    ## BuyPrice is not mentioned, then put as 00
    else:
        cd["Buy Price"]= product_buyprice.text[4:]   ## Getting values after Rs. using [4:] i.e only numbers
    if product_mrp == None:                          ## If MRP is not mentioned, then put as 00
        cd["MRP"] = "00.00"
    else:
        cd["MRP"]= product_mrp.text[3:]              ## Getting values after Rs. using [3:] i.e only numbers
    if product_discount == None:                     ## If Discount is not mentioned, then put as 00
        cd["product_discount"] = "00.00"      
    else:
        cd["product_discount"]= product_discount.text[:3]    ## Getting values after Rs. using [:3] i.e only numbers
    return cd;                                                ## Returning dictionary values

In [14]:
#``````````````````````` FUNCTION DEFINED FOR GETTING PRODUCT DETAILS (ProductName, BuyPrice, MRP, Discount) FROM SUB-CATEGORY``````````````````````
def get_productlist_of_subcategory(subcategory_doc, category_name, subcategory_name):
    #-----------------------------------------------------------------------------------#
      # SEARCHING TAG TO GET PRODUCT DETAILS (ProductName, BuyPrice, MRP, Discount) #
    product_details = subcategory_doc.find_all('div', class_ ='cat-item')       ## Getting all list of categories using find_all in 'div', class_ ='cat-item'
    productlist_details = []                                                   ## Create an empty list
    for product_detail in product_details:
        product_name = product_detail.find('span', class_ = "clsgetname")      ## Getting Product Name mentioned point 3
        product_buyprice = product_detail.find('span', id ="final_price")      ## Getting Product Buy Price mentioned point 4
        product_mrp = product_detail.find('strike', id = "price")              ## Getting Product MRP mentioned point 5
        product_discount = product_detail.find('span', class_ = "save-badge") ## Getting Product Discount mentioned point 6
   #--------------------------------------------------------------------------------------#    
                 # CREATING DICTIONARY FUNCTION to add product details
        bv = create_dict(category_name, subcategory_name, product_name, product_buyprice, product_mrp, product_discount)
    #-------------------------------------------------------------------------------------#
                            # Appending DICTIONARY in list
        productlist_details.append(bv)                               ## Adding dictionary data in list.
    return productlist_details                                                  # [dict({name:"",mrp:""})]



## Extract Current Page and Next Page Product Details (Product Name, BuyPrice, MRP and Discount) from Sub-category. Get the product details till the time page has next button.

- Lets define a function `get_pagination_wise_subcategory_data()` for getting current and next page Product detail's (Product Name, BuyPrice, MRP and Discount) from sub-category.

![](https://i.imgur.com/xt19xlW.png)

1. Find Next Button in Subcategory product list page using `.find("li",class_="next")`
2. Check for Next button using `nextButton = subDoc.find("li",class_="next")` if Yes, go to step3 or else `return rawData;`
3. Find the `<a> tag` using `nextATag = nextButton.find("a")`
4. Find the next page URL using `nextPageUrl = nextATag['href']`
5. Parse the next page data using Beautiful Soup `subDoc1 = getBSoupDocFromUrl(nextPageUrl)`
6. `return get_PaginationWise_SubCategoryData(subDoc1,categoryName,subCategoryName,rawData)` will repeat steps from 1 to 5 until there is no Next button present.
7. Once all data achieved then return rawData of all pages.

In [15]:
#``````````````````````` FUNCTION DEFINED FOR GETTING NEXT PAGE Product Details(ProductName, BuyPrice, MRP, Discount) FROM SUB-CATEGORY``````````````````````

# rawdata is none at start of page
def get_pagination_wise_subcategory_data(subdoc, category_name, subcategory_name, rawdata):
    rawdata =rawdata+ get_productlist_of_subcategory(subdoc, category_name, subcategory_name) # cascading rawData to first page of product details
    nextButton = subdoc.find("li",class_="next") # Check for Next Button from <li and class="next"> tag
    if nextButton != None:                       # if there is no Next button, then return rawData out of function
        nextATag = nextButton.find("a")          # if nextButton present then find <a> tag
        if nextATag !=None:
            nextPageUrl = nextATag['href']               # get next Page URL 
            subdoc1 = get_BSoupdoc_fromUrl(nextPageUrl)    # parse the next page using Beautiful Soup.
            return get_pagination_wise_subcategory_data(subdoc1,category_name,subcategory_name,rawdata)
    return rawdata;
    

# Scrape the data by compiling Python lists and dictionaries.

- Lets define a function `scrape_single_category()` to compile python list and dictionaries.

1. We will select specific category and their list of subcategory as `diabeticSuport= subCatgoryDictList[categoryName]`.
2. `categoryData = []` creates an empty list.
3. `subDoc = getBSoupDocFromUrl(subCategory['href'])` will parse specific link of sub-category data.
4. `data = getProductListOfSubCategory` will give the product list of each sub-category (product Name, Buy Price, MRP, Discount)


In [16]:
#````````````````````````` Defining Scraping function for single category ```````````````````````
def scrape_single_category(subcatgory_dictlist, category_name): 
    diabetic_support= subcatgory_dictlist[category_name]       ## Selecting specific Category and their subcategory list
    category_data = []
    for subcategory in diabetic_support:
        subdoc = get_BSoupdoc_fromUrl(subcategory['href'])    ## Using subCategory['href'] will give link of specific subcategory  
        data = get_pagination_wise_subcategory_data(subdoc,category_name,subcategory["name"],[])  
        ## 'data' will give product details of all page till it shows next button using function(get_PaginationWise_SubCategoryData)
        ## subCategory["name"] will specify only subcategory name
        if len(data) >0:                                              ## Check if it is empty, if no then pass the data
            category_data = category_data + data             ## Adding data in list using '+'

    return category_data;

# Defining a `main` function for puting it all together.

- Creating a blank list using `finalData = []`
- We have a function `get_BSoupdoc_fromUrl` that parses html data using Beautiful Soup.
- We have a function `get_main_categoryList` that gives the category list.
- We have a function `get_all_catandsubcat_data` that gives all subcategory data.
- We have a function `scrape_single_category` to scrape the selected category and get sub-category product details (Product Name, BuyPrice, MRP, Discount) and save the data  as categoryName.csv 
- Please note that folder name will be named as categoryName.csv. And it will contain product details (Product Name, BuyPrice, MRP, Discount) for every subcategory of the selected category. 

In [17]:
def main(category_name):                    ## Defining Main function
    finalData = []
    main_doc = get_BSoupdoc_fromUrl(main_url);                                  ## Parsing data using BeautifulSoup
    category_list = get_main_categoryList(main_doc);                            ## Getting Category List
    subcatgory_dictlist = get_all_catandsubcat_data(main_doc,category_list);       ## Getting all subcategory data
    data = scrape_single_category(subcatgory_dictlist,category_name)          ## scraping the data
    subcat_data_df = pd.DataFrame(data)
    subcat_data_df.to_csv(category_name + '.csv', index = None);

# Lets run it to scrape the sub-category product details by mentioning category name.

Category Names: [  'Veterinary', 'Ayush', 'Fitness', 'Mom & Baby', 'Sexual Wellness', 'Treatments', 'Devices', 'Health Conditions', 'Otc Deals', 'Eyewear', 'Covid Essentials', 'Surgical', 'Diabetes Support', 'Fragrances', 'Make-Up', 'Hair', "Men's Grooming", 'Skin Care', 'Tools & Appliances', 'Wellness', 'Personal Care'  ]

Input:
data = main(Enter one Category Name)

In [18]:
data = main("Diabetes Support")


#### OUTPUT :-

By running `data = main("Diabetes Support")`

##### It will be executed in following steps:

1. Creating a CSV file as categoryName.csv

Eg: Created `Diabetes Support.csv` as file.
![](https://i.imgur.com/gtAvvDB.png)


2. Product details of categoryName.csv file

We have selected 'Diabetic Support' as Category Name. 
Till it contains next button it will get all pages sub-category product details for selected Category Name. The CSV file details are in following format:

Category Name, Subcategory Name, Product Name, Buy Price, MRP, Product_Disount.


![](https://i.imgur.com/c0BxrL7.png)

## Check CSV file.

- Diabetic Support category has following sub-category name:

1. Diabetic Care- Ayurveda has no next button on webpage so scraped page 1 product details.
2. Glucometers has no next button on webpage so scraped page 1 product details.
3. Lancets & Test Strips has next button on webpage so scrapped page(1 to 2) product details.
4. Sugar Substitutes has next button on webpage so scrapped page(1 to 3) product details.
5. Diabetes Management Supplements has next button on webpage so scrapped page(1 to 2) product details.

In [19]:
pd.read_csv('Diabetes Support.csv')

Unnamed: 0,Category Name,Sub Category Name,Product Name,Buy Price,MRP,product_discount
0,Diabetes Support,Diabetes Care - Ayurveda,Kapiva Karela Jamun Juice 1 ltr,296.65,349.00,15%
1,Diabetes Support,Diabetes Care - Ayurveda,Kapiva Dia Free Juice 1 ltr,521.55,549.00,5%
2,Diabetes Support,Diabetes Care - Ayurveda,Kapiva Moringa Capsule 60's,175.00,250.00,30%
3,Diabetes Support,Diabetes Care - Ayurveda,Kapiva Madhu Tula Green Tea Bag 20's,359.20,449.00,20%
4,Diabetes Support,Diabetes Care - Ayurveda,Kapiva Dia Free Capsules 60's,299.25,399.00,25%
...,...,...,...,...,...,...
115,Diabetes Support,Diabetes Management Supplements,Ensure Diabetes Care Powder - Chocolate Flavou...,710.00,00.00,00.00
116,Diabetes Support,Diabetes Management Supplements,Evexia D Manage Vanilla Flavour Powder 200 gm,226.20,290.00,22%
117,Diabetes Support,Diabetes Management Supplements,Nestle Resource Diabetic Powder - Vanilla Flav...,356.25,375.00,5%
118,Diabetes Support,Diabetes Management Supplements,Nestle Resource Diabetic Powder Chocolate Flav...,799.00,00.00,00.00


# Saving the project on jovian

In [20]:
# Execute this to save new versions of the notebook
jovian.commit(project="finalscrape-netmeds-com-project")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "rahulhande2780/finalscrape-netmeds-com-project" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/rahulhande2780/finalscrape-netmeds-com-project[0m


'https://jovian.ai/rahulhande2780/finalscrape-netmeds-com-project'

## Saving the project and uploading additional file on 
https://jovian.ai/rahulhande2780/finalscrape-netmeds-com-project

In [None]:
jovian.commit(files=['Diabetes Support.csv'])

<IPython.core.display.Javascript object>

# Summary

1. We had define the webpage as mainUrl = "https://www.netmeds.com/non-prescriptions/diabetes-support".
2. We had define a function `getBSoupDocFromUrl` which download the webpage using `requests` and parse the data using `Beautiful Soup`
3. We had define a function `getMainCategoryList` to get the list of categories Names.
4. We had define a function `getSubCategoryOfCategory` to get the list of subcategories (name and href)
5. We had define a function `getAllSubCategoryData` to get the list of all subcategories and categories.
6. We had define a function `getProductListOfSubCategory` for getting Product Name, BuyPrice, MRP, Discount and create dictionary `create_dict` to add product details in dictionary.
7. We had define a function `get_PaginationWise_SubCategoryData` for getting current and next page product details till it has next button.
7. We had define a function `scrape_Single_category` to compile python list and dictionary and save it into CSV file.
8. We had define a function `main`. By mentioning Category Name in `main` function, we will retrive categoryName.CSV data which contains the list of product details for every sub-category of the selected category.

#  References 
- References of links that we found useful:
1. https://jovian.ai/learn/zero-to-data-analyst-bootcamp/assignment/project-web-scraping-with-python
2. https://realpython.com/python-requests/
3. https://www.geeksforgeeks.org/implementing-web-scraping-python-beautiful-soup/
2. https://stackoverflow.com/questions/55507926/create-folders-dynamically-and-write-csv-files-to-that-folders
3. https://stackoverflow.com/questions/1720421/how-do-i-concatenate-two-lists-in-python


# Future Work
- Ideas for future work:
1. We can fetch product image, product Rating and reviews, Mkt, Country of Origin
2. We can fetch the medicine data from https://www.netmeds.com/prescriptions
3. We can fetch the healthcare official blogs from https://www.netmeds.com/health-library
4. We can fetch wellness category that are containing list of products.