# 1. Importing Required Libraries
### Selenium is used for automating browser interaction (e.g., selecting dropdowns, clicking buttons).
### BeautifulSoup is used for parsing the HTML page to extract specific data.
### Pandas will store the extracted data in a tabular format (DataFrame).
### time.sleep is used to wait between actions so the website loads properly before the next step.

In [1]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import time
from bs4 import BeautifulSoup
import pandas as pd


# 2. Setting up the Chrome Driver
### driver_path: path to your downloaded ChromeDriver executable.
### Service and Options are used to configure and initialize the Chrome browser in automation mode.

In [2]:
driver_path = "C:/Users/Asus/Data Science/Web Scrapping/chromedriver.exe"
service = Service(driver_path)
options = Options()

# 3. Launching the Browser and Navigating to FCI Website
### Opens the Chrome browser and navigates to the Food Corporation of India (FCI) homepage.

In [3]:
driver = webdriver.Chrome(service=service, options=options)
driver.get("https://fci.gov.in/")

# 4. Clicking the “Translate” Button to Enable English UI
* Finds the “Translate” button using an XPath and clicks it.
* time.sleep(3) waits 3 seconds to ensure that the page reloads in English.

In [4]:
translate_button = driver.find_element(By.XPATH,'//*[@id="offcanvasSetting"]/div[2]/ul/li[3]/div/a')
translate_button.click()
time.sleep(2)

# 5. Defining All Zone and Region Pairs
* This list contains all combinations of Zones and Regions you want to scrape.
* Each tuple (zone, region) represents one selection to make and scrape.

In [5]:
# Defining list of (zone, region) pairs
zone_region_pairs = [                       # Add more (zone, region) pairs as needed
    ("East Zone", "Bihar Region"),("East Zone", "Jharkhand Region"),("East Zone", "Odisha Region"), ("East Zone", "WB Region"), 
    ("North East Zone", "Arunachal Region"),("North East Zone", "Assam Region"),("North East Zone", "Manipur Region"),
    ("North East Zone", "Nagaland Region"),("North East Zone", "NEF Region"),
    ("North Zone","Delhi Region" ), ("North Zone", "Haryana Region"), ("North Zone","HP Region"), ("North Zone","J&K Region"),
    ("North Zone", "Punjab Region"),("North Zone","Rajasthan Region"),("North Zone","UP Region"),("North Zone","Uttarakhand Region"),
    ("South Zone","A P Region" ),("South Zone","Karnataka Region" ),("South Zone","Kerala Region" ),("South Zone","Tamil Nadu Region" ),
    ("South Zone","Telangana Region" ),
    ("West Zone","Chhattisgarh Region" ),("West Zone","Gujarat Region" ),("West Zone","Maharastra Region" ),("West Zone", "MP Region")
]

# 6. Creating a List to Store Extracted Data
* An empty list where each region's scraped data (as a dictionary) will be stored.

In [6]:
all_data = []

# 7. Looping Through Each Zone-Region Combination
* For each pair, the code will:
1. Select the zone
2. Select the region
3. Scrape the data
# 8. Selecting the Zone from Dropdown
* Finds the dropdown with id="zone", wraps it in a Select object, and selects the appropriate zone using its visible text.
* Waits 3 seconds for the dependent Region dropdown to update.
# 9. Selecting the Region from Dropdown
* Same as above, but for the Region dropdown that updates based on the selected zone.
# 10. Getting the Currently Selected Zone and Region
* Reads back the selected dropdown text to include in the final output.
# 11. Getting the HTML Content of the Page
* Gets the current HTML source from the browser and parses it using BeautifulSoup.
# 12. Extracting the Required Data
1. titles: These are labels like "Total No. Of Depots", "Total Stock", etc.
2. values: These are corresponding numbers like 48, "4,69,494 MT", etc.
3. .strip() removes unnecessary whitespace.
# 13. Combining Titles and Values into a Row
* If the number of titles and values match:
1. Create a row dictionary.
2. Assign the values to their titles as keys.
3. Add it to all_data.
# 14. Handling Mismatch (Just in Case)
* Prints a warning if the number of titles and values doesn't match (i.e., something went wrong in loading or parsing).
# 15. Creating a DataFrame and Exporting to CSV
* Converts the list of dictionaries all_data into a pandas DataFrame
* Saves the DataFrame to a CSV file in your current directory.

In [7]:
# List to collect all data
all_data = []

# Loop through zone-region combinations
for zone_name, region_name in zone_region_pairs:
    # Select zone
    zone_dropdown = Select(driver.find_element(By.ID, 'zone'))
    zone_dropdown.select_by_visible_text(zone_name)
    time.sleep(3)

    # Select region
    region_dropdown = Select(driver.find_element(By.ID, 'region'))
    region_dropdown.select_by_visible_text(region_name)
    time.sleep(3)

    # Extract selected zone and region
    selected_zone = zone_dropdown.first_selected_option.text
    selected_region = region_dropdown.first_selected_option.text

    # Parse page
    soup = BeautifulSoup(driver.page_source, 'html.parser')

    # Extract data
    titles = [d.text.strip() for d in soup.find_all("p", class_="result-text mb-0 screen-reader font-adjust")]
    values = [d.text.strip() for d in soup.find_all("p", class_="result-no mt-0 mb-1 screen-reader font-adjust")]

    # Combine into row
    if len(titles) == len(values):
        row = {"Zone": selected_zone, "Region": selected_region}
        for title, value in zip(titles, values):
            row[title] = value
        all_data.append(row)
    else:
        print(f"⚠️ Mismatch in titles and data for {selected_zone} - {selected_region}")

# Convert to DataFrame
df = pd.DataFrame(all_data)

# Show and save
df

Unnamed: 0,Zone,Region,Total No. Of Depots,Total Capacity,Rice Stock,Wheat Stock,Total Stock
0,East Zone,Bihar Region,80,"11,54,779 MT","2,80,023 MT","7,09,644 MT","9,89,667 MT"
1,East Zone,Jharkhand Region,48,"4,69,494 MT","2,45,937 MT","95,124 MT","3,41,061 MT"
2,East Zone,Odisha Region,41,"5,84,467 MT","4,31,371 MT","42,864 MT","4,74,235 MT"
3,East Zone,WB Region,30,"8,40,319 MT","1,37,826 MT","6,16,519 MT","7,54,344 MT"
4,North East Zone,Arunachal Region,28,"63,176 MT","37,071 MT",0 MT,"37,071 MT"
5,North East Zone,Assam Region,35,"5,30,559 MT","3,59,576 MT","67,621 MT","4,27,197 MT"
6,North East Zone,Manipur Region,9,"64,728 MT","51,106 MT",0 MT,"51,106 MT"
7,North East Zone,Nagaland Region,6,"57,083 MT","44,726 MT",0 MT,"44,726 MT"
8,North East Zone,NEF Region,23,"1,31,822 MT","76,951 MT","11,386 MT","88,337 MT"
9,North Zone,Delhi Region,6,"3,27,629 MT","1,00,174 MT","1,43,414 MT","2,43,588 MT"


In [8]:
df.to_csv("FCI_zone_region_data.csv", index=False)