### Problem Statement: Automated Web Scraping of Financial Data from NSE India Website

## Description: 

The objective of this project is to develop an automated web scraping solution using Selenium to extract financial data from the NSE India website (https://www.nseindia.com/). The program should fetch a list of companies alphabetically, navigate to each company's page, and scrape specific financial information, including Total Income, Profit Before Tax, and Net Profit/Loss for the quarter ended on 31 March 2023. The scraped data should be stored in a CSV file for further analysis. The solution should also incorporate a mechanism to resume scraping from where it left off to avoid duplicating data if the program is run multiple times. The project aims to provide a reliable and efficient way to gather financial data for analysis and decision-making purposes
                


In [None]:
import csv
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

def get_companies():
    """
    Fetches the list of companies alphabetically from the NSE India website.

    Returns:
        companies (list): List of companies fetched from the website.
                          Each company is represented as a tuple (company_name, first_letter).
    """
    # Initialize ChromeDriver with options
    options = Options()
    options.add_argument("--headless")  # Run in headless mode (no visible browser window)
    service = Service('chromedriver_win32')  #  path of ChromeDriver executable
    driver = webdriver.Chrome(service=service, options=options)

    # Open the NSE India website
    driver.get("https://www.nseindia.com/")

    # Find and interact with the search bar
    search_bar = driver.find_element(By.XPATH, '//*[@id="header-search-input"]')
    search_bar.click()

    # Fetch the list of companies alphabetically
    companies = []
    for letter in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
        search_bar.clear()
        search_bar.send_keys(letter)
        company_dropdown = driver.find_element(By.XPATH, '/html/body/header/nav/div[1]/div[1]/div/div[1]/button/img')
        company_dropdown.click()
        company_list = driver.find_elements(By.XPATH, '//*[@id="headerSearchData"]/div[1]/ul/li')
        for company in company_list:
            company_name = company.text
            companies.append((company_name, letter))

    # Close the browser
    driver.quit()

    return companies

def scrape_company_data(company_name):
    """
    Scrapes financial data for a given company from the NSE India website and stores it in a CSV file.

    Args:
        company_name (tuple): Tuple containing the company name and first letter.
    """
    # Check if the data already exists in the CSV file
    with open('company_data.csv', 'r') as csvfile:
        reader = csv.reader(csvfile)
        for row in reader:
            if row[0] == company_name[0]:
                print(f"Skipping {company_name[0]} as data already exists")
                return

    # Initialize ChromeDriver with options
    options = Options()
    options.add_argument("--headless")  # Run in headless mode (no visible browser window)
    service = Service('chromedriver_win32')  #   path of ChromeDriver executable
    driver = webdriver.Chrome(service=service, options=options)

    # Open the NSE India website
    driver.get("https://www.nseindia.com/")

    # Find and interact with the search bar
    search_bar = driver.find_element(By.XPATH, '//*[@id="header-search-input"]')
    search_bar.click()
    search_bar.send_keys(company_name[0])

    # Wait for the dropdown to appear and select the first company
    company_dropdown = driver.find_element(By.XPATH, '/html/body/header/nav/div[1]/div[1]/div/div[1]/button/img')
    company_dropdown.click()

    # Find and click on the "Corporate Information" tab
    corporate_info_tab = driver.find_element(By.XPATH, '//*[@id="corporate_info"]')
    corporate_info_tab.click()

    # Find and click on the "Financial Results" navigation bar
    financial_results_tab = driver.find_element(By.XPATH, '//*[@id="financialResults"]')
    financial_results_tab.click()

    # Check if the condition is met for the desired financial quarter
    quarter_ended = driver.find_element(By.XPATH, '//*[@id="financialResultsTable"]/table/tbody/tr[1]/td[1]').text
    if quarter_ended == "31 Mar 2023":
        # Scrape the required financial data
        total_income = driver.find_element(By.XPATH, '//*[@id="financialResultsTable"]/table/tbody/tr[1]/td[3]').text
        profit_before_tax = driver.find_element(By.XPATH, '//*[@id="financialResultsTable"]/table/tbody/tr[1]/td[4]').text
        net_profit_loss = driver.find_element(By.XPATH, '//*[@id="financialResultsTable"]/table/tbody/tr[1]/td[5]').text

        # Store the scraped data in a CSV file
        with open('company_data.csv', 'a', newline='') as csvfile:
            writer = csv.writer(csvfile)
            writer.writerow([company_name[0], company_name[1], total_income, profit_before_tax, net_profit_loss])
    else:
        print(f"No data available for {company_name[0]}")

    # Close the browser
    driver.quit()

def main():
    """
    Main function to scrape financial data for multiple companies and store it in a CSV file.
    """
    # Fetch the list of companies
    companies = get_companies()

    # Scrape data for the companies and store it in a CSV file
    for company in companies:
        scrape_company_data(company)

# Run the main function
if __name__ == "__main__":
    main()


The provided code implements a web scraping solution using Selenium to extract financial data from the NSE India website (https://www.nseindia.com/). The program starts by fetching the list of companies alphabetically and stores them in a list. It then iterates through each company, checks if the financial data already exists in the CSV file to avoid duplication, and proceeds to scrape the data if it's not already present. The code navigates to the company's page, selects the "Corporate Information" tab, and clicks on the "Financial Results" navigation bar. It verifies if the desired financial quarter (31 Mar 2023) is available and if so, extracts the Total Income, Profit Before Tax, and Net Profit/Loss. The scraped data is stored in a CSV file for further analysis. Additionally, the code includes a condition to continue from where it left off if the program is run again, ensuring that duplicate data is not collected. This solution provides an efficient and automated way to gather financial information from the NSE India website.