# Web Scrapping the official AGMARKNET website

In [1]:
# üõ†Ô∏è Install necessary libraries for web scraping and data handling.
# `bs4` (BeautifulSoup) is crucial for parsing HTML content. üåê
# %pip install -U bs4

### üëÄ Manual Data Verification Instructions üìãüîç

To manually verify whether onion price data for Uttar Pradesh exists on AGMARKNET: üßÖ

1. Visit the official AGMARKNET "Search Reports" page: üîó
 ¬† https://agmarknet.gov.in/SearchCmmMkt.aspx

2. In the opened form: üìù
 ¬† - For 'Commodity', select "Onion".
 ¬† - For 'State', choose "Uttar Pradesh".
 ¬† - (Optional: Select District or Market, or leave as 'All' for a broader search.) üó∫Ô∏è
 ¬† - Choose the desired Date Range (for best results, use a recent week or month). üóìÔ∏è
 ¬† - Click the 'Submit' button to fetch results. ‚úÖ

3. The page will display a table with Data (Date, Market, Variety, Min Price, Max Price, Modal Price, Arrival Qty) if available. üìä

4. If results appear with valid prices and arrival quantities,
 ¬† ‚úÖ The data exists and can be scraped or downloaded. üéâ
 ¬† ‚ùå If you see "No records found" or empty fields, that period or region has no posted data. üòî

5. You can also use the 'Download CSV' button provided on the result page to save a copy for inspection. üíæ

‚¨ÜÔ∏è These steps let you confirm that real onion price data for UP exists BEFORE running your code or automating scraping. üöÄ

#### üí° Tip: Repeat this for different years or date ranges if you need historical data. üï∞Ô∏è

In [3]:
# --- Section 1: Setup & Imports üì¶ ---

# Let's get our essential tools ready! üõ†Ô∏è
# We'll need these Python libraries to make web requests, parse HTML,
# and handle data like a pro. üìà

import requests                              # For making HTTP requests to websites üåê
from bs4 import BeautifulSoup                # For parsing HTML content and navigating the DOM tree üå≥
import pandas as pd                          # For powerful data manipulation and analysis with DataFrames üìä
import os                                    # For interacting with the operating system (e.g., creating folders) üìÅ
import time                                  # For adding delays (important for polite scraping to avoid overwhelming servers! ‚è≥)
import urllib.parse                          # For URL encoding/decoding, used in AJAX response parsing and URL construction üîó
from datetime import datetime, timedelta     # For handling dates and times precisely üìÖ

# Set pandas display options for clearer output in our notebook. ‚ú®
pd.set_option('display.max_columns', 100)    # Show up to 100 columns üî¢
pd.set_option('display.width', 180)          # Widen the display for better readability of wide tables üìè

print("üöÄ Essential libraries imported successfully!")
print("Pandas display options set for a cleaner view of our data. ‚ú®")


üöÄ Essential libraries imported successfully!
Pandas display options set for a cleaner view of our data. ‚ú®


In [7]:
# --- ‚öôÔ∏è Configuration & Constants ---
# Define the parameters for our data fetching operation. üéØ

BASE_URL_MAIN = "https://agmarknet.gov.in/SearchCmmMkt.aspx" # The main URL for AGMARKNET price search üåê

# Commodity & state codes specific to AGMARKNET portal üßÖüó∫Ô∏è
COMMODITY_CODE = "23"    # For Onion, 23
STATE_CODE = "UP"        # For Uttar Pradesh, UP

# Set the date range for which we want to fetch the data üìÖ
# You can uncomment the lines below to dynamically set the date to "day before yesterday".
# latest_date = (datetime.today() - timedelta(days=2)).strftime("%d-%b-%Y")
# DATE_FROM = latest_date
# DATE_TO = latest_date

# For now, we'll use a fixed date range for consistency. üóìÔ∏è
DATE_FROM = '01-Jul-2025'
DATE_TO = '31-Jul-2025'

# Define the data directory and the raw CSV file path üìÅüíæ
DATA_DIR = "data"
# Format dates for filename: e.g., '01Jul25_31Jul25'
formatted_date_from = DATE_FROM.replace('-', '').replace('20', '')
formatted_date_to = DATE_TO.replace('-', '').replace('20', '')
RAW_CSV_PATH = os.path.join(DATA_DIR, f"COMMODITY[{COMMODITY_CODE}]_{STATE_CODE}_{formatted_date_from}_{formatted_date_to}.csv")
AJAX_RESPONSE_PATH = os.path.join(DATA_DIR, f"debug_ajax_response_{formatted_date_from}_{formatted_date_to}.txt")

# Ensure the data directory exists; create it if it doesn't. ‚ûï
os.makedirs(DATA_DIR, exist_ok=True)

print(f"""
‚öôÔ∏è Configuration:
 ¬†Commodity: Onion (Code: {COMMODITY_CODE})
 ¬†State: Uttar Pradesh (Code: {STATE_CODE})
 ¬†Date Range: {DATE_FROM} to {DATE_TO}
 ¬†Data will be saved in: {RAW_CSV_PATH}
  AJAX response will be saved in: {AJAX_RESPONSE_PATH}
""")



‚öôÔ∏è Configuration:
 ¬†Commodity: Onion (Code: 23)
 ¬†State: Uttar Pradesh (Code: UP)
 ¬†Date Range: 01-Jul-2025 to 31-Jul-2025
 ¬†Data will be saved in: data/COMMODITY[23]_UP_01Jul25_31Jul25.csv
  AJAX response will be saved in: data/debug_ajax_response_01Jul25_31Jul25.txt



In [41]:
# # --- üß© Parse AJAX response function ---
# # This function is designed to extract the relevant HTML fragment from the complex ASP.NET AJAX response. üì¶

# def parse_ajax_response(ajax_text: str) -> str:
#     """
#     Parses ASP.NET pipe-delimited AJAX response to extract and decode the HTML fragment.
#     """
#     parts = ajax_text.split('|')
#     i = 0
#     while i < len(parts):
#         try:
#             length = int(parts[i])   # first is length of next block
#             update_type = parts[i+1] # usually '#' or 'updatePanel'
#             control_id = parts[i+2]
#             content = parts[i+3]
#             decoded = urllib.parse.unquote(content)
#             if "cphBody_GridPriceData" in decoded:
#                 return decoded
#             i += 4
#         except Exception:
#             i += 1
#     print("‚ö† Could not parse AJAX properly. Returning raw text.")
#     return ajax_text


In [10]:
# --- ü§ñ Fetch AGMARKNET data function ---
# This is the core function for scraping onion price data from AGMARKNET. üßÖüí∞

def fetch_agmarknet_data(commodity_code, state_code, date_from, date_to, verbose=True):

    session = requests.Session() # Create a session to persist parameters across requests ü§ù
    df = pd.DataFrame() # Initialize an empty DataFrame to store results üìù

    if verbose:
        print(f"\nüöÄ Starting data fetch for: {date_from} to {date_to}") # Inform the user about the process start üöÄ

    try:
        # Step 1: GET the initial page to extract necessary ASP.NET ViewState and EventValidation tokens. üîë
        r = session.get(BASE_URL_MAIN, timeout=15) # Make a GET request with a timeout ‚è∞
        r.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx) üö®
        soup = BeautifulSoup(r.text, "html.parser") # Parse the HTML content üßê

        def get_val(name): # Helper function to get input field values üí°
            tag = soup.find("input", {"name": name}) # Find the input tag by its name attribute üîé
            return tag.get("value", "") if tag else "" # Return its value, or an empty string if not found üìù

        viewstate = get_val("__VIEWSTATE") # Extract __VIEWSTATE üîë
        viewstategenerator = get_val("__VIEWSTATEGENERATOR") # Extract __VIEWSTATEGENERATOR üîë
        eventvalidation = get_val("__EVENTVALIDATION") # Extract __EVENTVALIDATION üîë

        if verbose:
            print(f"üìç ViewState found: {bool(viewstate)}, EventValidation found: {bool(eventvalidation)}") # Report token discovery üéØ

        # Step 2: Prepare the payload for the POST request. üì§
        # These parameters mimic a form submission on the AGMARKNET website.
        payload = {
            "__EVENTTARGET": "btnSubmit",  # Crucial for triggering the form submission üéØ
            "__EVENTARGUMENT": "",         # Usually empty for simple button clicks
            "__VIEWSTATE": viewstate,      # Required ASP.NET token
            "__VIEWSTATEGENERATOR": viewstategenerator, # Required ASP.NET token
            "__EVENTVALIDATION": eventvalidation, # Required ASP.NET token
            "__LASTFOCUS": "",             # Often empty
            "ddlCommodity": commodity_code, # Our selected commodity (Onion) üßÖ
            "ddlState": state_code,        # Our selected state (Uttar Pradesh) üó∫Ô∏è
            "txtDate": date_from,          # Start date for the search üìÖ
            "txtToDate": date_to,          # End date for the search üìÖ
            "btnSubmit": "Submit"          # The submit button action ‚úÖ
        }


        # Query parameters for the URL, though the main data fetch is via POST. üåê
        query_params = {
            "Tx_Commodity": commodity_code,
            "Tx_State": state_code,
            "Tx_District": "0", # "0" typically means 'All Districts'
            "Tx_Market": "0",   # "0" typically means 'All Markets'
            "DateFrom": date_from,
            "DateTo": date_to,
            "Fr_Date": date_from, # Redundant but included for robustness
            "To_Date": date_to,
            "Tx_Trend": "0",     # Unused in this context
            "Tx_CommodityHead": "Onion",
            "Tx_StateHead": "Uttar Pradesh",
            "Tx_DistrictHead": "--Select--",
            "Tx_MarketHead": "--Select--"
        }

        # Construct the full URL for the POST request. üîó
        full_post_url = BASE_URL_MAIN + "?" + urllib.parse.urlencode(query_params)

        # Define HTTP headers to mimic a real browser request. üõ°Ô∏è
        headers = {
            'User-Agent': 'Mozilla/5.0',
            'Referer': BASE_URL_MAIN,
            'Origin': 'https://agmarknet.gov.in',
            'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'
        }

        if verbose:
            print(f"üì§ POST URL: {full_post_url}") # Display the URL for debugging üñ•Ô∏è
            print(f"üîë Payload: {payload}")      # Display the payload sent üì¶

        # Step 3: Send the POST request to get the data. üöÄ
        resp = session.post(full_post_url, data=payload, headers=headers, timeout=30) # Send the POST request with data and headers üì®
        resp.raise_for_status() # Check for HTTP errors again üö®

        if verbose:
            print(f"‚úÖ AJAX POST status: {resp.status_code}") # Show response status code üëç
            print(f"üìù Response length: {len(resp.text)}")     # Show the length of the response text üìè

        # Save raw response for inspection during debugging. üíæ
        with open(AJAX_RESPONSE_PATH, "w", encoding="utf-8") as f:
            f.write(resp.text)

        # The server didn‚Äôt include the table (cphBody_GridPriceData) in the AJAX response ‚Üí parsing failed ‚Üí no data.
        
        soup_result = BeautifulSoup(resp.text, "html.parser")
        table = soup_result.find("table", {"id": "cphBody_GridPriceData"})
        
        if table: # If the table is found üéâ
            # Extract table headers.
            headers = [th.get_text(strip=True) for th in table.find("tr").find_all("th")] # Get column headers üè∑Ô∏è
            # Extract table rows (excluding the header row).
            rows = [[td.get_text(strip=True) for td in tr.find_all("td")] for tr in table.find_all("tr")[1:]] # Get all data rows üìù
            df = pd.DataFrame(rows, columns=headers) # Create a Pandas DataFrame from the extracted data üìä
            print(f"‚úÖ Found table. Rows: {len(df)}") # Report success and row count üëç
        else:
            print("‚ùå Table not found. Possible reasons: wrong payload, no data, or structure change.") # Log failure reasons üòî
            # Check for "No records found" in the *extracted HTML fragment*
            if "No records found" in html_fragment: # Check for the "No records found" message in the parsed HTML ‚ÑπÔ∏è
                print("‚ÑπÔ∏è Detected: No records found message.")

    except Exception as e:
        print(f"‚ö†Ô∏è Error during data fetching: {e}") # Catch and report any exceptions during the process ‚ùó
    finally:
        session.close() # Always close the requests session to release resources. üßπ

    return df # Return the DataFrame (might be empty if no data was found or an error occurred) üîÑ

In [11]:
# --- üìù Run scraper & save to CSV ---
# Execute the data fetching process and handle the results. üöÄüíæ

df_data = fetch_agmarknet_data(
    COMMODITY_CODE,        # Pass the onion commodity code üßÖ
    STATE_CODE,            # Pass the Uttar Pradesh state code üó∫Ô∏è
    DATE_FROM,             # Start date for the data fetch üìÖ
    DATE_TO,               # End date for the data fetch üìÖ
    False                  # Set verbose to False for cleaner output during execution, True for detailed logs ü§´
)

if not df_data.empty: # Check if the DataFrame contains any data üì•
    df_data.to_csv(RAW_CSV_PATH, index=False, encoding='utf-8') # Save the DataFrame to a CSV file üìÑ
    print(f"‚úÖ Data saved to: {RAW_CSV_PATH}") # Confirm successful save üéâ
    display(df_data.head()) # Display the first few rows of the fetched data for a quick look üëÄ
else:
    print("‚ö†Ô∏è No data fetched. Please check configuration or website availability.") # Inform if no data was retrieved üòî

‚úÖ Found table. Rows: 4965
‚úÖ Data saved to: data/COMMODITY[23]_UP_01Jul25_31Jul25.csv


Unnamed: 0,Sl no.,District Name,Market Name,Commodity,Variety,Grade,Min Price (Rs./Quintal),Max Price (Rs./Quintal),Modal Price (Rs./Quintal),Price Date
0,1,Auraiya,Achalda,Onion,Red,FAQ,1200,1350,1300,01 Jul 2025
1,2,Auraiya,Achalda,Onion,Red,FAQ,1200,1350,1300,02 Jul 2025
2,3,Auraiya,Achalda,Onion,Red,FAQ,1200,1350,1300,15 Jul 2025
3,4,Auraiya,Achalda,Onion,Red,FAQ,1250,1450,1350,22 Jul 2025
4,5,Auraiya,Achalda,Onion,Red,FAQ,1250,1450,1350,30 Jul 2025
