# JOSAA-scrape
A python script to scrape the JOSAA OR/CR page at the [JOSAA Result Archieve](https://josaa.nic.in/Result/Result/OpeningClosingRankArchieve.aspx).

Uses `requests`, `BeautifulSoup` and `pandas`

I do not own this data, nor am I liable for any consequences of its usage.

In [22]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from io import StringIO

`params` is a dictionary containing params info from DevTools > Network > Params

It is an ASP .NET powered form, so a `__VIEWSTATE` and `__EVENTVALIDATION` are assigned to every session. Every dropdown selection adds some encoded info to these parameters, so it is not possible to send them all at once.

We can get these values from the hidden `__VIEWSTATE` and similar inputs on the page after every POST request.

The `josaa_scrape()` function return a pandas dataframe containing the year and round specific OR/CR.

`pandas` converts the OR/CR to `float`s, so I typecast them back as `int`s.

2016 only had 6 rounds for some reason, so the ugly adjustments were required in `years` and `rounds`.

In [23]:
url = 'https://josaa.admissions.nic.in/applicant/seatmatrix/openingclosingrankarchieve.aspx'

params = {
    "ctl00$ContentPlaceHolder1$ddlInstype": "ALL",
    "ctl00$ContentPlaceHolder1$ddlInstitute": "ALL",
    "ctl00$ContentPlaceHolder1$ddlBranch": "ALL",
    "ctl00$ContentPlaceHolder1$ddlSeatType": "OPNO",
    "ctl00$ContentPlaceHolder1$btnSubmit": "Submit"
}

In [24]:
def josaa_scrape(year, Round):
    """
    Sample usage: df = josaa_scrape("2018", "1")
    df.info()
    """
    with requests.Session() as s:
        R = s.get(url)
        data = {}
        data.update({tag['name']: tag['value'] for tag in BeautifulSoup(R.content, 'lxml').select('input[name^=__]')})
        data["ctl00$ContentPlaceHolder1$ddlYear"] = year
        R = s.post(url, data=data)

        data.update({tag['name']: tag['value'] for tag in BeautifulSoup(R.content, 'lxml').select('input[name^=__]')})
        data["ctl00$ContentPlaceHolder1$ddlroundno"] = Round
        R = s.post(url, data=data)

        for key, value in params.items():
            data.update({tag['name']: tag['value'] for tag in BeautifulSoup(R.content, 'lxml').select('input[name^=__]')})
            data[key] = value
            R = s.post(url, data=data)
        # print(R.text)

    table = BeautifulSoup(R.text, 'lxml').find(id = 'ctl00_ContentPlaceHolder1_GridView1')
    df = pd.read_html(StringIO(table.prettify()))[0]
    df.dropna(inplace = True, how="all")

    df["Year"] = year
    df["Round"] = Round
    df['Opening Rank'] = df['Opening Rank'].astype(int)
    df['Closing Rank'] = df['Closing Rank'].astype(int)
    return df


In [25]:
# List of years and number of rounds available for each
year_rounds = [
    {"year": 2024, "rounds": 5},
    {"year": 2023, "rounds": 6},
    {"year": 2022, "rounds": 6},
    {"year": 2021, "rounds": 6}
    # Add more years/rounds if needed
]

In [26]:
for yr in year_rounds:
    year = yr["year"]
    num_rounds = yr["rounds"]

    for rnd in range(1, num_rounds + 1):
        print(f"📥 Scraping Year: {year}, Round: {rnd}")
        df = josaa_scrape(year, rnd)
# saving the dataframe
        df.to_csv(f"data/{year}-{rnd}.csv", index=False)
print(f"All Scrapped")


📥 Scraping Year: 2024, Round: 1
📥 Scraping Year: 2024, Round: 2
📥 Scraping Year: 2024, Round: 3
📥 Scraping Year: 2024, Round: 4
📥 Scraping Year: 2024, Round: 5
📥 Scraping Year: 2023, Round: 1
📥 Scraping Year: 2023, Round: 2
📥 Scraping Year: 2023, Round: 3
📥 Scraping Year: 2023, Round: 4
📥 Scraping Year: 2023, Round: 5
📥 Scraping Year: 2023, Round: 6
📥 Scraping Year: 2022, Round: 1
📥 Scraping Year: 2022, Round: 2
📥 Scraping Year: 2022, Round: 3
📥 Scraping Year: 2022, Round: 4
📥 Scraping Year: 2022, Round: 5
📥 Scraping Year: 2022, Round: 6
📥 Scraping Year: 2021, Round: 1
📥 Scraping Year: 2021, Round: 2
📥 Scraping Year: 2021, Round: 3
📥 Scraping Year: 2021, Round: 4
📥 Scraping Year: 2021, Round: 5
📥 Scraping Year: 2021, Round: 6
All Scrapped


In [None]:
# for year in years:
#     for Round in rounds:
#         josaa_scrape(year, Round).to_csv(path_or_buf= year + "-" + Round + ".csv", index=False)