# JOSAA-scrape
A python script to scrape the JOSAA OR/CR page at the [JOSAA Result Archieve](https://josaa.nic.in/Result/Result/OpeningClosingRankArchieve.aspx).

Uses `requests`, `BeautifulSoup` and `pandas`

I do not own this data, nor am I liable for any consequences of its usage.

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

`params` is a dictionary containing params info from DevTools > Network > Params

It is an ASP .NET powered form, so a `__VIEWSTATE` and `__EVENTVALIDATION` are assigned to every session. Every dropdown selection adds some encoded info to these parameters, so it is not possible to send them all at once.

We can get these values from the hidden `__VIEWSTATE` and similar inputs on the page after every POST request.

The `josaa_scrape()` function return a pandas dataframe containing the year and round specific OR/CR.

`pandas` converts the OR/CR to `float`s, so I typecast them back as `int`s.

2016 only had 6 rounds for some reason, so the ugly adjustments were required in `years` and `rounds`.

In [None]:
url = 'https://josaa.nic.in/Result/Result/OpeningClosingRankArchieve.aspx'

params = {
    "ctl00$ContentPlaceHolder1$ddlInstype": "ALL",
    "ctl00$ContentPlaceHolder1$ddlInstitute": "ALL",
    "ctl00$ContentPlaceHolder1$ddlBranch": "ALL",
    "ctl00$ContentPlaceHolder1$btnSubmit": "Submit"
}

In [None]:
years = [
    "2018",
    "2017"
#     "2017",
#     "2016"
]

rounds = [
    "1",
    "2",
    "3",
    "4",
    "5",
#     "6"
    "6",
    "7"
]

In [None]:
def josaa_scrape(year, Round):
    """
    Sample usage: df = josaa_scrape("2018", "1")
    df.info()
    """
    with requests.Session() as s:
        R = s.get(url)
        data = {}
        data.update({tag['name']: tag['value'] for tag in BeautifulSoup(R.content, 'lxml').select('input[name^=__]')})
        data["ctl00$ContentPlaceHolder1$ddlYear"] = year
        R = s.post(url, data=data)

        data.update({tag['name']: tag['value'] for tag in BeautifulSoup(R.content, 'lxml').select('input[name^=__]')})
        data["ctl00$ContentPlaceHolder1$ddlroundno"] = Round
        R = s.post(url, data=data)

        for key, value in params.items():
            data.update({tag['name']: tag['value'] for tag in BeautifulSoup(R.content, 'lxml').select('input[name^=__]')})
            data[key] = value
            R = s.post(url, data=data)

    table = BeautifulSoup(R.text, 'lxml').find(id = 'ctl00_ContentPlaceHolder1_GridView1')
    df = pd.read_html(table.prettify())[0]
    df.dropna(inplace = True, how="all")

    df["Year"] = year
    df["Round"] = Round
    df['Opening Rank'] = df['Opening Rank'].astype(int)
    df['Closing Rank'] = df['Closing Rank'].astype(int)

    return df

In [None]:
df = josaa_scrape("2018", "1")
df.info()

In [None]:
# for year in years:
#     for Round in rounds:
#         josaa_scrape(year, Round).to_csv(path_or_buf= year + "-" + Round + ".csv", index=False)