In [1]:
import requests
import bs4
import pandas as pd

Let's try opening a page for which we ususally have to login first.

In [2]:
response = requests.get("https://opensource-demo.orangehrmlive.com/index.php/recruitment/viewCandidates")
response.status_code

200

In [3]:
soup = bs4.BeautifulSoup(response.content)
soup.find("table", id="resultTable")

In [4]:
with open("without_login.html", "wb") as file:
    file.write(response.content)

No success. We are redirected to the login page.

By inspecting the POST request when loggin in we find the URL and the necessary payload (`txtUsername` and `txtPassword`). Let's try logging in using it!

In [5]:
data = {
    "txtUsername": "Admin",
    "txtPassword": "admin123"
}
login_response = requests.post(
    "https://opensource-demo.orangehrmlive.com/index.php/auth/validateCredentials",
    data=data
)
login_response.status_code

400

In [6]:
with open("after_login.html", "wb") as file:
    file.write(login_response.content)

Again, no success. The reason is that we did not send the session-specific [csrf token](https://en.wikipedia.org/wiki/Cross-site_request_forgery). It's basically a string that the server sends to us when GETting the login page and expects us to send when POSTing our login information.

In order to properly handle this issue, we need to somehow connect our subsequent requests. For this we use a `session`. This allows the server to know that subsequent requests belong together, and allows us to store cookies that may be required for remaining logged in.

In [7]:
session = requests.Session()

login_page = session.get("https://opensource-demo.orangehrmlive.com")
login_soup = bs4.BeautifulSoup(login_page.content)
csrf_token = login_soup.find("input", type="hidden", id="csrf_token")["value"]

login_data = {
    "txtUsername": "Admin",
    "txtPassword": "admin123",
    "_csrf_token": csrf_token
}
new_login_response = session.post("https://opensource-demo.orangehrmlive.com/index.php/auth/validateCredentials",
                                  data=login_data)

In [8]:
with open("after_login_new.html", "wb") as file:
    file.write(new_login_response.content)

Logging in works now. Using the logged-in session we can now navigate to the page we originally wanted to.

In [9]:
candidates_response = session.get("https://opensource-demo.orangehrmlive.com/index.php/recruitment/viewCandidates")
candidates_soup = bs4.BeautifulSoup(candidates_response.content)

And download the data as usual.

In [10]:
candidates_soup.find("table", id="resultTable")

<table class="table hover" id="resultTable">
<thead>
<tr><th class="checkbox-col" rowspan="1"><input id="ohrmList_chkSelectAll" name="chkSelectAll" type="checkbox" value=""/></th>
<th class="header" rowspan="1" style="width:"><a class="null" href="https://opensource-demo.orangehrmlive.com/index.php/recruitment/viewCandidates?sortField=jv.name&amp;sortOrder=ASC">Vacancy</a></th>
<th class="header" rowspan="1" style="width:"><a class="null" href="https://opensource-demo.orangehrmlive.com/index.php/recruitment/viewCandidates?sortField=jc.first_name&amp;sortOrder=ASC">Candidate</a></th>
<th class="header" rowspan="1" style="width:"><a class="null" href="https://opensource-demo.orangehrmlive.com/index.php/recruitment/viewCandidates?sortField=e.emp_firstname&amp;sortOrder=ASC">Hiring Manager</a></th>
<th class="header" rowspan="1" style="width:"><a class="null" href="https://opensource-demo.orangehrmlive.com/index.php/recruitment/viewCandidates?sortField=jc.date_of_application&amp;sortOrder=

In [11]:
table_head = candidates_soup.find("table", id="resultTable").find("thead")
table_body = candidates_soup.find("table", id="resultTable").find("tbody")

In [12]:
colnames = [col.text for col in table_head.find_all("th")]

data = []
for row in table_body.find_all("tr"):
    data.append([col.text for col in row.find_all("td")])

pd.DataFrame(data, columns=colnames)


Unnamed: 0,Unnamed: 1,Vacancy,Candidate,Hiring Manager,Date of Application,Status,Resume
0,,Associate IT Manager,Banda Pavithra R,Odis Adalwin,2021-09-14,Application Initiated,Download
1,,Associate IT Manager,Banda Pavithra R,Odis Adalwin,2021-09-14,Application Initiated,Download
2,,Associate IT Manager,Banda Pavithra R,Odis Adalwin,2021-09-14,Application Initiated,Download
3,,Associate IT Manager,pavitra B R,Odis Adalwin,2021-09-14,Application Initiated,Download
4,,Junior Account Assistant,maren ibis salamo,Kevin Mathews,2021-09-14,Application Initiated,Download
5,,Software Engineer,Jennifer Clinton,Odis Adalwin,2020-10-08,Rejected,Download
6,,Sales Representative,Jo Denton,Linda Jane Anderson,2020-10-08,Application Initiated,Download
7,,Sales Representative,Charles Haywire,Linda Jane Anderson,2020-10-08,Shortlisted,Download
8,,Sales Representative,Richard Holmes,Linda Jane Anderson,2020-10-08,Application Initiated,Download
9,,Software Engineer,Phil Hughes,Odis Adalwin,2020-10-08,Shortlisted,Download
