In [1]:
import mechanicalsoup
import lxml

In [2]:
browser = mechanicalsoup.Browser()

Browser objects represent the headless web browser. You can use them to request a page from the Internet by passing a URL to their .get() method:

In [3]:
url = "http://olympus.realpython.org/login"
page = browser.get(url)

In [4]:
page.status_code

200

The important section of HTML code is the login form—that is, everything inside the \<form\> tags. The \<form\> on this page has the name attribute set to login. This form contains two \<input\> elements, one named user and the other named pwd. The third \<input\> element is the Submit button.

Now that you know the underlying structure of the login form, as well as the credentials needed to log in, let’s take a look at a program. that fills the form out and submits it.

In [5]:
html = page.soup
form = html.select("form")[0]
form.select("input")[0]["value"] = "zeus"
form.select("input")[1]["value"] = "ThunderDude"

You create a Browser instance and use it to request the URL http://olympus.realpython.org/login. You assign the HTML content of the page to the login_html variable using the .soup property.

login_html.select("form") returns a list of all \<form\> elements on the page. Since the page has only one \<form\> element, you can access the form by retrieving the element at index 0 of the list. The next two lines select the username and password inputs and set their value to "zeus" and "ThunderDude", respectively.
    
    
You submit the form with browser.submit(). Notice that you pass two arguments to this method, the form object and the URL of the login_page, which you access via login_page.url.

In [6]:
profiles_page = browser.submit(form, page.url)
profiles_page.url

'http://olympus.realpython.org/profiles'

In the interactive window, you confirm that the submission successfully redirected to the /profiles page. If something had gone wrong, then the value of profiles_page.url would still be "http://olympus.realpython.org/login".

Now that we have the profiles_page variable set, let’s see how to programmatically obtain the URL for each link on the /profiles page.

In [8]:
links = profiles_page.soup.select('a')

In [13]:
# iterate over each link and print the full urls
base_url = "http://olympus.realpython.org"

for link in links:
    text = link.text
    address = link['href']
    print('{}: {}{}'.format(text, base_url, address))

Aphrodite: http://olympus.realpython.org/profiles/aphrodite
Poseidon: http://olympus.realpython.org/profiles/poseidon
Dionysus: http://olympus.realpython.org/profiles/dionysus
