Skip to content
This repository has been archived by the owner on May 17, 2022. It is now read-only.

Switch to browser explicit wait #10

Merged
merged 2 commits into from Aug 15, 2021

Conversation

flyingcakes85
Copy link
Contributor

EXPERIMENTAL! Needs Testing.

time.sleep() introduces a necessary wait, even if the page has already been loaded.

By using expected_components, we can proceed as and when the element loads.
Using the python time library, I calculated the time taken by search and course page to load to be 2 seconds (approx.)

Theoretically speaking, after the change, execution time should have reduced by 5 seconds. (3+4-2)
However, the gain was only 3 seconds instead of the expected 5.

This behavior seems unexpected for the moment, unless we can find where the missing 2 seconds are.
For a reference, the original version, using time.sleep() took 17 seconds to execute.

(All times are measured for my internet connection, by executing the given example in readme file)

Possibly need to dig further. I haven't yet got the time to read full code.

@sortedcord sortedcord linked an issue Aug 14, 2021 that may be closed by this pull request
@sortedcord sortedcord added the bug Something isn't working label Aug 14, 2021
@sortedcord
Copy link
Owner

Okay, so the exact same issue has been addressed in issue #5 . The thing is that when using implicit wait with selenium (Specifying the time manually), the selenium actions halt, but the interpreter continues to execute the code. If this doesn't make sense then here is a simple example-

driver.implicitly_wait(100)  #tells to wait for 100 seconds
print("it works?")

So, ideally, I would want it to halt for 100 seconds before executing the printline (which could instead be some code for scraping info after components load) however, as I mentioned, the browser waits for the next action, but the python program itself does not. I suppose the exact same is there with the explicit wait which you are trying to do.

There were a number of other small issues that I'll review.

@sortedcord sortedcord added the optimization Found a better way to do the same thing? label Aug 14, 2021
@sortedcord sortedcord added this to In progress in UdemyScraper via automation Aug 14, 2021
@sortedcord sortedcord changed the title Reduce wait time by removing time.sleep() Switch to browser explicit wait Aug 14, 2021
@flyingcakes85
Copy link
Contributor Author

Implicit wait is not meant to add delay at the point it is called. Rather, it comes into play when you try to access a DOM element later on. When accessing DOM element, Selenium will wait for the implicit wait value before it fires an element not found error.

Expected conditions on the other hand will block the thread until the specified element is located. Try this code

import time

# Selenium Libraries
from selenium import webdriver  # for webdriver

# for suppressing the browser
from selenium.webdriver.chrome.options import Options

# to wait until page loads
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By

option = Options()
option.add_argument('headless')
option.add_experimental_option('excludeSwitches', ['enable-logging'])
browser = webdriver.Chrome(options=option)
browser.get("https://www.udemy.com/courses/search/?src=ukw&q=learn+javascript")

print("Starting wait")
try:
    start = time.time()
    element_present = EC.presence_of_element_located(
        (By.CLASS_NAME, 'course-directory--container--5ZPhr'))
    WebDriverWait(browser, 5).until(element_present)
    end = time.time()
except TimeoutException:
    print("Timed out waiting for page to load")
    exit()
print(end-start)

print("it works?")

The time printed is the time for which EC waited. And the final it works? isn't printed instantly after Starting wait.

If possible, throttle your network to better notice the delay. I used wondershaper on my Linux system, to throttle network.

udemyscraper.py Outdated Show resolved Hide resolved
@sortedcord sortedcord merged commit 5005364 into sortedcord:master Aug 15, 2021
UdemyScraper automation moved this from In progress to Done Aug 15, 2021
@flyingcakes85
Copy link
Contributor Author

This behavior seems unexpected for the moment, unless we can find where the missing 2 seconds are.

Still a mystery 🕵️‍♀️

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working optimization Found a better way to do the same thing?
Projects
Development

Successfully merging this pull request may close these issues.

Use explicit wait for search query
2 participants