You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug Description:
I encountered a strange bug while running two almost similar benchmarks for the "enter-time" task (but it might consider many other tasks). The only difference between the two runs is the value of the "headless" parameter. In the first case, I set it to False (headless = False), while in the second case, I left it as True, which was the default value.
Steps to Reproduce:
Git clone the SNow_benchmark branch from my fork and follow the installation in the README.md
.
Set the headless parameter to False and run the benchmark for the "enter-time" : python main.py --env enter-time --llm chatgpt --num-episodes 1 --irci 1 --sgrounding
Set the headless parameter to True and run the benchmark for the "enter-time" : python main.py --env enter-time --llm chatgpt --num-episodes 1 --irci 1 --sgrounding --headless Expected Behavior:
The results should be identical, regardless of the value of the headless parameter.
Actual Behavior:
When the headless parameter is disabled (set to False), certain actions are not allowed or counted, resulting in a failed task. (I could benchmark the task several time, I will still get the same results)
(RCI-agent-WSL) thirdcore@DESKTOP-5I4C9HH:~/rci-agent$ python main.py --env enter-time --llm chatgpt --num-episodes 1 --irci 1 --sgrounding
False
INFO:root:Starting WebDriver Instance 0
INFO:selenium.webdriver.common.selenium_manager:Applicable driver not found; attempting to install with Selenium Manager (Beta)
INFO:root:Send a request to the language model from initialize_plan
INFO:root:The number of generated action steps: 4
INFO:root:Send a request to the language model from generate_action
INFO:root:The executed instruction: clickxpath //*[@id="tt"]
INFO:root:Send a request to the language model from generate_action
INFO:root:The executed instruction: type 02:07PM
INFO:root:Send a request to the language model from generate_action
INFO:root:The executed instruction: clickxpath //*[@id="subbtn"]
success rate: 1.0
(RCI-agent-WSL) thirdcore@DESKTOP-5I4C9HH:~/rci-agent$ python main.py --env enter-time --llm chatgpt --num-episodes 1 --irci 1 --sgrounding --headless
True
INFO:root:Starting WebDriver Instance 0
INFO:selenium.webdriver.common.selenium_manager:Applicable driver not found; attempting to install with Selenium Manager (Beta)
INFO:root:Send a request to the language model from initialize_plan
INFO:root:The number of generated action steps: 4
INFO:root:Send a request to the language model from generate_action
INFO:root:The executed instruction: clickxpath //*[@id="tt"]
INFO:root:Send a request to the language model from generate_action
INFO:root:The executed instruction: type 1017AM
INFO:root:Send a request to the language model from generate_action
INFO:root:The executed instruction: clickxpath //*[@id="subbtn"]
success rate: 0.0
Additional Information:
I'm still investigating the root cause of this issue. It seems that when the browser is not displayed, some actions are restricted or not properly accounted for, leading to the task failure. Did you have the same behavior, is there something that I'm missing ?
The text was updated successfully, but these errors were encountered:
Bug Description:
I encountered a strange bug while running two almost similar benchmarks for the "enter-time" task (but it might consider many other tasks). The only difference between the two runs is the value of the "headless" parameter. In the first case, I set it to False (
headless = False
), while in the second case, I left it as True, which was the default value.Steps to Reproduce:
.
headless
parameter to False and run the benchmark for the "enter-time" :python main.py --env enter-time --llm chatgpt --num-episodes 1 --irci 1 --sgrounding
headless
parameter to True and run the benchmark for the "enter-time" :python main.py --env enter-time --llm chatgpt --num-episodes 1 --irci 1 --sgrounding --headless
Expected Behavior:
The results should be identical, regardless of the value of the
headless
parameter.Actual Behavior:
When the
headless
parameter is disabled (set to False), certain actions are not allowed or counted, resulting in a failed task. (I could benchmark the task several time, I will still get the same results)Additional Information:
I'm still investigating the root cause of this issue. It seems that when the browser is not displayed, some actions are restricted or not properly accounted for, leading to the task failure. Did you have the same behavior, is there something that I'm missing ?
The text was updated successfully, but these errors were encountered: