Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception with headless=False under WSL #78

Closed
raisulrana opened this issue Apr 8, 2022 · 6 comments
Closed

Exception with headless=False under WSL #78

raisulrana opened this issue Apr 8, 2022 · 6 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@raisulrana
Copy link

raisulrana commented Apr 8, 2022

I am facing AttributeError: 'ScrapyPlaywrightDownloadHandler' object has no attribute 'browser' suddenly. I noticed that it is having trouble launching the browser. I have set ```
PLAYWRIGHT_LAUNCH_OPTIONS = {
"headless": False
}

2022-04-08 23:40:35 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method ScrapyPlaywrightDownloadHandler._engine_started of <scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler object at 0x7f8c31878eb0>>
Traceback (most recent call last):
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/twisted/internet/defer.py", line 1030, in adapt
    extracted = result.result()
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/scrapy_playwright/handler.py", line 130, in _launch_browser
    self.browser = await browser_launcher(**self.launch_options)
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/playwright/async_api/_generated.py", line 11633, in launch
    await self._async(
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/playwright/_impl/_browser_type.py", line 90, in launch
    Browser, from_channel(await self._channel.send("launch", params))
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 39, in send
    return await self.inner_send(method, params, False)
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 63, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Protocol error (Browser.enable): Browser closed.
==================== Browser output: ====================
<launching> /home/raisulrana/.cache/ms-playwright/firefox-1313/firefox/firefox -no-remote -wait-for-browser -foreground -profile /tmp/playwright_firefoxdev_profile-RtZ4kg -juggler-pipe -silent
<launched> pid=16467
[pid=16467][err] Error: no DISPLAY environment variable specified
[pid=16467] <process did exit: exitCode=1, signal=null>
[pid=16467] starting temporary directories cleanup
=========================== logs ===========================
<launching> /home/raisulrana/.cache/ms-playwright/firefox-1313/firefox/firefox -no-remote -wait-for-browser -foreground -profile /tmp/playwright_firefoxdev_profile-RtZ4kg -juggler-pipe -silent
<launched> pid=16467
[pid=16467][err] Error: no DISPLAY environment variable specified
[pid=16467] <process did exit: exitCode=1, signal=null>
[pid=16467] starting temporary directories cleanup
============================================================
2022-04-08 23:40:36 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.proxy-daily.com:80
2022-04-08 23:40:37 [urllib3.connectionpool] DEBUG: http://www.proxy-daily.com:80 "GET / HTTP/1.1" 301 None
2022-04-08 23:40:37 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.proxy-daily.com:443
2022-04-08 23:40:37 [urllib3.connectionpool] DEBUG: https://www.proxy-daily.com:443 "GET / HTTP/1.1" 301 None
2022-04-08 23:40:37 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): proxy-daily.com:443
2022-04-08 23:40:38 [urllib3.connectionpool] DEBUG: https://proxy-daily.com:443 "GET / HTTP/1.1" 200 None
2022-04-08 23:40:38 [charset_normalizer] DEBUG: Encoding detection: utf_8 is most likely the one.
2022-04-08 23:40:38 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.us-proxy.org:443
2022-04-08 23:40:39 [urllib3.connectionpool] DEBUG: https://www.us-proxy.org:443 "GET / HTTP/1.1" 200 None
2022-04-08 23:40:39 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.sslproxies.org:443
2022-04-08 23:40:39 [urllib3.connectionpool] DEBUG: https://www.sslproxies.org:443 "GET / HTTP/1.1" 200 None
2022-04-08 23:40:39 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): free-proxy-list.net:443
2022-04-08 23:40:39 [urllib3.connectionpool] DEBUG: https://free-proxy-list.net:443 "GET /anonymous-proxy.html HTTP/1.1" 200 None
2022-04-08 23:40:39 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): free-proxy-list.net:443
2022-04-08 23:40:40 [urllib3.connectionpool] DEBUG: https://free-proxy-list.net:443 "GET /uk-proxy.html HTTP/1.1" 200 None
2022-04-08 23:40:40 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): www.free-proxy-list.net:80
2022-04-08 23:40:40 [urllib3.connectionpool] DEBUG: http://www.free-proxy-list.net:80 "GET / HTTP/1.1" 301 None
2022-04-08 23:40:40 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.free-proxy-list.net:443
2022-04-08 23:40:40 [urllib3.connectionpool] DEBUG: https://www.free-proxy-list.net:443 "GET / HTTP/1.1" 301 None
2022-04-08 23:40:40 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): free-proxy-list.net:80
2022-04-08 23:40:40 [urllib3.connectionpool] DEBUG: http://free-proxy-list.net:80 "GET / HTTP/1.1" 301 None
2022-04-08 23:40:40 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): free-proxy-list.net:443
2022-04-08 23:40:41 [urllib3.connectionpool] DEBUG: https://free-proxy-list.net:443 "GET / HTTP/1.1" 200 None
2022-04-08 23:40:41 [scrapy_proxy_pool.middlewares] WARNING: No proxies available.
2022-04-08 23:40:41 [scrapy_proxy_pool.middlewares] INFO: Try to download with host ip.
2022-04-08 23:40:41 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.bbc.com>
Traceback (most recent call last):
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
    result = current_context.run(
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/scrapy/core/downloader/middleware.py", line 49, in process_request
    return (yield download_func(request=request, spider=spider))
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/twisted/internet/defer.py", line 1030, in adapt
    extracted = result.result()
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/scrapy_playwright/handler.py", line 213, in _download_request
    page = await self._create_page(request)
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/scrapy_playwright/handler.py", line 158, in _create_page
    context = await self._create_browser_context(context_name, context_kwargs)
  File "/home/raisulrana/anaconda3/envs/scrapy/lib/python3.10/site-packages/scrapy_playwright/handler.py", line 144, in _create_browser_context
    context = await self.browser.new_context(**context_kwargs)
AttributeError: 'ScrapyPlaywrightDownloadHandler' object has no attribute 'browser'
2022-04-08 23:40:41 [scrapy.core.engine] INFO: Closing spider (finished)
2022-04-08 23:40:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'bans/error/builtins.AttributeError': 1,
 'downloader/exception_count': 1,
 'downloader/exception_type_count/builtins.AttributeError': 1,
 'downloader/request_bytes': 295,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'elapsed_time_seconds': 9.730816,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 4, 8, 17, 40, 41, 489424),
 'log_count/DEBUG': 26,
 'log_count/ERROR': 3,
 'log_count/INFO': 13,
 'log_count/WARNING': 1,
 'memusage/max': 63303680,
 'memusage/startup': 63303680,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2022, 4, 8, 17, 40, 31, 758608)}
2022-04-08 23:40:41 [scrapy.core.engine] INFO: Spider closed (finished)
@elacuesta
Copy link
Member

The relevant error happens earlier, when trying to start the browser: Error: no DISPLAY environment variable specified. This means you're probably running the spider in an environment without properly configured access to a graphics system, and this is a problem for the headless=false option that you're using.

@raisulrana
Copy link
Author

@elacuesta , the default settings for scrapy-playwright are working fine. but for some reason, I need to set the headless=false I am running this using WSL on my windows platform. I tried reinstalling conda venv but the same result. I have recently reinstalled the windows and after that, I am facing this issue.

bugging-pipe --no-startup-window
<launched> pid=17420
[pid=17420][err] [17420:17420:0411/001159.402230:ERROR:ozone_platform_x11.cc(247)] Missing X server or $DISPLAY
[pid=17420][err] [17420:17420:0411/001159.402285:ERROR:env.cc(225)] The platform failed to initialize.  Exiting.

@elacuesta
Copy link
Member

elacuesta commented Apr 10, 2022

Makes sense, I don't think WSL provides an X server out of the box. I have no experience with WSL (nor access to a Windows system to try things), but a quick search for "wsl x server" yields results like this one, I'd suggest you to look into that. Let me know if that works for you, perhaps I can add a note to the Readme about this specific scenario.

(Edit) see also #7.

@raisulrana
Copy link
Author

@elacuesta Thanks a lot. It worked. I just followed the process you mentioned. I just needed to install it into my venv (I am using anconda). After installing it what I need to do is, just run these two steps export DISPLAY=172.31.0.1:0.0
export LIBGL_ALWAYS_INDIRECT=1 (for WSL 2) before running the spider. That's it. Thanks a lot

@elacuesta
Copy link
Member

Thanks, glad you got it working.

@elacuesta elacuesta added documentation Improvements or additions to documentation enhancement New feature or request labels Apr 12, 2022
@elacuesta elacuesta changed the title AttributeError: 'ScrapyPlaywrightDownloadHandler' object has no attribute 'browser' Exception with headless=False under WSL May 15, 2022
@elacuesta
Copy link
Member

Note added 7785789.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants