Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory #7455

Closed
phongtnit opened this issue Jul 4, 2021 · 3 comments

Comments

@phongtnit
Copy link

phongtnit commented Jul 4, 2021

Hello,

I crawled first 2k domains from Majestic to get html content and take a screenshot of these domains with Scrapy, my script worked fine. However, when I increased to scrape about 4-10k domains and had same other settings, the error Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory occurred. How to fix out of memory issue? Many thanks,

Detail of the error

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xa18150 node::Abort() [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 2: 0xa1855c node::OnFatalError(char const*, char const*) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 3: 0xb9715e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 4: 0xb974d9 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 5: 0xd54755  [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 6: 0xd54de6 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 7: 0xd616a5 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 8: 0xd62555 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
 9: 0xd6500c v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
10: 0xd32eac v8::internal::Factory::NewRawOneByteString(int, v8::internal::AllocationType) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
11: 0xd32f71 v8::internal::Factory::NewStringFromOneByte(v8::internal::Vector<unsigned char const> const&, v8::internal::AllocationType) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
12: 0xbaf62f v8::String::NewFromOneByte(v8::Isolate*, unsigned char const*, v8::NewStringType, int) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
13: 0xaf60aa node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, node::encoding, v8::Local<v8::Value>*) [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
14: 0x9f33a6  [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
15: 0x1390d8d  [/home/ubuntu/.local/lib/python3.8/site-packages/playwright/driver/node]
Aborted (core dumped)

My env

Ubuntu 20.04 on a server with 8 CPU cores and 64GB RAM
Python 3.8.5
playwright 1.12.1
Scrapy 2.5.0
scrapy-playwright 0.0.3
@mxschmitt
Copy link
Member

From a short look into the code of scrapy-playwright it seems like they always reuse the browser context. This then ends up in an out of memory error, see #6319.

I would recommend to open an issue on their side, since it's best practice to have a new context when navigating to a lot of different sites. This should fix the memory allocation issue.

@phongtnit
Copy link
Author

Thanks @mxschmitt

@mxschmitt
Copy link
Member

Closing since we track this currently in #6319 and it can be fixed by creating a new context each time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants