Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why use new URL() may lead to memory leak ? #17448

Closed
liuqipeng417 opened this issue Dec 4, 2017 · 10 comments
Closed

Why use new URL() may lead to memory leak ? #17448

liuqipeng417 opened this issue Dec 4, 2017 · 10 comments
Labels
memory Issues and PRs related to the memory management or memory footprint. whatwg-url Issues and PRs related to the WHATWG URL implementation.

Comments

@liuqipeng417
Copy link

liuqipeng417 commented Dec 4, 2017

  • Version: 8.4.0
  • Platform: Linux version 3.10.0-514.10.2.el7.x86_64
  • Subsystem: CentOS Linux release 7.3.1611 (Core)

My project is using express.js, and having a code fragment like this:

conse {URL} = require('url');
const express = require('express');

const router = express.Router();

router.put('', (req, res) => {
  let data = req.body.flow ? JSON.parse(req.body.flow) : {};
  if (data.url && !data.domain) {
    data.domain = new URL(data.url).hostname;
  }

  res.send();
})

When many request come in, the project memory will rise slowly ,and sometimes it will drop a little. but in general it's still up. When i change to url.parse(), the memory looks normal. I use PM2 to look memory.

What's the difference between the two(new URL and url.parse) in memory or gc?

@mscdex mscdex added memory Issues and PRs related to the memory management or memory footprint. whatwg-url Issues and PRs related to the WHATWG URL implementation. v8.x labels Dec 4, 2017
@apapirovski
Copy link
Member

I would figure that there's a higher memory requirement given internals of the WHATWG-URL interface but it sounds like you're saying there's an actual memory leak? Could you provide some more precise numbers/stats? Are we talking a steady increase (say, 10% higher, but not constantly rising) or a never-ending increase over the run of the process?

/cc @jasnell & @TimothyGu

@TimothyGu
Copy link
Member

TimothyGu commented Dec 4, 2017

The WHATWG URL implementation uses C++, so there's a possibility that the C++ code has some memleaks in them. In fact Valgrind seems to agree: there's an awful lot of things like the following coming from memcheck:

==28049== 649,419 bytes in 20,949 blocks are definitely lost in loss record 701 of 703
==28049==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==28049==    by 0x5665A46: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==28049==    by 0x10E4CAE: node::url::ParseHost(node::url::url_host*, char const*, unsigned long, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E16C5: node::url::URL::Parse(char const*, unsigned long, node::url::url_parse_state, node::url::url_data*, bool, node::url::url_data const*, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E7776: node::url::Parse(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049== 984,467 bytes in 31,757 blocks are definitely lost in loss record 702 of 703
==28049==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==28049==    by 0x5665A46: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==28049==    by 0x10E4CAE: node::url::ParseHost(node::url::url_host*, char const*, unsigned long, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E16C5: node::url::URL::Parse(char const*, unsigned long, node::url::url_parse_state, node::url::url_data*, bool, node::url::url_data const*, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049== 3,721,922 bytes in 120,062 blocks are definitely lost in loss record 703 of 703
==28049==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==28049==    by 0x5665A46: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==28049==    by 0x10E4CAE: node::url::ParseHost(node::url::url_host*, char const*, unsigned long, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E16C5: node::url::URL::Parse(char const*, unsigned long, node::url::url_parse_state, node::url::url_data*, bool, node::url::url_data const*, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E7776: node::url::Parse(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/timothy-gu/dev/node/node/out/Release/node)

all pointing towards ParseHost() as the culprit.

Test script:

'use strict';

const { URL, parse } = require('url');
// Adjust to taste.
for (let i = 0; i < 100001; i++) {
  new URL('https://domain1.domain.test.com/789012/3456/789012345678-2?section=10');
  if (i % 1000 === 0) {
    gc();
  }
  if (i % 100000 === 0) {
    console.log(process.memoryUsage());
  }
}

@TimothyGu
Copy link
Member

Tracked it down to

==29610== 1,546,342 bytes in 49,882 blocks are definitely lost in loss record 85 of 85
==29610==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==29610==    by 0x5665FD4: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==29610==    by 0x10EA1B0: node::url::PercentDecode(char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10EB0AF: node::url::ParseHost(node::url::url_host*, char const*, unsigned long, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10E9839: node::url::ParseHost(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10E7D0D: node::url::URL::Parse(char const*, unsigned long, node::url::url_parse_state, node::url::url_data*, bool, node::url::url_data const*, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10F55D2: node::url::Parse(node::Environment*, v8::Local<v8::Value>, char const*, unsigned long, node::url::url_parse_state, v8::Local<v8::Value>, v8::Local<v8::Value>, v8::Local<v8::Function>, v8::Local<v8::Value>) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10F29C0: node::url::Parse(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x9F4A8E: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) (in /home/timothy-gu/dev/node/node/out/Release/node)

But I don't understand why this is the case, as everything should be allocated on the stack (and therefore freed automatically)...

@addaleax
Copy link
Member

addaleax commented Dec 4, 2017

@TimothyGu How did you run Node + valgrind to get that output?

@liuqipeng417
Copy link
Author

@apapirovski
The phenomenon I observed is that memory continues to rise, and memory is released(a little), but the overall trend is rising. My online project (one process qps is 50 - 100) runs 7 days, the memory is up to 1GB. When using url.parse , the memory is stable in 100MB.

@TimothyGu
Copy link
Member

TimothyGu commented Dec 5, 2017

@addaleax

valgrind --leak-check=full --undef-value-errors=no ./node --expose-gc a.js where a.js contains the script in #17448 (comment). Valgrind is the default installation in Debian stable (1:3.12.0~svn20160714-1). node is built normally with Clang 3.9 (release config) except node_url.o, which is manually built with -O0.

@joyeecheung
Copy link
Member

joyeecheung commented Dec 5, 2017

@TimothyGu That might be a false positive from memory pooling because it is not run with a debug build of STL, see http://valgrind.org/docs/manual/faq.html#faq.reports

@TimothyGu
Copy link
Member

@joyeecheung That FAQ talks about "still reachable" leaks. The leak here is "definitely lost".

@trygve-lie
Copy link

trygve-lie commented Dec 5, 2017

We have been struggelig for a while with a very small memory leak without being able to find it. We have almost simmilar code as posted in the initial message here and I'm able to get that part of our code to leak when stresstesting it.

In my testing I noticed that the leak are only happening if the host have more then 3 parts.

In other words, this does leak:

new URL('https://domain1.domain.test.com/789012/3456/789012345678-2?section=10');

This does not leak:

new URL('https://domain.test.com/789012/3456/789012345678-2?section=10');

@addaleax
Copy link
Member

addaleax commented Dec 5, 2017

This is happening because union url_host_value has std::string members but an empty destructor.

I’m working on this but it’s probably going to be a bit of a refactor to fix this in a safe way

addaleax added a commit to addaleax/node that referenced this issue Dec 5, 2017
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

Fixes: nodejs#17448
MylesBorins pushed a commit that referenced this issue Jan 8, 2018
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

PR-URL: #17470
Fixes: #17448
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this issue Jan 22, 2018
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

PR-URL: #17470
Fixes: #17448
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
addaleax added a commit to addaleax/node that referenced this issue Jan 23, 2018
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

PR-URL: nodejs#17470
Fixes: nodejs#17448
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this issue Feb 5, 2018
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

Backport-PR-URL: #18324
PR-URL: #17470
Fixes: #17448
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this issue Feb 11, 2018
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

Backport-PR-URL: #18324
PR-URL: #17470
Fixes: #17448
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this issue Feb 12, 2018
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

Backport-PR-URL: #18324
PR-URL: #17470
Fixes: #17448
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this issue Feb 13, 2018
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

Backport-PR-URL: #18324
PR-URL: #17470
Fixes: #17448
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory Issues and PRs related to the memory management or memory footprint. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants