Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClassManifest / TokenisedRegularExpression appears to have memory leaks #5549

Closed
sminnee opened this issue May 17, 2016 · 14 comments
Closed

Comments

@sminnee
Copy link
Member

sminnee commented May 17, 2016

A frequent source of frustration when setting up new dev environments is that visiting the sites will run out of memory while compiling the initial manifest. This is exacerbated by the fact that the manifest is rebuild for www-data and CLI users, but that's a separate problem.

I believe that the issue is called when the file-by-file parsing needs to happen for all files in the project. This suggests that a memory leak in this subsystem is responsible for the problem.

@dhensby
Copy link
Contributor

dhensby commented May 18, 2016

pull TokenisedRegularExpression out for https://github.com/nikic/PHP-Parser :)

@dhensby
Copy link
Contributor

dhensby commented May 18, 2016

@kinglozzer took a quick look at doing it here: master...kinglozzer:classmanifest

but it takes too long and uses too much memory, I think but the feature of breaking out of the node traverser hasn't been implemented, which I think we can use aggressively to save time (and maybe memory?)

@sminnee
Copy link
Member Author

sminnee commented May 19, 2016

It will be interesting to see if an approach using nikic's parser can be made to work, but until the speed and memory usage issues are solved, it would make this problem worse, not better. It wouldn't surprise me if there were some quick wins with the current codebase and it's probably worth addressing those.

Bringing in nikic/php-parser feels a bit like using a sledgehammer to crack a walnut, but it would be a more robust project to rely on.

@dhensby
Copy link
Contributor

dhensby commented May 19, 2016

It reduces the coffee we have to maintain, but yes it's got issues we'd like to address.

Copying composer's approach to class finding may be the way to speed this up. We don't need to tokenise more than a few lines of a file to get the info we need so we can look at trashing a ton of the coffee before tokenisation

@sminnee
Copy link
Member Author

sminnee commented May 19, 2016

Composer's approach is inferior and not suited to our use case. It interprets neither the implements not extends keywords. It's a non-starter.

Trying to reliably tokenise a partial file is hard to do reliably, and the code basically starts by scanning for a T_CLASS or T_INTERFACE item.

We could consider moving to a better parser but we're not moving to a worse one.

@dhensby
Copy link
Contributor

dhensby commented May 19, 2016

Composer's approach is inferior and not suited to our use case. It interprets neither the implements not extends keywords. It's a non-starter.

Sorry, by copying composers approach, I mean aggressively discarding code that we don't need to parse. @kinglozzer has had a go at it and had some success.

We could consider moving to a better parser but we're not moving to a worse one.

Agreed, but can we get worse than what we've got? ;)

@sminnee
Copy link
Member Author

sminnee commented May 19, 2016

Agreed, but can we get worse than what we've got? ;)

Composer's. Last I checked it used regexes.

@dhensby
Copy link
Contributor

dhensby commented May 20, 2016

Touché

@kinglozzer
Copy link
Member

Composer's approach is inferior and not suited to our use case. It interprets neither the implements not extends keywords. It's a non-starter.

Sorry, by copying composers approach, I mean aggressively discarding code that we don't need to parse. @kinglozzer has had a go at it and had some success.

Yup, Composer strips out comments, docblocks etc before parsing and it does save a small amount of CPU time (I don’t have the profiler runs to hand, but it didn’t significantly impact memory usage either way in the runs I did). Speed-ups are nice, but unfortunately miss the point of this ticket :(

The other thing I tried was simply removing everything inside the “top-level” braces that we don’t need, that is:

class Foo extends Bar {
    // everything here will be removed
    protected $iWillBeRemoved;
}

trait Baz {
    // also here
    public $baz;
}

// would become
class Foo extends Bar { }
trait Baz { }

It was extremely crude and relied on copying Composer’s approach to stripping comments out first (so comments including things like {@link Foo} didn’t break it entirely). Not sure if we could make something like that work, and it needs profiling properly to see if it would be of any benefit anyway.

@dhensby
Copy link
Contributor

dhensby commented May 20, 2016

We need to test on "real world" sites to see the real impact

@sminnee
Copy link
Member Author

sminnee commented May 24, 2016

Running it on the CWP recipe, which has a bunch of modules, would be a reasonable real-world test.

@sminnee
Copy link
Member Author

sminnee commented May 24, 2016

OK, I raised this issue based the reports of out-of-memory errors here (https://groups.google.com/forum/#!topic/silverstripe-dev/YQ9ru7UuUV0) but I'm having trouble replicating this...

@dhensby
Copy link
Contributor

dhensby commented May 24, 2016

I'm able to exhaust 256mb of memory on a CWP install when including every test suite from the modules that have them.

I think each test does add more and more memory consumption but in not sure it's a manifest issue

@sminnee
Copy link
Member Author

sminnee commented Sep 21, 2016

Closing this issue as a false-report.

@sminnee sminnee closed this as completed Sep 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants