-
-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Sort of) memory leak with entry objects #2398
Comments
Ah, and BTW, if I don't do anything with the entry object, just create it, like so: foreach ($data as $index => $fields) {
$entry = EntryManager::create();
$entry->set('section_id', $section_id);
$entry->set('author_id', $author_id);
if (!empty($fields['id'])) {
$entry->set('id', $fields['id']);
}
echo 'Success: entry ' . ($index + 1) . "\n";
echo number_format(memory_get_peak_usage(true), 0, '.', ',') . " bytes\n";
} the memory consumption does not increase at all. |
This will be largely dependent on what fields and how many are in the Are you able to narrow this down at all?
|
I have only one custom field (one that is similar to the Reflection Field), and I already removed that to no avail. The rest of the fields is: Input, Selectbox, SBL and Checkbox. The section has 32 fields. It's hard for me to narrow this down. Honestly, at the moment I don't understand why the memory consumption grows at all. What exactly remains in memory? I will try. |
The following functions do not increase memory consumption:
The following functions increase memory consumption:
I also verified that the size of the entry arrays (i.e. the number of elements) has an influence. If I remove unneeded keys from the arrays, things are a bit better. To prevent the phenomenon altogether, I will probably have to spawn child processes for saving the entries, right? |
Or wouldn't that help either? |
Keep in mind that anything that does Reflection like capabilities is inherently expensive. This is because the single field must build the Entry object before it can function. This would be my top suspect at the moment. Is it possible to duplicate the section and remove this field and see if the trend is the same?
At the moment it's a process of elimination I would think, so is it possible to narrow down which of these three is causes the memory use to grow? Tracking the delegate call should be simpler because we just have to look at the extensions that are subscribing to that delegate.
Unsure, it's very hard to say without knowing exactly where the memory leak is occurring, and this in itself could be tricky to nut out. I've not had to search for a memory leak in PHP before so I'm unsure where to start. A tool like MacGDBP may be able to shed some light. Alternatively, perhaps |
I have already done that, and it didn't change anything. (My field is also a lot simpler than the Reflection Field. No XSLT processing, just concatenating some strings. It's only similar in how it "registers".)
Any of them.
I also tried that, and it doesn't help at all. I am just coding a child process and will watch memory consumption then. If it doesn't help, I will have to investigate more. |
Spawning child processes in the CLI, using something like this: shell_exec('php ' . EXTENSIONS . '/some/path/cli.backgroundprocess.php ' . escapeshellarg(serialize($fields))); solves the memory issue, but it is terribly slow (around ten times slower). This may be caused by the fact that every child process has to load Symphony. :-) I will have to dig deeper. |
So regardless of which is commented out, the memory growth is consistent and at the end of 5000 entries the value is the same? I don't suppose you're able to package any of this up into an ensemble so I can aid in debugging? As I mentioned before, the leak really could be anywhere. It can be in the EntryManager, the SectionManager, FieldManager, ExtensionManager, inside the actual Field itself or any one of the Entry classes! |
Are you able to pull this out of your loop? If this was defined as |
No, as mentioned in an earlier comment, every single function has an impact, but the problem gets worse if I use them all.
Difficult. I would have to think about it. There are legal issues as well (with the data). Here are some new figures: Using spawned processes is a lot faster, of course, when the processes are pushed to the background (i.e. you stop waiting for them) by appending Spawned, background:
All in one process (initial setup):
I will try this now! |
Including
Removing
Hmmm, no significant difference. :-( |
Removing a single function,
Removing a single function,
Removing both,
Removing three functions,
Also removing
|
As you see, there are three functions that "leak" memory. |
Let me know what you come up with. Because of the scope of where the leak could be originating from, the best way for me to help it to be able to replicate the issue on my machine and then start debugging as well. If the problem is not data specific, we should be able to recreate this using 'fake data', but I'm keen to try and replicate your exact setup as closely as possible. Otherwise, I can spin up a dummy script that creates fake data for a section that contains all of the core fields and share that ensemble with you, essentially a test case. We can use this as base to attempt to first replicate and then resolve the problem. If we can't replicate with this test case, then we'll have to start adding fields to bridge the gap between the simple test and the actual test until we find the breaking point. I assume the environment is running PHP 5.6.7? |
The whole thingie is running on a virtual server, I don't even have a local installation. On the server I am running PHP 5.4.39. Debugging would probably be easier on localhost, right? I will come back to this a bit later, since I have a deadline for initial data import. Since all these Excel files have 5000 rows, I could do it with the current script. (It might become a serious issue later, when people try bigger files.) One more interesting test: Using
Without dry-run mode:
(BTW, dry-run seems to be the only way to successfully set a |
My gut feeling says it's the database (MySQL) class. Symphony has a static |
(Excuse me if I am completely wrong with the assumption above. As you know, my PHP skills are pretty limited.) |
Michael, I think your assumption is right. The MySQL instance is destroyed only when the php script exits. |
OK, I think I can live with the memory consumption. For 10.000 entries the memory footprint is less than 800 MB, which is still OK for me. The good thing is that the script is pretty fast (more than 30 entries per second, with 33 fields in the section). I will tell the client that 10.000 entries is the importer's limit. For anybody who is interested, this is the final code: // …
// first step: building an array of entries… (keys must match the field handles)
// …
// second step: plug 'em into Symphony
foreach ($data as $index => $fields) {
$entry = EntryManager::create();
$entry->set('author_id', $author_id);
$entry->set('section_id', $section_id);
$entry->set('creation_date', DateTimeObj::get('c', $fields['created']));
$entry->set('creation_date_gmt', DateTimeObj::getGMT('c', $fields['created']));
if (!empty($fields['id'])) {
// We must check if the provided entry ID exists
$existing = EntryManager::fetch($fields['id']);
$existing = $existing[0];
if (is_object($existing)) {
$entry->set('id', $fields['id']);
$entry->set('creation_date', $existing->get('creation_date'));
}
}
if (Entry::__ENTRY_FIELD_ERROR__ == $entry->checkPostData($fields, $errors)) {
foreach ($errors as $field_id => $message) {
echo 'ERROR : entry ' . ($index + 1) . ' field ' . $field_id . ': ' . $message . "\n";
}
$count_errors ++;
} else if (Entry::__ENTRY_OK__ != $entry->setDataFromPost($fields, $errors, true)) {
foreach ($errors as $field_id => $message) {
echo 'ERROR : entry ' . ($index + 1) . ' field ' . $field_id . ': ' . $message . "\n";
}
$count_errors ++;
} else if (!$entry->commit()) {
echo 'ERROR : entry ' . ($index + 1) . "\n";
$count_errors ++;
} else {
Symphony::ExtensionManager()->notifyMembers('EntryPostEdit', '/publish/edit/', array(
'section' => SectionManager::fetch($section_id),
'entry' => $entry,
'fields' => $fields
));
echo 'Success: entry ' . ($index + 1) . "\n";
}
unset($errors);
} @brendo: If you also tend to think that the memory thingie is "normal behaviour", feel free to close this issue. |
Whoa, be careful with this. It will cause some fields to skip the actual saving work (for example, the Upload field).
Sounds like a bug
While I'd expect the memory to grow when importing, it is a bit curious that it's growing at a consistent rate instead of being absorbed back into the garbage collection. The MySQL class does keep a static log of all the queries that are run, so there will be some memory growth as that log gets larger, but I wouldn't expect it to as large as it is! |
Regarding the bug with |
OK, I will send a very simple test scenario (built on a vanilla installation of Symphony) to @brendo. Anybody else who is interested? @nitriques? |
I think I noticed something similar to Micheal, didn't bother much as I am My scenario is slightly different as I needed two DB Connections (to get On 7 April 2015 at 13:41, michael-e notifications@github.com wrote:
|
In the test scenario that I have sent to @brendo there are Input fields exclusively. Nevertheless the memory phenomenon can be seen. So it has nothing to do with any special fields, that's for sure. |
Yep, just for transparency sake, I was able to run @michael-e's test case and my environment produced a slightly better result, but I still consider it high:
I'll be looking at using XDebug and the aforementioned MacGDBP to try and find the pain points. |
Well, that was fast. On @michael-e's hunch, I immediately commented out the Importing the 5000 entries resulted in: Memory peak usage: 4M |
So the question now is, how should we approach solving this?
The purpose of the logged queries at the moment is for the Profiler, and also for if an error occurs so a backtrace can be displayed. I'm leaning towards a rolling window of queries approach, although it has it's potential downsides, I'm not sure we'll actually realise any of them. |
Wow, that was fast. I wouldn't like to remove the logging. Any chance to turn it off when running a background script (like mine)? Do you have an idea? |
Something like |
I tested a little "hack", and it seems to work. I added another private static variable to the MySQL class: private static $_logging = true; Then added a method to disable logging: public static function disableLogging()
{
self::$_logging = false;
} and wrapped the code mentioned by @brendo in a condition: if (self::$_logging === true) {
//…
} In my importer script, I added: Symphony::Database()->disableLogging(); Tested on PHP 5.3.28, the maximum memory consumption is 19 MB (compared to nearly 400 MB with logging enabled). At the same time profiling database queries works normally for frontend pages (which must be one of the wonders of object-oriented programming—have I already mentioned that I am rather bad in programming?). |
Bingo, with the above change to the MySQL class I performed a successful test import with 50,000 (test) datasets!
(Memory usage should be even lower with newer PHP versions.) |
Wow that's great! the disablelogging thing is nice! I would love to have that on production server... Maybe consider to add a config setting ? |
Do you think a config setting a significant impact on a production site? For standard Symphony pages, there shouldn't be more than (a few) 1000 queries or so, so IMHO the memory consumption for logging would be rather small, wouldn't it? |
Seeing, how much memory logs can consume, I would love to be able to not generate them in production. I do have sites where the number of SQL queries is >> 1000 |
I am not against it. Just wanted to know… :-) |
:) |
Yep that'll work :) |
|
Yes to all of the above. Happy for you to submit the PR if you have time.
|
I will try to do at least number 1. I am not sure about the config option, because it should be added to the updater as well. Shall I submit against integration, and you will then cherry-pick the commit into 2.6.x? FYI: Success! I did a "real-life" import (i.e. with production data) using the fix in the MySQL class:
I am very happy that you figured it out, @brendo! |
Is this a bugfix or a new feature? PR should be submitted against |
Well, it is both. It fixes an issue with memory consumption by adding a new feature (to switch off the query logger). So what shall I do? |
|
I am a bit lost with the PDO rewrite of the database stuff, which is in the integration branch. Actually it already has a parameter called (Anyway, I could send my basic fix to the 2.6 branch if @brendo agrees. We'll need different implementations for 2.6 and integration/3.0.) |
Sorry for missing this. Yes, please submit a basic fix for
I don't know, it looks like a mistake, perhaps from debugging the feature. Those lines should simply call |
@brendo: Feel free to do it differently—you know much better than me. |
This is good, I'll add two additional methods, |
Ah, I understand, thanks! |
Add possibility to disable query logging. RE: #2398
Done, the Database class has been updated with the |
Thanks @brendo really cool feature! |
Works like a charm. It's nice that you can disable the query cache as well, so you can check how important this cache is to your site (and if you should check/optimize the MySQL settings…). Thanks, @brendo! |
@michael-e Indeed!! |
I am importing 5000 entries using the PHP CLI. The script works with Symphony and is supposed to build the core of a web-based (frontend) importer later, so I care about memory consumption.
The initial memory consumption of the script is around 47 MB. (This is caused by parsing a rather big Excel XML worksheet, so that is ok.)
Then I start saving entries to Symphony basically using the following code:
Memory consumption stays at the initial level of 47 MB for around 150 entries, then starts growing slowly. It reaches 437 MB (!) for entry number 5000.
I did some more tests:
$entry->checkPostData
and$entry->setDataFromPost
only (i.e. without$entry->commit()
), memory still grows, but much slower. It reaches 55 MB for entry number 5000.$entry->commit()
and the extension manager notification (i.e. without$entry->checkPostData
and$entry->setDataFromPost
), it reaches 239 MB.(Isn't this strange? If I add these figures, this is still much lower than using everything at the same time. I really double-checked these figures!)
I even tried to
sleep(1)
the script after bunches of 100 entries, giving the garbage collector a lot of time to do its job — to no avail.Do you have any ideas what is happening here?
The text was updated successfully, but these errors were encountered: