Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

php 8 - exit() takes a very long time to send response #10620

Open
fred-gregorio opened this issue Feb 19, 2023 · 14 comments
Open

php 8 - exit() takes a very long time to send response #10620

fred-gregorio opened this issue Feb 19, 2023 · 14 comments

Comments

@fred-gregorio
Copy link

fred-gregorio commented Feb 19, 2023

Code

<?php
   $stid= [];
    $cnt= 0;
    set_time_limit(600);

    $epg= simplexml_load_file("epg.xml", "SimpleXMLElement", LIBXML_NOCDATA);
    if ($epg === FALSE) {
        exit ( "Malformed XML" );
    }

    foreach  ($epg->programme as $p) {
        /* just do something.... */
        $ev= json_decode(json_encode($p), true);
        $pieces= explode(" ", $ev['@attributes']['start']);
        $st= $pieces[0];
        $start= strtotime(substr($st, 0, 4)."-".substr($st, 4, 2)."-".substr($st, 6, 2)." ".substr($st, 8, 2).":".substr($st, 10, 2).":".substr($st, 12, 2));
        $cnt++;
    }
    exit ( "Done, $cnt programs loaded" );
?>

Description

For some reason the php opening line is correct, it breaks the report, so I inserted a space after <

This snipet of code takes around 15 seconds to execute and get to the exit() line. Existing, however, takes 10minutes, until the response is sent. The input file (epg.xml) has around 330,000 nodes. The actual code inside the foreach() loop doesn't realy matters much.

Tested with 8.1.14 and 8.2.3 - Same result on both.

Executing the same code with php 7.4.28 works fine, taking around 15 seconds to execute and spit out the response.

Relevant php configs : max_memory= 256M

Test environment: laptop HP, i7 8th Gen, 32GB RAM, 2TB nVme, Windows 11 PRO - 64 bits

The input file (epg.xml) is provided gz compressed.

epg.xml.gz

PHP Version

8.2.3

Operating System

Windos 11 64bits

@nielsdos
Copy link
Member

I'll take a look today. Could be coincidence, but 10 minutes is equal to 600 seconds, which is your time limit. I wonder if it exits because of the time limit and doesn't actually finish...

@fred-gregorio
Copy link
Author

Hi Niels, thanks for the promptness!

I ran with lower time limit, but it didn't end any sooner.

Let me know if I can help with anything else.

Best,
Fred

@nielsdos
Copy link
Member

Preliminary testing shows that it's the cleaning up of the XML objects taking a lot of time.
I'll keep this updated with my findings.

@nielsdos
Copy link
Member

nielsdos commented Feb 19, 2023

I profiled it with Callgrind.
xmlFreeDoc(), (and transitively free()) take up the bulk of the time, not PHP itself... I wonder why it was better in 7.4 (some cleanup not done?).

@fred-gregorio
Copy link
Author

It makes sense. One additional info, though: Just loading the XML file doesn't cause the problem. It only happens when the nodes are fetched (json_encode($p)). My guess is that the XML module leaves all the allocated elements to be freed in the end. Question: since the script is terminating is it necessary to free them, won't the whole process' memory be deallocated by the OS?

@nielsdos
Copy link
Member

nielsdos commented Feb 19, 2023

Yeah, I'm afraid the freeing problem is more a libxml2 problem than a PHP problem...

Question: since the script is terminating is it necessary to free them, won't the whole process' memory be deallocated by the OS?

If you're using a single process PHP instance (for example on CLI), then yes you are right. You don't strictly need to free the data in that case, but that is only true for your specific workload here. If it's important to just exit quickly I can give you a hack for posix systems to just forcibly kill the process. On posix systems you could change the exit line to:

echo( "Done, $cnt programs loaded" );
posix_kill(posix_getpid(), 9);

However, I just noticed you're on Windows, so I don't know what we can do in this case...

@fred-gregorio
Copy link
Author

Got it, thanks for the explanation.
Regarding the hack, all my customers are set with PHP 7 on Linux. Windows is just the plataform where I develop, and I can easily switch to PHP 7, since I got WAMP installed.

Once again, thanks for your time.

@nielsdos
Copy link
Member

I dug into this more, although the performance of 7.4 & 8.1 is now the same, there is still something strange for 8.2 and above.
I tested with the reproducing script with the foreach loop removed for simplicity.
For 8.2 I get a time of 0m2.975s, but for 8.1 I get 0m2.152s.
I bisected it and it pointed me to 625f164, which seemed unrelated, but indeed upon reverting that code I got the performance back.
After digging into it more, I found that percentage-wise more time is spent in malloc() and free() even though the number of calls to those functions is equal before and after that commit. Interestingly for free() a whopping 6% extra time was spent consolidating chunks.

I noticed that commit allocates more memory because every function now gets a mapped pointer, so there are more calls to zend_map_ptr_new.
Looking at that function:

php-src/Zend/zend.c

Lines 1939 to 1953 in f5f44bb

ZEND_API void *zend_map_ptr_new(void)
{
void **ptr;
if (CG(map_ptr_last) >= CG(map_ptr_size)) {
/* Grow map_ptr table */
CG(map_ptr_size) = ZEND_MM_ALIGNED_SIZE_EX(CG(map_ptr_last) + 1, 4096);
CG(map_ptr_real_base) = perealloc(CG(map_ptr_real_base), CG(map_ptr_size) * sizeof(void*), 1);
CG(map_ptr_base) = ZEND_MAP_PTR_BIASED_BASE(CG(map_ptr_real_base));
}
ptr = (void**)CG(map_ptr_real_base) + CG(map_ptr_last);
*ptr = NULL;
CG(map_ptr_last)++;
return ZEND_MAP_PTR_PTR2OFFSET(ptr);
}

I noticed a large allocation that did not happen prior to that commit: CG(map_ptr_size) = ZEND_MM_ALIGNED_SIZE_EX(CG(map_ptr_last) + 1, 4096);, so this is 4096 for my execution. Later we multiply by sizeof(void*) which is 8 so that's 32KiB that gets allocated.
When I dropped 4096 to 256 such that 8*256==4096 aka 1 page, then the performance was back to normal. I can't really explain why, only that I know that for >1 page the allocator has some different code paths. Even more bizarre was that even changing that number to be even lower had the same performance improvement as using 4096 despite the fact that more reallocations happened.

It seems we hit some sort of "unoptimal heap layout" ? I'm not sure. Normally this isn't a problem but for CLI scripts this is noticeable...
cc @iluuu1994 have you seen anything like this?

@iluuu1994
Copy link
Member

No, I wasn't aware of such an issue. Does using 4096 alignment lead to any new chunks being allocated? Or does it increase time spent searching for free pages (i.e. zend_mm_alloc_pages)? Can the increase in time be accounted to the ptr map allocation, or to the allocations following it? If you like I can have a closer look after the holidays.

@nielsdos
Copy link
Member

Does using 4096 alignment lead to any new chunks being allocated? Or does it increase time spent searching for free pages (i.e. zend_mm_alloc_pages)?

Note that this uses the system allocator (1 is passed to persistent), so this is glibc malloc behaviour.
I see a lot more time spent in malloc_consolidate and its descendants.

Can the increase in time be accounted to the ptr map allocation, or to the allocations following it?

Avoiding the ptr map allocations altogether fixes the issue.

@iluuu1994
Copy link
Member

Ah, I didn't realize this was an issue with the system allocator. TBH malloc/free are currently black boxes to me, I've never looked at the glibc implementation. It's also surprising to me that a singular malloc/free pair would be so expensive. Does using mmap directly dodge the issue?

@nielsdos
Copy link
Member

I also haven't really looked too deep into glibc's malloc, or its optimizations. From what I understand a different code path is taken when the requested size is bigger than a page, which is why I saw the problem when allocating 32KiB but not 4KiB.

Does using mmap directly dodge the issue?

Yes.

@iluuu1994
Copy link
Member

Does this allocation happen at a particular time? E.g. after a lot of fragmentation from libxml? I'm not sure whether libxml uses zend_alloc or the system allocator. I suppose we could use mmap if the map ptr becomes sufficiently big but it's likely that any other large malloc/free call could cause the issue. I'm afraid I won't be of much help for now. :(

@nielsdos
Copy link
Member

Does this allocation happen at a particular time?

Startup, when functions are registered for assigning the runtime cache pointer.

after a lot of fragmentation from libxml?

libxml only comes into action afterwards. I checked it, and as the document in the reproducer is only parsed and no manipulations happen there is almost no chance for fragmentation.

I'm not sure whether libxml uses zend_alloc or the system allocator

System allocator (unfortunately)

I suppose we could use mmap if the map ptr becomes sufficiently big but it's likely that any other large malloc/free call could cause the issue.

Yeah, I agree. I also think that even if we magically fix this, that other allocations can just screw us over again. I'm just very surprised that it is actually quite a noticeable slowdown.

I'm afraid I won't be of much help for now. :(

No worries!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants