-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Interned strings unification for TS/NTS #2390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* php/master: (149 commits) Fixed bug #61471 Add #ifndef restrict Fix detection of isnan and isinf ReflectionGenerator now sends ReflectionException as expected use some dynamically generated NAN as well rework fd521a2 to simplify for master, see github php#2356 switch to smart str conversion routine to hide exact NAN type Revert "Fix detection of isnan and isinf" Fix detection of isnan and isinf Fix bug #73954 Implement Parameter Type Widening RFC Add UPGRADING notes for deprecations Deprecate each() Deprecate assert() with string argument Deprecate mbstring.func_overload Deprecate track_errors / $php_errormsg Deprecate mb_parse_str() without second argument Deprecate parse_str() without second argument Deprecate (unset) Deprecate __autoload() ...
- implemented separate hash tables for permanent and per request interned strings - strings interned during the startup are permanent - afterwards, any interned string has life time per request only In the thread safe build this means, every thread maintains its own hash table for the non permanent interned strings. The permanent strings are common to all the threads and won't be free'd until the process shutdown.
* php/master: (44 commits) [ci skip] update NEWS [ci skip] update NEWS fix typo skip test Fixed bug #74090 stream_get_contents maxlength>-1 returns empty string Fixed test in travis Make it slower Fixed test Add a test for hard_timeout(bug #74093) Workaround to fix bug #74093 (Maximum execution time of n+2 seconds exceed not written in error_log) Update to SQLite 3.17.0 Update NEWS Fixed bug #73989 (PHP 7.1 Segfaults within Symfony test suite) update NEWS update UPGRADING.INTERNALS The d_name member of struct dirent should be a pointer Revert "Fixed bug #74035" Upgrade timelib to 2017.01 remove loop fix loop ...
ensure clean alloc/dealloc
@@ -1706,7 +1706,7 @@ PHPAPI int php_register_url_stream_wrapper(const char *protocol, php_stream_wrap | |||
return FAILURE; | |||
} | |||
|
|||
return zend_hash_str_add_ptr(&url_stream_wrappers_hash, protocol, protocol_len, wrapper) ? SUCCESS : FAILURE; | |||
return zend_hash_add_ptr(&url_stream_wrappers_hash, zend_string_init_interned(protocol, protocol_len, 1), wrapper) ? SUCCESS : FAILURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This place may be reached at run-time through stream_wrapper_register(), but we don't create interned strings at run-time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was basing it on the fact url_stream_wrappers_hash is used, which is global. Otheriwse, stream wrappers created on runtime seem to be not thread safe in general. Gonna investigate and rework this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this part is ok. User space wrappers runs through php_register_url_stream_wrapper_volatile() and save into FG(stream_wrappers). So far i couldn't find a code path where php_register_url_stream_wrapper() is called outside the startup phase. Thus names of those saved into url_stream_wrappers_hash need to be interned, as they can interact with the runtime code.
Zend/zend_string.c
Outdated
#endif | ||
empty_string = zend_new_interned_string_permanent_int(str); | ||
#ifndef ZTS | ||
CG(empty_string) = empty_string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since empty_string is a true global variable, seems we don't need to store it in CG .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it would then even replace the current CG reference in ZSTR_EMPTY_ALLOC(), just need to extern it. I'll do this in the scope of this patch. Would have to check with Opcache yet, missed it in the initial patch.
/* ensure singleton */ | ||
return; | ||
} | ||
|
||
if (tsrm_tls_table) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we already have a in_main_thread guarder, this check should also not that necessary, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check this, thanks for the hint. Thought also to export this as a function, so it's usable in any places with startup/shutdown, maybe improving to explicitly checking thread ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems ok, i've reverted for now as it's not related to interned strings directly. Will re-apply once the actual interned strings stuff is finished.
this part is shm protected
This avoids shutdown crashes with Opcache, basically because function/class/etc. table is already cleaned up by the Opcache, so the global pointers need to match when zend_shutdown tries to free stuff.
* php/master: fix test for libzip 1.2.0 initialize valid_symbol_table, important for the main thread ftp_mlsd(): Parse the MLSD response Fixed #74099 - Memory leak with openssl_encrypt() Fixed bug #74105 Unused var Disable RTLD_DEEPBIND when compiling with AddressSanitizer (-fsanitize=address). Fix memory errors in url rewriter Fix autoload_func_info destruction Make the ftp and stream tests more reliable. Add json dep to test
TSRM/TSRM.c
Outdated
@@ -10,7 +10,7 @@ | |||
+----------------------------------------------------------------------+ | |||
*/ | |||
|
|||
#include "TSRM.h" | |||
#include "zend.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TSRM didn't depend on Zend by design. Can we avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could introduce more callbacks in TSRM, currently there are only a couple of callbacks at thread creation. Some callbacks could be introduced for the tsrm startup/shutdown and thread shutdown. I'll do the callbacks for now, for cleaner TSRM API.
TSRM/TSRM.c
Outdated
if (!in_main_thread) { | ||
/* ensure singleton */ | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes/fix related to "tsrm_tls_table" and "in_main_thread" look unrelated to interned strings. Lets commit this part separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, will revert for now.
Zend/zend.c
Outdated
@@ -623,7 +619,7 @@ static void executor_globals_ctor(zend_executor_globals *executor_globals) /* {{ | |||
ZEND_TSRMLS_CACHE_UPDATE(); | |||
|
|||
zend_startup_constants(); | |||
zend_copy_constants(EG(zend_constants), GLOBAL_CONSTANTS_TABLE); | |||
zend_copy_constants(executor_globals->zend_constants, GLOBAL_CONSTANTS_TABLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is unrelated. Commit it separately.
Zend/zend.c
Outdated
compiler_globals->auto_globals = GLOBAL_AUTO_GLOBALS_TABLE; | ||
|
||
zend_hash_destroy(executor_globals->zend_constants); | ||
*executor_globals->zend_constants = *GLOBAL_CONSTANTS_TABLE; | ||
executor_globals->zend_constants = GLOBAL_CONSTANTS_TABLE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about all these zend_startup() changes above.
This reverts commit 32d72b3.
This reverts commit 5f03243.
…ld logic"" This reverts commit b2b7a56.
* php/master: (22 commits) do not try to handle signals, when globals are inconsistent Zip: add support for encrypted archive PHP bug #74004 Fixed tests after tzdb removed abbreviations Make sure we anchor this test to February Add test-case from bug #55157 Fix bug73858.phpt to work in months without 31 days Update NEWS Update NEWs Fixed bug #54379 (PDO_OCI: UTF-8 output gets truncated) Remove dead code related to error constants Remove PHP5-specific code Updated to version 2017.1 (2017a) Updated to version 2017.1 (2017a) Updated to version 2017.1 (2017a) make type consistent with glob_t.gl_pathc Fix potential crash when setting invalid declare value make test slower again prepare 7.1.4 prepare next ...
Zend/zend_API.c
Outdated
@@ -2019,7 +2019,7 @@ ZEND_API zend_module_entry* zend_register_module_ex(zend_module_entry *module) / | |||
lcname = zend_string_alloc(name_len, 1); | |||
zend_str_tolower_copy(ZSTR_VAL(lcname), module->name, name_len); | |||
|
|||
if ((module_ptr = zend_hash_add_mem(&module_registry, lcname, module, sizeof(zend_module_entry))) == NULL) { | |||
if ((module_ptr = zend_hash_add_mem(&module_registry, zend_new_interned_string(lcname), module, sizeof(zend_module_entry))) == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be lcname = zend_new_interned_string(lcname)
on a separate line. The semantics of zend_new_interned_string are similar to realloc. The argument is released, you just sometimes don't see it (if the string was not interned yet).
@@ -763,7 +763,7 @@ ZEND_FUNCTION(each) | |||
if (Z_REFCOUNTED_P(entry)) Z_ADDREF_P(entry); | |||
} | |||
zend_hash_index_add_new(Z_ARRVAL_P(return_value), 1, entry); | |||
zend_hash_add_new(Z_ARRVAL_P(return_value), CG(known_strings)[ZEND_STR_VALUE], entry); | |||
zend_hash_add_new(Z_ARRVAL_P(return_value), ZSTR_KNOWN(ZEND_STR_VALUE), entry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to apply these changes changes independently? (Only if it's not too entangled.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CG(known_strings) is now removed, instead it's a real global var zend_known_strings. Same is with zend_empty_string and zend_one_char_string. So we would reference a global var directly, which is worse than having an acessor macro. But in general, it is a a change that belongs to the patch, so either way it were not possible to keep the exact current code.
interned_strings->nTableMask = -interned_strings->nTableSize; | ||
HT_SET_DATA_ADDR(interned_strings, pemalloc(HT_SIZE(interned_strings), permanent)); | ||
HT_HASH_RESET(interned_strings); | ||
interned_strings->u.flags |= HASH_FLAG_INITIALIZED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this code just zend_hash_real_init(interned_strings, 0);
?
Zend/zend_string.c
Outdated
memcpy(interned_strings->arData, old_buckets, sizeof(Bucket) * interned_strings->nNumUsed); | ||
pefree(old_data, interned_strings->u.flags & HASH_FLAG_PERSISTENT); | ||
zend_hash_rehash(interned_strings); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With pemalloc
this branch can never be taken.
|
||
static zend_never_inline void zend_string_table_grow(HashTable *interned_strings) | ||
{ | ||
if (EXPECTED(interned_strings->nTableSize < HT_MAX_SIZE)) { /* Let's double the table size */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is existing code, but the implementation as-is cannot actually handle this condition gracefully. Either it should error as the HT would usually do, or it should not intern in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it should fail hard in this situation, as not interning might cause more hard weight issues, at least with TS. Alternatively, we could increase the value, if possible.
Zend/zend_string.c
Outdated
CG(interned_strings).nTableSize = CG(interned_strings).nTableSize >> 1; | ||
CG(interned_strings).nTableMask = -CG(interned_strings).nTableSize; | ||
} | ||
return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the interned string is stored as both key and value of the HT (https://github.com/weltling/php-src/blob/97882476c602a20411c33d0d5c555818ba5e5834/Zend/zend_string.c#L176) this function should be a thin wrapper around zend_hash_find()
, instead of reimplementing HT logic.
HT_HASH(&CG(interned_strings), nIndex) = HT_IDX_TO_HASH(idx); | ||
nIndex = h | interned_strings->nTableMask; | ||
Z_NEXT(p->val) = HT_HASH(interned_strings, nIndex); | ||
HT_HASH(interned_strings, nIndex) = HT_IDX_TO_HASH(idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use zend_hash_append()
. Or going one step further, this could just use zend_hash_add_new()
, in which case we can discard the duplicated table growing logic as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the only not addressed points regard to the interned strings ht management. I would address them after the patch was merged. It would be indeed make sense to reuse the existing API, for now we're fine reusing the existing implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Patch LGTM, very nice work!
* php/master: sync NEWS for bug #74159 improve signal globals consistency check for TS
/* Copy PHP interned strings from PHP process memory into the shared memory */ | ||
static void accel_use_shm_interned_strings(void) | ||
static void accel_copy_permanent_strings(zend_new_interned_string_func_t new_interned_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify: Copying to SHM here is just an optimization right? For cross-process purposes it's only important that strings referenced from SHM are interned, not that permanent strings are all in SHM. If something is missed here, it should not cause issues. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On ASLR enabled systems, like Windows by default or hardened Linux, pointers to interned strings are likely differ. So in first place, by copying we ensure the pointers are same in all the processes. If something is missed here, the interning will happen on demand in zend_persist.c and zend_persist_calc.c parts on demand. See ADD_INTERNED_STRING, zend_accel_store_interned_string and so on there.
It might be, that on some systems the pointers to the local interned strings could stay same, But as we can't guarantee it, it's played safe. While it were possible to add a configure option and modify the code for not saving to SHM, i'd not go for that, as it is really not a reliable behavior.
p->key = new_interned_string(p->key); | ||
} | ||
c = (zend_constant*)Z_PTR(p->val); | ||
if (c->name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there constants without names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not a usual case. Some tricks i recall were like giving a name with a \0-byte in the front, to save some thing unusual into the constant. Such code is for sure still somewhere in APCu and earlier in APC, probably in some other exts as well.
} | ||
} | ||
|
||
for (idx = 0; idx < module_registry.nNumUsed; idx++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this code should be using ZEND_HASH_FOREACH_BUCKET... but not related to this change.
We have to copy permanent interned strings into SHM, because on Windows different processes may put the same strings at different addresses.
________________________________
From: Nikita Popov <notifications@github.com>
Sent: Friday, March 3, 2017 12:45:54 AM
To: php/php-src
Cc: Dmitry Stogov; Comment
Subject: Re: [php/php-src] Interned strings unification for TS/NTS (#2390)
@nikic commented on this pull request.
________________________________
In ext/opcache/ZendAccelerator.c<#2390 (comment)>:
/* Copy PHP interned strings from PHP process memory into the shared memory */
-static void accel_use_shm_interned_strings(void)
+static void accel_copy_permanent_strings(zend_new_interned_string_func_t new_interned_string)
Just to clarify: Copying to SHM here is just an optimization right? For cross-process purposes it's only important that strings referenced from SHM are interned, not that permanent strings are all in SHM. If something is missed here, it should not cause issues. Is that correct?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#2390 (review)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ACZM0nEuHBvRQBpb_VfqTT7WZ_k2Su5Oks5rhziSgaJpZM4MEM_p>.
|
Merged per c698299. The fololwups are going to be handled either directly in dev or per a separate PR. Thanks! |
The primary goal is to support real interned strings also in Ts builds. The unified approach implements string interning with two tables for permanent and per-request interned strings for both TS/NTS. More explanations to the patch here http://news.php.net/php.internals/98324
Thanks.