Skip to content

Conversation

weltling
Copy link
Contributor

@weltling weltling commented Feb 17, 2017

The primary goal is to support real interned strings also in Ts builds. The unified approach implements string interning with two tables for permanent and per-request interned strings for both TS/NTS. More explanations to the patch here http://news.php.net/php.internals/98324

Thanks.

* php/master: (149 commits)
  Fixed bug #61471
  Add #ifndef restrict
  Fix detection of isnan and isinf
  ReflectionGenerator now sends ReflectionException as expected
  use some dynamically generated NAN as well
  rework fd521a2 to simplify for master, see github php#2356
  switch to smart str conversion routine to hide exact NAN type
  Revert "Fix detection of isnan and isinf"
  Fix detection of isnan and isinf
  Fix bug #73954
  Implement Parameter Type Widening RFC
  Add UPGRADING notes for deprecations
  Deprecate each()
  Deprecate assert() with string argument
  Deprecate mbstring.func_overload
  Deprecate track_errors / $php_errormsg
  Deprecate mb_parse_str() without second argument
  Deprecate parse_str() without second argument
  Deprecate (unset)
  Deprecate __autoload()
  ...
- implemented separate hash tables for permanent and per request
  interned strings
- strings interned during the startup are permanent
- afterwards, any interned string has life time per request only

In the thread safe build this means, every thread maintains its own hash
table for the non permanent interned strings. The permanent strings are
common to all the threads and won't be free'd until the process
shutdown.
* php/master: (44 commits)
  [ci skip] update NEWS
  [ci skip] update NEWS
  fix typo
  skip test
  Fixed bug #74090 stream_get_contents maxlength>-1 returns empty string
  Fixed test in travis
  Make it slower
  Fixed test
  Add a test for hard_timeout(bug #74093)
  Workaround to fix bug #74093 (Maximum execution time of n+2 seconds exceed not written in error_log)
  Update to SQLite 3.17.0
  Update NEWS
  Fixed bug  #73989 (PHP 7.1 Segfaults within Symfony test suite)
  update NEWS
  update UPGRADING.INTERNALS
  The d_name member of struct dirent should be a pointer
  Revert "Fixed bug #74035"
  Upgrade timelib to 2017.01
  remove loop
  fix loop
  ...
@@ -1706,7 +1706,7 @@ PHPAPI int php_register_url_stream_wrapper(const char *protocol, php_stream_wrap
return FAILURE;
}

return zend_hash_str_add_ptr(&url_stream_wrappers_hash, protocol, protocol_len, wrapper) ? SUCCESS : FAILURE;
return zend_hash_add_ptr(&url_stream_wrappers_hash, zend_string_init_interned(protocol, protocol_len, 1), wrapper) ? SUCCESS : FAILURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This place may be reached at run-time through stream_wrapper_register(), but we don't create interned strings at run-time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was basing it on the fact url_stream_wrappers_hash is used, which is global. Otheriwse, stream wrappers created on runtime seem to be not thread safe in general. Gonna investigate and rework this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this part is ok. User space wrappers runs through php_register_url_stream_wrapper_volatile() and save into FG(stream_wrappers). So far i couldn't find a code path where php_register_url_stream_wrapper() is called outside the startup phase. Thus names of those saved into url_stream_wrappers_hash need to be interned, as they can interact with the runtime code.

#endif
empty_string = zend_new_interned_string_permanent_int(str);
#ifndef ZTS
CG(empty_string) = empty_string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since empty_string is a true global variable, seems we don't need to store it in CG .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it would then even replace the current CG reference in ZSTR_EMPTY_ALLOC(), just need to extern it. I'll do this in the scope of this patch. Would have to check with Opcache yet, missed it in the initial patch.

/* ensure singleton */
return;
}

if (tsrm_tls_table) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we already have a in_main_thread guarder, this check should also not that necessary, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this, thanks for the hint. Thought also to export this as a function, so it's usable in any places with startup/shutdown, maybe improving to explicitly checking thread ids.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems ok, i've reverted for now as it's not related to interned strings directly. Will re-apply once the actual interned strings stuff is finished.

this part is shm protected
This avoids shutdown crashes with Opcache, basically because
function/class/etc. table is already cleaned up by the Opcache,
so the global pointers need to match when zend_shutdown tries
to free stuff.
* php/master:
  fix test for libzip 1.2.0
  initialize valid_symbol_table, important for the main thread
  ftp_mlsd(): Parse the MLSD response
  Fixed #74099 - Memory leak with openssl_encrypt()
  Fixed bug #74105
  Unused var
  Disable RTLD_DEEPBIND when compiling with AddressSanitizer (-fsanitize=address).
  Fix memory errors in url rewriter
  Fix autoload_func_info destruction
  Make the ftp and stream tests more reliable.
  Add json dep to test
TSRM/TSRM.c Outdated
@@ -10,7 +10,7 @@
+----------------------------------------------------------------------+
*/

#include "TSRM.h"
#include "zend.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TSRM didn't depend on Zend by design. Can we avoid this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could introduce more callbacks in TSRM, currently there are only a couple of callbacks at thread creation. Some callbacks could be introduced for the tsrm startup/shutdown and thread shutdown. I'll do the callbacks for now, for cleaner TSRM API.

TSRM/TSRM.c Outdated
if (!in_main_thread) {
/* ensure singleton */
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes/fix related to "tsrm_tls_table" and "in_main_thread" look unrelated to interned strings. Lets commit this part separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, will revert for now.

Zend/zend.c Outdated
@@ -623,7 +619,7 @@ static void executor_globals_ctor(zend_executor_globals *executor_globals) /* {{
ZEND_TSRMLS_CACHE_UPDATE();

zend_startup_constants();
zend_copy_constants(EG(zend_constants), GLOBAL_CONSTANTS_TABLE);
zend_copy_constants(executor_globals->zend_constants, GLOBAL_CONSTANTS_TABLE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is unrelated. Commit it separately.

Zend/zend.c Outdated
compiler_globals->auto_globals = GLOBAL_AUTO_GLOBALS_TABLE;

zend_hash_destroy(executor_globals->zend_constants);
*executor_globals->zend_constants = *GLOBAL_CONSTANTS_TABLE;
executor_globals->zend_constants = GLOBAL_CONSTANTS_TABLE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about all these zend_startup() changes above.

weltling and others added 14 commits February 28, 2017 16:30
* php/master: (22 commits)
  do not try to handle signals, when globals are inconsistent
  Zip: add support for encrypted archive
  PHP bug #74004
  Fixed tests after tzdb removed abbreviations
  Make sure we anchor this test to February
  Add test-case from bug #55157
  Fix bug73858.phpt to work in months without 31 days
  Update NEWS
  Update NEWs
  Fixed bug #54379 (PDO_OCI: UTF-8 output gets truncated)
  Remove dead code related to error constants
  Remove PHP5-specific code
  Updated to version 2017.1 (2017a)
  Updated to version 2017.1 (2017a)
  Updated to version 2017.1 (2017a)
  make type consistent with glob_t.gl_pathc
  Fix potential crash when setting invalid declare value
  make test slower again
  prepare 7.1.4
  prepare next
  ...
Zend/zend_API.c Outdated
@@ -2019,7 +2019,7 @@ ZEND_API zend_module_entry* zend_register_module_ex(zend_module_entry *module) /
lcname = zend_string_alloc(name_len, 1);
zend_str_tolower_copy(ZSTR_VAL(lcname), module->name, name_len);

if ((module_ptr = zend_hash_add_mem(&module_registry, lcname, module, sizeof(zend_module_entry))) == NULL) {
if ((module_ptr = zend_hash_add_mem(&module_registry, zend_new_interned_string(lcname), module, sizeof(zend_module_entry))) == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be lcname = zend_new_interned_string(lcname) on a separate line. The semantics of zend_new_interned_string are similar to realloc. The argument is released, you just sometimes don't see it (if the string was not interned yet).

@@ -763,7 +763,7 @@ ZEND_FUNCTION(each)
if (Z_REFCOUNTED_P(entry)) Z_ADDREF_P(entry);
}
zend_hash_index_add_new(Z_ARRVAL_P(return_value), 1, entry);
zend_hash_add_new(Z_ARRVAL_P(return_value), CG(known_strings)[ZEND_STR_VALUE], entry);
zend_hash_add_new(Z_ARRVAL_P(return_value), ZSTR_KNOWN(ZEND_STR_VALUE), entry);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to apply these changes changes independently? (Only if it's not too entangled.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CG(known_strings) is now removed, instead it's a real global var zend_known_strings. Same is with zend_empty_string and zend_one_char_string. So we would reference a global var directly, which is worse than having an acessor macro. But in general, it is a a change that belongs to the patch, so either way it were not possible to keep the exact current code.

interned_strings->nTableMask = -interned_strings->nTableSize;
HT_SET_DATA_ADDR(interned_strings, pemalloc(HT_SIZE(interned_strings), permanent));
HT_HASH_RESET(interned_strings);
interned_strings->u.flags |= HASH_FLAG_INITIALIZED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this code just zend_hash_real_init(interned_strings, 0);?

memcpy(interned_strings->arData, old_buckets, sizeof(Bucket) * interned_strings->nNumUsed);
pefree(old_data, interned_strings->u.flags & HASH_FLAG_PERSISTENT);
zend_hash_rehash(interned_strings);
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With pemalloc this branch can never be taken.


static zend_never_inline void zend_string_table_grow(HashTable *interned_strings)
{
if (EXPECTED(interned_strings->nTableSize < HT_MAX_SIZE)) { /* Let's double the table size */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is existing code, but the implementation as-is cannot actually handle this condition gracefully. Either it should error as the HT would usually do, or it should not intern in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it should fail hard in this situation, as not interning might cause more hard weight issues, at least with TS. Alternatively, we could increase the value, if possible.

CG(interned_strings).nTableSize = CG(interned_strings).nTableSize >> 1;
CG(interned_strings).nTableMask = -CG(interned_strings).nTableSize;
}
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the interned string is stored as both key and value of the HT (https://github.com/weltling/php-src/blob/97882476c602a20411c33d0d5c555818ba5e5834/Zend/zend_string.c#L176) this function should be a thin wrapper around zend_hash_find(), instead of reimplementing HT logic.

HT_HASH(&CG(interned_strings), nIndex) = HT_IDX_TO_HASH(idx);
nIndex = h | interned_strings->nTableMask;
Z_NEXT(p->val) = HT_HASH(interned_strings, nIndex);
HT_HASH(interned_strings, nIndex) = HT_IDX_TO_HASH(idx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use zend_hash_append(). Or going one step further, this could just use zend_hash_add_new(), in which case we can discard the duplicated table growing logic as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the only not addressed points regard to the interned strings ht management. I would address them after the patch was merged. It would be indeed make sense to reuse the existing API, for now we're fine reusing the existing implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Patch LGTM, very nice work!

* php/master:
  sync NEWS for bug #74159
  improve signal globals consistency check for TS
/* Copy PHP interned strings from PHP process memory into the shared memory */
static void accel_use_shm_interned_strings(void)
static void accel_copy_permanent_strings(zend_new_interned_string_func_t new_interned_string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: Copying to SHM here is just an optimization right? For cross-process purposes it's only important that strings referenced from SHM are interned, not that permanent strings are all in SHM. If something is missed here, it should not cause issues. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On ASLR enabled systems, like Windows by default or hardened Linux, pointers to interned strings are likely differ. So in first place, by copying we ensure the pointers are same in all the processes. If something is missed here, the interning will happen on demand in zend_persist.c and zend_persist_calc.c parts on demand. See ADD_INTERNED_STRING, zend_accel_store_interned_string and so on there.

It might be, that on some systems the pointers to the local interned strings could stay same, But as we can't guarantee it, it's played safe. While it were possible to add a configure option and modify the code for not saving to SHM, i'd not go for that, as it is really not a reliable behavior.

p->key = new_interned_string(p->key);
}
c = (zend_constant*)Z_PTR(p->val);
if (c->name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there constants without names?

Copy link
Contributor Author

@weltling weltling Mar 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not a usual case. Some tricks i recall were like giving a name with a \0-byte in the front, to save some thing unusual into the constant. Such code is for sure still somewhere in APCu and earlier in APC, probably in some other exts as well.

}
}

for (idx = 0; idx < module_registry.nNumUsed; idx++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this code should be using ZEND_HASH_FOREACH_BUCKET... but not related to this change.

@dstogov
Copy link
Member

dstogov commented Mar 3, 2017 via email

@weltling
Copy link
Contributor Author

weltling commented Mar 4, 2017

Merged per c698299. The fololwups are going to be handled either directly in dev or per a separate PR.

Thanks!

@weltling weltling closed this Mar 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants