-
Notifications
You must be signed in to change notification settings - Fork 7.9k
zend_string changes and additions #1353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f026bac
to
1230765
Compare
IMO changing the existing API is already a breakage for 7.0, quite some extensions are already ported and could be broken again. Probably not what we want at the current stage. But adding some useful APIs/macros with consistent naming should be no issue. Thanks. |
6bbf0d4
to
c8e1c20
Compare
Also fix a bug: when trucating a zend_string, the resulting string is not null-terminated. Add ZVAL_STR_RESIZE() macro
c8e1c20
to
82c6dc9
Compare
Right. The PR was modified to propose only additions, without any BC break. |
I'm running some tests but how it currently looks there's no regression. I wouldn't push this into the next release as it's tagged tomorrow. But after alpha2 there'll be the last chance to add this, also we'll have the whole two weeks to ensure it's worky in master. Btw, and what about Z_STR_VAL() and co mentioned in the thread? @dstogov how does the code look for you? Thanks. |
About ZSTR_xxx() macros, I am currently restructuring the ‘zstrict’ PR (#1348) which will include this proposal for a clean zend_string API with separate getters/setters. I hope I can propose something clean tomorrow (UTC). I’ll send a message to you as soon as there’s something to review. |
I see no problem with renaming and additional "more consistent" API, but I don't like behavior changes, like setting s->val[s->len] = "\0"; Note, that the whole PHP sources work fine without this. It's just useless in most frequent use cases, because we use memcopy(). It also cause non-linear data access locality, and most probably will lead to CPU miss. |
Please also implement usages of these new APIs, to show that these are actually common. Personally I have some doubts about the usefulness of mutable append APIs, as this is normally handled by smart_str. We do have code for (non-mutable) concats lying around, e.g. zend_concat3. |
@dstogov - I am not against the decision of considering that the zend_string API does not ensure that the strings it returns are null-terminated (even if it would probably cause many issues) but we must be consistent : The whole zend_string API takes special care to maintain the terminating null-byte except zend_string_truncate() and zend_string_realloc() (when trucating) which will always return a non-null-terminated string. So, do we consider that the API maintains this null byte, or is it up to the user to set it every time he receives a strings from a zend_string_xxx() function ? Actually, the question may be expressed as : how important is it that zval strings are null-terminated ? The message of saying : "Some functions maintain the terminating null and others don't" is not the right one, IMHO. |
@nikic - smart_str have a set of append functions but, IMO, zend_string will gradually replace smart_str. smart_str is slightly slower and doesn't provide much more, except a set of append functions. So, once we implement a pair of missing features in zend_string, I don't see why both systems would coexist much longer than the time for extension maintainers to migrate their code. That's something that can disappear in 8.0. About the fact that append is common or not : it was considered common enough for smart_str, why would it be different ? I think I'll even extend it to provide exactly the same options as smart_str. |
@flaupretre zend_string and smart_str are for two different purposes. zend_string is essentially a finished string that will not be changed anymore. smart_str is what you use to get the zend_string based on multiple append operations. smart_str is specially designed around that usage, by using allocation policies that provide good performance if you need to do many appends to generate the final string. |
@nikic I don’t have enough time now but I am pretty sure we can combine the benefits of both systems without loosing performance (integrate smart_str mem allocation policy in zend_string). Anyway, we already have zend_string realloc/extend/truncate. If I understand your point, these shouldn’t have been defined. The main point is that there is no easy tranformation between zval and smart_str, when zend_string is the underlying mechanism of zval. So, do you mean that a string zval is “essentially a finished string that will not be changed anymore” ? If this is the case, we should probably deprecate the zend_string realloc/extend/truncate functions. If a string zval can be modified, the zend_string layer can provide ways to perform these modifications. |
This will go to another PR as it's unrelated to this branch's purpose
@dstogov Just removed the additional null byte set in realloc/truncate. But I'll revive this in another PR :) because I still see it as a bug. |
@flaupretre I'll try to clarify what I mean here. First off, Normally
You suggest that we might be able to combine the benefits of both approaches. That would be nice (by way of unifying the API) but I don't see any obvious way to do this, as the allocation policies (smallest possible vs overly large) are pretty contradictory. You have correctly noted that the existence of realloc etc. functions goes somewhat against my point here. These are necessary for multiple reasons. E.g. smart_str itself uses zend_strings so requires a reallocation mechanism for it. But there are also other places that resize strings and where using I'm not saying that mutable append APIs are useless -- I don't know how useful they'd be -- which is why I asked for them to be used in php-src where applicable (it should be possible to find all possible application points by checking the usages of the functions you listed). This would both test your implementation and allow us to gauge how often these are useful. I would not want to add them without knowing this, especially if it might cause extension authors to use these in cases where using smart_str would have been more appropriate. |
@nikic Thanks for your reply. The more I think about it, the more I think the solution is to combine both. The challenge is to add the flexibility of smart_str to zend_string without loosing performance nor wasting too much memory. Today, the main problem I see with maintaining both APIs is that it is quite impractical and slow to use the smart_str API on a zval (must convert back and forth). Just thinking out loud : The first step is to add a zend_string field containing the allocated size. Then, a zend_string can be marked as 'over-allocating' (using the smart_str allocation policy) at creation time or even after it is created. This will affect the way it is created and extended. Another function can over-allocate by an amount passed as argument (when we have an idea in advance of the memory we'll need). When the string is considered as 'finished', an opposite function can mark it as 'strictly-allocated'. This function would also optionally reclaim the over-allocated memory. This would allow switching from one allocation scheme to the other as often as we want, providing a lot of flexibility.. This would cost a pair of additional checks but it would avoid realloc() calls and the compiler should be able to suppress most of them (these are all inlined functions and they are often called from the same function). Anyway, that should be performance-tested. |
Add zend_string_resize() function
This function is similar to zend_string_realloc() but doesn't take the 'persistent' arg, as a zend_string is always re-allocated in the memory space it was created in.
Add a collection of zend_string_append() functions
These function allow to append to a zend_string from :
The 'append' functions are all based on zend_string_resize().