Skip to content

Zstrict - Enforcing encapsulation/isolation #1348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

flaupretre
Copy link
Contributor

This branch is under discussion. It is not ready for merging yet. So, please read below and comment.

Introduction - Why ?

Two months ago, Anthony Ferrara expressed concerns about PHP 7 and zend_hash growing complexity, combined with a lack of strict encapsulation (tight coupling).

Some replied that the solution is just a mix of comments and documentation, but growing complexity quickly becomes unmanageable if any code can bypass official/published APIs. zend_hash is a critical examples, but not the only one.

While part of the solution lies in documentation, 'quick-and-dirty' is not just a question of knowledge. We also need tools to detect and fix API violations.
#1. Protecting structure elements from direct access

The first tool focuses on the question of direct access to structure elements. Even if the API provides macros and functions to access structure elements, it is still possible to access them directly, bypassing the API. This tool provides a way to 'protect' the elements you want, so that directly accessing them ouside of the official API causes a compile failure.

As we code in C, it will always be possible to access any address in memory. So, we can only provide alert and detection tools to help developers improve their code. Those who willingly bypass everything and respect no rule will have to be stopped by other means.

From a user's point it view, it takes the form of an additional configure option. When PHP is configured with '--enable-api-checks', every direct access to protected structure elements causes a compilation error.

When the 'enable-api-checks' configure option is not set, the code is not modified and direct access to structure element is still possible. That's why it is implemented as an option.

How does it work ? When the 'api-checks' mode is active, the name of protected structure elements is modified and the official API is modified accordingly. Then, every attempt to access the element using its 'usual' name generates a compile error.

This is a compile-time tool. As it generates long and complex names for protected elements, the generated code shouldn't be used for debugging. So, among others, this option must never be set when generating binary releases.

Protecting structure elements is simple : everywhere the protected element is to be used, it is replaced with '_ZEND_PROTECTED(<prefix>,<name>)', where <prefix> is the name of the structure containing the element, and <name> is the element name.

In general, only the first-level elements need to be protected, as they control access to the sub-elements.
#2. Separating getters and setters

Encapsulation is composed of different levels. Providing an abstraction layer, like a macro, is the first level. The second level requires a strict separation of getter and setter methods. This level cannot be achieved with a macro because a macro can be used as an lvalue in an assignment, and can be prefixed with '&' (giving the address of the struct element). So, using a macro, it is impossible to control get and set operations. And controlling get/set operations is the key to loose coupling.

PHP APIs generally provide setter methods, but they also often use macros as getters. In order to allow for a gradual improvement, another configure option, '--enable-strict-api' activates some API restrictions. These restrictions are typically changing some getters from macro to function, making them de-facto read-only. Other restrictions may also be activated by this flag. The idea is that this 'strict' mode will become the default mode in a future PHP version. So, extension developers can use it to check in advance that their code complies with an API stricter than the current one.

For an example of how this mode is used, please look at the declaration of ZSTR_VAL/ZSTR_LEN/ZSTR_HASH in zend_string.h. As you may see, in strict mode, these macros are aliased to functions, making them pure getters. An important side effect is that Z_STRLEN() becomes a pure getter too. So, the code using assignments to a Z_STRLEN() lvalue must be modified to use the new ZVAL_STR_LEN() macro. That's just an example but it illustrates how a developer can make his code compliant with a future version of the API.

Of course, the core also needs to be strict-compliant. It will take time but the most important is that we now have the tools to work on it. That's why I would like these tools included in the first 7.0 release, so that we can start working with them and prepare the next minor versions.

What's in this branch ?

  1. Implementation of the 'api-checks' and 'strict-api' configuration flags.
  2. Implement a 'clean' API for zend_string. zend_string is critical, in this regard, because no API existed to access the len and val elements. So, the first step is to create this API, then protect in strict mode, to help people fix their code.
  3. As an example, I also protected the 'zend_refcounted' structure. Look how the structure was modified, along with the GC_xxx() macros. Remember that, once we decide that this behavior switches from 'strict' to 'general', we just need to rename the _ZEND_PROTECTED_STRICT macros to _ZEND_PROTECTED.

I am stopping now because doing more would be useless and confusing as long as the mechanisms are not officially approved and merged. The next steps, when it is done, would be probably to provide a complete encapsulated API for most zval elements. We'll also work on HashTables. HashTables are critical as there are a lot of places in the core which access directly the HashTable elements, and this must absolutely be cleaned before it becomes unmanageable.

@@ -184,7 +184,7 @@ static zend_object *zend_default_exception_new_ex(zend_class_entry *class_type,
zend_class_entry *base_ce;
zend_string *filename;

Z_OBJ(obj) = object = zend_objects_new(class_type);
ZVAL_OBJ(&obj, object = zend_objects_new(class_type));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new operation assumes two assignments instead of one. The assignment Z_TYPE_INFO(obj) = IS_OBJECT_EX is going to be useless. There are a lot of similar issues in this patch. Each of them will increase code size and reduce performance.

@weltling
Copy link
Contributor

Ok, actually i fear this one to target 7.0. Comparing to #1353 it's a quite huge change. @dstogov @laruence @nikic what's your sight? IMHO it's too short cut.

Thanks.

@laruence
Copy link
Member

I think it's too late for 7.0, as a many extension maintainer, I already worked a lots on my extensions to adapt APIs changing, I really don't like change them again, especially in alpha phase.. and we already short of porting documents to explain the current APIs changes.

@laruence laruence added the RFC label Jun 23, 2015
@flaupretre
Copy link
Contributor Author

I understand that it is too late and I know I went too far for a first step, but your comments helped me a lot. I am currently restructuring the patch to be more permissive, focusing on the creation of new mechanisms to check API conformance, and the definition of 'clean' APIs where they are missing (IMO, a 'clean' API is a set of separate getters/setters), starting with zend_string.

Add the '--enable-api-checks' options to travis configure command (if supported, depends on the branch)
New getters/setters for 'val' and 'len'.
Strict mode enforces getter/setter separation
API restrictions are all strict-only.
@weltling
Copy link
Contributor

@flaupretre thanks for your work! As for me, i don't think i'm the one who can approve/decline your patch in general. But i've spent a couple of hours today analyzing it and also reading the previous thread. Thus i would like to express my insights.

  1. The idea. It is feasible from the maintainability POV. However - it is an obvious debug headache. And not a small one. For example, imagine you get a backtrace or dump from some user where PHP is built with type protection. Many other situations can arise. And, it is a huge intrusion into the code base, no one will recognize PHP anymore when it's complete :)
  2. 7.0 targeting. It's a no go. Firstly - because of above, such a big change needs to be approved by the community. Secondly - the implementation is incomplete. That means, merging it now will turn the current master in quite a moving target. For instance, places accessing zend_string's ->val are countless, just to call one

IMHO, the patch can be done so it doesn't break existing API. All the protection is turned on only by an explicit configure option. This means, you could target 7.1. But first, bringing it to the usual way of discussion and RFC, as the basic patch illustrating the idea is already there. It will be for sure a hard job, I know this myself, however right now we have no choice.

Thanks.

@weltling
Copy link
Contributor

I also think #1353 were probably a more realistic target for 7.0 still (but not for long), if you could address the issues pointed out by @dstogov and @nikic there.

Thanks.

@flaupretre
Copy link
Contributor Author

@weltling

  1. Nobody's supposed to produce code to debug with 'api-checks' turned on. If needed, I'll state it again in the documentation. This option is here for compile-time checks only. Yes, that(s an intrusion in the code but we generally need to protect 1st-level elements only. Protecting zval, for example, requires modifying 3 fields only.
  2. I'd like @dstogov to integrate the non-strict zend_string API as part of phpng, targeting 7.0. I am OK to write an RFC for the two configure flags but, if we don't provide an API to access the val and len fields, the number of places already access them directly can only grow. Once we have the API, we can start 'cleaning' the code, even during 7.0. We can also hope that people use it and clean their extensions. If we don't have it, we just can look things going worse and worse.
  3. About zend_string changes and additions #1353, I am waiting for @dstogov return about zend_string_truncate() returning a non-null-terminated string. I really don't understand why the whole API would take care of setting this byte except a single function. Actually, I think I'll remove the fix because it's unrelated to the rest of the PR.
  4. @nikic's opinion about zend_string changes and additions #1353 is that zend_strings are 'finished' strings which have generally no reason to be modified. So, zend_string append and resize functions would be useless. I cannot address such issue as it makes the whole PR pointless. I wrote a reply to explain why I disagreed but I cannot do much more, nor make the decision.

@weltling
Copy link
Contributor

@flaupretre I see your points, I hope. For the reference - I was in a similar situation targeting 5.6 with the 64-bit patch.

But, when an incomplete patch comes in, it means - tweaking. One can argue, but so is it. Fe I don't see any handling of HashTable, or other possible places, don't even count the zend_string ocurrences. Tweaking means mistakes. One can argue again, but it's a fact. The timeline for 7.0 is really tight. so there's the choice.

With #1353, if there's no agreement there, I'd see it as a blocker. Especially ATM as mentioned, we should really start the feature freeze soon. You can still go for an RFC to solve this. Both PRs can be done in a way they don't break API (additions anyway), so 7.1 could fit. But it's a hard work, really.

Thanks.

@flaupretre
Copy link
Contributor Author

@weltling About the question of incomplete patch, I'm afraid you didn't exactly understand my intention with the 'strict-api' mode. This mode is here to allow simulating the way APIs will evolve in the future, helping everyone to fix ('cleanup') code in advance. So, it is not my intention to provide fixes for every locations where zend_strings are accessed in a 'non-strict' way. It does not mean I won't fix some, but this is a detection tool to be used by everybody willing to help make the code cleaner. So, defining a 'strict' API without making the rest of the code compliant is the normal behavior. Once again, it just provides a tool for anybody to work towards 'strict' compliance in the rest of the core.

Actually, I'm not sure I will persist in this way because it seems people are not familiar with such two-step methods (provide a detection tool so that everybody can then fix code).

As I mentioned earlier, the only thing I would have liked to see in 7.0 is an improved zend_string API, providing essentially an alternate way to access zend_string->val/->len. I propose it as #1367. Hope it can please everyone, especially @dstogov.

About moving #1353 to a RFC, I'll think about it. Publishing an RFC is a tough process and I'm not sure I have enough energy at the moment to argue endlessly with rude people.

#1353 is not important. The 'api-checks' mechanism is a much more important tool for the future. Unfortunately, if it requires an RFC, it will never exist because most list members are totally unable to understand how urgent it is to fight against tight coupling, especially after PHP7 additions. So, I had a little hope to push it using the phpng umbrella. But I went too far with the 'strict-api' mode. I should have kept it hidden for the next time 😄. But there are so few occasions to push something useful to PHP...

Anyway, thanks for your time and your comments. I really appreciate. Nice to see that some people like you and Dmitry still have an open mind. That's not so common.

@weltling
Copy link
Contributor

Hi @flaupretre,

I guess my understanding before matches with your current explanation. Where our insights are different is that you see the implementation as finished, but I do not. Look at it from my POV - i've spent like 3 hours already on your patch. Today i checked it out and compiled using --enable-strict-api with configure. Here you are

/home/anatol/dws/src/flaupretre/ext/spl/spl_directory.c: In function ‘zim_spl_SplFileObject_fread’: /home/anatol/dws/src/flaupretre/ext/spl/spl_directory.c:2920:27: error: lvalue required as left operand of assignment Z_STRLEN_P(return_value) = php_stream_read(intern->u.file.stream, Z_STRVAL_P(return_value), length);

We could dispute this, sure. Maybe a better situation were with phpng or pure master, as you mention. As in that case, a moving target would justify it. But again, for the code base which is about to enter beta, it is IMHO a move to instability. While I part the vision in general, pushing an incomplete patch now will break a release process. ATM, stability comes befor maintainability or anything else, so far about priorities.

About the RFC on this- yes, that's why i said it's a hard work. You have to convince people, you have to catch up with the mainstream, there are many tasks that don't have a direct relation to the actual development. It is up to you to take the stress, but it is a good thing to manage. The idea is feasible anyway. But I can read your mood, anyway. But please, don't give up!

Thanks.

@flaupretre
Copy link
Contributor Author

Hi @weltling and thanks for the time you took to think about it. You're right, distributing the 'strict-api' with the core not being strict-compliant is wrong. The problem I saw is these countless access to zend_string->len/->val. Actually, I think we should first introduce the 'strict' switch without anything being marked as 'strict' in the core. Then, I now agree we cannot release a core distrib that is not 'strict-compliant'. We can create a temporary branch for this, even ask pople to contribute on this if it requires too much work for one, but not on a release.

About RFCs, I've been searching for ways to improve the overall process for years, as many of us, I imagine. The good news, IMO, is that the recent phpng case shows that the informal rule saying you must have a PR ready to propose something, is not so strict. Of course, phpng people also got a whitecard thanks to their names and Zend's, not only on their proposal's content. But that's a usable precedent.

I still need to refine the idea but I'd like to propose a new form of RFC for heavy, long-term changes such as phpng. For such changes, you cannot write a separate RFC each time you create a new macro or even restructure an API. The idea would be to define a scope, or 'whitecard limits'. phpng was a full whitecard but I prefer defining clear limits before voting. Changing the major version was an artificial way to allow for almost any change at C level, but we won't set a new major version everytime !

So, I would take the subject of encapsulation and C APIs as a first example and propose a strategy, instead of an implementation. This strategy would contain the following conditions and limits :

  • It would be limited in time. 6 months, 1 year, I don't know, but this must end someday. Renewing it would require a new vote. Maybe, we can fix a hard limit of one year to such RFCs.
  • It should include an extensive description of the authorized BC break that could be introduced, at the PHP and C level, whether these BC breaks would require a separate vote (mostly for PHP BC breaks, if any). Any BC break that would go beyong the initial scope should require a new vote.
  • It would include an extensive description of the expected delivery. For example, something like '3 months: a mechanism to detect unauthorized direct access to explicitely-'protected' structure elements, 6 months : 1st-level protection of zval, zend_string, zend_refcounted,...', in more detailed form, of course as we would need to define what is '1st-level protection', and other terms.
  • Most important: It would include a restricted list of people ready to supervise the changes realized under this RFC's umbrella. We may need rules in case of conflict but the idea is that changes should be approved by all, or at least not opposed, by any of these people. Anybody else could discuss, or even, oppose some changes but these people would be the favorite interface for this. A vote can always take place to restrict or cancel the RFC but we should demand a clear majority (2/3 or, even, 3/4). IMHO, this point was missing in the phpng process and I think that working without supervision and mixing different point of views is not optimal.
  • A list of implemented changes or/and a status of the study should be published at pre-defined intervals. The proposal can even include points in time where past and future changes will be presented at a PHP conference, provided it is accepted, which is quite hard when your name is not well-known.

So, you see the idea. In political terms, it is an evolution from overly-participative democracy to limited, supervised delegation. I think it would be much more efficient for several subjects requiring long-running, extensive studies and thinking. Another subject that would fit is the overall question of types in PHP. This is a huge subject including making type conversions consistent, probably extending object to scalar conversions, extending STH to union types and, probably, user-defined type, and a lot of other aspects. Such a work is long, complex, and typically incompatible with the current RFC process (as the recent STH discussion proves).

So, if you have thoughts on such a process or ideas to improve it, I'm very interested. @dstogov, I'd love getting your comments too. This would take the form of an RFC presenting a new RFC type. I don't know how it would be received, people can fear loosing control when authorizing such a long work. Anyway, we'll need to find a way to address large and complex subjects by a different way from mailing list flame wars.

@weltling
Copy link
Contributor

Hi Francois,

the change to the RFC process is probably something that needs to be discussed much wider than just an irrelevant github PR page :) Though I think this topic can be solved without strong changes to the RFC RFC :)

The actual topic - still a wide discussion would do only good. Firstly, you would see how far the community support is. Secondly, you'd get potential issues pointed out. I would have also several questions to one or other implementation aspect. There's also an implementation of the idea, so the overall information would give you the crucial info about whether the further efforts are worth while.

Thanks.

@flaupretre
Copy link
Contributor Author

Follow-up : #1508

@flaupretre flaupretre closed this Jan 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants