-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Zstrict - Enforcing encapsulation/isolation #1348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -184,7 +184,7 @@ static zend_object *zend_default_exception_new_ex(zend_class_entry *class_type, | |||
zend_class_entry *base_ce; | |||
zend_string *filename; | |||
|
|||
Z_OBJ(obj) = object = zend_objects_new(class_type); | |||
ZVAL_OBJ(&obj, object = zend_objects_new(class_type)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new operation assumes two assignments instead of one. The assignment Z_TYPE_INFO(obj) = IS_OBJECT_EX is going to be useless. There are a lot of similar issues in this patch. Each of them will increase code size and reduce performance.
I think it's too late for 7.0, as a many extension maintainer, I already worked a lots on my extensions to adapt APIs changing, I really don't like change them again, especially in alpha phase.. and we already short of porting documents to explain the current APIs changes. |
I understand that it is too late and I know I went too far for a first step, but your comments helped me a lot. I am currently restructuring the patch to be more permissive, focusing on the creation of new mechanisms to check API conformance, and the definition of 'clean' APIs where they are missing (IMO, a 'clean' API is a set of separate getters/setters), starting with zend_string. |
Add the '--enable-api-checks' options to travis configure command (if supported, depends on the branch)
New getters/setters for 'val' and 'len'. Strict mode enforces getter/setter separation API restrictions are all strict-only.
@flaupretre thanks for your work! As for me, i don't think i'm the one who can approve/decline your patch in general. But i've spent a couple of hours today analyzing it and also reading the previous thread. Thus i would like to express my insights.
IMHO, the patch can be done so it doesn't break existing API. All the protection is turned on only by an explicit configure option. This means, you could target 7.1. But first, bringing it to the usual way of discussion and RFC, as the basic patch illustrating the idea is already there. It will be for sure a hard job, I know this myself, however right now we have no choice. Thanks. |
|
@flaupretre I see your points, I hope. For the reference - I was in a similar situation targeting 5.6 with the 64-bit patch. But, when an incomplete patch comes in, it means - tweaking. One can argue, but so is it. Fe I don't see any handling of HashTable, or other possible places, don't even count the zend_string ocurrences. Tweaking means mistakes. One can argue again, but it's a fact. The timeline for 7.0 is really tight. so there's the choice. With #1353, if there's no agreement there, I'd see it as a blocker. Especially ATM as mentioned, we should really start the feature freeze soon. You can still go for an RFC to solve this. Both PRs can be done in a way they don't break API (additions anyway), so 7.1 could fit. But it's a hard work, really. Thanks. |
@weltling About the question of incomplete patch, I'm afraid you didn't exactly understand my intention with the 'strict-api' mode. This mode is here to allow simulating the way APIs will evolve in the future, helping everyone to fix ('cleanup') code in advance. So, it is not my intention to provide fixes for every locations where zend_strings are accessed in a 'non-strict' way. It does not mean I won't fix some, but this is a detection tool to be used by everybody willing to help make the code cleaner. So, defining a 'strict' API without making the rest of the code compliant is the normal behavior. Once again, it just provides a tool for anybody to work towards 'strict' compliance in the rest of the core. Actually, I'm not sure I will persist in this way because it seems people are not familiar with such two-step methods (provide a detection tool so that everybody can then fix code). As I mentioned earlier, the only thing I would have liked to see in 7.0 is an improved zend_string API, providing essentially an alternate way to access zend_string->val/->len. I propose it as #1367. Hope it can please everyone, especially @dstogov. About moving #1353 to a RFC, I'll think about it. Publishing an RFC is a tough process and I'm not sure I have enough energy at the moment to argue endlessly with rude people. #1353 is not important. The 'api-checks' mechanism is a much more important tool for the future. Unfortunately, if it requires an RFC, it will never exist because most list members are totally unable to understand how urgent it is to fight against tight coupling, especially after PHP7 additions. So, I had a little hope to push it using the phpng umbrella. But I went too far with the 'strict-api' mode. I should have kept it hidden for the next time 😄. But there are so few occasions to push something useful to PHP... Anyway, thanks for your time and your comments. I really appreciate. Nice to see that some people like you and Dmitry still have an open mind. That's not so common. |
Hi @flaupretre, I guess my understanding before matches with your current explanation. Where our insights are different is that you see the implementation as finished, but I do not. Look at it from my POV - i've spent like 3 hours already on your patch. Today i checked it out and compiled using --enable-strict-api with configure. Here you are
We could dispute this, sure. Maybe a better situation were with phpng or pure master, as you mention. As in that case, a moving target would justify it. But again, for the code base which is about to enter beta, it is IMHO a move to instability. While I part the vision in general, pushing an incomplete patch now will break a release process. ATM, stability comes befor maintainability or anything else, so far about priorities. About the RFC on this- yes, that's why i said it's a hard work. You have to convince people, you have to catch up with the mainstream, there are many tasks that don't have a direct relation to the actual development. It is up to you to take the stress, but it is a good thing to manage. The idea is feasible anyway. But I can read your mood, anyway. But please, don't give up! Thanks. |
Hi @weltling and thanks for the time you took to think about it. You're right, distributing the 'strict-api' with the core not being strict-compliant is wrong. The problem I saw is these countless access to zend_string->len/->val. Actually, I think we should first introduce the 'strict' switch without anything being marked as 'strict' in the core. Then, I now agree we cannot release a core distrib that is not 'strict-compliant'. We can create a temporary branch for this, even ask pople to contribute on this if it requires too much work for one, but not on a release. About RFCs, I've been searching for ways to improve the overall process for years, as many of us, I imagine. The good news, IMO, is that the recent phpng case shows that the informal rule saying you must have a PR ready to propose something, is not so strict. Of course, phpng people also got a whitecard thanks to their names and Zend's, not only on their proposal's content. But that's a usable precedent. I still need to refine the idea but I'd like to propose a new form of RFC for heavy, long-term changes such as phpng. For such changes, you cannot write a separate RFC each time you create a new macro or even restructure an API. The idea would be to define a scope, or 'whitecard limits'. phpng was a full whitecard but I prefer defining clear limits before voting. Changing the major version was an artificial way to allow for almost any change at C level, but we won't set a new major version everytime ! So, I would take the subject of encapsulation and C APIs as a first example and propose a strategy, instead of an implementation. This strategy would contain the following conditions and limits :
So, you see the idea. In political terms, it is an evolution from overly-participative democracy to limited, supervised delegation. I think it would be much more efficient for several subjects requiring long-running, extensive studies and thinking. Another subject that would fit is the overall question of types in PHP. This is a huge subject including making type conversions consistent, probably extending object to scalar conversions, extending STH to union types and, probably, user-defined type, and a lot of other aspects. Such a work is long, complex, and typically incompatible with the current RFC process (as the recent STH discussion proves). So, if you have thoughts on such a process or ideas to improve it, I'm very interested. @dstogov, I'd love getting your comments too. This would take the form of an RFC presenting a new RFC type. I don't know how it would be received, people can fear loosing control when authorizing such a long work. Anyway, we'll need to find a way to address large and complex subjects by a different way from mailing list flame wars. |
Hi Francois, the change to the RFC process is probably something that needs to be discussed much wider than just an irrelevant github PR page :) Though I think this topic can be solved without strong changes to the RFC RFC :) The actual topic - still a wide discussion would do only good. Firstly, you would see how far the community support is. Secondly, you'd get potential issues pointed out. I would have also several questions to one or other implementation aspect. There's also an implementation of the idea, so the overall information would give you the crucial info about whether the further efforts are worth while. Thanks. |
Follow-up : #1508 |
Introduction - Why ?
Two months ago, Anthony Ferrara expressed concerns about PHP 7 and zend_hash growing complexity, combined with a lack of strict encapsulation (tight coupling).
Some replied that the solution is just a mix of comments and documentation, but growing complexity quickly becomes unmanageable if any code can bypass official/published APIs. zend_hash is a critical examples, but not the only one.
While part of the solution lies in documentation, 'quick-and-dirty' is not just a question of knowledge. We also need tools to detect and fix API violations.
#1. Protecting structure elements from direct access
The first tool focuses on the question of direct access to structure elements. Even if the API provides macros and functions to access structure elements, it is still possible to access them directly, bypassing the API. This tool provides a way to 'protect' the elements you want, so that directly accessing them ouside of the official API causes a compile failure.
From a user's point it view, it takes the form of an additional configure option. When PHP is configured with '--enable-api-checks', every direct access to protected structure elements causes a compilation error.
How does it work ? When the 'api-checks' mode is active, the name of protected structure elements is modified and the official API is modified accordingly. Then, every attempt to access the element using its 'usual' name generates a compile error.
This is a compile-time tool. As it generates long and complex names for protected elements, the generated code shouldn't be used for debugging. So, among others, this option must never be set when generating binary releases.
Protecting structure elements is simple : everywhere the protected element is to be used, it is replaced with '_ZEND_PROTECTED(<prefix>,<name>)', where <prefix> is the name of the structure containing the element, and <name> is the element name.
In general, only the first-level elements need to be protected, as they control access to the sub-elements.
#2. Separating getters and setters
Encapsulation is composed of different levels. Providing an abstraction layer, like a macro, is the first level. The second level requires a strict separation of getter and setter methods. This level cannot be achieved with a macro because a macro can be used as an lvalue in an assignment, and can be prefixed with '&' (giving the address of the struct element). So, using a macro, it is impossible to control get and set operations. And controlling get/set operations is the key to loose coupling.
PHP APIs generally provide setter methods, but they also often use macros as getters. In order to allow for a gradual improvement, another configure option, '--enable-strict-api' activates some API restrictions. These restrictions are typically changing some getters from macro to function, making them de-facto read-only. Other restrictions may also be activated by this flag. The idea is that this 'strict' mode will become the default mode in a future PHP version. So, extension developers can use it to check in advance that their code complies with an API stricter than the current one.
For an example of how this mode is used, please look at the declaration of ZSTR_VAL/ZSTR_LEN/ZSTR_HASH in zend_string.h. As you may see, in strict mode, these macros are aliased to functions, making them pure getters. An important side effect is that Z_STRLEN() becomes a pure getter too. So, the code using assignments to a Z_STRLEN() lvalue must be modified to use the new ZVAL_STR_LEN() macro. That's just an example but it illustrates how a developer can make his code compliant with a future version of the API.
Of course, the core also needs to be strict-compliant. It will take time but the most important is that we now have the tools to work on it. That's why I would like these tools included in the first 7.0 release, so that we can start working with them and prepare the next minor versions.
What's in this branch ?
I am stopping now because doing more would be useless and confusing as long as the mechanisms are not officially approved and merged. The next steps, when it is done, would be probably to provide a complete encapsulated API for most zval elements. We'll also work on HashTables. HashTables are critical as there are a lot of places in the core which access directly the HashTable elements, and this must absolutely be cleaned before it becomes unmanageable.