Regex (or wildcards) for @properties and @methods #689

Open
pocesar opened this Issue Nov 27, 2012 · 37 comments

Projects

None yet

9 participants

@pocesar
pocesar commented Nov 27, 2012

Sometimes it's really useful to have a way to place wildcards in properties, because since __get and __call are "magic", it's common to use modifiers to existing properties, like _as_something like _as_json, _as_array, _as_int. field_as_array for example will return an object as an associative array, so in the Docblock, it would be written as

/**
 * @property ORM $field_as_(array|json|int)
 */

or

/**
 * @property ORM $*_as_(array|json|int)
 */

that works for multiple fields

@boenrobot
Contributor

IMHO, these sorts of "wildcard" magic methods and properties are best documented at __get and __call's long descriptions.

If you actually have a specific set of allowed methods/properties, you can already simply add multiple @property and @method tags for each one. Sure, having them on one line saves typing, but this means you must attach the same description to all of them. If that's the case, they are essentially a kind of "wildcard", hence best at the magic method's long description.

Besides, a wildcard would be useless for IDEs, since they can't offer a list of hints for those. At best, they could evaluate the regex/wildcard for a method/property after you type it to give you a warning about it being undefined/undocumented.

@pocesar
pocesar commented Nov 27, 2012

indeed, I was thinking mainly in IDEs, because most of them, when using modifiers, they show as an error. Dynamic classes, like ORM, that get the properties from the database directly can't guess the name of the properties for obvious reasons, so I thought using a wildcard for the field would suffice. For example, an ORM class, that has 8 columns in the database, and will have a property for each of them, times the number of modifiers, that would be _as_date, _as_datetime, as_int, as_json, _as_array, _as_object isn't a fun thing to do, so I guess it would be nice to be able to get a shortcut.

Since the phpDoc is the starting point for IDEs to follow, I thought implementing it in the official supported syntax would be easier to implement in IDEs.

Where would the long desc of the magic method be? I just know the shortdesc

@boenrobot
Contributor

Since the phpDoc is the starting point for IDEs to follow, I thought implementing it in the official supported syntax would be easier to implement in IDEs.

That's true, but when a feature is ONLY useful to IDEs, we better make it count (as in "define it very carefully, very exhaustively, and make sure we don't change it overnight"), or IDE vendors will simply not implement it, thus making this feature redundant, or they'll implement it differently, thus few people use the feature, thus this feature becomes redundant.

There's also another problem with this, in addition to the aforementioned one - Which regex dialect to use?

  • We could explicitly say in the docs "PCRE", but that means Java based editors like NetBeans will have a hard time implementing this, since they use Java's native regex-es, not PCRE.
  • We could say this implementation defined, leaving "PCRE" as merely recommended. This will create a lot of problems when using exotic regex features, available in only one or another dialect.

Also, the very scenario you outline above is one where only a specific subset from the regex is allowed, yet this can't be known from the regex alone. Thus, the kind of "help" this is supposed to give to IDEs is only reduced to the larger subset. Combined with the aforementioned ability to specify multiple @property and @method tags, the use case for this becomes way too narrow.

I can see the kind of scenario you're talking about, but at least in my opinion, that's one scenario where one has to live with the dynamic nature of PHP.

Where would the long desc of the magic method be? I just know the shortdesc

I mean at the magic method itself, e.g.

class ORM {
/**
* Fetches a field.
*
* This is a magic method that lets you fetch a field as if it's a property. The property name takes the form
* of the field name, followed by "_as_", followed by the type you want to take this field as.
* Supported types are "datetime", "int", "json", "array" and "object".
*
* @param string $property
* @return mixed Whatever you've specified you want as the property name.
*/
public function __get($property) {
//...
}
}
@pocesar
pocesar commented Nov 27, 2012

I see, very well thought out, now it really seem redundant. I didn't know I could create a doc above the __get and still work :)
thanks for your time

@mvriel
Member
mvriel commented Dec 8, 2012

I think there is not much to add, closing the issue.
Let me know if I am wrong, then I'll reopen the issue

@mvriel mvriel closed this Dec 8, 2012
@lividgreen

example of useful wildcards:

/**
 * @method Person findOneBy*()
 */
class PersonTable {
  //...
}

$person = PersonTable::getInstance()->findOneByNameAndPosition('Jon', 'developer');
@boenrobot
Contributor

My critisisms earlier apply to this example as well.

How do you address those?

@mvriel
Member
mvriel commented Feb 25, 2013

I will reopen the issue for further inspection; I cannot re-read the discussion at this time and will try to give a proper response at another date

@mvriel mvriel reopened this Feb 25, 2013
@DavertMik

+1 for wildcard methods

@robforrest

+1 from me too. I think this would be an incredibly useful feature.

What's the latest on this?

@bkuhl
Contributor
bkuhl commented Mar 26, 2013

+1

@boenrobot
Contributor

Ughhh... fine...

I guess a wildcard (as in the "*" and "?" meta characters, with their Windows wildcard semantics) wouldn't be difficult to implement consistently across IDEs.

This means the use case outlined by @pocesar would be invalid, and only the kind of example @lividgreen showed would be valid. Is everyone who "+1"ed this feature OK with such a limitation?

Also, as stated previously, IDEs can't offer a list of hints for those (since the set of possibilities is very arbitrary). At best, they could evaluate the wildcard for a method/property after you type it to give you a warning about it being undocumented... doesn't that strike anyone as severely limiting the usefulness of such a feature?

@bkuhl
Contributor
bkuhl commented Mar 26, 2013

boen brings up a good point... I revoke my +1. It's also rare that you would literally want a * and would most likely want a set of fields instead. Being a set and not a * would mean we should still document each item individually to ensure accuracy.

@lividgreen

I think it is not a big deal for IDE developers to implement proper resolution of "" wildcard. But it must be in standard first.
I propose to implement "
" only for the end of method/property name. That is the only realistic case.

Another variant: use different tag (something like @method* or @method-template) for that and wait until IDEs are going to implement the support

"Wildcard at the end" case is not so rare, for example it is widely used in Doctrine ORM.

@bkuhl
Contributor
bkuhl commented Mar 26, 2013

I'm unfamiliar with Doctrine ORM, can you reference an example in the documentation where such a scenario would be useful?

If it's referencing a set of methods and it's just easier to use * than define the whole set - I would suggest the team hold off on this since there is an existing work around (document each instance of the method manually) and focus on functionality that isn't there yet.

@lividgreen

http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/working-with-objects.html#by-simple-conditions
look at the last example in this section:

// A single user by its nickname (__call magic)
$user = $em->getRepository('MyProject\Domain\User')->findOneByNickname('romanb');
@boenrobot
Contributor

@lividgreen Regardless, how do you expect IDEs to behave when a class has such a "template" defined? Offer to complete the method/property name up to the first "*"?

@bkuhl
Contributor
bkuhl commented Mar 26, 2013

@lividgreen This is an example where I don't believe * is the solution. Technically, findByNickname may exist, but findByFavoriteColor doesn't. So using findBy* is less accurate than having a @method for each method that's available.

I understand the desire to have such functionality, I'm just not sure we've found the best solution yet.

@pocesar
pocesar commented Mar 26, 2013

* would "signal" to the IDE to "parse" the switch/ifelse comparisions inside __call or __get for example, that's the only thing I can think of. but using a Regex group would be far more interesting:

/**
 * @property string $(field|yield|row)_as_(json|array|date|datetime)
 */
@boenrobot
Contributor

To add to @bkuhl's point about accuracy... does Doctrine generate its ORM methods, or are they determined at runtime by a common class?

If they're supposed to be generated, IMHO, it should be up to the ORM generator to also generate appropriate @method tags (which eliminates the need for any wildcards). If they're determined at runtime... I guess one has to bite the bullet, and either accept that "*" may falsely match an invalid method as valid OR bite the bullet, and not use such wildcards to begin with, and assume everything is potentially invalid.

@pocesar We've already talked above why Regex is a bad idea. And parsing the insides of a function is always extremely tricky, and in this case, it may be futile - If you have a switch/if-else inside a __call/__get method that happens to list all possible method/property names, you may as well be using separate methods/properties. Other scenarios on the other hand vary too much for any program to reliably determine the method/property names.

@jasonlotito

On the flip side, in most cases I've seen, you can determine from the code what the @methods should be. Even in Doctrine's case, the findOneByNickname isn't magical. nickname is a defined field somewhere.

Point is, it leads me to believe that considering all the ways you can create magic methods, it's better to write up a tool that will automatically add the @method documentation to your classes when there are two many. Either that, or write them up manually.

After all, IDEs already have the capabilities of not warning you that you are calling an undefined method of magic methods exist. Doctrine already knows the fields it can work with, and can generate the @method comments for you.

In any case where * would be useful, I can't help but think that's an inappropriate use for methods in the first place.

@method select*()
selectStarFromUsersWhereUsernameEqualsJason()

Where does the madness end?

Oh, and to add: Documentation should be easy. Easy to read, easy to write, easy to consume. The last thing I want is a syntax error in my documentation because I screwed up some awkward new standard or failed at my regex.

@mvriel
Member
mvriel commented Mar 28, 2013

I have read the discussion and I am not at all convinced that wildcards should be allowed in an @method or @property call for the following reasons:

  1. What is the added value or goal? The only thing I can think of is to communicate that you may use findBySomething for example but how should an IDE or documentation tool work this?

  2. Even if findBy* is a goal; it is only marginally more accurate than omitting the @method at all; the only thing you say is that methods starting with findBy MAY be legal but there is no way to say for sure. This does not being accurate documentation; which is our goal.

    parsing code is not an option; there is no way for phpDocumentor or an IDE to tell how the __call works.

I am afraid that I consider this a 'fix the process, not the tool' type of situation where it would be beneficial for everyone if specific @method calls are introduced by the generating code or library builder.

I am more than happy to hear other sides of the story, at the moment I am not persuaded (yet?).

@robforrest

Having read all of your comments, I'm tending to agree that a wildcard strategy is pretty pointless. Can someone remind me why a regex solution was ruled out? It would have the benefit or providing using the IDE with right structure for the findBySomethingAndSomethingElse calls and be generic enough to solve the wildcard requests. Is it simply a matter of performance?

@mvriel
Member
mvriel commented Mar 28, 2013

My opinion is that, while performance is certainly an issue, regular expressions make reading the source documentation complex and adds a complexity layer to maintaining it. If that had a clear benefit then it might be worth it but thus far I have not been able to think of, or read, of a use case where it simplifies usage and still provide strict and usable documentation.

@boenrobot
Contributor

... and in addition to that, there's no single regex language. There are multiple dialects, which differ on some more exotic features. Having that said

Which regex dialect to use?

  • We could explicitly say in the docs "PCRE", but that means Java based editors like NetBeans will have a hard time implementing this, since they use Java's native regex-es, not PCRE.
  • We could say this implementation defined, leaving "PCRE" as merely recommended. This will create a lot of problems when using exotic regex features, available in only one or another dialect.

Wildcards don't pose the same problem (they're trivial to implement as a wrapper around any regex library, if they're not natively available), but the "principle" issues like readability and usefulness still apply.

@lividgreen

You have to realize, that @property before class don't mean that this property is correctly recognized by __get() magic method. But this notation helps when we use some API.
Well, if we have generated class with strict number of properties, we don't need wildcards. But when we have properties where its name contains parameter to __get(), which can't be resolved before runtime, wildcards are useful.
Example: we have a list of predefined data structures: MongoDate, DateTime, MyDateImpl.... It can be extended.
So, we want our class return any property (data) using convertors, which are just classes like MongoDateConvertor:
$a->dateAsMongoDate // return MongoDate because it ends with AsMongoDate
$a->dateAsMyDate // returns MyDateImpl
...
We cant define @property for each property/convertor, because we don't know total amount of convertors.
We can use different words for compatibility
@property-regexp
@property-template
etc.

@boenrobot
Contributor

But that's just it - in principle, if you have an API design that demands the use of such defined-only-at-runtime properties, your API design is bad. A better one is to have a generic getter with arguments (that are documented properly, of course), as that's what you're actually doing anyway, or better yet, have an __invoke, e.g.

$a('date', 'MongoDate');

A common argument against the above is the fact that IDEs can't offer hints for the argument values, which is a fair point - that's why fixed set of __get/__set properties (with according @property docs) is still a preferable option for fixed sets of properties... but wildcards don't solve that problem for a runtime defined set.

Just because you can do something in PHP doesn't mean you should and be further encouraged by pointless additions like wildcards that miss the problem entirely.

EDIT: I should point out that at this point, I'm not "against" adding wildcards - but I'm not "pro" either, as the added value appears next to none, while causing the aforementioned readability and accuracy issues.

@pavoleichler

+1 for me. I agree that regexes are an overkill though and would rather stick with simple wildcards.

For me, a DB model / ORM class seems as a good example where they are useful.

Having the following magic methods:

getBy[ColumnName]($value)
getBy[ColumnName]Like($value)

We could phpdoc them:

@method-template \DatabaseRow getBy*(mixed $value) Fetches a row by the given column value.
@method-template \DatabaseRow getBy*Like(string $value) Fetches all rows satisfying the like condition.

Or even better, we could name the wildcards:

@method-template \DatabaseRow getBy[Column](mixed $value) Fetches a row by the given column value.
@method-template \DatabaseRow getBy[Column]Like(string $value) Fetches all rows satisfying the like condition.

Now the IDE could benefit of this in the following ways:

  • Provide code completion for these methods When I write 'getB', the IDE would propose me with getBy[Column] and getBy[Column]Like methods. When I choose one, it completes the whole method name and places my cursor at [Column]. NetBeans already does exactly this with code templates and several built-in code completions. When you write 'fun' and expands it to a function definition, NetBeans expands it to: function functionName($param){} Now 'functionName' is outlined with a red box and when you start typing, it is simply replaced with whatever you type. When you hit enter, it jumps to $param and highlights it with the red box in the same way. If the method name contains multiple wildcards (e.g. getBy[Table][Name]), the IDE should cycle through them in the very same way.
  • Type hinting. The IDE now acknowledges, that getByName() returns a \DatabaseRow object and could provide me with correct code completion when I type 'getByName('Mary')->'.

This even seems useful for me to be included in the human readable documentation as well.

@pavoleichler

Oh, and thinking about the concept, I eventually tried it for real. Finding out wildcards already work in NetBeans, at least Netbeans 7.3. :) The syntax is as follows:

@method getBy${Column}($value)

Code completion behaves exactly the same way, I have described. However type hinting for the returned values does not.

@boenrobot
Contributor

Whoa.

Well... that certainly changes the whole discussion. I was actually thinking yesterday about the moving cursor part you were talking about above (and started leaning "pro" due to that), but it already being implemented completely blows my mind.

On a closer inspection, this appears to be a bug rather than a feature at this point.

Disregarding that for the moment... I'm not sure what syntax would be better, considering we must also cater to properties though. I mean

@property MyClass $${table}index

vs

@property MyClass $*index

On the one hand, the first syntax is more verbose (and => descriptive), but with all the "$", it's becoming confusing. Then again, a "*" at the beginning doesn't make it much more readable. Perhaps if we simply remove the "$", i.e.

@property MyClass ${table}index
@method getBy{Column}($value)

?

Unfortunately, this doesn't currently work in NetBeans, but like I said - the current implementation is a bug anyway.

Also another thing to note: The type (of the return/property) should probably NOT be allowed to contain such abstractions, for the sake of providing the second benefit mentioned above.

@lividgreen

FYI, PhpSstorm proposed implementation of "Factory Method" in latest EAP build.
http://blog.jetbrains.com/webide/2013/04/phpstorm-6-0-1-eap-build-129-177/
This feature is very good and useful, but you have to place your descriptions in separate file in PhpStorm format.
That's why extending phpdoc is very valuable to make all IDEs work the same way. It's not hard to extend phpdoc to achieve "factory method" functionality right from annotations.

/**
  * @method-template Core myFactory({'core'}) Get Core
  * @method-template Model myFactory({'model'}, name) Get model by name 
  * @method-template Table_{Model} getTable({Model}) Get table by name
  * @method-template Table_{Model} get{Model}Table() Same as above but with magic method 
  */
@boenrobot
Contributor

Having a separate tag is probably a bad idea BTW, particularly due to properties - we'd have to also add @property-read-template and @property-write-template. That's too much IMHO. Some kind of a modifier that turns on these advanced features - maybe - but not separate tags.

Having that said, with regards to the syntaxes above:

@method Core myFactory({'core'}) Get Core

Looks confusing. It can easily be mistaken for a string literal. Why the apostrophes anyway?

@method Table_{Model} getTable({Model})

The value of the argument - and therefore, the return type - can't be determined statically, unless it's a scalar literal. This eliminates the whole point. That's not going in, for sure.

@method Table_{Model} get{Model}Table()

Interesting... that one could work... IF we make it explicit that it's invalid for the type of a method/property declaration to have a named wildcard that's not present in the name.

There's just one potential problem - there was another proposed feature discussed previously (#557) about enumerated values for properties, which then escalated into having ways to impose restrictions on types in general. This syntax would conflict with it. Care to propose a way to reconcile those two?

EDIT: Hmm... maybe if we use "<" and ">", but should we use them for this or for that?

@method Table_<Model> get<Model>Table()
@method int{0...2} getStream()

vs

@method Table_{Model} get{Model}Table()
@method int<0...2> getStream()
@lividgreen
@method Core myFactory({'core'})
class AClass {...}

that means Factory Method support implementation (see http://youtrack.jetbrains.com/issue/WI-6027)
That means:

AClass::myFactory('core') // returns an instance of Core

{Model} means identifier to be used in type resolution
{'core'} means exact string value

@boenrobot
Contributor

Why would you do that to begin with?

If there's a specific list of possible return types from the factory method, you can do that in a single @method declaration, separating the types with a "|". e.g.

@method Core|CoreExt|CoreAltExt myFactory($what) Creates an instance of an object, the type of which is determined by the value of $what

If there's a runtime determined pattern, then specifying a literal defeats the point. Something like

@method Model_{Type} myFactory({Type})

Makes more sense in terms of specifying the pattern.

But this goes back to what I said above - an argument's value can't be determined statically, again defeating the whole point.

If the idea is to enumerate the list of possible values of the argument, and how each would relate to the return type, then IMHO, the value enumeration, combined with the fact @method can occur multiple times and inline doc blocks should suffice, e.g.

/**
* @method
*     {{
*         Creates a Core object.
*         @param string<'core'> $what
*     }}
*    Core myFactory($what)
* 
* @method
*     {{
*         Creates a CoreExt object.
*         @param string<'ext','CoreExt'> $what
*     }}
*    CoreExt myFactory($what)
*
* @method
*     {{
*         Creates a Model object from the supplied data.
*         @param mixed $data Stuff to fill the model with.
*     }}
*    {Type}Model get{Type}Model($data)
*/
class AClass

Verbose? Hell yeah, but edge cases call for edge syntax.

@pavoleichler

@method and @property(-read/-write) tags and one of get<Model>Table() or get{Model}Table() look perfect to me. Not sure which one to choose. Probably I would go with curly braces for wildcards as they seem to be close to what the IDEs tend to use naturally. Additionally angle brackets for enumerations <1...3> may be a little less confusing, as they make it clear the syntax differs from the regex one (while {1...3} is a bit simillar to {1,3} in regex).

Thinking of the whole subject, I came across the following two issues:

  • Having multiple wildcard method definitions, it may be impossible for the IDE to choose the correct one. Imagine a class with these two in phpdoc: @method ObjectA getBy{Column}($value) and @method ObjectB getBy{Table}{Column}($value). Now, the IDE will be left unsure if the method getByBlahBlah() will return an ObjectA or ObjectB. Not sure if this could be resolved, but probably is not too serious.
  • Imagine a case simillar to the one described in the phpdoc @method documentation. Let's have a Parent class and a Child class. The Parent has a magic __call method, assume it acepts get{Anything}. The Child class relies on this method, but has only two valid formats: getInteger and getString . Now if we phpdoc the child class with @method getInteger() and @method getString() and the parent class with @method get{Anything} we will end up with an IDE suggesting get{Anything} for the child class as well, even though it is redundant. Should there be a way to overide the parent definition in the child class and disable the wildcarded method?

As for the following syntax:

@method Table_{Model} get{Model}Table()

It assumes that the names of the returned object types match the names of methods, which may not always be the case. That makes it useful in a very special cases only, doesn't it?

Finally, for the Factory method support. I understand this is a common practice in various frameworks and may be useful. It may be worth a separate discussion, as this is not really related to wildcards and may have a completely different syntax.
Additionally, if it would be possible to define a specific return value for a method with a specific string argument, it may be interesting to thought it out to be compatible with any argument value type as well. Something like:

@method mixed myFactory(int|string) General method definition.
@method Core myFactory('core') Returns a Core object when passed a string 'core'
@method Three myFactory(3) Returns a Three object when passed a number three
@method Digit myFactory(<0...9>) Returns a Digit object when passed one of the numbers 0 - 9
@method ExtendedObject myFactory(Object) Returns an ExtendedObject when passed an Object

Anyway, this could be useful in the IDE only when the argument is not passed as a variable, as it was pointed out already. When a variable is passed in, the IDE would have to use the broadest definition available or ignore the type hinting completely.

@boenrobot
Contributor

Having multiple wildcard method definitions, it may be impossible for the IDE to choose the correct one.

They'd simply have the union of all possibilities, which I'm pretty sure they're already doing in cases you have the same method name occurring more than once, and have it return different types different times.

If they aren't, IMHO, we should probably clarify that they should.

Imagine a case simillar to the one described in the phpdoc @method documentation.

That one's even easier. As the docs already imply, the definition from the parent is added to the child's one, as you already hinted. If the parent is able to deliver "{Anything}", then it seems logical to let the child handle it too. Otherwise, you simply wouldn't document it like that.

In the particular case you describe where the parent can't deliver "{Anything}" but forces children to abide a pattern, the best place to describe the pattern would be parent's __call method, while the concretes would be described at the child's @method tags.

As for the following syntax:

@method Table_{Model} get{Model}Table()

It assumes that the names of the returned object types match the names of methods, which may not always be the case. That makes it useful in a very special cases only, doesn't it?

Well, there's still the possibility of a multiple @method declarations to describe such cases. The cases of mismatch seem like special cases, rather than the other way around though. Besides, if you have both a wildcard call method that returns a mismatching wildcard type... what kind of a terrible API design are we talking about? That's not even a wildcard anymore... but a "map".

Finally, for the Factory method support.

That syntax here seems like it would pose quite a parsing challenge (especially in terms of efficiency), considering what the @method tag is currently allowed to contain.

It may be worth a separate discussion, as this is not really related to wildcards and may have a completely different syntax.

True, but whatever syntax we pick for this feature is going to affect future features (which is why I brought up the enumerated values and inline doc blocks into the mix), so it helps to keep it all in mind to avoid breaking the possibility of a nice new feature before said feature even appears.

@pavoleichler

I agree with all points.

I even changed my mind on get<Model>Table() vs. get{Model}Table() debate. I thought the earlier example with @method Core myFactory({'core'}) was part of the PhpStorm draft, but that was actually an example only. That makes my argument on IDEs adopting the curly braces invalid.

Concerning the Factory method, in any case I did not really like the proposed syntax with both curly braces and quotes:

@method Core myFactory({'core'})

and would rather have the quotes only or a completely different syntax.

@alganet alganet referenced this issue in Respect/Validation Jun 26, 2014
Closed

Change docblocks to reflect magic static methods #184

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment