Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

null vs NullPage with in-memory selector #896

Open
adrianbj opened this issue May 30, 2019 · 25 comments
Open

null vs NullPage with in-memory selector #896

adrianbj opened this issue May 30, 2019 · 25 comments

Comments

@adrianbj
Copy link

@ryancramerdesign - any chance you could please revisit:

"@todo check if selector, then call findOne(). If it returns null, return a NullPage instead."

https://github.com/processwire/processwire/blob/ad2f60b5445d2449920227f96f7ba1ab49777ece/wire/core/PageArray.php#L182-L187

The discrepancies between these is a little frustrating because you can't simply do a:

if($results->id)

on it like you could with a DB ($pages) selector.

This demonstrates pretty clearly:

image

Thanks!

@Toutouwai
Copy link

Toutouwai commented May 30, 2019

This would be a breaking change though. There will be lots of existing sites with code like...

$p = $page_array->get('title=foo');
if($p) {
    //....
}

...which will break if $p becomes a NullPage object when there is no match in the PageArray.

@adrianbj
Copy link
Author

@Toutouwai - yeah, I know but it's such a weird inconsistency that it regularly trips me up. We're often telling beginners to check if a page is returned by using $p->id, but then if they are using an in-memory selector, it doesn't work. Maybe this goes back to my argument that v4 needs lots of breaking changes. But I digress :)

@Toutouwai
Copy link

We're often telling beginners to check if a page is returned by using $p->id, but then if they are using an in-memory selector, it doesn't work.

My view is that what's needed is a prominent page in the docs (or maybe a highlighted section at the top of the Selectors page) that explains the important differences between PageArray (WireArray) selectors and PageFinder selectors, because there are several that trip people up. Besides what you've highlighted above, in a PageArray selector you...

  1. Cannot use time strings like "today"
  2. Cannot use page status strings like "hidden"
  3. Cannot use OR-groups
  4. Cannot use sub-selectors
  5. Must use correct case, whereas PageFinder selectors are case-insensitive

The return values of WireArray::get() and WireArray::findOne() are documented, so the problem is really that folks assume that everything will be the same as with PageFinder methods - if the differences were highlighted more prominently there might be less confusion.

@adrianbj
Copy link
Author

Maybe you're right that better docs would help, but I still don't think it's good that find returns an empty PageArray, but get returns null. It just doesn't seem logical to me.

I realize that breaking changes are not ideal, so if it's going to stay as is, if it was officially recognized and flagged as a known "bug", I'd feel better about it.

@LostKobrakai
Copy link
Collaborator

LostKobrakai commented May 31, 2019

To me this seems to be mostly a result of improper alignment of the WireArray api with the one of Pages. WireArray doesn't use null objects. It just returns WireData | boolean, so by the Liskov Substitution principle a PageArray cannot return a null object as well. This of course stands in contrast to the Pages class, which is mimicking the WireArray api, but does use the concept of returning null objects. I feel like a proper fix would need a breaking change one way or the other.

Edit:
Just to make my post more clear: NullPage is a WireData object, so it would fit the contract, but WireArray::findOne is documented like so:

Returns item from WireArray or false if the result is empty.

https://processwire.com/api/ref/wire-array/find-one/

@ryancramerdesign
Copy link
Member

ryancramerdesign commented Jun 21, 2019

PageArray and $pages may have some similar named methods, but that's because we try and maintain a similar naming convention across the entire API, for everything. But this does not mean that all similarly named methods return the same things. Regardless of name, every method has its own arguments and return value, so what appears in the documentation for one find() method will be referring to that find() method only and not others.

PageArray is a type of WireArray and built to behave as a WireArray rather than the $pages API variable. Likewise WireArray is built to be able to perform like or substitute for a PHP array in several respects. If PageArray did not behave as a WireArray, it could not be derived from it.

Also, any type of array is a container, something that can be empty (0 items) or non-empty (1+ items), and these kinds of array types should not return null objects because a null object is still an item with an object type.

@adrianbj
Copy link
Author

Also, any type of array is a container, something that can be empty (0 items) or non-empty (1+ items), and these kinds of array types should not return null objects because a null object is still an item with an object type.

Then why does a DB selector return NullPage - surely by this logic it should also return null What am I not understanding?

@ryancramerdesign
Copy link
Member

ryancramerdesign commented Jun 21, 2019 via email

@adrianbj
Copy link
Author

I understand that $pages is not an array.

I am obviously not explaining my point properly, so I'll try again. It would be a breaking change so it's not going to happen anyway, but:

Doing a $pages->findOne() returns NullPage but doing a $inMemoryPages->findOne returns false

This means you have to check the response in different ways depending on what you are doing the find on. Many newbies won't understand the difference between DB and in-memory selectors and I don't think they should have to know - it should be seamless IMHO.

I just don't understand why an in-memory selector can't return NullPage

Here's another example that is maybe clearer:

image

Whether it's a DB selector or in-memory selector, we get a PageArray with find, but as soon as we do findOne, the behavior is now different (NullPage vs false) - I find this unintuitive.

@LostKobrakai
Copy link
Collaborator

LostKobrakai commented Jun 21, 2019

$pages might not be a runtime array, but if you look at all the "page retrieval" functions (not the implementation, the names) I don't see how this is not acting like a collection. And quite a lot of code in processwire is in regards to retrieving the correct page from the available, so it would be nice to be able to write abstract code without needing to differenciate if the selection is happening on all the pages in the db or just a previously queried subset.

I can understand that besides retrieval $pages doesn't act like a collection, but I'd guess that this topic is not about that at all.

This means you have to check the response in different ways depending on what you are doing the find on. Many newbies won't understand the difference between DB and in-memory selectors and I don't think they should have to know - it should be seamless IMHO.

I think we need to be a little more careful here. New people need to understand that $pages does query the db and page arrays are already queried pages in memory. Where I see the problem is this:

$posts = $pages->find("template=post")
$event = $pages->findOne("template=event")
$post = $posts->findOne("sort=date")

[…]
// Way later, potentially another file
if($post){ //show }
if($event->id){ //show }

This means even though I call the same function on a class, which allows me to retrieve a single page out of a bigger total I need to be aware of the different return values. This can easily result in hard to find bugs and is especially problematic for refactoring. E.g. I might also decide to show more events on the same page and do:

- $event = $pages->findOne("template=event")
+ $event = $events->findOne("template=event")

This makes the return value change, without clear notice, potentially breaking later code, which relied on getting a nullpage not false (or the other way round as well).

@ryancramerdesign
Copy link
Member

ryancramerdesign commented Jun 21, 2019 via email

@adrianbj
Copy link
Author

but you are still coming from the
idea that a method like findOne() or get() is supposed to work the same
between $pages and a PageArray.

Yeah - I think they should work the same, after-all $pages->find and PageArray->find work the same, eg:

image

image

So surely if the find method works the same, why shouldn't the findOne method work the same?

In the second example, we are returning an empty PageArray for both $pages and PageArray - they are both working the same. I really think that findOne needs to work the same as well - otherwise it's just inconsistent.

If fixing this wasn't a breaking change, can you honestly say that the current behavior makes sense and you would build it the same way again?

That hook option might be nice if it could replace findOne, but I don't honestly think anyone is going to want to use a new findOnePage custom method just to get NullPage. FYI, in case someone wants to use that, you'll need $e->return = instead of return

I know we're not seeing eye to eye on this, but can you please explain why get returns null and findOne returns false - is there a logical reason for this difference?

@BernhardBaumrock
Copy link

As mentioned in the other issue where I was getting a little off-topic, my suggestion to solve and finally close this old issue would be to add the following method to the API which does not solve the conceptual problem that selecting pages from the DB and from in-memory objects can by design return different things but it can help in writing code that works as expected no matter what PW does under the hood:

if($wire->pageId($something)) {
    // page exists
} else {
    // page does not exist
}

This would solve the problem described by @adrianbj in the initial post:

The discrepancies between these is a little frustrating because you can't simply do a:
if($results->id)
on it like you could with a DB ($pages) selector.

And it would also solve the problem later described by @LostKobrakai.

@adrianbj what do you think?

@ryancramerdesign
Copy link
Member

@BernhardBaumrock I have gone ahead and added wirePageId() and pageId() functions. Both do the same thing, but pageId() is only present when the functions API is enabled. I couldn't find a reason to make them methods of $wire since the instance wouldn't matter for these functions.

@BernhardBaumrock
Copy link

BernhardBaumrock commented Aug 4, 2023

Hey @ryancramerdesign thx a lot!

I'm not sure if I'm happy with that and maybe it's just a matter of lacking knowledge from my side. So it would be great to get a little more details here.

I'm always using syntax like $this->wire->foo->bar because this seems to be the most logical way of writing code for me. I start with $this most of the time, then make sure to use the wire instance (to avoid naming conflicts if a class uses a property that is otherwise bypassed to the PW API if it does not exist) and then traverse down the PW classes to get to the feature that I want (like $this->wire->files->render(...) for example). I know that $this->wire() would be slightly more efficient because it saves one magic method call, but I just prefer $this->wire->foo ...

I'm never using function calls at all. So I'm a little concerned about these additions. And I'm even more concerned about pageId() as it is only available when the functions api is enabled. One of my main goals when writing code is always to make sure that things are reusable and just work as they are, no matter what environment the user has set up.

If I used if(pageId($foo)) ... in a module this would mean I'd have to instruct the user to enable functions API, right? Not ideal in my opinion.

On the other hand wirePageId() seems to work in any situation, right? I'm never ever using these function calls. Maybe there's nothing bad with them, but I guess they remind me too much about WordPress and it's the_loop() and the_post() etc.;

That's why I suggested to have a global $wire->pageId() method. We could then use that method on all wire derived objects. For me this is just more aligned with what I'm used to. I'm not really feeling good about using global function calls without any context. But maybe that's something I should work on rather than asking for a change in the API.

On the other hand I still think that this is a very important and helpful addition. And for me having to use wirePageId() feels weird if I know I could also use pageId(). I know this might seem picky, but I'm spending a lot of time with PW code and so little things matter. If using pageId() is only possible when the functions API is enabled it feels like an extra step necessary that would actually not be necessary. I hope you understand.

$wire->pageId(...) on the other hand is longer then wirePageId(...) but for me it still looks more logical. So my comment might be too philosophic, but still I'd be interested if there is a technical difference or if one is better than the other in any way.

Thx in advance for helping me to understand and improve.

@teppokoivula
Copy link

teppokoivula commented Aug 5, 2023

This is obviously a matter of preference, but (as Ryan mentioned above) in cases where the ProcessWire instance doesn't matter, wire-functions are perfectly fine. They are indeed always available and in scope (as long as you've booted up PW) and don't need to be called through a Wire derived object.

Global functions are also quite nice when working with static methods: it feels awkward (and likely adds some unnecessary overhead) if you have to get a full blown ProcessWire instance and call methods from there just because you want to use a simple helper function.

I get where you're coming from, but having and using some global helper functions is not a bad thing. Especially since in our case they are prefixed with "wire", which makes it unlikely that you'd have a competing implementation in your own code, or third party code you're relying on. For an example I would argue that there's less chance of clash with wirePageId() (ProcessWire) than with is_page() (WordPress) 🙂

And I'm even more concerned about pageId() as it is only available when the functions api is enabled. One of my main goals when writing code is always to make sure that things are reusable and just work as they are, no matter what environment the user has set up.

That's one of the reasons I personally don't use Functions API, especially in my modules. Another is that I just don't enjoy the syntax as much, and don't see much benefit in it (for my use cases). And third is that from a more philosophic point of view (😉) I think it's better not to have too many syntax options available when there's little functional difference.

That's why I suggested to have a global $wire->pageId() method. We could then use that method on all wire derived objects.

On the other hand: you would indeed need to get a Wire derived object to use it. In many cases one may already be available and you're perhaps already "in it" and can use $this, but at least for me that's often not true (it's a non-Wire class, I'm in a static function, etc.) For those use cases it's just more work and overhead without real benefits, apart from the syntax.

@teppokoivula
Copy link

teppokoivula commented Aug 5, 2023

Oh, and one more thing: adding new methods to the Wire class directly is not a good long term strategy from an architectural point of view, since it keeps adding more baggage to nearly every object in ProcessWire. Offloading such methods to separate on-demand classes may be slightly better, but not ideal either, in my opinion.

Not to mention that there's actually a decent chance that a custom class somewhere already has pageId() method 😁

@BernhardBaumrock
Copy link

Hey @teppokoivula thx a lot! Very good points :) I think I'll make myself more familiar with all the wire functions :) My IDE properly suggests me all of them, so there's really no reason not to use them.

The only concern left now is that if(wirePageId($something)) will not be available for older versions of PW whereas if("$something") will always work (unless $something is not something that can't be casted to a string...).

But time will solve that problem, so it's a very welcome addition :)

@matjazpotocnik
Copy link
Collaborator

@adrianbj (and others), what should we do with this? Are we ready to close it? I know it doesn't solve the issue...

@adrianbj
Copy link
Author

adrianbj commented Aug 7, 2023

Personally I think it should be kept open just in case Ryan decides at some point that breaking changes would be acceptable because improved consistency would definitely make the API nicer and easier to work with.

Not that I love the idea of having breaking changes, but I still think it's worth keeping a record of these issues - even if it's just a help for others who have similar confusion in the future.

@ryancramerdesign
Copy link
Member

...breaking changes would be acceptable because improved consistency would definitely make the API nicer and easier to work with.

PageArray is currently consistent with WireArray and all types that extend WireArray, so making such a change would be making it inconsistent with its type, which we're not likely to do as it has potential to break things that accept any type of WireArray. What would probably be used to accomplish the kind of addition you are looking for is an option to have it return results in an array type of object that likely isn't a WireArray type and perhaps has methods to let you continue querying the items within it using the database. In order to avoid the inherent efficiency problem of doing this, perhaps it could hold off on executing its DB query until you attempted to iterate the found items. That would let you stack find() calls without too much penalty.

@adrianbj
Copy link
Author

adrianbj commented Aug 9, 2023

@ryancramerdesign, thanks for thinking about this further, but I am not sure this is about PageArray vs WireArray. The mentioned inconsistencies are about differences in what is returned from in-memory vs database find operations and also the differences between what is returned (null vs false) when there are no results from a get vs a findOne call.

image image

With the DB calls we always get a PageArray, Page, or NullPage, but with the In-memory calls, we still get PageArray, and Page, but when there are no results, it's either null or false instead of NullPage. I just don't understand this discrepancy especially given that the original find returns a PageArray that I've assigned to $results

I also keep coming back to this note: https://github.com/processwire/processwire/blob/ad2f60b5445d2449920227f96f7ba1ab49777ece/wire/core/PageArray.php#L182-L187 so it seems like you have had similar thoughts in the past.

Anyway, I don't want to keep rehashing this one forever :)

@adrianbj
Copy link
Author

Just pinging because I updated the examples above from my initial version of that post.

@ryancramerdesign
Copy link
Member

Similarly named methods in WireArray (PageArray is one of many) are not supposed to behave the same as those from $pages because those methods originate from WireArray, which is not related to pages, but Wire derived objects in general. Anything that's of a derived type (like PageArray is a type of WireArray) has to behave according to the type, and not according to other classes outside its family. This is not a matter of preference or opinion.

I think there are more benefits for PageArray to be a type of WireArray than for it to become something different that starts querying the database on filtering operations, or that returns NullPage rather than null.

I'm not sure how to explain it better than I already have. What you are requesting is not an option in the PageArray type. Though I'm always happy to keep the conversion going, but would prefer to keep this issues repo focused on bugs. I'll definitely keep thinking about an option to support custom array types for return values from $pages methods, which is how we'd have to support what you are requesting.

@matjazpotocnik
Copy link
Collaborator

@adrianbj just to let you know that the text "DB get without results" should be "DB find without results" (and similar on the second screenshot)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants