Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.0] API Updates #90

Closed
rtheunissen opened this issue Jul 17, 2017 · 23 comments

Comments

@rtheunissen
Copy link
Member

commented Jul 17, 2017

@rtheunissen rtheunissen added the API label Jul 17, 2017

@rtheunissen rtheunissen self-assigned this Jul 17, 2017

@rtheunissen rtheunissen changed the title [WIP] [2.0] API Updates [2.0] API Updates Jul 19, 2017

@rtheunissen rtheunissen added this to the 2.0.0 milestone Jul 19, 2017

@wwulfric

This comment has been minimized.

Copy link

commented Sep 1, 2017

Hi What's the release plan about api 2.0?

@rtheunissen

This comment has been minimized.

Copy link
Member Author

commented Sep 1, 2017

As soon as I can get around to it I'm going to implement everything, test, update the documentation, and release a beta. By Christmas, hoopefully. 🎄

@riveraj

This comment has been minimized.

Copy link

commented Oct 25, 2017

@rtheunissen Would you consider adding functional iterators to the Collection interface (map() and filter(), for example)?

@marcospassos

This comment has been minimized.

Copy link

commented Nov 9, 2017

Please, consider adding a method for checking equality between collections.

@marcospassos

This comment has been minimized.

Copy link

commented Nov 10, 2017

@rtheunissen is in your plans allow objects of distinct hierarchies to be considered equals by the map? IMHO, this constraint should not exist. The contract should impose only that if $a->equals($b), then $b->equals($a).

@marcospassos

This comment has been minimized.

Copy link

commented Nov 10, 2017

Another point I'd like to suggest considering is to split Hashable into two new interfaces (it could be done right now without BC breaks):

interface Equality {
    public function equals(Equality $other) : bool;
}
interface Hashable extends Equality {
    public function hash() : int;
}

How about fixing the hash code type (int)? Leaving it open may be tricky in the future in case of switching the internal implementation of the hash table.

@Fleshgrinder

This comment has been minimized.

Copy link

commented Dec 23, 2017

Splitting the interfaces definitely makes sense for PHP, however, they should look as follows:

interface Equatable {
    function equals($other): bool;
}

It should not be required for the $other value to implement the interface. This makes the method more useful in every day situations where one just wants to know if two things are equal or not, regardless of what they are.

interface Hashable extends Equatable {
    function hash(): string;
}

Requiring an int limits us to a i32 on 32-bit builds or i64 on 64-bit builds. This is a problem! We should use u128 but that is not available in PHP, hence, using a string solves the problem. That can either be binary or hex.

Ideally the extension comes with a fast hashing function like MurmurHash3_x64_128 or recommends it.

@rtheunissen a thing I would like to see in 2.0 is the choice to have methods that return null and ones that trow exceptions. Using exceptions for control flow is simply wrong because unrolling the stack just to continue normally in the domain code is waste. Hence, every procedure that may fail should be offered in two manifestations, e.g.:

  • Set->first(): ?T + Set->firstChecked(): T throws
  • Set->last(): ?T + Set->lastChecked(): T throws

I also believe that methods like Collection.capacity() should not be instance methods but rather named constructors Collection::withCapacity(int $capacity) -> Collection. Changing the capacity on-the-fly makes no sense and should be handled internally.

Another thing that would be awesome would be the possibility to generate type constrainted versions of the various DS types. Something like MapGenerator::new(string $path)->generate(string $class_name) and MapGenerator::new(string $path)->generateScalar(string $type_name). Not sure how feasable that is for your 2.0 goal but this could be added at any point in time as a new feature.

I'm posting this here first to hear what you think about it instead of directly creating issues for everything.

@marcospassos

This comment has been minimized.

Copy link

commented Dec 23, 2017

@Fleshgrinder

This comment has been minimized.

Copy link

commented Dec 23, 2017

Sounds good, yeah. Important is that int is definitely not sufficient especially because it is not portable.

@rtheunissen

This comment has been minimized.

Copy link
Member Author

commented Dec 23, 2017

@Fleshgrinder the hashable interface does not require that the return value is an integer. It could actually be anything at all, and the internals determines what the hash of that value is. For example, hash() could return something like $this->name rather than the actual hash of that value.

The problem with having methods return null is that it creates theoretical ambiguity. What if the first value was null, and that null is an acceptable value in its domain? It's unfortunate that PHP doesn't have the equivalent of undefined. It's up to the developer to not use the exceptions for control, but to prevent that state from occurring at all, using things like ! $stack->isEmpty() etc. It's important that we prevent ambiguity here.

Capacity is likely out the door for 2.0, I honestly don't see the benefit in anyone using it. The overhead of the method call is likely to cancel out whatever performance gain you'd get from the preallocation. I haven't done any tests on that but in the context of PHP it's my opinion that it does nothing bad add noise and bloats the API.

As for types, I think we should wait for Generics to land in core. Once that happens we can add type constraints and optimisations for scalar types but until then it's like we're trying to invent the wheel with no idea what it's supposed to look like.

@Fleshgrinder I'm happy to split the interfaces. But why should we? equals is required for Hashable, so why not make it part of the interface? Otherwise we're extending an interface that will likely only ever be used by Hashable anyway. Seems like an unnecessary separation of concerns here.

@Fleshgrinder

This comment has been minimized.

Copy link

commented Dec 23, 2017

@rtheunissen what are you using to hash the value returned by hash? (Could not find it while skimming through the sources.) zend_inline_hash_func is used.

I know that null is problematic but offering a firstChecked() should solve that. The problem with combinations of if set.isEmpty { set.first() } is that static code analyzers cannot understand it, hence, one has to catch the exception or add comments to suppress the error/warning.

All good if capacity is out completely.

I have my doubt about generics landing soon but we can continue to wait, sure.

The split of the interfaces would mainly be of interest for types that want an equals but do not want to have a hash. There is definitely no use case for it in your extension. As I wrote in another issue of yours, other languages almost always put them together into the superclass (e.g. Object in Java, Basic in Ceylon, etc.) so keeping them together is also not bad.

@Fleshgrinder

This comment has been minimized.

Copy link

commented Dec 24, 2017

@rtheunissen Hashable.hash is actually a very confusing naming for the interface and method, since it is not at all a hash (internally but even there it boils down to a uint32_t and not a real proper hash). Not sure how this could be named to fix this situation or if anyone is actually interested in fixing it. The return type constraint annotation is also wrong of the interface method, it says mixed but actually, only scalar is allowed.

@rtheunissen

This comment has been minimized.

Copy link
Member Author

commented Dec 24, 2017

Hashable.hash is actually a very confusing naming for the interface and method.

I agree, it doesn't return the hash but the value to be hashed. I wanted to keep the names simple, but it's misleading now because you wouldn't be able to easily guess the responsibility of the method. Something more verbose like getHashValue would be more descriptive but it's such a simple interface that I'm not sold on a change to this.

The return type constraint annotation is also wrong of the interface method

This is a mistake - it should allow any value and leave it to the internals to determine how to hash it (htable, in this case). The user should never need to implement a hash function.

internally but even there it boils down to a uint32_t and not a real proper hash

That doesn't reeeally matter because the only function it serves is to distribute the keys in the hash table and to find the start of the collision chain. It's not used for equality or identity. The user could implement it to always return 0 but that would result in a terrible distribution. The requirement is that the user or class should produce a value that is not random, reproducible, and as unique as possible in the domain of the object. Whether it's a proper hash is insignificant, because we translate it to an integer anyway to find the effective index in the buffer (as a mod of the capacity).


My immediate resolution for this is to not rename anything, fix the return types to allow non-scalar types, and implement Hashable on all collections.

I've had a good think about the separation of equals, and I like it. I think it's a good idea to separate this from a responsibility point of view. If an object is considered equal in the context of collision resolution, it should also be equal outside of that. It's either equal or it's not. So that's a 👍 from me on that front.

@Fleshgrinder

This comment has been minimized.

Copy link

commented Dec 24, 2017

Broadening of the return type might have unwanted side effects. I've seen that you use the count of an array to determine the hash. Objects that return their properties for hashing (which would be a perfect thing to do for value objects) would always hash to the same in that case. Casting to object would not help since it would result in spl_object_hash

class C implements Hashable {
    function hash() {
        return \get_object_vars($this);
    }
}

Correct me if I'm wrong here.


I understand why you are hesitant to rename it. You should definitely expand on the documentation of the hash method if you want to keep it as is. I for one was totally confused and returned md5 results from it until I dug deeper into the sources and found out that I'm doing it wrong.

I would like to see Hashable.hashValues since it is super clear what it's for.


The split of equals into a dedicated Equatable interface just makes sense in PHP because we have spl_object_hash. That interface should be part of the core anyways but getting it from your extension is like the second best thing. 😉

@rtheunissen

This comment has been minimized.

Copy link
Member Author

commented Dec 28, 2017

I've seen that you use the count of an array to determine the hash

@Fleshgrinder internally we're actually serialising the array and using the resulting string's hash. The downside of that is obviously an O(n) hash function, so there's a lot more work to be done here.

@Fleshgrinder

This comment has been minimized.

Copy link

commented Dec 28, 2017

@rtheunissen that’s this one here. It stands and falls with the assumption that everything is serializable and one needs to deal with circular dependencies. Currently only scalar is permitted and this seems like a good idea to me. Leave it to the user to resolve all those issues manually, at least they need to think about what they are actually doing here. 😉

@teohhanhui

This comment has been minimized.

Copy link

commented Apr 19, 2018

Hashable.hash is actually a very confusing naming for the interface and method.

I agree, it doesn't return the hash but the value to be hashed. I wanted to keep the names simple, but it's misleading now because you wouldn't be able to easily guess the responsibility of the method. Something more verbose like getHashValue would be more descriptive but it's such a simple interface that I'm not sold on a change to this.

Pretty sure a misleading name is not worth it, no matter how "simple" it might sound.

@vudaltsov

This comment has been minimized.

Copy link

commented Sep 2, 2018

Hi, @rtheunissen ! Great extension, thanx for creating this project. Any updates on 2.0?

@rtheunissen

This comment has been minimized.

Copy link
Member Author

commented Sep 4, 2018

@vudaltsov it's on my list. 😅

@Majkl578

This comment has been minimized.

Copy link

commented Sep 10, 2018

@rtheunissen Now may be the right time to start preparing an RFC for 7.4. 😊

@rtheunissen

This comment has been minimized.

Copy link
Member Author

commented Mar 13, 2019

This is all outdated now. Please see https://github.com/php-ds/ext-ds/blob/2.0/DRAFT.php

@Majkl578

This comment has been minimized.

Copy link

commented Mar 13, 2019

Too bad there is no PR to review/discuss drafted changes.

@rtheunissen

This comment has been minimized.

Copy link
Member Author

commented Mar 13, 2019

Can discuss here if you want or can fork the 2.0 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.