Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array numeric string key gets converted to int #9029

Closed
piku235 opened this issue Jul 16, 2022 · 17 comments
Closed

Array numeric string key gets converted to int #9029

piku235 opened this issue Jul 16, 2022 · 17 comments

Comments

@piku235
Copy link

piku235 commented Jul 16, 2022

Description

I ran into this bug while playing with query params. My script had declare(strict_types=1) and was iterating keys from $_GET using a function with the typed argument string ...$keys, but surprisingly for GET /script.php?123=foo my script ended with a fatal error stating that argument to that function must be of type string but got int.

At first, I thought it was a bug somewhere in my code, but as I delved deeper and deeper, it finally led me to arrays. When I passed a numeric string key such as "123" it got converted to int, where I expected it to remain as a string.

The below code:

<?php
$arr = array('123' => 'foo');
var_dump(array_keys($arr));

Resulted in this output:

array(1) {
  [0]=>
  int(123)
}

But I expected this output instead:

array(1) {
  [0]=>
  string(3) "123"
}

It looks like this bug has existed for a very long time, also tested on PHP 7.0 and the result was the same.

PHP Version

8.2.0-dev

Operating System

No response

@cmb69
Copy link
Contributor

cmb69 commented Jul 16, 2022

Not a bug, but a deliberate design decision. From the docs:

The key can either be an int or a string. The value can be of any type.

Additionally the following key casts will occur:

Strings containing valid decimal ints, unless the number is preceded by a + sign, will be cast to the int type. E.g. the key "8" will actually be stored under 8. On the other hand "08" will not be cast, as it isn't a valid decimal integer.

@piku235
Copy link
Author

piku235 commented Jul 16, 2022

Ah, ok, I didn't know that, thanks for explaining!

@still-dreaming-1
Copy link

still-dreaming-1 commented Jun 7, 2023

This seems like a very bad design. If the type that was used for the key started out as one of the two valid types, either int or string, neither the type nor the exact value actually stored as the key should change for any reason. This design effectively makes it so that we often cannot be guaranteed what type to expect for the key. Often when the programmer is trying to use a string for the key, there is some chance that it could be a valid int. This effectively means that when using the key later, if it is both expected and required to be a string, it will have to be cast to a string just in case it was converted to an int by php. It also means that when using tools like psalm, when you type the key as string, it might actually not always be a string, and psalm won't be able to statically detect that you are storing the wrong type for the key because you are not, php is. This can lead to an exception at runtime in simple situations like a function with a return type of string that ends up returning a particular key when it expected all the keys to be a string. These are the types of runtime exceptions a tool like psalm should be able to catch and help the programmer prevent, but this design makes it much more complex of a problem to detect. Even if psalm were adapted to be able to catch this, it would have to become more complicated, and the types declared in the code to help psalm would become much more complex, your code itself becoming more convoluted.

Therefore as is, this design causes using anything other than an int as the key to be a bad practice, which means you would then have do something much more complex like store an array of arrays of keys and values instead of just using the keys as the keys and the values as the values.

@KapitanOczywisty
Copy link

This seems like a very bad design.

This is 20 years old design, and changing that would likely break a lot of apps. More realistic is introduction of new Map object.

@still-dreaming-1
Copy link

still-dreaming-1 commented Jun 9, 2023

This seems like a very bad design.

This is 20 years old design, and changing that would likely break a lot of apps. More realistic is introduction of new Map object.

I ended up realizing this later after doing more research on it. Those are all good points. However, I personally still see merit in the idea of at least considering whether this should be changed. Backwards incompatible changes do make their way into the language sometimes, and I think this is a case that warrants at least considering it. While those types of changes should not be taken lightly, it is also true that having to make changes to existing project code as backwards incompatible changes are made to the language is just part of software, and is sometimes worth it to make improvements to the language. I'm not completely opposed to the idea of introducing a new Map object, but at the same time, I don't necessarily think that is more likely to be introduced or used, and it may not have all the benefits that arrays currently have. Arrays as they are in php are quite useful and improving them further in this way, if it is agreed this really would be an improvement, is something to consider. I would say that the decision to change arrays is not mutually exclusive to the decision to add Maps. A decision to add Maps may or may not get delayed for some time and a decision to change arrays could be considered in the meantime.

In this particular case, where string keys of arrays are sometimes converted to ints, while it is true changing this will break some code, the existing design is already the cause of many bugs in existing code that would get fixed if the design were changed. I know this sounds strange because developers are supposed to design for how the language currently works, but I do think I still have a point if you will hear me out. I have a suspicion that most php code that uses arrays and sets keys from strings assumes the keys will actually get set to a string, in the cases where the code cares about the types. I suspect this is the cause of some rare bugs in many of those projects where the code expects those keys to be strings. While changing a language to map to the perception of how people thought it worked is not automatically a good idea, and the idea of doing this should be met with some skepticism, I think in this case it makes sense. I suspect that the current behavior is not expected/known/understood by the majority of the reasonably experienced and competent php professionals. I suspect the current behavior is not only unknown to them, but actually contradicts a current belief a good portion of them have about how it currently works. Personally I have been professionally writing php code for about 20 years, and I am just now realizing this. Much of the code I wrote over the past several years assumed the keys would be strings if they were set from a string. I even used Psalm to document and validate that these keys are strings, and yet sometimes they are actually ints. This can lead to a runtime exception in some scenarios. For example, if the code is using strict_types, and a function return type is string, and it ends up returning an int instead.

@damianwadley
Copy link
Member

However, I personally still see merit in the idea of at least considering whether this should be changed.

And it has been. Many times.

There are problems with attempting to do so, most stemming from the fact that PHP's values are loosely-typed while this sort of change tries to make something strictly-typed. There would be very many "WTF?" moments when developers stumbled upon situations where ["0"] doesn't exist because it was entered as [0] - or worse if both keys were to get defined with different values. And much of it probably traces back to how things like $_GET and $_POST and $_COOKIE use string values even for data that looks numeric.

As with ideas like strict_types, the most likely outcome will be adding a new feature of a strongly-typed Map data structure that developers can opt-into using while leaving the existing functionality of arrays alone the way they are.

@still-dreaming-1
Copy link

still-dreaming-1 commented Jul 7, 2023

However, I personally still see merit in the idea of at least considering whether this should be changed.

And it has been. Many times.

There are problems with attempting to do so, most stemming from the fact that PHP's values are loosely-typed while this sort of change tries to make something strictly-typed. There would be very many "WTF?" moments when developers stumbled upon situations where ["0"] doesn't exist because it was entered as [0] - or worse if both keys were to get defined with different values. And much of it probably traces back to how things like $_GET and $_POST and $_COOKIE use string values even for data that looks numeric.

As with ideas like strict_types, the most likely outcome will be adding a new feature of a strongly-typed Map data structure that developers can opt-into using while leaving the existing functionality of arrays alone the way they are.

You made some really good points. I agree that it would not be good to change all arrays to consider an int key to be different from an equivalent string key. However, it seems to me that there might be a way to both preserve that behavior and make the change being discussed here, without introducing new settings or data types. I'm not necessarily opposed to $value = $arr['5']; retrieving the same value tied to the same key as $value = $arr[5];. What I don't like is that when you are retrieving the keys, they get retrieved as a different type from what they were set as. In other words, I'm not opposed to treating certain int keys and string keys as "equivalent" when setting/retrieving values with/from keys, I just don't like that the language goes about making that happen in a way that also creates other strange and unexpected side effects. There could be some custom behavior in how arrays work to make this happen without actually changing the types of the keys on us. If I set a key to a string, when I retrieve the key later, the key that I retrieve should still be a string. Php retrieving a different type for the key than what it was initially set to is causing strange and unexpected bugs to exist in people's code.

@still-dreaming-1
Copy link

Please fix this. In my previous comment I pointed out a behavior that would resolve this issue without negatively impacting existing code. It would basically just be a bug fix at that point.

@billynoah
Copy link

My fondest memory of this extremely odd design choice happened when I was reading a spreadsheet of part numbers and writing an array map with the part numbers as the key. It so happened that the part numbers looked like decimals and the the reader was treating them as such. I ended up building an array that looked something like this:

$arr = [
  1.1 => 'PART ONE',
  1.2 => 'PART TWO',
  1.3 => 'PART THREE'
];

Only to discover later (the hard way)

print_r($arr);
Array
(
    [1] => PART THREE
)

I'm glad to see the most recent versions at least throw a deprecated notice in this case:

Deprecated: Implicit conversion from float to int loses precision

@still-dreaming-1
Copy link

I think it is fine for it to only store keys as ints and strings. The code posted at the very top by the OP illustrates the problem very well. However, where this conversation has lead for me is that, when retrieving values, continuing to consider "123" and 123 to be the same key for backwards compatibility is fine. Letting it continue to convert the key from a string to an int as strategy for achieving this is problematic. If both string and int keys are supported, they should be left as is when provided as one of those types to begin with. I don't think it is quite fair to say this is a strict type vs loose type issue, even if that is ultimately why people care about it now. It could compare the equivalency in a loose way without unnecessarily changing the type from one that is supposed to be supported to another.

@billynoah
Copy link

billynoah commented Sep 24, 2023

@still-dreaming-1 - out of curiosity, how do you think it should treat the scenario I described above where a float is used as the key? Obviously it wasn't my intention and this was user error, but I think I would have preferred those values cast as strings rather than int which completely changes the value and in my case actually coalesced several of the keys into one.

@still-dreaming-1
Copy link

@billynoah Converting from float to string does seem more logical than from float to int. I'm not sure what the right change would be going forward though. If it is currently deprecated with a notice, does that mean it will be completely not allowed in future versions, throwing an exception instead?

@billynoah
Copy link

@still-dreaming-1 - not sure but an exception would be a heck of a lot better than silently casting them as int!

@sebastianstucke87
Copy link

I'm not doing anything wrong, yet it errors:

<?php

declare(strict_types=1);

$array = [
    '0' => 'my-value',
];

foreach ($array as $key => $value) {
    // Fatal error: Uncaught TypeError: {closure}(): Argument #1 ($_) must be of type string, int given
    (fn(string $_) => var_dump($_))($key);
}

bondarde added a commit to bondarde/lox that referenced this issue Dec 14, 2023
@paulshryock
Copy link

paulshryock commented Jan 16, 2024

Ah, this explains why array_keys is returning all my numerical string keys as integers. Yikes.

@jmossetc
Copy link

This is a really bad design decision in my opinion. Code that relies on strings being silently cast into int is really bad code and it should be punished in a context with strict typing.

I believe that fixing this would actually would actually fix a lot more bugs than it would create.

@MorganLOCode
Copy link

Odd that this has yet to be brought up for discussion on the mailing list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants