Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't parse all numbers using if ( is_numeric( ... ) ) #147

Open
alinnert opened this issue Jul 14, 2023 · 4 comments
Open

Don't parse all numbers using if ( is_numeric( ... ) ) #147

alinnert opened this issue Jul 14, 2023 · 4 comments

Comments

@alinnert
Copy link

alinnert commented Jul 14, 2023

Currently, every value gets checked if it's a number using is_numeric(). There are some types of values that pass this check depending on the concrete value. ZIP codes for instance: 12345 => number, but 01234 => string. Or color codes: 000000 => number, but ffffff => string.

This makes working with the data really difficult. If JS consumes this data using an API and I want to convert all color codes to lowercase it would look like this:

// assuming that `color` is of type `string | number`
const lowercaseColor = typeof color === 'string'
  ? color.toLowerCase()
  : color.toString().toLowerCase()

If I knew every color was a string I could just do:

const lowercaseColor = color.toLowerCase()

I have a few ideas how to solve this:

1) Add an option whether numbers should be parsed or not

$result = SimpleXLSX::parse('file.xlsx', parseNumbers: false);

This syntax requires PHP 8.0 or later. Users of older versions would need to write:

$result = SimpleXLSX::parse('file.xlsx', false, false, false);

This could be made nicer using an options array because you can skip all options you don't want to change:

$result = SimpleXLSX::parse($data, ['debug' => true, 'is_data' => true, 'parse_numbers' => false]);
$result = SimpleXLSX::parse('file.xlsx', ['parse_numbers' => false]);

I know, the options array makes the library more complex and is only helpful for older PHP versions. So, this is totally optional IMO.

If the option is set to false numbers can still be parsed later using intval() or floatval(). So, it's still pretty flexible.

2) Store the original value in a separate field

$cell['value'] // -> 123
$cell['rawValue'] // -> "123"

Easier to implement I guess, but probably does unnecessary work and needs a bit more memory.

@shuchkin
Copy link
Owner

@alinnert
Copy link
Author

Yes, I know this. What I'm saying is: It is not great how this currently works. Converting every potential number is not a good idea. Please let us developers control what gets converted to a number and what doesn't.

@shuchkin
Copy link
Owner

There not only is_numeric, see default case:

case 's':
    // Value is a shared string
    if ((string)$cell->v !== '') {
        $value = $this->sharedstrings[(int)$cell->v];
    }
    break;

case 'str': // formula?
    if ((string)$cell->v !== '') {
        $value = (string)$cell->v;
    }
    break;

case 'b':
    // Value is boolean
    $value = (string)$cell->v;
    if ($value === '0') {
        $value = false;
    } elseif ($value === '1') {
        $value = true;
    } else {
        $value = (bool)$cell->v;
    }

    break;

case 'inlineStr':
    // Value is rich text inline
    $value = $this->_parseRichText($cell->is);

    break;

case 'e':
    // Value is an error message
    if ((string)$cell->v !== '') {
        $value = (string)$cell->v;
    }
    break;

case 'D':
    // Date as float
    if (!empty($cell->v)) {
        $value = $this->datetimeFormat ? gmdate($this->datetimeFormat, $this->unixstamp((float)$cell->v)) : (float)$cell->v;
    }
    break;

case 'd':
    // Date as ISO YYYY-MM-DD
    if ((string)$cell->v !== '') {
        $value = (string)$cell->v;
    }
    break;

default:
    // Value is a string
    $value = (string)$cell->v;

    // Check for numeric values
    if (is_numeric($value)) {
        /** @noinspection TypeUnsafeComparisonInspection */
        if ($value == (int)$value) {
            $value = (int)$value;
        } /** @noinspection TypeUnsafeComparisonInspection */ elseif ($value == (float)$value) {
            $value = (float)$value;
        }
    }

this code cover 99% common data, use 'format' from rowsEx meta data to detect formatting.

PS1: In XML there only numbers integers or floats. In start of developing this class, I decided to leave the formatting to the developers (parsed only dates)

PS2: I have correct xlsx, when colors as strings
Screenshot_917

public function testFFFFFF() {
    $xlsx = SimpleXLSX::parseFile(__DIR__ . '/../xlsx/ffffff.xlsx', true);
    print_r($xlsx->rows());
}

and I have correct results

Array
(
    [0] => Array
        (
            [0] => 000000
        )

    [1] => Array
        (
            [0] => ffffff
        )

)

PS3: Just comment

    if (is_numeric($value)) {
        /** @noinspection TypeUnsafeComparisonInspection */
        if ($value == (int)$value) {
            $value = (int)$value;
        } /** @noinspection TypeUnsafeComparisonInspection */ elseif ($value == (float)$value) {
            $value = (float)$value;
        }
    }

And deny updates from composer

@alinnert
Copy link
Author

alinnert commented Jul 15, 2023

Oh, yes, I'm sorry. The color 000000 gets treated as a string because of the 0s. My bad. But 111111 would get parsed into a number.

Anyway, I've looked a little more into the format property. In Excel I can change the format of a cell by right clicking it, select "Format Cells..." and choose another format in the dialog. There's a "Text" format. If I select that the value of $cell['format'] becomes "@". Would it make sense to check for this format in the default case? Like this:

default:
    $value = (string) $cell->v;

    /* I'm not sure how to really check for the format here. But something like this. */
    if ( $cell['format'] !== '@' && is_numeric( $value ) ) {
        /* ... do number conversion ... */
    }

Would that be ok?

EDIT:

I've looked into the source code. I see it's not that simple. format is not yet available at this point. So, that's a problem. Hm...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants