Skip to content

unable to handle 0-byte in pattern #31

@glensc

Description

@glensc

so i wanted to implement strip control chars from input, like php-equivalent:

https://github.com/glensc/php-filename-normalizer/blob/d772aaad6b2a157787ae17320de5db4d3715df72/src/Normalizer.php#L30

select preg_replace('/[\x00-\x08\x0b-\x1f\x7f]/', 'a', concat('C',char(0x10),'kammkala'));
+------------------------------------------------------------------------------------+
| preg_replace('/[\x00-\x08\x0b-\x1f\x7f]/', 'a', concat('C',char(0x10),'kammkala')) |
+------------------------------------------------------------------------------------+
| aaaaaaaaa                                                                         |
+------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

first, as the \x00 is interpreted by php engine (in mysql it just translates to literal string \, x, 0, 0), needed to use different approach, \0 or char(0) both output null byte, but that's also char * string terminator in C, which results error that pattern separator is missing:

mysql> select preg_replace(concat('/[', char(0), '-', 0x08, 0x0b, '-', 0x1f, 0x7f, ']/'), 'a', concat('C',char(0x10),'kammkala'));
ERROR:
No ending delimiter found
mysql> select preg_replace(concat('/[', '\0', '-', 0x08, 0x0b, '-', 0x1f, 0x7f, ']/'), 'a', concat('C',char(0x10),'kammkala'));
ERROR:
No ending delimiter found
mysql>

so as workaround to my problem, i'm using mysql native replace function.

mysql> select replace(preg_replace(concat('/[', char(1), '-', 0x08, 0x0b, '-', 0x1f, 0x7f, ']/'), 'a', concat('C',char(0x0),'kammkala')), char(0), '!');
+-------------------------------------------------------------------------------------------------------------------------------------------+
| replace(preg_replace(concat('/[', char(1), '-', 0x08, 0x0b, '-', 0x1f, 0x7f, ']/'), 'a', concat('C',char(0x0),'kammkala')), char(0), '!') |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| C!kammkala                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

the UDF function should be able to accept \0 in input.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions