Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with semicolon within field values in MySQL dump files #21

Open
Ara4Sh opened this issue May 22, 2024 · 2 comments
Open

Issue with semicolon within field values in MySQL dump files #21

Ara4Sh opened this issue May 22, 2024 · 2 comments

Comments

@Ara4Sh
Copy link

Ara4Sh commented May 22, 2024

Hello,

I was experimenting with the library recently and noticed an issue when there is a semicolon in the values of MySQL dump files. Specifically, the application stop obfuscation when encountering semicolons within the data as we expect since the delimiter is semicolon it's going to change the regex matching behaviour completely.

Example:

(11,'Sirius','Black','8009008090','sirius.black@gryffindor.co.uk','wizard','known as Padfoot; the last heir of the House of Black; son of Orion and Walburga Black','1959-11-03 10:32:27','1996-06-18 02:07:56'),

As we already know, the --hex-blob option only works with binary data. To address this, I tried to preprocess the dump file by changing the delimiter from semicolon to another character. Then, I modified the make_insert_statement() and rows_to_be_inserted() methods in the MySQL module, as well as the parse() method in the InsertStatementParser to make it work. after processing, I reverted the delimiter back to semicolon.

My question is: do you have any experience dealing with such issues? Would it be a good idea to make the delimiter configurable within the library?

Thank you for creating and contributing this library to OS community.

Best regards,
Arash

@josacar
Copy link
Owner

josacar commented May 23, 2024

Hi,

yes, I broke the support for semicolon inside the values when I added support for MariaDB multi-line dump.

So in the first phase, the statements are read from the file, before was each line ( that is original mysqldump format ), so everything was fine but MariaDB. To fix this, I changed to read until the ; character, to get a full sql statement, but it was to naive and stopped no matter if ; was a value inside a string.

So I pushed a 'quick fix' with a test that will read to );\n, that is not perfect but maybe faster than a regex or a full sql parser.

Let me know if this works for you.

@Ara4Sh
Copy link
Author

Ara4Sh commented Jun 10, 2024

I tried the new patch with a simple table with fields with semicolon, Triki will just print out the same input without any errors, I tried to use triki from shards or directly call it within the directory (require "./src/triki").
I will try to debug it further and will post the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants