Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New bug] Issues with some weird characters inserted into DB #3423

Open
1 task done
ArranTuna opened this issue Jun 1, 2024 · 12 comments
Open
1 task done

[New bug] Issues with some weird characters inserted into DB #3423

ArranTuna opened this issue Jun 1, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@ArranTuna
Copy link
Collaborator

ArranTuna commented Jun 1, 2024

Describe the bug

After updating MTA server I started getting warnings about unable to insert into MySQL such as:

[script] FAIL: (1366) Incorrect string value: '\xED\xA0\xBD\xED\xB8\x81' for column 'msg' at row 1 [Query:INSERT INTO my_history (time, acc, msg) VALUES (1717270392, 'Arran', '😁')]

I narrowed it down to r22396 that has MySQL changes: https://buildinfo.multitheftauto.com/index.php?Revision=22396&Branch=

Steps to reproduce

  1. Windows 64 bit server with v1.6-release-22396 running.
  2. MySQL server you can dbConnect to.
  3. srun dbcon = dbConnect("mysql", "dbname=dbnamehere;host=localhost", "user", "password")
  4. srun dbExec(dbcon, "INSERT INTO table (message) VALUES (?)", "😁")
  5. Get error like FAIL: (1366) Incorrect string value: '\xED\xA0\xBD\xED\xB8\x81' for column 'msg' at row 1

Note that I have the MySQL table set to utf8mb4 and have also tried passing charset=utf8mb4 with dbConnect but it makes no difference. Only using the revision before this fixes it. I checked what gets inserted in the previous version and MySQL workbench shows: í ½í¸� for 😁

Version

Windows 64 bit v1.6-release-22396

Additional context

No response

Relevant log output

No response

Security Policy

  • I have read and understood the Security Policy and this issue is not security related.
@ArranTuna ArranTuna added the bug Something isn't working label Jun 1, 2024
@ArranTuna
Copy link
Collaborator Author

Also seems to be having issues loading from the DB as I was sent this:

KW5vs9q

@theSarrum
Copy link
Contributor

I can't reproduce it on r22493 (x64).

DB:
image
Code:

dbcon = dbConnect("mysql", "dbname=testdb;host=localhost", "root", "")
dbExec(dbcon, "SET NAMES utf8mb4")
dbExec(dbcon, "CREATE TABLE IF NOT EXISTS `table` (message TEXT)")
dbExec(dbcon, "INSERT INTO `table` (message) VALUES (?)", "😁")

Result:
image

Another test

Code:

dbcon = dbConnect("mysql", "dbname=testdb;host=localhost", "root", "")
dbExec(dbcon, "SET NAMES utf8mb4")

function query(...)
	local queryHandle = dbQuery(dbcon, ...)
	if (not queryHandle) then
		return nil
	end
	local rows = dbPoll(queryHandle, -1)
	return rows
end

iprint(query("SELECT message from `table`"))
iprint(query("SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'"))

Result:

INFO: { {
    message = "😁"
  } }
INFO: { {
    Value = "utf8mb4",
    Variable_name = "character_set_client"
  }, {
    Value = "utf8mb4",
    Variable_name = "character_set_connection"
  }, {
    Value = "utf8mb4",
    Variable_name = "character_set_database"
  }, {
    Value = "binary",
    Variable_name = "character_set_filesystem"
  }, {
    Value = "utf8mb4",
    Variable_name = "character_set_results"
  }, {
    Value = "latin1",
    Variable_name = "character_set_server"
  }, {
    Value = "utf8",
    Variable_name = "character_set_system"
  }, {
    Value = "utf8mb4_general_ci",
    Variable_name = "collation_connection"
  }, {
    Value = "utf8mb4_unicode_ci",
    Variable_name = "collation_database"
  }, {
    Value = "latin1_swedish_ci",
    Variable_name = "collation_server"
  } }

@ArranTuna
Copy link
Collaborator Author

Hmm, I wonder if the MySQL version running on the server being outdated could be causing the problem with the updated version of MTA. What MySQL version do you have running?

@botder
Copy link
Member

botder commented Jun 6, 2024

I couldn't reproduce the issue. I've used MySQL Community Server 8.4.0 with a utf8mb4_bin collation table and an MTA Windows x64 server.

@botder
Copy link
Member

botder commented Jun 6, 2024

The commit 31c68fd is unrelated. I simply couldn't connect to my local MySQL server without this option.

@theSarrum
Copy link
Contributor

What MySQL version do you have running?

8.3.0, but I just tested on a very old version (5.7.23) and it's ok too.

@ArranTuna
Copy link
Collaborator Author

I know why this couldn't be reproduced as I just tried the same code in a script and it worked fine, but when done through runcode it fails. This is just so weird though, that this code in runcode would work fine on... now this gets even more weird as I can now reproduce this bug on older versions that I'm sure worked fine, I'm going to try older client version to see if the problem is that the MTA client is sending garbled characters. I just tried with the oldest available nightly, 22388 and that also gives the error.

I've found a possible clue:
srun utf8.byte("😁")
Command results: 55357 [number]

But when the same character is done in a script:
outputChatBox("UTF8.Byte: "..utf8.byte("😁"))
UTF8.Byte: 3824460688

So this bug maybe have nothing to do with MySQL but when a player enters certain characters in MTA in their console, GUI, etc they are handled incorrectly.

I've tried in client side runcode and client side file and got:
Executing client-side command: utf8.byte("😁")
Command results: 55357 [number]
In client side file: 1026006976

@ArranTuna
Copy link
Collaborator Author

This could be why:
srun utf8.byte("😁", 1, 2)
Command results: 55357 [number], 56833 [number]

Whereas that executed in a script file:
local a, b = utf8.byte("😁", 1, 2)
outputChatBox("utf8-byte: "..tostring(a).." "..tostring(b))
utf8-byte: 3964768944 nil

On the client that 1 character is being split into 2. I just wish there was a way to fix the MySQL insertion errors, like what about utf8_bin?

@botder
Copy link
Member

botder commented Jun 7, 2024

For your information, the MTA client has a broken UTF16 to UTF8 conversion when the UTF16 string contains surrogate pairs. Use the function below to fix the broken UTF8 string:

function utf8_decode_utf16_surrogate_pairs(text)
    local characters = {}
    local highSurrogate = 0
    local length = 0

    for position, codepoint in utf8.next, text do
        if highSurrogate > 0 then
            if codepoint >= 0xDC00 and codepoint <= 0xDFFF then
                codepoint = (highSurrogate - 0xD800) * 0x400 + (codepoint - 0xDC00) + 0x10000

                length = length + 1
                characters[length] = utf8.char(codepoint)
            end

            highSurrogate = 0
        elseif codepoint >= 0xD800 and codepoint <= 0xDBFF then
            highSurrogate = codepoint
        else
            length = length + 1
            characters[length] = utf8.char(codepoint)
        end
    end

    return table.concat(characters)
end

@Daemant
Copy link

Daemant commented Jun 24, 2024

I see character_set_server and collation_server as latin1.
You can try to change server's collation to utf8mb4, editing my.ini like as here:

[mysqld]
collation_server = utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'
character_set_server = utf8mb4

@ArranTuna
Copy link
Collaborator Author

Thanks for that code botder, it works for hack fixing this bug.

@botder
Copy link
Member

botder commented Jun 24, 2024

If my function worked for you, then this isn't an actual issue with the database nor the libraries used, but with user input conversion to UTF-8. Your database protected you from inserting garbage UTF-8 bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants