When I try to use TinyTds to retrieve a row containing some odd characters, it throws a TinyTds::Error exception:
ActiveRecord::StatementInvalid: TinyTds::Error: Some character(s) could not be converted into client's character set. Unconverted bytes were changed to question marks ('?'): SELECT notes FROM [dbo].[customers] WHERE ([dbo].[customers].[id] = 3893620)
The same query works using ODBC, albeit with the odd characters replaced with question marks.
It may be that TinyTds is doing the right thing here, but the net effect renders these data inaccessible. Also, the description doesn't agree with the behavior. I might suggest that TinyTds have a preference which determines whether it throws an exception or munges the invalid characters into question marks and sets a warning flag.
I'm of course happy to provide the invalid data, but I'm not sure how to extract it properly. The database is on SQL Server 2000, the collation character set is SQL_Latin1, and there are no encoding values set in my freetds or tinytds configuration files. The sql type of the column is text.
I started looking into this issue today and here is what I came up with.
1) Since this is an actual error by the DB with a severity level, it will always raise an error by default. Case in point, you want an exception if you are inserting bad data.
2) These error messages are wrapped up in two numbers/constants. First i SYBEICONVO 2402 Error converting characters into server's character set. Some character(s) could not be converted. Second SYBEICONVI 2403 Some character(s) could not be converted into client's character set. Unconverted bytes were changed to question marks ('?').
3) I start by adding a configuration that allowed end users to configure :encoding_errors to false. I then write a few test to verify default error of raising an exception with the expected code. I am not trying to see if I can allow the end user configuration to ignore on inserts. So far I am getting I am meeting some resistance and need to retool my approach.
Lastly, it could be the this will be left as default behavior. It makes total sense that you do not want bad data in the DB on inserts. I can see the silent read as a default false maybe too. But then again, maybe the real fix is to clean up old data and not use ODBC :)
I'll let you know what happens! Thanks!
Here is my progress so far. I have failed to get TinyTDS to allow me to insert bad data. I tried using an ASCII client to insert UTF8 data and no matter what, the SQL will never make it thru. I am OK with this and in actuality this is a good thing not to fight.
On the flip side, of allowing the client/result object to return badly converted data as ? marks, I have found that this is happening already. It was odd, I went thru all the work and put in a special query option to ignore encoding errors and the dberr number 2403 mentioned above. I then inserted unicode data into both an nvarchar and varchar field. I then pulled that value out with an ASCII client and it returned each value as a ? mark. So the test was prematurely passing for me.
Now a few things could be happing. (1) that master, future 0.4.0, has this fixed naturally. That I doubt. But you could test it by bundling to the git repo. (2) That my control was not of sufficient fucked upness to generate the error. (3) That my stack and/or configuration is different somehow? FYI, I have no encoding in freetds.conf and I was testing to SQL Server 2008. Maybe I should try 2000 like you are on.
Maybe you could do a little more investigation on your end? Perhaps let me know if you can insert something like a ✓ into an nvarchar column, then change you database.yml to ASCII for encoding and select that value back out?
Done deal :)
Donald, pay attention to issue #89 for more details.