-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support database encodings other than UTF8 #101
Comments
Is this supported?
and trying to connect to a a non-UTF8 database shows the above error. |
No, I don't think it is. The commit actually is transcoding only on write, there's nothing for read that I see, and as you note there's still a block right at the start checking for a utf backend. |
The thing I am running into is using ogr_fdw to connect (and import) from MSSQL using LATIN1 encoding. It works fine connecting and importing and the extended characters get inserted into the DB with no errors. However running a query with results that include that text throws an error. |
It's a little fiddly because OGR cannot provide a hard guarantee that its data is UTF8. Some drivers won't provide that... dunno of the MSSQL driver does. Mostly it seems like drivers just take in whatever encoding the source has, and don't do any transcoding. I'm not sure why the commit thought it fixed this issue, since it was only one step on the way there. So your PgSQL database is LATIN1 and your MSSQL data is... something? And OGR is in the middle. Basically, even if I fixed this, if it turns out OGR isn't providing UTF in the middle it will still not work for you. |
I have a MSSQL database that I have full connection to via ODBC. I created a PS database with the default UTF8 encoding, installed ogr_fdw, and used ogr_fdw_info to get the connection parameters. Everything is fine so far. To speed up queries, I create a local copy via: I just created a new database with WIN1252 encoding (I found out later MSSQL was LATIN1) and tried again but I could not create the extension because of the utf check early on the code. So I ended up using ogr2ogr from MSSQL -> .csv (the extended characters are clearly there) and then copied the csv into the WIN1252 encoded PS database. This is doing what I want but the ogr_fdw is a much cleaner solution. To grab a fresh snapshot just: truncate table foo; insert into foo (select * from fdw_foo); without the intermediate step of creating the .csv files. |
So probably OGR does not convert the strings it gets from MSSQL into UTF, so they are in OGR in LATIN1. |
Thanks. I like the "handy knob" idea since text output from the two databases would be the same. Otherwise the output might look like The ODBC driver is a Microsoft product so the OGR might not be able to correctly know what it is pushing. |
@szekerest could probably comment better than me what is the status of the MSSQL driver regarding UTF-8 |
@rouault According to CPLODBCStatement::Fetch as soon as the text data is stored as WCHAR (nvarchar, ntext) the returned data is converted to UTF8. The single byte string representations (CHAR) are not being converted at the moment. |
@szekerest So the driver is probably close to be able to advertize OLCStringsAsUTF8 ? (although @bowguy seems to get LATIN1 strings . Perhaps UTF8 conversion is something added recently. @bowguy Which GDAL version do you use?) |
In my understanting we have single byte string columns in the database with specific encoding. That is not converted to UTF8 automatically by the driver. That would require to query the database global encoding setting to be able to apply that conversion. |
OSGEO4W installed GDAL 3.0.3
ogrfdw_version.txt shows
OGR_FDW: 1.0.9
PostgreSQL: 12 w64
Built: 20200105
GIT_REPO: https://github.com/pramsey/pgsql-ogr-fdw
GIT_BRANCH:
GIT_REVISION:
GDAL_VER: 2.4.4
postgresql 12, postgis 3.0
Path is pointing to C:\Program Files\PostgreSQL\12\bin
…On Sat, Sep 26, 2020, 4:03 AM Tamas Szekeres ***@***.***> wrote:
In my understanting we have single byte string columns in the database
with specific encoding. That is not converted to UTF8 automatically by the
driver. That would require to query the database global encoding setting to
be able to apply that conversion.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#101 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA4YOTS2INFDJAO3NWZGKKDSHXC75ANCNFSM4CD5OQTA>
.
|
What if it does not do any conversion (which seems to happen in my case)? The extension is loaded but the check is in the server to see if the source matches the database. Then instead of "OGR FDW only works with UTF-8 databases" the CREATE SERVER throws "Error: Source is encoded in LATIN1 but database is UTF-8" Not a perfect fix - ideal is as you say convert anything from source into destination but how often will that happen? I think I am kind of an edge case and I have a workaround. |
Yes, just stripping out the hard check for a UTF database might work...
…On Tue, Sep 29, 2020 at 9:39 AM bowguy ***@***.***> wrote:
What if it does not do any conversion (which seems to happen in my case)?
The extension is loaded but the check is in the server to see if the source
matches the database. Then instead of "OGR FDW only works with UTF-8
databases" the CREATE SERVER throws "Error: Source is encoded in LATIN1 but
database is UTF-8" Not a perfect fix - ideal is as you say convert anything
from source into destination but how often will that happen? I think I am
kind of an edge case and I have a workaround.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#101 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA3ZW3PPEEJAKCUEGEOTGLSIIEU3ANCNFSM4CD5OQTA>
.
|
I'll see if I can set up a build environment for it. (Argh, windows.) |
Made the changes but running into issues with GDAL versions. It looks like the windows version of postgis 3.0 is using an old gdal version (2.4.4 - see above). @robe2 is there a build of postgis 3.0 with GDAL 3.0 ? |
Success. I had to compile gdal 2.4.4 from source in MSYS2 because the default PostGIS installation (PG12,PG3) uses gdal 2.4.4 while everything else on windows is gdal 3.0 Then I could compile pgsql_ogr_fdw in MSYS2 and copy the files over. |
I'm planning to ship postgis 3.0.3 Windows bundle with GDAL 3.0. I think I've worked out the kinks already i was having with EDB library conflicts. If I haven't already I'll flip winnie to start building ogr_fdw with GDAL 3.0 and PostGIS 3.0 branch to build with 3.0. Aside from that sounds like no changes needed in ogr_fdw to support encoding? Can we close this out then @bowguy @pramsey @rouault |
@robe2 if you can ship 3.0.3 postgis with GDAL 3.0 I will test it ASAP. Maybe EDB 13? |
Correction to previous comment. |
How about removing the five lines and mark this fixed? For me it is working for encodings other than UTF8 just fine. I think there is something else going on with the encoding for me. |
Just for anyone looking at this. As discussed on PostGIS mailing lists, I have PG 13, PostGIS 3.0.2 (includes latest released ogrfdw) up on stackbuilder and in http://download.osgeo.org/postgis/windows/pg13/ but as I discovered my compile has no dependecy on libiconv - https://lists.osgeo.org/pipermail/postgis-devel/2020-October/028655.html which @bowguy pointed out might be the issue. I'll report back on this after I have recompiled GDAL (and gotten libiconv back in there) |
Progress on my issue. I have LATIN1 encoding in my ODBC database. Some of the characters in a string are wchar which must be converted to multi byte characters for UTF8. This can be done with Note this no longer is part of the original issue, 'Support database encodings other than UTF8' (which is fixed and closed) so it might be better to move to another issue. |
Success! CPLRecode(cstr_in, CPL_ENC_ISO8859_1, CPL_ENC_UTF8) works perfectly for me. Now I need to test for the source encoding and destination encoding. The only guaranteed supported encodings are CPL_ENC_UTF8, CPL_ENC_ASCII and CPL_ENC_ISO8859_1. Currently, the following conversions are supported : |
If I add |
Some people do run PostgreSQL databases with non-default encodings (though it's pretty rare in production, I hope, so this is a low priority issue). For them, there's really no solution except to add transcoding in the FDW extension. Fortunately, GDAL has an internal UTF model, so it's possible to count on that as a fixed reference point, otherwise it would just be too ugly.
The text was updated successfully, but these errors were encountered: