New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't read a file in same connection if file has been edited since connection open (excel XLSX for sure) #29

Closed
robe2 opened this Issue May 14, 2015 · 11 comments

Comments

Projects
None yet
3 participants
@robe2
Contributor

robe2 commented May 14, 2015

I'm not sure if this is limited to just excel and open office workbooks. I suspect it's not but those are the only ones I commonly edit. Anyway what happens is this:

If I run a query against a Foreign table of a worksheet, then edit the file in Excel or whatever, and then try to rerun the query in the same connection as the original query, I get a weird failure like.

ERROR:  unable to connect to data source "//S/E$/Survey150513.xlsx"
HINT:  GotoFileOffset failed
********** Error **********

ERROR: unable to connect to data source "//S/E$/Survey150513.xlsx"
SQL state: HV00D
Hint: GotoFileOffset failed

To work around the issue, I have to start a new postgres connection.

@pramsey

This comment has been minimized.

Show comment
Hide comment
@pramsey

pramsey May 14, 2015

Owner

That's a GDAL error, implication is that the connection handle is hanging around between calls, and maybe it shouldn't. Though, then things get nasty, since for database backends we'd like it to hang around on principle that fast things should be fast.

Owner

pramsey commented May 14, 2015

That's a GDAL error, implication is that the connection handle is hanging around between calls, and maybe it shouldn't. Though, then things get nasty, since for database backends we'd like it to hang around on principle that fast things should be fast.

@robe2

This comment has been minimized.

Show comment
Hide comment
@robe2

robe2 May 14, 2015

Contributor

I was afraid of that. So in theory nothing can be done about this aside from maybe catch the error and kill the old handle and create a new one.

Contributor

robe2 commented May 14, 2015

I was afraid of that. So in theory nothing can be done about this aside from maybe catch the error and kill the old handle and create a new one.

@pramsey

This comment has been minimized.

Show comment
Hide comment
@pramsey

pramsey Jun 16, 2015

Owner

I wonder if @rouault has any suggestions...

Owner

pramsey commented Jun 16, 2015

I wonder if @rouault has any suggestions...

@robe2

This comment has been minimized.

Show comment
Hide comment
@robe2

robe2 Jun 16, 2015

Contributor

FWIW I haven't tried this again with GDAL 2.0. pramsey did you try with GDAL 2.0 or you don't go around editing files while you are trying to query them :)

GDAL 2.0 did fix my MS access issue it seems (of being able to generate a table structure from within postgres using copy from and this hack

http://www.postgresonline.com/journal/archives/345-Import-Foreign-Schema-hack-with-OGR_FDW-and-reading-LibreOffice-calc-workbooks.html

For some reason in older GDAL this worked for everything (including SQL Server which also uses ODBC) but didn't for MS Access (though I could see the structure if just calling from commandline - extremely bizarre)

Contributor

robe2 commented Jun 16, 2015

FWIW I haven't tried this again with GDAL 2.0. pramsey did you try with GDAL 2.0 or you don't go around editing files while you are trying to query them :)

GDAL 2.0 did fix my MS access issue it seems (of being able to generate a table structure from within postgres using copy from and this hack

http://www.postgresonline.com/journal/archives/345-Import-Foreign-Schema-hack-with-OGR_FDW-and-reading-LibreOffice-calc-workbooks.html

For some reason in older GDAL this worked for everything (including SQL Server which also uses ODBC) but didn't for MS Access (though I could see the structure if just calling from commandline - extremely bizarre)

@rouault

This comment has been minimized.

Show comment
Hide comment
@rouault

rouault Jun 16, 2015

Contributor

The error message comes from the /vsizip/ layer. Most (all) OGR drivers that deal with files don't like at all that they are changed behind its back. But in that instance, it is a bit weird since the XLSX driver ingests all the content at open time and should not read the file anymore afterwards.

Contributor

rouault commented Jun 16, 2015

The error message comes from the /vsizip/ layer. Most (all) OGR drivers that deal with files don't like at all that they are changed behind its back. But in that instance, it is a bit weird since the XLSX driver ingests all the content at open time and should not read the file anymore afterwards.

@robe2

This comment has been minimized.

Show comment
Hide comment
@robe2

robe2 Jun 16, 2015

Contributor

Okay still an issue with GDAL 2.0.0. By file @rouault do you consider an MS Access database a file or a database? Excel I still get an error even in GDAL 2.0.0, but MS Access I can edit to my hearts content and see the changes in PostgreSQL without closing the file or starting a new connection.

FWIW for the excel I am using expat 2.1.0. I suppose I could try connecting to my excel file using the ODBC driver to see if I run into the same issue.

Contributor

robe2 commented Jun 16, 2015

Okay still an issue with GDAL 2.0.0. By file @rouault do you consider an MS Access database a file or a database? Excel I still get an error even in GDAL 2.0.0, but MS Access I can edit to my hearts content and see the changes in PostgreSQL without closing the file or starting a new connection.

FWIW for the excel I am using expat 2.1.0. I suppose I could try connecting to my excel file using the ODBC driver to see if I run into the same issue.

@robe2

This comment has been minimized.

Show comment
Hide comment
@robe2

robe2 Jun 16, 2015

Contributor

Well couldn't get ODBC to work (but I think that's with the funkiness of the 64-bit MS Access driver thing and the driver name change).
However I did resave my file as an XLS file and tried to edit and query and that works just fine. Can see my changes from postgres and no errors while maintaining my same connection. The XLS I am using Sandro's FreeXLS (vs. 1.0.1) - https://www.gaia-gis.it/fossil/freexl/index

So maybe something wrong with the expat logic.

Contributor

robe2 commented Jun 16, 2015

Well couldn't get ODBC to work (but I think that's with the funkiness of the 64-bit MS Access driver thing and the driver name change).
However I did resave my file as an XLS file and tried to edit and query and that works just fine. Can see my changes from postgres and no errors while maintaining my same connection. The XLS I am using Sandro's FreeXLS (vs. 1.0.1) - https://www.gaia-gis.it/fossil/freexl/index

So maybe something wrong with the expat logic.

@robe2 robe2 changed the title from Can't read a file in same connection if file has been edited since connection open (excel for sure) to Can't read a file in same connection if file has been edited since connection open (excel XLSX for sure) Jun 16, 2015

@rouault

This comment has been minimized.

Show comment
Hide comment
@rouault

rouault Jun 17, 2015

Contributor

@robe2 If you can figure out a OGR Python script (or C, C++) that reproduces the problems outside of ogr_fdw, I might consider having a closer look ;-) http://www.gdal.org/ograpispy_8h.html could be a start, but the output will need hacking (inserting a shutil.copy() at the appropriate place) since it will not detect the change of file done behind its back

I've tested the following with first.xlsx and second.xlsx of very different file sizes, and this confirms that the XLSX driver ingests all the file in memory, since it doesn't see the effect of the copy.

from osgeo import ogr
import shutil

shutil.copy('first.xlsx', 'working.xlsx')
ds = ogr.Open('working.xlsx')
lyr = ds.GetLayer(0)
i = 0
for f in lyr:
    i += 1
    pass

shutil.copy('second.xlsx', 'working.xlsx')
j = 0
lyr.ResetReading()
for f in lyr:
    j += 1
    pass

print(i == j)
Contributor

rouault commented Jun 17, 2015

@robe2 If you can figure out a OGR Python script (or C, C++) that reproduces the problems outside of ogr_fdw, I might consider having a closer look ;-) http://www.gdal.org/ograpispy_8h.html could be a start, but the output will need hacking (inserting a shutil.copy() at the appropriate place) since it will not detect the change of file done behind its back

I've tested the following with first.xlsx and second.xlsx of very different file sizes, and this confirms that the XLSX driver ingests all the file in memory, since it doesn't see the effect of the copy.

from osgeo import ogr
import shutil

shutil.copy('first.xlsx', 'working.xlsx')
ds = ogr.Open('working.xlsx')
lyr = ds.GetLayer(0)
i = 0
for f in lyr:
    i += 1
    pass

shutil.copy('second.xlsx', 'working.xlsx')
j = 0
lyr.ResetReading()
for f in lyr:
    j += 1
    pass

print(i == j)
@rouault

This comment has been minimized.

Show comment
Hide comment
@rouault

rouault Jun 17, 2015

Contributor

Actually I reproduced the issue with a slight modification of the above, by closing and reopening the file after the second copy... Filed as https://trac.osgeo.org/gdal/ticket/6005 and fixed

Contributor

rouault commented Jun 17, 2015

Actually I reproduced the issue with a slight modification of the above, by closing and reopening the file after the second copy... Filed as https://trac.osgeo.org/gdal/ticket/6005 and fixed

@robe2

This comment has been minimized.

Show comment
Hide comment
@robe2

robe2 Jun 18, 2015

Contributor

Wow cool. Cause I was embarrassed to say I'm not setup with python and GDAL together. I'll give it a try in next day or so and reopen if its still an issue. :)

Contributor

robe2 commented Jun 18, 2015

Wow cool. Cause I was embarrassed to say I'm not setup with python and GDAL together. I'll give it a try in next day or so and reopen if its still an issue. :)

@robe2 robe2 closed this Jun 18, 2015

@robe2

This comment has been minimized.

Show comment
Hide comment
@robe2

robe2 Jun 29, 2015

Contributor

Okay just tested gdal 2.0 branch and that seems to do the trick.

Thanks Even,
Regina

Contributor

robe2 commented Jun 29, 2015

Okay just tested gdal 2.0 branch and that seems to do the trick.

Thanks Even,
Regina

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment