Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbReadTable/dbGetQuery returns Invalid Descriptor Index error on SQL server 2016 with NAV tables #112

Closed
simsonas opened this issue Sep 22, 2017 · 19 comments

Comments

@simsonas
Copy link

I found similar issue with Invalid Descriptor Index error, and the issue was with column type varchar(max), but I cant find this type of column in my case (see attached picture), using select * statement I cant load any Navision table.

odbc_error

Example:

library(odbc)
library(DBI)

con <- DBI::dbConnect(odbc::odbc(),
                      Driver   = "ODBC Driver 13 for SQL Server",
                      Server   = "xx",
                      Database = "xx",
                      UID      = "xx",
                      PWD      = "xx*",
                      Port     = 1433)

Bin <- dbReadTable(con, 'DM8$Bin Type')
# OR
Bin <- dbGetQuery(con, "SELECT * FROM [DM8$Bin Type]")

Result:

Error in result_fetch(res@ptr, n, ...) : 
  nanodbc.cpp:3102: 07009: [Microsoft][ODBC Driver 13 for SQL Server]Invalid Descriptor Index 

However, selecting certain columns, query returns data.frame. RODBC is working without an issue, I really want switch to ODBC for additional features.

Session Info
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DBI_0.6-1            odbc_1.0.1           RevoUtilsMath_10.0.0

loaded via a namespace (and not attached):
[1] compiler_3.4.0   RevoUtils_10.0.4 tools_3.4.0      tibble_1.3.3     Rcpp_0.12.11     blob_1.1.0       rlang_0.1.1     
@jimhester
Copy link
Contributor

I am not sure what is going on, did you try returning only a subset of the columns in that table? Does the order of the columns matter?

@simsonas
Copy link
Author

simsonas commented Sep 27, 2017

Order of column does matter, I put timestamp field to the end of select statement and it successfully loads table:

Bin <- dbGetQuery(con, "SELECT [Code], [Receive], [Ship], [Put Away], [Pick], [timestamp] FROM [DM8$Bin Type]")

I suspect timestamp field to be the problem here, as this field is present in every table and I cant load any table using select * statement. Timestamp here has nothing to do with actual date, it is more of rowversion and it looks like this: "0x00000000000007D3" more information here

@jimhester
Copy link
Contributor

Unfortunately there doesn't seem to be a way for Applications to fix this, it is really a Driver bug / feature. The only workaround I am aware of is to put these columns at the end of the query.

@r2evans
Copy link

r2evans commented Aug 20, 2018

I can see this as nothing other than deeply saddening for the state of database access in R. Reliance on a "standard" (ODBC) that hand-cuffs SQL so that the query-er must possess insider knowledge of the table's column constructs so that ordering the fields must be done according to a strict set of rules ...

ODBC does not appear to give any indication that there will be a problem. If you look at dbColumInfo(res) for the query about to be fetched, two (or more) fields can both say "-9", but one must always be after the other. No indication that this condition exists until failure happens, and in the presence of 3 or more columns, no hint is provided as to the true source column of the problem.

@jimhester, is there hope for order-agnostic queries against ODBC data sources? I can make a query against a table in (say) postgres that works, but in sqlserver it fails due solely to the order of columns. This (to my understanding of SQL) defeats the "simple" of SQL as well as the assumption of "standard". I can (and do) translate some things between DBMS's, such as top versus limit, but this is not verb or adjective SQL tokens ...

@jimhester
Copy link
Contributor

Certain drivers fix this, e.g. the RStudio professional drivers and those from Simba handle this situation transparently to the odbc package.

I don't know of a way to do this automatically in the odbc package, but perhaps it could be done either here or in nanodbc.

@r2evans
Copy link

r2evans commented Aug 20, 2018

@jimhester, I've re-written this comment about seventeen times, and each time I delete it because it feels like I'm just complaining (and you are not the problem). RStudio's fix is both encouraging (that a fix/workaround exists) and frustrating (linux only, and my reliable-data-access is effectively behind a ransom-wall).

Though this really should be fixed in ODBC itself, since MS has done nothing over the last decade the bug has been clearly documented, it needs to be done in upper levels. Nanodbc disagrees that it is theirs to fix.

I'm interested in finding a way to fix this somewhere, inviting nanodbc to use similar techniques to address the problem there. Unfortunately, I'm not familiar enough with nanodbc low-level functions or the ODBC protocol to find enough info to do this.

Specifically, I can find no way to know that this will be a problem. For instance:

library(DBI)
con <- dbConnect(odbc::odbc(), ...)
dbExecute(con, "create table tbl1 ( id BIGINT PRIMARY KEY NOT NULL, s1 varchar(32), s2 varchar(max) )")
dbExecute(con, "insert into tbl1 values (1,'a','a'),(2,'b','b')")
ret <- dbGetQuery(con, "select s1,s2 from tbl1") # works
dbGetQuery(con, "select s2,s1 from tbl1") # fails
# Error in result_fetch(res@ptr, n, ...) : 
#   nanodbc/nanodbc.cpp:2836: 07009: [Microsoft][ODBC Driver 13 for SQL Server]Invalid Descriptor Index 
res <- dbSendStatement(con, "select s2,s1 from tbl1")
ci <- dbColumnInfo(res)
ci
#   name type
# 1   s2   12
# 2   s1   12
dbClearResult(res)

I don't know, given that ci, which is a *(max) variable that needs to be moved to the end. So the first problem is: how to determine if we need to fix a column? Are there additional meta-columns available similar to dbColumnInfo's name/type?

Once that's been identified, what can we do with that information? Is there a way to rearrange the columns post-dbSendStatement to accommodate this constraint? Do we have to go back to the query and rearrange the order of requested columns? (This avenue seems a bit complicated with SQL-parsing, dealing with column as newname renaming, dealing with *, etc.)

I'm willing to do a bit of foot-work with this, but (1) it might need to be a new issue? (2) Can you provide some pointers on how to "play" around the column-metadata, possible reordering, or something else?

@r2evans
Copy link

r2evans commented Aug 20, 2018

I'm doing some testing, and I thought I'd try modifying odbc_result::result_to_dataframe to go through the for (size_t col = 0; col < types.size(); ++col) twice. Since "getting long data" suggests they need to be the last data retrieved (and retrieved in order), I thought I could simply unroll the loop a little and grab the contents based on a "0" column size (suggesting "max"?):

  while (!complete_) {
  ...
    for (size_t col = 0; col < types.size(); ++col) {
      if (r.column_size(col) == 0) continue;
      ...
    }
    for (size_t col = 0; col < types.size(); ++col) {
      if (r.column_size(col) != 0) continue;
      ...
    }
  }

but it's still failing. With debugging enabled, I've confirmed that on each row, it calls the "not-large" columns first and then all "large" columns in-order, but on the first row it still errs:

nanodbc/nanodbc.cpp:2820 SQLGetData(handle, column + 1, col.ctype_, buffer, buffer_size, &ValueLenOrInd)
<-- rc: SQL_ERROR | 
nanodbc/nanodbc.cpp:334 SQLGetDiagRec(handle_type, handle, (SQLSMALLINT)i, sql_state, &native_error, 0, 0, &total_bytes)
<-- rc: SQL_SUCCESS | 
nanodbc/nanodbc.cpp:352 SQLGetDiagRec(handle_type, handle, (SQLSMALLINT)i, sql_state, &native_error, sql_message.data(), (SQLSMALLINT)sql_message.size(), &total_bytes)
<-- rc: SQL_SUCCESS | 
Error in result_fetch(res@ptr, n, ...) : 
  nanodbc/nanodbc.cpp:2837: 07009: [Microsoft][ODBC Driver 13 for SQL Server]Invalid Descriptor Index 

I suspect that we need to be including SQLFetchScroll in the "blob" (large-column) (based on the MS link above). Thoughts? Experience with this?

@r2evans
Copy link

r2evans commented Aug 24, 2018

Interesting ...

For the record, I get this error with drivers "ODBC Driver 13 for SQL Server" and "SQL Server Native Client 11.0", but not with "SQL Server".

                            name APILevel ConnectFunctions CPTimeout DriverODBCVer FileUsage SQLLevel UsageCount
1                     SQL Server        2              YYY        60         03.50         0        1          1
20 ODBC Driver 13 for SQL Server        2              YYY        60         03.80         0        1          1
28 SQL Server Native Client 11.0        2              YYY        60         03.80         0        1          1

So the driver that I'm inferring is older (because of the older DriverODBCVer) works just fine, but the newer ones (installed with SSMS and updated recently by Microsoft) are broken.

@retorquere
Copy link

Where do I get the older SQL Server driver? I'm trying to convince my SQL-only team mates to join the RMarkdown revolution, but this is not helping.

@r2evans
Copy link

r2evans commented Jun 15, 2020

@retorquere, you might consider using the FreeTDS driver. I've successfully tested it on Windows and linux. The pros/cons so far:

Pros:

  • does not succumb to this "large field problem".

Cons:

  • allegedly slower, I've seen 10-15% but have not benchmarked it;
  • when used with pool, the pool package does not detect a disconnect properly so does not attempt to reconnect;
  • on Windows, I cannot get it to recognize SSL libraries so that I can enable an encrypted connection.

Update (Sep 2020): since I wrote this, I've tried FreeTDS on-and-off. Recently, using emacs-27.1, R-4.0.2, in Emacs/ESS on win10, it (somehow) eventually freezes emacs after several minutes, and it is affecting the whole system. If I had to guess, it's a memory leak that bogs everything else to a crawl. When I use Microsoft's ODBC driver, I get no instability.

@retorquere
Copy link

@r2evans thanks. The pool package is not much of an issue for us, and since we're using it for fairly simple RMarkdown documents the performance hit is likely tolerable, but most of my colleagues are on Windows, and we're connecting (a.o.) to Azure databases; I can't imagine Azure would allow non-secured connections.

@retorquere
Copy link

(but to test: how would I set up a FreeTDS connection? I'm on macOS myself)

@r2evans
Copy link

r2evans commented Jun 16, 2020

I don't know how to manage odbc drivers on macos, sorry. I was only able to find FreeTDS' appveyor jobs, which is how I found the windows binaries for installation, I wonder if there's an analog for macos. Once you have that, though, you use it just like you would another driver, using the driver="FreeTDS" instead of driver="ODBC Driver 13 for SQL Server" (or whichever version you're using).

Frankly, if pool weren't a problem, I would be completely switched over. Since much of my work suffers hard when a connection hiccups (e.g., RStudio Connect server), I'd rather not implement my own re-connect logic ... so I'm currently stuck with MS's odbc driver. (Which means that I'm already working around the "long data fields" fiasco ...)

@r2evans
Copy link

r2evans commented Jun 16, 2020

@retorquere, I would be very interested if somebody figures out how to get the freetds.dll registered driver in windows to recognize SSL libraries for the encrypted connection. Yes, I'd imagine that azure databases do (or at last should) require encrypted connections. In a fix, though, do you have the ability to ssh-tunnel in and connect to it as locally as it'll permit? That mitigates some of the encryption concern. (I don't know if azure instances allow ssh-tunneling.)

@retorquere
Copy link

I haven't seen anything that would indicate we can do this.

@nickv9
Copy link

nickv9 commented Jun 16, 2020

Since much of my work suffers hard when a connection hiccups (e.g., RStudio Connect server), I'd rather not implement my own re-connect logic ... so I'm currently stuck with MS's odbc driver. (Which means that I'm already working around the "long data fields" fiasco ...)

@r2evans I thought this problem does NOT occur when using RStudio's commercial drivers?See Jim Hester's comment

Specifically, this statement:

The only driver for SQL Server I know of which does is the ones supplied by Simba which are bundled with the Pro versions of RStudio products.

@r2evans
Copy link

r2evans commented Jun 16, 2020

First, the comment that suggests FreeTDS is going to have problems with "long data" because it doesn't implement his SQLGetData logic does not pan out: using FreeTDS, I can query the columns in any order, in ways that will cause the MS driver to fail. I don't know that FreeTDS implements both of those functions (that Jim references), but something it is doing is less-wrong than MS's driver. (There are other problems with FreeTDS, though ... namely the pool problem I mentioned above ...)

As to using RStudio pro drivers ... I had been working for way too long before those drivers were available to me (before I purchased RSC), that switching was effectively not the immediate priority. And the fact that those drivers are not for Windows means that I'm out of luck: RStudio pro drivers do not support Windows. Joy.

@nickv9
Copy link

nickv9 commented Jun 17, 2020

To give you some context behind my questions, I am writing about the problem and solutions in this WIP article: Debugging the Invalid Descriptor Index Error.

In regards to this statement...

RStudio pro drivers do not support Windows. Joy.

The Requirements for RStudio Professional Drivers page does say they support Windows. If this isn't a viable solution for Windows users, I'd like to update my post. I'm curious, have you attempted to use the SQL driver and found it didn't work on Windows?

@r2evans
Copy link

r2evans commented Jun 17, 2020

I thought it used to say "windows not supported", but effectively the same for me:

The drivers are installed using RStudio Desktop Pro

I have RSC, not RStudio Desktop Pro. I tried downloading the (1.2GB!) tarfile anyway, but no DLLs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants