Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default SQL data type for character values #102

Closed
krlmlr opened this issue May 5, 2016 · 7 comments
Closed

Default SQL data type for character values #102

krlmlr opened this issue May 5, 2016 · 7 comments
Milestone

Comments

@krlmlr
Copy link
Member

krlmlr commented May 5, 2016

https://github.com/rstats-db/DBI/pull/97/files/c4a8b60bf4b9d4661838af135a24c7f31590ddd5#r61682461

What is the most sensible default here?

  • VARCHAR(n) depends on data
  • TEXT may mean overhead by storing all character values in external BLOBS

CC @hadley @hannesmuehleisen

@hadley
Copy link
Member

hadley commented May 5, 2016

I'm pretty sure the default should be TEXT

@beanumber
Copy link

SQLite doesn't support VARCHAR(n), but it will gloss over it. So why not default to VARCHAR(n) since that will work in both MySQL and PostgreSQL?

@hannes
Copy link
Contributor

hannes commented May 5, 2016

OK I think DBI should choose CLOB for characters, because SQL 2008 says so (ISO/IEC 9075-2:2008(E), Section 6.1)

screen shot 2016-05-05 at 16 25 15

@krlmlr
Copy link
Member Author

krlmlr commented Jul 31, 2016

@hannesmuehleisen: How many DBMS actually support this? MySQL and PostgreSQL don't. (There seems to be a newer standard (2011), I doubt that much has changed there.)

TEXT is supported by SQLite, MySQL, and PostgreSQL, but not by SQL Server. I'll play it safe and keep it simple here -- return "TEXT".

@krlmlr krlmlr closed this as completed in 028bcd5 Jul 31, 2016
krlmlr pushed a commit that referenced this issue Jul 31, 2016
- Remove misleading parts in `dbConnect()` documentation (#118).
- Use `contains` argument instead of `representation()` to denote base classes (#93).
- `dbDataType()` maps `character` values to `"TEXT"` by default (#102).
- Remove misleading link in `dbDataType()` documentation.
krlmlr added a commit that referenced this issue Aug 11, 2016
- Interface changes
    - `dbDataType()` maps `character` values to `"TEXT"` by default (#102).
    - The default implementation of `dbQuoteString()` doesn't call `encodeString()` anymore: Neither SQLite nor Postgres understand e.g. `\n` in a string literal, and all of SQLite, Postgres, and MySQL accept an embedded newline (#121).

- Interface enhancements
    - New `dbSendStatement()` generic, forwards to `dbSendQuery()` by default (#20, #132).
    - New `dbExecute()`, calls `dbSendStatement()` by default (#109, @bborgesr).
    - New `dbWithTransaction()` that calls `dbBegin()` and `dbCommit()`, and `dbRollback()` on failure (#110, @bborgesr).
    - New `dbBreak()` function which allows aborting from within `dbWithTransaction()` (#115, #133).
    - Export `dbFetch()` and `dbQuoteString()` methods.

- Documentation improvements:
    - One example per function (except functions scheduled for deprecation) (#67).
    - Consistent layout and identifier naming.
    - Better documentation of generics by adding links to the class and related generics in the "See also" section under "Other DBI... generics" (#130). S4 documentation is directed to a hidden page to unclutter documentation index (#59).
    - Fix two minor vignette typos (#124, @mdsumner).
    - Add package documentation.
    - Remove misleading parts in `dbConnect()` documentation (#118).
    - Remove misleading link in `dbDataType()` documentation.
    - Remove full stop from documentation titles.
    - New help topic "DBIspec" that contains the full DBI specification (currently work in progress) (#129).
    - HTML documentation generated by `staticdocs` is now uploaded to http://rstats-db.github.io/DBI for each build of the "production" branch (#131).
    - Further minor changes and fixes.

- Internal
    - Use `contains` argument instead of `representation()` to denote base classes (#93).
    - Remove redundant declaration of transaction methods (#110, @bborgesr).
@jimhester
Copy link
Contributor

I don't think TEXT is playing it safe, It is not part of SQL-92 standard, is deprecated in SQL-Server and not supported in Oracle databases either.

If you are worried about new data not fitting into a VARCHAR(n) column, you could use something like the following, which would make the field width one power of two greater than it takes to hold the current data.

varchar_data_type <- function(x) {
  size <- 2L^(floor(log2(max(nchar(as.character(x))))) + 1L)
  paste0("VARCHAR(", size, ")")
}
varchar_data_type("t")
#> [1] "VARCHAR(2)"
varchar_data_type("te")
#> [1] "VARCHAR(4)"
varchar_data_type("test")
#> [1] "VARCHAR(8)"

@krlmlr
Copy link
Member Author

krlmlr commented Oct 12, 2016

@jimhester: This is only the fallback, DBI drivers are free to override. I think it will be difficult to find a solution that works in all cases, so TEXT may be just as good as any other option. What am I missing?

@jimhester
Copy link
Contributor

I agree it is difficult to find a good default, but I think the default should at least be part of the SQL 92 specification, this is explicitly mentioned in ?dbDataType.

The default method determines the SQL type of an R object according to the SQL 92 specification, which may serve as a starting point for driver implementations.

TEXT is not part of the standard and not implemented in a number of widely used databases.

Probably the most compatible solution would be just to use VARCHAR(255), this is very widely compatible, just about every database should support it and SQLite even translates it explicitly to its TEXT data type.

I agree this default is not terribly important because drivers can define their own method, but TEXT does not seem like a proper default.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants