Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for case sensitive identifiers #2863

Open
cberner opened this issue May 4, 2015 · 32 comments
Open

Add support for case sensitive identifiers #2863

cberner opened this issue May 4, 2015 · 32 comments

Comments

@cberner
Copy link
Contributor

@cberner cberner commented May 4, 2015

  • We need to record the fact that identifiers are quoted/not-quoted in the sql
  • We need to extend the SPI, so that connectors can specify whether table/column..etc identifiers are case sensitive
  • This should work for fields in row types also
  • Rule should be that if either side is case insensitive then the match is done ignoring case
  • For case insensitive matching, if more than one item matches, throw an ambiguity exception
@jamiemccrindle
Copy link

@jamiemccrindle jamiemccrindle commented May 12, 2015

+1

@electrum
Copy link
Contributor

@electrum electrum commented May 12, 2015

I don't think the last one is correct. Case insensitive matching should only happen if the target is the correct case.

You can have multiple identifiers differing only in case. Case insensitive matching should match at most one.

@dain
Copy link
Contributor

@dain dain commented May 12, 2015

@electrum correct. I updated the description.

@XiLongZheng
Copy link

@XiLongZheng XiLongZheng commented Jul 27, 2015

Is there any plan to get this fixed so I could use Upper case in table name?

@dain
Copy link
Contributor

@dain dain commented Jul 27, 2015

I took a look at this a few weeks back. The problem is the change is spread throughout the codebase and in parts that are actively being changed. After @martint's planner changes are in and @haozhun's changes to field dereferencing, the change should be doable.

@ashish6976
Copy link

@ashish6976 ashish6976 commented Aug 17, 2015

Is there any other way to use upper case in database name and table name while querying from presto

@dain
Copy link
Contributor

@dain dain commented Aug 18, 2015

@ashish6976 This issues is about changing the current Presto behavior to support full case sensitive identifiers.

From the users perspective, the current behavior of only supporting case insensitive identifiers should just work, unless you happen to have an existing system containing identifiers that only differ in case. If you are seeing failures when there is only one identifier, please file a new issue since that is a bug.

@ashish6976
Copy link

@ashish6976 ashish6976 commented Aug 18, 2015

@dain Here are the scenarios in which I am facing isssues

My MYSQL Server is running on Centos.
I am using presto to query my MYSQL database using MYSQL connector where my catalog name is mysql.

Scenario 1 - DataBase name and Table name is combination of upper case and lower case letters

Database Name - TestDB
Table Names - EmployeeDetails, EmployeeTable
Query 1 - show schemas from mysql;
Output -
Schema

information_schema
performance_schema
testdb
(3 rows)

Query 20150818_064410_00003_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [3 rows, 61B] [25 rows/s, 524B/s]

Query 2 - show tables from mysql.testdb;
Output -
Table

(0 rows)

Query 20150818_064532_00004_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

In this case presto is not able to Fetch the table names which are present in database TestDB.

The Mysql output
mysql> show tables from TestDB;
+------------------+
| Tables_in_TestDB |
+------------------+
| EmployeeDetails |
| EmployeeTable |
+------------------+
2 rows in set (0.00 sec)

Scenario 2 - DataBase name is in lower case , Table name is combination of upper case and lower case letters

Database Name - lowercasedb
Table Names - TableOne, TableTwo
Query 1 - show schemas from mysql;
Output -
Schema

information_schema
lowercasedb
performance_schema
testdb
(4 rows)

Query 20150818_065347_00005_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [4 rows, 77B] [27 rows/s, 522B/s]

Query 2 - show tables from mysql.lowercasedb;
Output -
Table

tableone
tabletwo
(2 rows)

Query 20150818_065432_00006_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [2 rows, 66B] [15 rows/s, 505B/s]

Query 3 - select * from mysql.lowercasedb.tableone;
Output -
Query 20150818_065535_00007_837eu failed: Table mysql.lowercasedb.tableone does not exist

In this scenario presto is able to fetch the table names but when I am accessing the table the its giving me an error as shown above.

The Mysql output
mysql> select * from lowercasedb.TableOne;
+-----------+-----------+
| ColumnOne | ColumnTwo |
+-----------+-----------+
| 1 | Row 1 |
| 2 | Row 2 |
+-----------+-----------+
2 rows in set (0.00 sec)

Scenario 3 - DataBase name and Table name is in lower case letters

Database Name - lowercasedb
Table Names - tableone, tabletwo
Query 1 - show schemas from mysql;
Output -
Schema

information_schema
lowercasedb
lowercasetabledb
performance_schema
testdb
(5 rows)

Query 20150818_070234_00008_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [5 rows, 98B] [30 rows/s, 597B/s]

Query 2 - show tables from mysql.lowercasetabledb;
Output -
Table

tableone
tabletwo
(2 rows)

Query 20150818_070253_00009_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [2 rows, 76B] [17 rows/s, 652B/s]

Query 3 - select * from mysql.lowercasetabledb.tableone;
Output -
columnone | columntwo
-----------+-----------
1 | Row 1
2 | Row 2
(2 rows)

Query 20150818_070319_00010_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [2 rows, 0B] [8 rows/s, 0B/s]

In this scenario I am able to access the tables in the database.

@dain
Copy link
Contributor

@dain dain commented Aug 18, 2015

@ashish6976 This is a bug in the mysql connector. Please open a new issue for the mysql connector.

@saileshmittal
Copy link

@saileshmittal saileshmittal commented Feb 13, 2016

Any progress on this?

@samanthrao
Copy link

@samanthrao samanthrao commented Nov 10, 2016

I have the same issue as @ashish6976 but in my case the database is MongoDB. Some of the database names and collections are in title case per company standards. When Presto loads the catalog, it converts everything to lowercase and results in zero records being processed. The only thing that works is, when a collection name or database name is lowercase.

Are there any alternative to overcome this?

@umnya
Copy link

@umnya umnya commented Feb 20, 2017

Any update about this ?

@knoguchi
Copy link

@knoguchi knoguchi commented Feb 22, 2017

I've read the source a bit. There are places in the Presto code like this
https://github.com/prestodb/presto/blob/master/presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java#L49

I think we have to remove those toLowerCase().

@img22
Copy link

@img22 img22 commented May 19, 2017

Is there a fix for this? I'm not able to query upper case tables

@faisal00813
Copy link

@faisal00813 faisal00813 commented Jun 14, 2017

Any update about the fix?

@RameshByndoor
Copy link
Contributor

@RameshByndoor RameshByndoor commented Jul 14, 2017

I am using 0.180 version of presto and issue is still present. We have all our mysql table names in UPPERCASE and any alternative or patches available..?

@MichaelAlo
Copy link

@MichaelAlo MichaelAlo commented Jul 24, 2017

Hi there,
Is the issue about tables in upper case in MySQL and a Presto connection fixed? :)
Cheers

@Drizzt321
Copy link

@Drizzt321 Drizzt321 commented Jul 27, 2017

I'm testing out Presto with MySQL, running into this exact issue. Is there any will, at all, to fix this rather big issue? Clearly there's more than 1 or 2 people, and this would prevent us from using Presto as our tables are managed by Hibernate, which creates them. I have seen https://dev.mysql.com/doc/refman/5.7/en/identifier-case-sensitivity.html, which setting it to 2 might work. I'm having trouble getting my MySQL instance to get that set properly though, but I'll report back if I manage to and if that works.

@Drizzt321
Copy link

@Drizzt321 Drizzt321 commented Aug 1, 2017

So, sadly I can't get it set to '2', I suppose it's because I'm running on Linux, and so mysql really doesn't like it so it keeps reverting to 0. I can get it set to 1, which simply lowercases all table names and stores them on disk as lowercased, but that's not so useful for lots of existing tables.

Drizzt321 added a commit to Drizzt321/presto that referenced this issue Aug 4, 2017
Based on needs in issue prestodb#2863,
where in RDBMS that supports case-sensitive table names and have
tables with case-sensitive names, then those tables are unable to be
used with PrestoDB. This change adds optional (configuration controlled)
mapping for table names when executing queries that end up calling
getTableHandle(). It maps a lowercased name to the name of the table
from the database, and supports reloading the mapping in the event that
a table is referenced but doesn't currently have a mapping so new tables
can be used without restarting Presto.

A test was attempted to be written similar to the TestJdbcClient with a
new TestingDatabase, however it was discovered that H2 will return TRUE
for metadata.storesUpperCaseIdentifiers() even if
DATABASE_TO_UPPER=FALSE is set. The only way to have that return FALSE
is using MODE=MYSQL, however in that case H2 lowercases all table names.
So it is not possible to create a test with H2 without modifying the
section of code with the call to metadata.storesUpperCaseIdentifiers().
@martint
Copy link
Contributor

@martint martint commented Aug 4, 2017

For reference, here are the relevant parts from the SQL spec:

<delimited identifier> ::=
  <double quote> <delimited identifier body> <double quote>

<delimited identifier body> ::=  <delimited identifier part>...
<delimited identifier part> ::=
    <nondoublequote character>
  | <doublequote symbol>

<Unicode delimited identifier> ::=
  U <ampersand> <double quote> <Unicode delimiter body> <double quote>
      <Unicode escape specifier>
<Unicode escape specifier> ::=
  [ UESCAPE <quote> <Unicode escape character> <quote> ]
<Unicode delimiter body> ::=
  <Unicode identifier part>...
<Unicode identifier part> ::=
    <delimited identifier part>
  | <Unicode escape value>
24) For every <identifier body> IB there is exactly one corresponding case-normal form CNF. CNF is an <identifier body> derived from IB as follows:
Let n be the number of characters in IB. For i ranging from 1 (one) to n, the i-th character Mi of IB is transliterated into the corresponding character 
or characters of CNF as follows:
Case:
   a) If Mi is a lower case character or a title case character for which an equivalent upper case sequence U is de ned by Unicode, then let j be th
       e number of characters in U; the next j characters of CNF are U.
   b) Otherwise, the next character of CNF is Mi.
25) The case-normal form of the <identifier body> of a <regular identifier> is used for purposes such as and including determination of identifier 
      equivalence, representation in the Definition and Information Schemas, and representation in diagnostics areas.

...

27) Two <regular identifier>s are equivalent if the case-normal forms of their <identifier body>s, considered as the repetition of a <character string literal> 
that specifies a <character set specification> of SQL_IDENTIFIER and an implementation-defined collation IDC that is sensitive to case, compare equally 
according to the comparison rules in Subclause 8.2, “<comparison predicate>”.

28) A <regular identifier> and a <delimited identifier> are equivalent if the case-normal form of the <identifier body> of the <regular identifier> and the 
<delimited identifier body> of the <delimited identifier> (with all occurrences of <quote> replaced by <quote symbol> and all occurrences of 
<doublequote symbol> replaced by <double quote>), considered as the repetition of a <character string literal> that specifies a <character set specification>
 of SQL_IDENTIFIER and IDC, compare equally according to the comparison rules in Subclause 8.2, “<comparison predicate>”.


29) Two<delimited identifier>s are equivalent if their <delimited identifierbody>s,considered as the repetition of a <character string literal> that specifies
 a <character set specification> of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally according to the
 comparison rules in Subclause 8.2, “<comparison predicate>”.

30) Two <Unicode delimited identifier>s are equivalent if their <Unicode delimiter body>s, considered as the repetition of a <character string literal> that
 specifies a <character set specification> of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally according
 to the comparison rules in Subclause 8.2, “<comparison predicate>”.

31) A <Unicode delimited identifier> and a <delimited identifier> are equivalent if their <Unicode delimiter body> and <delimited identifier body>, 
respectively, each considered as the repetition of a <character string literal> that specifies a <character set specification> of SQL_IDENTIFIER and 
an implementation-defined collation that is sensitive to case, compare equally according to the comparison rules in Subclause 8.2, “<comparison predicate>”.

32) A <regular identifier> and a <Unicode delimited identifier> are equivalent if the case-normal form of the <identifier body> of the <regular identifier> 
and the <Unicode delimiter body> of the <Unicode delimited identifier> considered as the repetition of a <character string literal>, each specifying a
 <character set specification> of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally according to the 
comparison rules in Subclause 8.2, “<comparison predicate>”.
@ghost
Copy link

@ghost ghost commented Apr 16, 2018

+1
Is there any progess on this? We are useing version 0.192 and the problem with Scenario 2 is still there.

@Drizzt321
Copy link

@Drizzt321 Drizzt321 commented Apr 18, 2018

I have an open PR #8674 that got stalled waiting for some replies from maintainers, and then I got busy with work and unable to continue it. I don't see any time in the foreseeable future for me to pick this back up.

@nihalbtq8
Copy link

@nihalbtq8 nihalbtq8 commented Jul 3, 2018

This is still an issue currently. Is there any alternative available?

@binque
Copy link

@binque binque commented Aug 28, 2018

Any update ?

@kokosing
Copy link
Contributor

@kokosing kokosing commented Aug 28, 2018

Any update ?

Unfortunately, no ;(

@dain
Copy link
Contributor

@dain dain commented Aug 28, 2018

@hgschmie told me the other day he was looking at this

@SanjayJosh
Copy link

@SanjayJosh SanjayJosh commented Sep 5, 2018

Any update on this? All the tables in my project are by default in upper case.
Or can any pointers be given in the code as to what can be changed so that a custom build for the same can be made?

@tooptoop4
Copy link

@tooptoop4 tooptoop4 commented Sep 13, 2018

any update? hive metastore tables in mysql like DBS, PARTITIONS can't be queried.

presto> select * from mysql.metastore.DBS;
Query 20180913_015448_00010_774as failed: line 1:15: Table mysql.metastore.dbs does not exist

@ciscoring
Copy link
Contributor

@ciscoring ciscoring commented Nov 9, 2018

Any updae on this issue? My collections in mongodb have names with upper case. Presto can't read these collections because it always change lower case.

@alvespat
Copy link

@alvespat alvespat commented Nov 22, 2018

I'm getting the same issue with Sqlserver connectors and tables defined in uppercase... ( only be able to query tables where names are defined in lower case.... :-( (presto 0.203)

@cberner
Copy link
Contributor Author

@cberner cberner commented Jun 20, 2019

I no longer am actively working on Presto

@mizunno
Copy link

@mizunno mizunno commented Dec 31, 2019

Any update on this? Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.