Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB collection name case-sensitivity #1102

Closed
xumingming opened this issue Jul 9, 2019 · 11 comments · Fixed by #3453
Closed

MongoDB collection name case-sensitivity #1102

xumingming opened this issue Jul 9, 2019 · 11 comments · Fixed by #3453

Comments

@xumingming
Copy link
Contributor

MongoDB Collection is case sensitive: It is totally valid for one database to have two collections: Hello, and hello, while Presto-MongoDB Connector use SchemaTableName to represent collection name, SchemaTableName itself is not case sensitive, it causes that we can not query mongodb collection which have uppercase letter in its name.

@findepi
Copy link
Member

findepi commented Jul 9, 2019

we can not query mongodb collection which have uppercase letter in its name.

This can be fixed by finding the collection case-insensitively.

It is totally valid for one database to have two collections: Hello, and hello

Fixing this requires #17

@tchunwei
Copy link
Member

I am having this problem too.

we can not query mongodb collection which have uppercase letter in its name.

This can be fixed by finding the collection case-insensitively.

@findepi any idea how to have a SELECT statement querying collection (table) name case-insensitively?

I am having some db (schema) name in camelCase too. SHOW tables FROM mongodb.camelCaseSchema will always return an error

@findepi
Copy link
Member

findepi commented Sep 10, 2019

@tchunwei This requires code changes. I implemented case-insensitive name resolution in JDBC connectors (#614).
Similar approach can be implemented in Mongo connector as well.
If someone wants to take a stab at this, I can offer guidance. We can chat on our Slack (https://prestosql.io/community.html).

@zgseed
Copy link

zgseed commented Sep 29, 2019

Hi, @findepi I'm trying to implement the case-insensitive in mongo connector.
I notice BaseJdbcClient cache the lower-case name(schemas & tables) and the real remote name. But why use JdbcIdentify as the cache key? Could the cached data be global for all users?

@findepi
Copy link
Member

findepi commented Sep 29, 2019

@zgseed good question.

Have at https://github.com/prestosql/presto/blob/9792498d4edefc3d47af03a7e627d048e2a91c30/presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/ConnectionFactory.java#L25
-- we allow the connection to be dependent on the current user, so technically, users could see separate sets of data.
Without separation on the cache level, we could reveal actual object name eg in the error message even when current user doesn't have permissions to read the object.
Ideally we should not have JdbcIdentify as the cache key when user-dependent connections are not used (the default).

In Mongo connector you don't need all of this. No need for "MongoIdentity" or the like.

@martint
Copy link
Member

martint commented Sep 29, 2019

Be aware that there’s a change in progress to fix the underlying issue with case sensitive identifiers across the board: #1302

@findepi
Copy link
Member

findepi commented Sep 29, 2019

Thanks @martint for heads up.

There are two use-cases:

  1. collections differing in case, as described by @xumingming in this issue (eg Hello and hello)
  2. collections with non-lowercase names (eg. Hello)

Case-insensitive name resolution helps 2. but doesn't help 1.
#1302 (#17) will help both 1. and 2. and we hope this will be finished in couple of weeks.

@zgseed if you have use-case 2. (and not 1.) and you need a stop-gap solution
urgently, you can definitely work on it (following base-jdbc approach, sans "identity").
Keep in mind, however, that we want #1302 (#17) to be the ultimate solution,
so we consider case-insensitive name resolution an interim solution only.
I.e. once #1302 (#17) is done, we will deprecate case-insensitive name resolution
and later remove it.

@zgseed
Copy link

zgseed commented Oct 8, 2019

thanks @findepi ,@martint.

@tchunwei
Copy link
Member

Thanks @martint for heads up.

There are two use-cases:

  1. collections differing in case, as described by @xumingming in this issue (eg Hello and hello)
  2. collections with non-lowercase names (eg. Hello)

Case-insensitive name resolution helps 2. but doesn't help 1.
#1302 (#17) will help both 1. and 2. and we hope this will be finished in couple of weeks.

@zgseed if you have use-case 2. (and not 1.) and you need a stop-gap solution
urgently, you can definitely work on it (following base-jdbc approach, sans "identity").
Keep in mind, however, that we want #1302 (#17) to be the ultimate solution,
so we consider case-insensitive name resolution an interim solution only.
I.e. once #1302 (#17) is done, we will deprecate case-insensitive name resolution
and later remove it.

I have came across this stale PR prestodb/presto#6993, that is trying to implement case-insensitive name solution, which will help in case 2 that you mentioned. I tried to implement it over again by forking the latest source code, my implementation is here https://github.com/tchunwei/presto/tree/mongodb-case-insensitive .

I am not really sure I am working in a correct way, but it does seems to work for me. I would be grateful if anyone can help to review this solution, and submit a PR if it is helpful for anybody else.

@findepi
Copy link
Member

findepi commented Oct 31, 2019

I would be grateful if anyone can help to review this solution

@tchunwei you can just open a PR and let people provide feedback.

@tchunwei
Copy link
Member

@tchunwei you can just open a PR and let people provide feedback.

Thanks @findepi , submitted PR here #1915

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants