Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MongoDB Database References #3134

Closed
ebyhr opened this issue Mar 18, 2020 · 14 comments · Fixed by #8549
Closed

Add support for MongoDB Database References #3134

ebyhr opened this issue Mar 18, 2020 · 14 comments · Fixed by #8549
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@ebyhr
Copy link
Member

ebyhr commented Mar 18, 2020

presto> select * from mongodb.test.test_ref;
 $ref |                 $id                 | $db  
------+-------------------------------------+------
 test | 5e 71 93 9b 7d 7f e6 32 5a 0b 33 07 | test 
@ebyhr ebyhr added the enhancement New feature or request label Mar 18, 2020
@findepi findepi added the good first issue Good for newcomers label Mar 18, 2020
@kristina-head
Copy link

Please may I work on this?

@findepi
Copy link
Member

findepi commented Apr 28, 2020

@kristina-head sure, go for it!

@achyudh
Copy link
Contributor

achyudh commented Jun 25, 2020

@kristina-head are you still working on this? I’d like to give this a shot

@kristina-head
Copy link

@achyudh it's all yours :)

@jasonyanwenl
Copy link
Member

Hi @achyudh are you still working on this? I would like to take a look at this issue as well.

@achyudh
Copy link
Contributor

achyudh commented Nov 9, 2020

@jasonyanwenl nope, feel free to take it up

@jasonyanwenl
Copy link
Member

Hi @ebyhr, may I ask which part of the code is a good place to start? Thanks!

@ebyhr
Copy link
Member Author

ebyhr commented Nov 10, 2020

@jasonyanwenl I think you can start with MongoSession#getTableMetadata that has "guess fields" mechanism.

@jasonyanwenl
Copy link
Member

Hi @ebyhr, currently I am trying to get more background about this issue. I have read two links you have shared. In my sense, DBRefs are kind of like pointers to another document in MongoDB. One thing I am confused is the example you are given:

presto> select * from mongodb.test.test_ref;
 $ref |                 $id                 | $db  
------+-------------------------------------+------
 test | 5e 71 93 9b 7d 7f e6 32 5a 0b 33 07 | test 

Essentially, I don't know what is the exact requirement for solving this issue. Here are two versions I could come up with:

  1. Do you mean that the above SQL query result is just what we expect to have? And if so, do you mean that the test_reftable is automatically created as long as the test table is created. And the fields inside test_ref is auto-filled with $ref, $id and $db?

  2. Do you mean when a document is queried, we need to automatically check if there is any DBRefs inside this doc and if so we need to automatically retrieve the referenced doc and finally return the entire doc along with referenced doc as the query result?

Which way is correct? Or I totally misunderstand any points? Thanks!

@ebyhr
Copy link
Member Author

ebyhr commented Nov 13, 2020

@jasonyanwenl Thanks for looking into the details.

  1. Though I almost forgot how I created the above example, test_ref should be created beforehand. If we use DBRef in MongoDB, the field has $ref, $id and $db. You can reproduce this issue by this steps:
  • Start MongoQueryRunner
  • Run mongo shell docker exec -it <container id> mongo
  • Prepare data:
use users;

db.creators.insert({
    "_id": ObjectId("5126bc054aed4daf9e2ab772"),
    "name": "Broadway Center",
    "url": "bc.example.net"
})

db.test_ref.insert(
{
  "_id" : ObjectId("5126bbf64aed4daf9e2ab771"),
  "col1" : "foo",
  "creator" : {
                  "$ref" : "creators",
                  "$id" : ObjectId("5126bc054aed4daf9e2ab772"),
                  "$db" : "users"
               }
})
  • Select the table from Presto. You will see creator column is missing because the type is DBRef
select * from mongodb.users.test_ref;
  1. It is the final goal, but let's start with simple step. The minimum goal is showing DBRef fields so that we can use it on join condition.

@academy-codex
Copy link
Contributor

academy-codex commented Jul 9, 2021

@ebyhr I have got this working with some nits.
If no one is working, I can take this up.

Sample Output:
trino:test> select * from test_ref;
col1 | creator
------+---------------------------------------------------------------------------------------
foo | {id=51 26 bc 05 4a ed 4d af 9e 2a b7 72, databaseName=users, collectionName=creators}
(1 row)

Sample Join:
trino:test> select * from creator, test_ref where creator._id=test_ref.creator.id;
name | url | col1 | creator
-----------------+----------------+------+----------------------------------------------------------------------------
Broadway Center | bc.example.net | foo | {id=51 26 bc 05 4a ed 4d af 9e 2a b7 72, databaseName=users, collectionName
(1 row)

@academy-codex
Copy link
Contributor

@ebyhr I have got this working with some nits.
If no one is working, I can take this up.

Sample Output:
trino:test> select * from test_ref;
col1 | creator
------+---------------------------------------------------------------------------------------
foo | {id=51 26 bc 05 4a ed 4d af 9e 2a b7 72, databaseName=users, collectionName=creators}
(1 row)

I am currently, mapping the $ref->collectionName, $db-> databaseName and $id->id.
However, I also see that MongoPageSource.writeSlice(..) has this logic:
if (type.equals(OBJECT_ID)) {
type.writeSlice(output, wrappedBuffer(((ObjectId) value).toByteArray()));
}

Any reason to use byte array here ?

@ebyhr
Copy link
Member Author

ebyhr commented Jul 9, 2021

The reason is that MongoDB ObjectId type is mapped to Slice internally. Does this answer your question?
https://github.com/trinodb/trino/blob/master/plugin/trino-mongodb/src/main/java/io/trino/plugin/mongodb/ObjectIdType.java

@academy-codex academy-codex self-assigned this Jul 10, 2021
@academy-codex
Copy link
Contributor

The reason is that MongoDB ObjectId type is mapped to Slice internally. Does this answer your question?
https://github.com/trinodb/trino/blob/master/plugin/trino-mongodb/src/main/java/io/trino/plugin/mongodb/ObjectIdType.java

I see. Got it. Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Development

Successfully merging a pull request may close this issue.

6 participants