Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When uploading a table without an explicit _key, _from, or _to we should allow setting an existing column for them #220

Closed
JackWilb opened this issue Feb 18, 2022 · 1 comment · Fixed by #295
Assignees

Comments

@JackWilb
Copy link
Member

Alex asked me to add an issue for some of the other upload issues he faced so we could track them better.

When we upload a table without a key, an incrementing integer is used for each row. This works and allows a user to upload data but could cause headaches when trying to create networks, since those values need to be known for _from and _to.

To better support this use case, we should allow specifying a _key column (at upload and after), so a user can more easily create connections between nodes.

Additionally, we should allow a user to specify which column to match on when we're uploading an edge table for example:

source,target
1,2
2,3
5,4

And the user could specify that source maps to a-node-table/identity-col and target maps to a-different-node-table/ID.

This means that the restriction on only creating edges by pointing at the _key column is removed. As a technical detail, we'd likely need to figure out some way to convert between the values the user provided and the actual _key value for that row.

@JackWilb JackWilb changed the title Whe uploading a table without an explicit _key, _from, or _to we should allow setting an existing column for them When uploading a table without an explicit _key, _from, or _to we should allow setting an existing column for them Feb 18, 2022
@waxlamp
Copy link
Contributor

waxlamp commented Feb 21, 2022

Thanks for filing the issue, @JackWilb.

Could you work up a very specific example that illustrates what we'd like to do here? Basically, expand your example with a couple of node tables as well. This is because, as stated, we can't do this in Arango without some further constraints which haven't been made explicit yet in this discussion. In particular, I'm not sure if the ask involves being able to use different columns in a given node table file as the ID for different edge tables--if so, we'd need to do something a bit complex to make it work.

The other way I've been thinking about doing this is to have the user upload all the tables (node and edge) at once, and ask them to pick out which are the key columns for each node table, and which are the from/to columns for each edge table (or, "the" edge table if we're following the existing capabilities). The latter might be a more unified way to make this work, if the use case you're describing fits (i.e., if we're talking about having a full network specific over a few files, like the silent movies dataset).

Let me know what you think. @AlmightyYakob and I have been thinking about this a little bit but I want to standardize on a concrete use case before we design/execute a solution.

@alexsb alexsb self-assigned this Feb 25, 2022
@JackWilb JackWilb mentioned this issue Jul 27, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants