When uploading a table without an explicit _key, _from, or _to we should allow setting an existing column for them #220

JackWilb · 2022-02-18T20:14:51Z

Alex asked me to add an issue for some of the other upload issues he faced so we could track them better.

When we upload a table without a key, an incrementing integer is used for each row. This works and allows a user to upload data but could cause headaches when trying to create networks, since those values need to be known for _from and _to.

To better support this use case, we should allow specifying a _key column (at upload and after), so a user can more easily create connections between nodes.

Additionally, we should allow a user to specify which column to match on when we're uploading an edge table for example:

source,target
1,2
2,3
5,4

And the user could specify that source maps to a-node-table/identity-col and target maps to a-different-node-table/ID.

This means that the restriction on only creating edges by pointing at the _key column is removed. As a technical detail, we'd likely need to figure out some way to convert between the values the user provided and the actual _key value for that row.

The text was updated successfully, but these errors were encountered:

waxlamp · 2022-02-21T15:43:15Z

Thanks for filing the issue, @JackWilb.

Could you work up a very specific example that illustrates what we'd like to do here? Basically, expand your example with a couple of node tables as well. This is because, as stated, we can't do this in Arango without some further constraints which haven't been made explicit yet in this discussion. In particular, I'm not sure if the ask involves being able to use different columns in a given node table file as the ID for different edge tables--if so, we'd need to do something a bit complex to make it work.

The other way I've been thinking about doing this is to have the user upload all the tables (node and edge) at once, and ask them to pick out which are the key columns for each node table, and which are the from/to columns for each edge table (or, "the" edge table if we're following the existing capabilities). The latter might be a more unified way to make this work, if the use case you're describing fits (i.e., if we're talking about having a full network specific over a few files, like the silent movies dataset).

Let me know what you think. @AlmightyYakob and I have been thinking about this a little bit but I want to standardize on a concrete use case before we design/execute a solution.

JackWilb changed the title ~~Whe uploading a table without an explicit _key, _from, or _to we should allow setting an existing column for them~~ When uploading a table without an explicit _key, _from, or _to we should allow setting an existing column for them Feb 18, 2022

alexsb self-assigned this Feb 25, 2022

JackWilb mentioned this issue Jul 27, 2023

Data upload improvements #295

Merged

7 tasks

JackWilb closed this as completed in #295 Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When uploading a table without an explicit _key, _from, or _to we should allow setting an existing column for them #220

When uploading a table without an explicit _key, _from, or _to we should allow setting an existing column for them #220

JackWilb commented Feb 18, 2022

waxlamp commented Feb 21, 2022

When uploading a table without an explicit _key, _from, or _to we should allow setting an existing column for them #220

When uploading a table without an explicit _key, _from, or _to we should allow setting an existing column for them #220

Comments

JackWilb commented Feb 18, 2022

waxlamp commented Feb 21, 2022