You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Alex asked me to add an issue for some of the other upload issues he faced so we could track them better.
When we upload a table without a key, an incrementing integer is used for each row. This works and allows a user to upload data but could cause headaches when trying to create networks, since those values need to be known for _from and _to.
To better support this use case, we should allow specifying a _key column (at upload and after), so a user can more easily create connections between nodes.
Additionally, we should allow a user to specify which column to match on when we're uploading an edge table for example:
source,target
1,2
2,3
5,4
And the user could specify that source maps to a-node-table/identity-col and target maps to a-different-node-table/ID.
This means that the restriction on only creating edges by pointing at the _key column is removed. As a technical detail, we'd likely need to figure out some way to convert between the values the user provided and the actual _key value for that row.
The text was updated successfully, but these errors were encountered:
JackWilb
changed the title
Whe uploading a table without an explicit _key, _from, or _to we should allow setting an existing column for them
When uploading a table without an explicit _key, _from, or _to we should allow setting an existing column for them
Feb 18, 2022
Could you work up a very specific example that illustrates what we'd like to do here? Basically, expand your example with a couple of node tables as well. This is because, as stated, we can't do this in Arango without some further constraints which haven't been made explicit yet in this discussion. In particular, I'm not sure if the ask involves being able to use different columns in a given node table file as the ID for different edge tables--if so, we'd need to do something a bit complex to make it work.
The other way I've been thinking about doing this is to have the user upload all the tables (node and edge) at once, and ask them to pick out which are the key columns for each node table, and which are the from/to columns for each edge table (or, "the" edge table if we're following the existing capabilities). The latter might be a more unified way to make this work, if the use case you're describing fits (i.e., if we're talking about having a full network specific over a few files, like the silent movies dataset).
Let me know what you think. @AlmightyYakob and I have been thinking about this a little bit but I want to standardize on a concrete use case before we design/execute a solution.
Alex asked me to add an issue for some of the other upload issues he faced so we could track them better.
When we upload a table without a key, an incrementing integer is used for each row. This works and allows a user to upload data but could cause headaches when trying to create networks, since those values need to be known for
_from
and_to
.To better support this use case, we should allow specifying a _key column (at upload and after), so a user can more easily create connections between nodes.
Additionally, we should allow a user to specify which column to match on when we're uploading an edge table for example:
And the user could specify that source maps to
a-node-table/identity-col
and target maps toa-different-node-table/ID
.This means that the restriction on only creating edges by pointing at the _key column is removed. As a technical detail, we'd likely need to figure out some way to convert between the values the user provided and the actual _key value for that row.
The text was updated successfully, but these errors were encountered: