-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import Data #176
Comments
@jdanish @kalanicraig I have some questions about how to handle imports. Currently NetCreate is designed to work with a single starting database. When you start up the app, you have to specify a specific database file (e.g. So when you import new data, do you intend to: a) Add new records to existing database? And if there is an existing record, do you want to overwrite it? or b) Replace all existing records in the current database with the imported records? or c) Create a new database with the imported records, giving the database a new name (or really starting with a new empty database). Each one of the three would require a slightly different use model and workflow. Or do you need to support all three different use models? |
I can see a use for a modified A (append only, no edits to existing rows) and C (for new or heavily modified datasets). B would be nice but could be accommodated by A and C, I think. I’d start with C since it lets us use the export feature to get a dataset out, mod it externally, and reimport into the Net.Create environment if necessary.
Responding to export requests next!
—k
… On Dec 21, 2021, at 5:05 PM, benloh ***@***.***> wrote:
@jdanish @kalanicraig I have some questions about how to handle imports.
Currently NetCreate is designed to work with a single starting database. When you start up the app, you have to specify a specific database file (e.g. ./nc.js --dataset=tacitus).
So when you import new data, do you intend to:
a) Add new records to existing database? And if there is an existing record, do you want to overwrite it?
or
b) Replace all existing records in the current database with the imported records?
or
c) Create a new database with the imported records, giving the database a new name (or really starting with a new empty database).
Each one of the three would require a slightly different use model and workflow. Or do you need to support all three different use models?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
To clarify: The most frequent things we see for import needs are new batch
node additions with some new edges, or a bug batch edge import with few new
nodes. C first, and then templates, and then A-append-only would get us
more coverage of existing needs.
If we get to A (I’d prioritize template creation and editing after import
option C), it would assume no node disambiguation on import, and edge
lookup based on first-node-label-match.
I’d rather folks handle node disambiguation using the existing
edit/delete/merge features for now than pile time into an import feature
that is infrequently used.
On Wed, Dec 22, 2021 at 10:06 AM Kalani Craig ***@***.***>
wrote:
… I can see a use for a modified A (append only, no edits to existing rows)
and C (for new or heavily modified datasets). B would be nice but could be
accommodated by A and C, I think. I’d start with C since it lets us use the
export feature to get a dataset out, mod it externally, and reimport into
the Net.Create environment if necessary.
Responding to export requests next!
—k
On Dec 21, 2021, at 5:05 PM, benloh ***@***.***> wrote:
@jdanish <https://github.com/jdanish> @kalanicraig
<https://github.com/kalanicraig> I have some questions about how to
handle imports.
Currently NetCreate is designed to work with a single starting database.
When you start up the app, you have to specify a specific database file
(e.g. ./nc.js --dataset=tacitus).
So when you import new data, do you intend to:
a) Add new records to existing database? And if there is an existing
record, do you want to overwrite it?
or
b) Replace all existing records in the current database with the imported
records?
or
c) Create a new database with the imported records, giving the database a
new name (or really starting with a new empty database).
Each one of the three would require a slightly different use model and
workflow. Or do you need to support all three different use models?
—
Reply to this email directly, view it on GitHub
<#176 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACKL4NDQB5ENQ42AEKPURELUSD235ANCNFSM5IIVCBSA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
If C) is the priority, then importing is much more complex.
For any given record, you can have empty fields, but we expect the table format to have all of these fields defined. I'm guessing you probably need more flexibility than that? |
If we use the nc-multiplex to create the new file does that make it easier? Not sure that fits Kalani’s use case but figured I’d ask. |
If we added import to the regular |
The issue of required or optional fields and template creation seem like they're related.
IF we assume that a re-import of data creates a new dataset, then the list of fields would be controllable by an exported file. Let’s require all fields to exist (even if the values are blank), and the documentation will note that, along with suggesting an export from a template.
If we treat option C as a new network, then it would essentially be a new network and a new template, but using existing values from an existing network for the template alone and then proceeding with the append option.
… On Dec 22, 2021, at 2:31 PM, benloh ***@***.***> wrote:
If we added import to the regular nc.js startup script, we'd probably have to make a corresponding change with nc-multiplex to make it work. What might be slightly easier would be to figure out a way to initiate a new blank db with a new name, then allow the upload via the web interface (otherwise you'd have to have direct access to the server to upload files there and import files directly, which now that I think about it, sounds like a terrible solution).
—
Reply to this email directly, view it on GitHub <#176 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACKL4NFL4DCJTBV5DX6LNPDUSIRQPANCNFSM5IIVCBSA>.
You are receiving this because you were mentioned.
|
In which case, we’d need to rethink my (bad) idea about using node labels instead of node IDs for edge imports.
… On Dec 22, 2021, at 2:47 PM, Kalani Craig ***@***.***> wrote:
The issue of required or optional fields and template creation seem like they're related.
IF we assume that a re-import of data creates a new dataset, then the list of fields would be controllable by an exported file. Let’s require all fields to exist (even if the values are blank), and the documentation will note that, along with suggesting an export from a template.
If we treat option C as a new network, then it would essentially be a new network and a new template, but using existing values from an existing network for the template alone and then proceeding with the append option.
> On Dec 22, 2021, at 2:31 PM, benloh ***@***.*** ***@***.***>> wrote:
>
>
> If we added import to the regular nc.js startup script, we'd probably have to make a corresponding change with nc-multiplex to make it work. What might be slightly easier would be to figure out a way to initiate a new blank db with a new name, then allow the upload via the web interface (otherwise you'd have to have direct access to the server to upload files there and import files directly, which now that I think about it, sounds like a terrible solution).
>
> —
> Reply to this email directly, view it on GitHub <#176 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACKL4NFL4DCJTBV5DX6LNPDUSIRQPANCNFSM5IIVCBSA>.
> You are receiving this because you were mentioned.
>
|
@kalanicraig This is getting complicated. It sounds like we should at least take a pass through the template editing design before we fully address this. But is this correct: Main Use Model
Secondary Use Model
Tertiary Use Model
In all cases, I imagine it might be useful to have a Dry Run feature where you can test the import and get a report that lists the nodes and edges that are added or replaced? If you like the Dry Run, you can then press Import do to the actual import? |
I'll defer to Kalani but wanted to clarify: if we have 2 and 3 from the tertiary model, then really the only difference between the tertiary and secondary would be doing it "all at once" in which case I think we can drop the tertiary? Or am I missing something? Thanks! |
I think the main difference in the tertiary is the addition of the template file and not modifying an existing database. |
You’re right about the main difference, and I think that makes the tertiary
model mostly unnecessary. We can handle it with documentation: create a new
blank DB with the template creation and then work with the secondary model
for import/export (its just that the export would be blank).
…On Tue, Jan 11, 2022 at 1:26 PM benloh ***@***.***> wrote:
I think the main difference in the tertiary is the template file and not
modifying an existing database.
Part of the reason I'm teasing these all out is to make sure that the
workflow is supported by whatever scheme we come up with, especially if
they require slightly different methods (e.g. creating a new db), and
biasing the design towards one model vs another (e.g. if you only rarely do
the tertiary model, then it's OK if it's a little more difficult to do).
—
Reply to this email directly, view it on GitHub
<#176 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACKL4NCJLPQJRPGOLVM3HD3UVRY3XANCNFSM5IIVCBSA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@kalanicraig
The node/edge data might be created from scratch or created by first exporting existing nodes/edges. (e.g. Main Use Model and Secondary Use Model, above)
It sounds like this is not as urgent and can be handled via other existing means for editing templates and creating new databases. (e.g. Tertiary Use Model above). While this would be a nice addition, it requires a substantial amount of rework of both netcreate and nc-multiplex. |
Correct.
… On Jan 17, 2022, at 1:46 PM, benloh ***@***.***> wrote:
@kalanicraig <https://github.com/kalanicraig>
Just confirming then, that with the emphasis on importing for the Feb 2022 pilots/tests, we want to prioritize:
Importing nodes/edges to an existing database
The node/edge data might be created from scratch or created by first exporting existing nodes/edges. (e.g. Main Use Model and Secondary Use Model, above)
Importing nodes/edges to a NEW database
It sounds like this is not as urgent and can be handled via other existing means for editing templates and creating new databases. (e.g. Tertiary Use Model above). While this would be a nice addition, it requires a substantial amount of rework of both netcreate and nc-multiplex.
—
Reply to this email directly, view it on GitHub <#176 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACKL4NDT2XQXQ52NVHRHL3LUWRPWVANCNFSM5IIVCBSA>.
You are receiving this because you were mentioned.
|
@kalanicraig @jdanish One more question about importing: What kind of restrictions should we place on importing?
On a related, note, I was assuming that we don't need to place similar restrictions on exporting. But perhaps you do want a way to lock a database too so that people can't arbitrarily export data? |
If login is required to see the network, doesn't that mean you can't get to the import tab without logging in? Also, by the way, what are you using to edit the csv files in your testing? The reason I ask is that we did a quick test and Excel appears to cause problems. Literally opening and saving a csv in excel seems to break the import even without intentionally editing. |
For a project that requires login, yes, you wouldn't see the tab. So for example, you could allow users to import if they're logged in even if they are NOT admins.
Export was broken -- it was not properly accounting for missing data, so the fields were getting shifted. It's fixed in the latest branch ( I sometimes open the file directly in VSCode, other times I use Numbers and Excel. |
I’d say let’s require login as described.
And cool, we will test again when you tell us to.
Thanks!
…----
(from my iPhone)
Joshua Danish
http://www.joshuadanish.com
On Mar 5, 2022, at 1:01 PM, benloh ***@***.***> wrote:
If login is required to see the network, doesn't that mean you can't get to the import tab without logging in?
For a project that requires login, yes, you wouldn't see the tab.
But for a project that doesn't require login, you see the tab immediately. However, we can still restrict it so that you have to still login to be able to import. We would just add an extra level of hiding: e.g. if you're not logged in, the import buttons are grayed out or missing.
what are you using to edit the csv files in your testing?
Export was broken -- it was not properly accounting for missing data, so the fields were getting shifted. It's fixed in the latest branch (import), but there's lots of other stuff that is still broken.
I sometimes open the file directly in VSCode, other times I use Numbers and Excel.
But if you look at the exported data directly in VSCode and count the number of data points vs the headers, you'll probably find that you're missing a few data points -- that's causing the data corruption.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.
|
Sorry, thinking this through some more, I can see a situation where you don't want any old user doing imports: e.g. it's the first class with 50 students, everyone's working on a shared network. You don't want some wiseass to clobber the whole network. But later on you might want to allow students to import mini-networks. This suggests that we add a Template option Or maybe I'm overthinking it? |
I like this idea a lot. Lets admins turn import on once they have a sense
of whether imports are a good or bad idea, sets up some control without
making it too complicated
…On Sat, Mar 5, 2022 at 1:32 PM benloh ***@***.***> wrote:
Sorry, thinking this through some more, I can see a situation where you
don't want any old user doing imports: e.g. it's the first class with 50
students, everyone's working on a shared network. You don't want some
wiseass to clobber the whole network.
But later on you might want to allow students to import mini-networks.
This suggests that we add a Template option allowImport. By default, it's
false and only admins can import. If it's true, then anyone logged in can
import. For a network that does not require login, the Import section is
hidden or grayed out.
Or maybe I'm overthinking it?
—
Reply to this email directly, view it on GitHub
<#176 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACKL4NHK7CO722M7BJBBMCDU6OSCHANCNFSM5IIVCBSA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
For the Feb 2022 pilots/tests, we want to prioritize:
The node/edge data might be created from scratch or created by first exporting existing nodes/edges. (e.g. Main Use Model and Secondary Use Model, above)
This may be addressed in the future. It will not be implemented at this moment as there is a workaround via manual template editing and nc-multiplex.
To Do
The text was updated successfully, but these errors were encountered: