-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export #179
Export #179
Conversation
…ile that should not be in the repo.
…teredOut'. The parameters are redundant.
…gger the graph redraw.
…TA. D3DATA is the core data. We always only draw filtered data changes.
…his is necessary for table updates to display current filtered data.
…ecrease font size, adjust column size.
…ated via radius, which in turn is already based on edge count.
… does not crowd the "More..." button.
@jdanish @kalanicraig Alright, I think the layout issues have mostly been fixed now. The whole app can probably use a relayout/cleanup, especially for narrower screens, but we'll save that for the end. Fixes:
Please give it a whirl and if things look good, let's merge this so we can move onto import. |
Kalani wrote:
|
@kalanicraig Some questions:
...you want this:
|
@kalanicraig One more question/issue on exports: For edges I believe you had said that you wanted to be able to specify the source and target nodes via the node labels rather than the node ID numbers. The problem is that we HAVE to use the ID numbers because the labels are not guaranteed to be unique, e.g. you can have two nodes named "Alexandria". If you want to be able link via labels only then, we need to:
This seems doable so long as it matches your workflow. I don't have a sense of the capabilities and limitations of the tools you're using outside of NetCreate to manipulate the data. So should we remove IDs from the edges and just use Source and Target labels AND change the editor to not allow duplicate node labels? |
Item 2 would be the preferred option so that we can do easy lookups.
We still want the edge ID to go with the edge export, but yes, the edge ID should be at the end of the line so that less technical folks are focused on the Source and Target IDs as the important key-value references for the edge table.
ID (all caps) and Label (sentence case) in the nodes field, and “Source” and “Target” in the edges field (with IDs rather than labels in those columns) are the key pieces of Gephi and Cytoscape’s imports. That’s where I fail alllllllll the time, because I’ve forgotten to label them correctly. If we can fix those on import, it’ll be a nice QoL even for technical folks. I usually title the Label columns “Source Label” and “Target Label” if I want an explicit human-readable value to go with the IDs in the edge table.
… On Dec 22, 2021, at 1:25 PM, benloh ***@***.***> wrote:
@kalanicraig <https://github.com/kalanicraig> One more question/issue on exports: For edges I believe you had said that you wanted to be able to specify the source and target nodes via the node labels rather than the node ID numbers. The problem is that we HAVE to use the ID numbers because the labels are not guaranteed to be unique, e.g. you can have two nodes named "Alexandria". If you want to be able link via labels only then, we need to:
Always check to make sure node labels are unique (e.g. when editing a node, if someone enters a duplicate label, we would need to prevent that and tell the user to enter something unique).
Change the edges export format to either also include labels with id numbers, or remove the id numbers.
This seems doable so long as it matches your workflow. I don't have a sense of the capabilities and limitations of the tools you're using outside of NetCreate to manipulate the data.
—
Reply to this email directly, view it on GitHub <#179 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACKL4NH324OZS4FZNED5CK3USIJZXANCNFSM5ILBCEHA>.
You are receiving this because you were mentioned.
|
@kalanicraig I think the challenge here is that you're talking about three different types of labeling:
For example, if we edges with "Source Label", does Gephi/Cytoscape recognize that? Don't they need it to be "Source"? Do you need the ability to designate different label mappings? Or can we keep things simpler and just choose one? (e.g. only have a Gephi/Cytoscape, not also a Human Readable? Also, I wanted to confirm that the attributes and meta information mappings are working? I wasn't sure how Gephi/Cytoscape handle those extra fields? |
Right. So, given that list, I’d privilege interoperability between Gephi and Net.Create, with a side of human readability, over full interoperability with all possible systems.
Gephi node tables require:
ID: numeric only
Label: Any
Gephi’s edge table requires:
Source: numeric ID from node table
Target: numeric ID from node table
* Gephi prefers a “Type” column in edge import that is “Directed” or “Undirected” but there’s a batch setting in the import process itself that supports users in choosing directed/undirected
Everything else that Gephi imports comes in or goes out as an attribute. If we prioritize items 1 and 2, then I imagine it would look something like this:
We highly recommend exporting from an existing database and using the output as a guide for import, with specific limits that require:
NodeID matching for Source/Target
EdgeID maintenance for existing edges that need to be modified in some way
On import, require:
NODE TABLE IMPORT:
Numeric-only “ID" and “Label" columns
All other columns on import are matched to the attributes in the template and user gets big giant warning that any columns that don’t match existing attribute values in the template won’t be imported.
EDGE TABLE IMPORT:
“Source” and “Target” numericID columns that relate to entry in nodeID table
EdgeID column with NetCreate’s edgeID value for an edge that already exists and blank for new edges
Recommended SourceLabel and TargetLabel which Net.Create import will ignore so that user can spot-check human-readable labels in import data against Net.Create import results
All other columns on import are matched to the attributes in the template and user gets big giant warning that any columns that don’t match existing attribute values in the template won’t be imported.
It’s not super human readable, but it’s got enough there that it would function with a little documentation (and now we have some of that in nascent form here to adapt)
… On Jan 17, 2022, at 2:13 PM, benloh ***@***.***> wrote:
@kalanicraig <https://github.com/kalanicraig> I think the challenge here is that you're talking about three different types of labeling:
matching our internal code representation
matching Gephi/Cytoscape labeling
making the label human readable.
For example, if we edges with "Source Label", does Gephi/Cytoscape recognize that? Don't they need it to be "Source"?
Do you need the ability to designate different label mappings? Or can we keep things simpler and just choose one? (e.g. only have a Gephi/Cytoscape, not also a Human Readable?
—
Reply to this email directly, view it on GitHub <#179 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACKL4NBKH4NTILWZ5R56NTLUWRS6RANCNFSM5ILBCEHA>.
You are receiving this because you were mentioned.
|
@kalanicraig I think we probably need to make some of this editable via the template too. Can you send me a prototypical Gephi export? I know you've sent one before, but let's start fresh with an real-world use case. I'm especially confused by the Edge Type. Should we be representing that internally as well? Popping up a level, it kind of seems like the idea case is that are able to import and export in Gephi format rather than some Net.Create proprietary format? |
@kalanicraig @jdanish This may have gotten lost in the slew of updates/emails I was sending: I think we probably need to make some of this editable via the template too. Can you send me a prototypical Gephi export? I know you've sent one before, but let's start fresh with an real-world use case. I'd like to see what the raw file looks like (I'm assuming it's csv). I'm especially confused by the Edge Type field. Should we be representing that internally as well? Popping up a level, it kind of seems like the ideal case is that we are able to import and export in Gephi format rather than some Net.Create proprietary format? |
Hi! I'm attaching an Excel file that was used to import into Gephi as well
as the export CSVs that came from an export of that network from Gephi.
Bonus points for mixed character sets and some double quotes.
I also used this Excel file to concatenate JSON lines for the Nodes and
Edges but that document is in a colleague's OneDrive and I can't get to it
right now.
…On Wed, Jan 19, 2022 at 12:58 PM benloh ***@***.***> wrote:
@kalanicraig <https://github.com/kalanicraig> @jdanish
<https://github.com/jdanish> This may have gotten lost in the slew of
updates/emails I was sending:
I think we probably need to make some of this editable via the template
too.
Can you send me a prototypical Gephi export? I know you've sent one
before, but let's start fresh with an real-world use case. I'd like to see
what the raw file looks like (I'm assuming it's csv).
I'm especially confused by the Edge Type field. Should we be representing
that internally as well?
Popping up a level, it kind of seems like the ideal case is that we are
able to import and export in Gephi format rather than some Net.Create
proprietary format?
—
Reply to this email directly, view it on GitHub
<#179 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACKL4NAX7SCHKUGNY3TBTMLUW33UJANCNFSM5ILBCEHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
…laced by a template definition.
@kalanicraig Thanks! Unfortunately replying to github doesn't attach the file. You'll have to eithe log into github and attach it there or you can just email me directly. Thanks! |
Merging export for now. |
IMPORTANT: Merge #169 before merging this!
To Do
Collapsible Search/Node-- Won't do, not necessary (maybe for future)This implements #177.
Branch:
dev-bl/export
This adds the ability to Export and download a CSV file to a local file.
Prototype Implementation
Currently this takes a preliminary approach to exporting. We can work out exactly how you want this to work.
To Test
git fetch; git checkout dev-bl/export
./nc.js --dataset=junk
http://localhost:3000
Where to put the Export button?
There is now an "Export" button in the "Help" tab.
I didn't want to add another tab because we already had so many tabs and on narrower windows the tabs do not flow well. We can revisit where to put the button
What is exported?
To keep things simple, we currently just export whatever is being drawn on the graph. Any nodes/edges that are filtered are not exported. Any nodes/edges that are highlighted ARE exported but there is no marker to indicate that they are highlighted.
We can add a second button "EXPORT FILTERED DATA" to "EXPORT DATA" if you think we need that distinction.
What is the export format?
Rather than exporting two separate files, I've put all the data in one file, something like this:
It seems more convenient to be able to keep things in a single file. You can easily copy and paste to move them as needed.
If they should be exported as two separate files, let me know.
What is the CSV format?
Currently we're just building up the nodes and edges data based on the data in the database. The specific fields that are exported can easily be configured via the code. But right now there is no end-user customization available.
It's not clear we want to support end-user customization as that can get pretty complicated. The issue is that the data is not a simple flat table but consists of nested fields, so creating the index for the fields is something only an expert should do.
How to handle 'attributes' fields?
Our file format uses nested attributes fields for each node/edge record. I believe this came from whichever package we were originally importing the data from (Gephy?). e.g. a node record might look like this:
When exporting the data, we flatten out the data so that everything can fit within a single record. We do this by adding the 'attributes' tag to the field names, e.g. we have
attributes:Node_Type
, attributes:Extra Info, and
attributes:Notes`, e.g.:Let me know if you need a different way of handling that data.
How are source and targets referenced in Edge nodes?
Currently, we're using numeric IDs to reference source and targets in edges. We can also add labels if needed, but that does get more complicated. Using ids seemed like the most efficient format.
Let me know if you need labels as well.
How to handle Commas, Quotes, and Special Characters?
To keep things simple, right now all fields are wrapped in double quotes ("") which should theoretically support commas in descriptions. We can strip out other characters if needed, but I wasn't sure how aggressive we should be.
The critical symbols are probably:
I expect to tackle the encoding issues later.
How to set the default filename?
By default, the filename is '_export.csv'. We can use a different default filename if you prefer.
How to display date and time?
Both nodes and edges keep track of
created
andupdated
date and time information. We export the created and updated dates in UTC format. We can relatively easily select a different format if you prefer. Or of course we can remove dates altogether.Any other issues?