Export #179

benloh · 2021-11-19T01:45:11Z

IMPORTANT: Merge #169 before merging this!

To Do

Export as two files dataset_nodes.csv / dataset_edges.csv
Combine "Vocabulary" and "Help" as "More"
Add TOC for "More"
~~Collapsible Search/Node~~ -- Won't do, not necessary (maybe for future)
Autocollapse Search/Node when Filter is opened for smaller screens
Filter display in collapsible right panel

This implements #177.

Branch: dev-bl/export

This adds the ability to Export and download a CSV file to a local file.

Prototype Implementation

Currently this takes a preliminary approach to exporting. We can work out exactly how you want this to work.

To Test

git fetch; git checkout dev-bl/export
Start NetCreate with your own data, e.g. ./nc.js --dataset=junk
Go to http://localhost:3000
Log in if necessary
Click the "Help" tab
Click the "Export" button.
Set the download name.
Click "Save" to save the CSV file to your local hard drive.
Open the CSV in Excel

Where to put the Export button?

There is now an "Export" button in the "Help" tab.

I didn't want to add another tab because we already had so many tabs and on narrower windows the tabs do not flow well. We can revisit where to put the button

What is exported?

To keep things simple, we currently just export whatever is being drawn on the graph. Any nodes/edges that are filtered are not exported. Any nodes/edges that are highlighted ARE exported but there is no marker to indicate that they are highlighted.

We can add a second button "EXPORT FILTERED DATA" to "EXPORT DATA" if you think we need that distinction.

What is the export format?

Rather than exporting two separate files, I've put all the data in one file, something like this:

NODES
<node headers>
<nodes>

EDGES
<edge headers>
<edges>

It seems more convenient to be able to keep things in a single file. You can easily copy and paste to move them as needed.

If they should be exported as two separate files, let me know.

What is the CSV format?

Currently we're just building up the nodes and edges data based on the data in the database. The specific fields that are exported can easily be configured via the code. But right now there is no end-user customization available.

It's not clear we want to support end-user customization as that can get pretty complicated. The issue is that the data is not a simple flat table but consists of nested fields, so creating the index for the fields is something only an expert should do.

How to handle 'attributes' fields?

Our file format uses nested attributes fields for each node/edge record. I believe this came from whichever package we were originally importing the data from (Gephy?). e.g. a node record might look like this:

node = {
  id,
  label,
  attributes: {
    Node_Type,
    Extra Info,
    Notes,
  }
}

When exporting the data, we flatten out the data so that everything can fit within a single record. We do this by adding the 'attributes' tag to the field names, e.g. we have attributes:Node_Type, attributes:Extra Info, and attributes:Notes`, e.g.:

node = [
  id,
  label,
  attributes:Node_Type,
  attributes:Extra Info,
  attributes:Notes
]

Let me know if you need a different way of handling that data.

How are source and targets referenced in Edge nodes?

Currently, we're using numeric IDs to reference source and targets in edges. We can also add labels if needed, but that does get more complicated. Using ids seemed like the most efficient format.

Let me know if you need labels as well.

How to handle Commas, Quotes, and Special Characters?

To keep things simple, right now all fields are wrapped in double quotes ("") which should theoretically support commas in descriptions. We can strip out other characters if needed, but I wasn't sure how aggressive we should be.

The critical symbols are probably:

commas
double quotes -- depending on the application reading the final csv, we might be able to replace double quotes with single quotes, or otherwise encode them.
line feeds/carriage returns -- how do you want to preserve these? line feeds are the record delimiters in CSV.
control characters

I expect to tackle the encoding issues later.

How to set the default filename?

By default, the filename is '_export.csv'. We can use a different default filename if you prefer.

How to display date and time?

Both nodes and edges keep track of created and updated date and time information. We export the created and updated dates in UTC format. We can relatively easily select a different format if you prefer. Or of course we can remove dates altogether.

Thu, 11 Nov 2021 23:56:54 GMT

Any other issues?

Dev

…ile that should not be in the repo.

…teredOut'. The parameters are redundant.

…tom methods)

… button.

…gger the graph redraw.

…TA. D3DATA is the core data. We always only draw filtered data changes.

…filtering.

…set.

…his is necessary for table updates to display current filtered data.

…ecrease font size, adjust column size.

…ated via radius, which in turn is already based on edge count.

…oll offscreen.

… does not crowd the "More..." button.

benloh · 2021-12-18T00:17:04Z

@jdanish @kalanicraig Alright, I think the layout issues have mostly been fixed now. The whole app can probably use a relayout/cleanup, especially for narrower screens, but we'll save that for the end.

Fixes:

FILTER button no longer crows "More..." button
Node and Edge Table resizer correct position has been restored (the drag position was off by 40px)
The Node and Edge Table heights are now correct (the scrollbar would extend beyond the bottom of the visible area by 40px).
The Node and Edge Table headers now stay fixed as the table body scrolls.
The Search label "Type to search or add a node:" no longer crowds the "Add New Node" button.

Please give it a whirl and if things look good, let's merge this so we can move onto import.

benloh · 2021-12-20T20:01:45Z

Kalani wrote:

I tested export several times and the only thing I see happening is with the ID label in the nodes and edges export. The nodes “id” field needs to be uppercase “ID” and the ID field for the edges should be at the end of the export column list. The Source and Target ID fields in the edge table need to be labeled with the sentence case.

benloh · 2021-12-21T18:03:19Z

@kalanicraig Some questions:

The nodes “id” field needs to be uppercase “ID”
Do you mean the header needs to say "ID"?

the ID field for the edges should be at the end of the export column list
Here you're saying that for exported edges, you want the ID field to be the last item in the list? e.g.
For ID = 595, instead of this:

// OLD
"595","The Briber","","","","0","Fri, 04 Sep 2020 13:53:39 GMT",

...you want this:

// NEW
"The Briber","","","","0","Fri, 04 Sep 2020 13:53:39 GMT","595"
```

> The Source and Target ID fields in the edge table need to be labeled with the sentence case.
I'm not sure what you're referring to here.  Are you saying in the exported edges csv the header should read `Source` and `Target` instead of the current `source` and `target`?

benloh · 2021-12-22T18:25:21Z

@kalanicraig One more question/issue on exports: For edges I believe you had said that you wanted to be able to specify the source and target nodes via the node labels rather than the node ID numbers. The problem is that we HAVE to use the ID numbers because the labels are not guaranteed to be unique, e.g. you can have two nodes named "Alexandria". If you want to be able link via labels only then, we need to:

Always check to make sure node labels are unique (e.g. when editing a node, if someone enters a duplicate label, we would need to prevent that and tell the user to enter something unique).
Change the edges export format to either also include labels with id numbers, or remove the id numbers.

This seems doable so long as it matches your workflow. I don't have a sense of the capabilities and limitations of the tools you're using outside of NetCreate to manipulate the data.

So should we remove IDs from the edges and just use Source and Target labels AND change the editor to not allow duplicate node labels?

kalanicraig · 2021-12-22T19:54:19Z

Item 2 would be the preferred option so that we can do easy lookups. We still want the edge ID to go with the edge export, but yes, the edge ID should be at the end of the line so that less technical folks are focused on the Source and Target IDs as the important key-value references for the edge table. ID (all caps) and Label (sentence case) in the nodes field, and “Source” and “Target” in the edges field (with IDs rather than labels in those columns) are the key pieces of Gephi and Cytoscape’s imports. That’s where I fail alllllllll the time, because I’ve forgotten to label them correctly. If we can fix those on import, it’ll be a nice QoL even for technical folks. I usually title the Label columns “Source Label” and “Target Label” if I want an explicit human-readable value to go with the IDs in the edge table.

…

On Dec 22, 2021, at 1:25 PM, benloh ***@***.***> wrote: @kalanicraig <https://github.com/kalanicraig> One more question/issue on exports: For edges I believe you had said that you wanted to be able to specify the source and target nodes via the node labels rather than the node ID numbers. The problem is that we HAVE to use the ID numbers because the labels are not guaranteed to be unique, e.g. you can have two nodes named "Alexandria". If you want to be able link via labels only then, we need to: Always check to make sure node labels are unique (e.g. when editing a node, if someone enters a duplicate label, we would need to prevent that and tell the user to enter something unique). Change the edges export format to either also include labels with id numbers, or remove the id numbers. This seems doable so long as it matches your workflow. I don't have a sense of the capabilities and limitations of the tools you're using outside of NetCreate to manipulate the data. — Reply to this email directly, view it on GitHub <#179 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACKL4NH324OZS4FZNED5CK3USIJZXANCNFSM5ILBCEHA>. You are receiving this because you were mentioned.

benloh · 2022-01-17T19:13:33Z

@kalanicraig I think the challenge here is that you're talking about three different types of labeling:

matching our internal code representation
matching Gephi/Cytoscape labeling
making the label human readable.

For example, if we edges with "Source Label", does Gephi/Cytoscape recognize that? Don't they need it to be "Source"?

Do you need the ability to designate different label mappings? Or can we keep things simpler and just choose one? (e.g. only have a Gephi/Cytoscape, not also a Human Readable?

Also, I wanted to confirm that the attributes and meta information mappings are working? I wasn't sure how Gephi/Cytoscape handle those extra fields?

kalanicraig · 2022-01-17T20:43:31Z

Right. So, given that list, I’d privilege interoperability between Gephi and Net.Create, with a side of human readability, over full interoperability with all possible systems. Gephi node tables require: ID: numeric only Label: Any Gephi’s edge table requires: Source: numeric ID from node table Target: numeric ID from node table * Gephi prefers a “Type” column in edge import that is “Directed” or “Undirected” but there’s a batch setting in the import process itself that supports users in choosing directed/undirected Everything else that Gephi imports comes in or goes out as an attribute. If we prioritize items 1 and 2, then I imagine it would look something like this: We highly recommend exporting from an existing database and using the output as a guide for import, with specific limits that require: NodeID matching for Source/Target EdgeID maintenance for existing edges that need to be modified in some way On import, require: NODE TABLE IMPORT: Numeric-only “ID" and “Label" columns All other columns on import are matched to the attributes in the template and user gets big giant warning that any columns that don’t match existing attribute values in the template won’t be imported. EDGE TABLE IMPORT: “Source” and “Target” numericID columns that relate to entry in nodeID table EdgeID column with NetCreate’s edgeID value for an edge that already exists and blank for new edges Recommended SourceLabel and TargetLabel which Net.Create import will ignore so that user can spot-check human-readable labels in import data against Net.Create import results All other columns on import are matched to the attributes in the template and user gets big giant warning that any columns that don’t match existing attribute values in the template won’t be imported. It’s not super human readable, but it’s got enough there that it would function with a little documentation (and now we have some of that in nascent form here to adapt)

…

On Jan 17, 2022, at 2:13 PM, benloh ***@***.***> wrote: @kalanicraig <https://github.com/kalanicraig> I think the challenge here is that you're talking about three different types of labeling: matching our internal code representation matching Gephi/Cytoscape labeling making the label human readable. For example, if we edges with "Source Label", does Gephi/Cytoscape recognize that? Don't they need it to be "Source"? Do you need the ability to designate different label mappings? Or can we keep things simpler and just choose one? (e.g. only have a Gephi/Cytoscape, not also a Human Readable? — Reply to this email directly, view it on GitHub <#179 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACKL4NBKH4NTILWZ5R56NTLUWRS6RANCNFSM5ILBCEHA>. You are receiving this because you were mentioned.

benloh · 2022-01-17T20:57:48Z

@kalanicraig I think we probably need to make some of this editable via the template too.

Can you send me a prototypical Gephi export? I know you've sent one before, but let's start fresh with an real-world use case.

I'm especially confused by the Edge Type. Should we be representing that internally as well?

Popping up a level, it kind of seems like the idea case is that are able to import and export in Gephi format rather than some Net.Create proprietary format?

benloh · 2022-01-19T17:58:15Z

@kalanicraig @jdanish This may have gotten lost in the slew of updates/emails I was sending:

I think we probably need to make some of this editable via the template too.

Can you send me a prototypical Gephi export? I know you've sent one before, but let's start fresh with an real-world use case. I'd like to see what the raw file looks like (I'm assuming it's csv).

I'm especially confused by the Edge Type field. Should we be representing that internally as well?

Popping up a level, it kind of seems like the ideal case is that we are able to import and export in Gephi format rather than some Net.Create proprietary format?

kalanicraig · 2022-01-19T18:10:17Z

Hi! I'm attaching an Excel file that was used to import into Gephi as well as the export CSVs that came from an export of that network from Gephi. Bonus points for mixed character sets and some double quotes. I also used this Excel file to concatenate JSON lines for the Nodes and Edges but that document is in a colleague's OneDrive and I can't get to it right now.

…

On Wed, Jan 19, 2022 at 12:58 PM benloh ***@***.***> wrote: @kalanicraig <https://github.com/kalanicraig> @jdanish <https://github.com/jdanish> This may have gotten lost in the slew of updates/emails I was sending: I think we probably need to make some of this editable via the template too. Can you send me a prototypical Gephi export? I know you've sent one before, but let's start fresh with an real-world use case. I'd like to see what the raw file looks like (I'm assuming it's csv). I'm especially confused by the Edge Type field. Should we be representing that internally as well? Popping up a level, it kind of seems like the ideal case is that we are able to import and export in Gephi format rather than some Net.Create proprietary format? — Reply to this email directly, view it on GitHub <#179 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACKL4NAX7SCHKUGNY3TBTMLUW33UJANCNFSM5ILBCEHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

…laced by a template definition.

benloh · 2022-01-19T18:18:59Z

@kalanicraig Thanks! Unfortunately replying to github doesn't attach the file. You'll have to eithe log into github and attach it there or you can just email me directly. Thanks!

benloh · 2022-01-19T18:24:48Z

Merging export for now.
This has a hacked in override for defining headers, e.g. id is exported as ID. This will be replaced with a template definition with #175.

jdanish and others added 30 commits February 4, 2021 09:28

Merge pull request #162 from netcreateorg/dev

86ce3d1

Dev

filter: Stop tracking netcreate-config.js. This is an autogenerated f…

e2c1bdf

…ile that should not be in the repo.

filter: Move comment to the correct code block.

23ee166

filter: lint - disable complexity complaint.

7287873

filter: Rename "Filters" panel to "Highlight"

a47e7ee

filter: Add Filter tab.

15f84f2

filter: Add 'filterAction' to Filter components.

3288d1d

filter: Show different help text based on filter action.

ac62e84

filter: Disallow input if filter operator is not yet selected.

79a8bf8

filters: Just use 'filteredTransparency' instead of also using 'isFil…

c8426b5

…teredOut'. The parameters are redundant.

filters: Move 'componentWillUnmount' call to proper order (before cus…

a97cce6

…tom methods)

filters: Combine Highlight and Filter into a single tab with a toggle…

6bed8f2

… button.

filters: Update FILTERED_D3DATA when D3DATA is updated. This will tri…

3d43e0b

…gger the graph redraw.

filters: Only do graph redraw on FILTERED_D3DATA changes, not on D3DA…

b6f67c8

…TA. D3DATA is the core data. We always only draw filtered data changes.

filter: Rewrite m_FiltersApplyTo* to handle either highlighting or …

60f9245

…filtering.

filter: Clarify filter behavior.

45fc055

filter: Fix InfoPanel tab indices.

4b0cf18

filter: Disable select filter value if NO_OP.

a87673e

filter: Specify summary type. Remove summary field if no filters are …

5285d60

…set.

filter: Lint/clean up formatting.

59d36b3

filter: Change FILTERED_D3DATA message to FILTEREDD3DATA app state. T…

26454d6

…his is necessary for table updates to display current filtered data.

filter: Lint

2dde6a2

filter: Improve NodeTable and EdgeTable layout. Tighten up spacing, d…

8b64258

…ecrease font size, adjust column size.

filter: Remove edgeCount code, replaced by degrees, which is calcul…

1d30df4

…ated via radius, which in turn is already based on edge count.

filter: Lint/doc.

a246604

filter: Remove stray edgeCount refernces.

32f5196

filter: Show highlighted/filtered state in NodeTable.

897f553

filter: Show highlighted/filtered state in EdgeTable

2588af7

filter: Doc

ced9393

filter: Prevent "Enter" on Filters from submitting form. Addresses #172.

6e491d5

benloh added 11 commits December 17, 2021 09:56

export: Lint fixes.

9cb693c

export: Remove unused bIgnoreTableUpdates hack.

d73d447

export: Fix InfoPanel dragger height calculation. Addresses #182.

733d202

export: Use 'const' not 'let'

5250199

export: Lint fixes.

5f2d8de

export: Make NodeTable and EdgeTable headers sticky so they don't scr…

fed9445

…oll offscreen.

export: Remove Vocabular from InfoPanel

ab69101

export: Fix Search label overlapping "Add Node" button. Addresses #178

e8915b9

export: InfoPanel tab buttons are now narrower so the "FILTER" button…

7ae2478

… does not crowd the "More..." button.

export: Lint

112dba8

export: Clear DBG state.

bb9ed70

netcreateorg deleted a comment from Kalani Dec 21, 2021

benloh mentioned this pull request Jan 19, 2022

Improve Template Editing #175

Open

38 tasks

export: Hacky override export headers. This will be eventually be rep…

6f2a1ed

…laced by a template definition.

benloh merged commit 3ed097f into dev Jan 19, 2022

Version 1.4 automation moved this from In Review to Done Jan 19, 2022

benloh mentioned this pull request Jan 20, 2022

Gephi Export/Import Format #190

Open

benloh deleted the dev-bl/export branch January 22, 2022 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export #179

Export #179

benloh commented Nov 19, 2021 •

edited

Loading

benloh commented Dec 18, 2021

benloh commented Dec 20, 2021

benloh commented Dec 21, 2021

benloh commented Dec 22, 2021 •

edited

Loading

kalanicraig commented Dec 22, 2021 via email

benloh commented Jan 17, 2022 •

edited

Loading

kalanicraig commented Jan 17, 2022 via email

benloh commented Jan 17, 2022

benloh commented Jan 19, 2022

kalanicraig commented Jan 19, 2022 via email

benloh commented Jan 19, 2022

benloh commented Jan 19, 2022

Export #179

Export #179

Conversation

benloh commented Nov 19, 2021 • edited Loading

IMPORTANT: Merge #169 before merging this!

To Do

Prototype Implementation

To Test

Where to put the Export button?

What is exported?

What is the export format?

What is the CSV format?

How to handle 'attributes' fields?

How are source and targets referenced in Edge nodes?

How to handle Commas, Quotes, and Special Characters?

How to set the default filename?

How to display date and time?

benloh commented Dec 18, 2021

benloh commented Dec 20, 2021

benloh commented Dec 21, 2021

benloh commented Dec 22, 2021 • edited Loading

kalanicraig commented Dec 22, 2021 via email

benloh commented Jan 17, 2022 • edited Loading

kalanicraig commented Jan 17, 2022 via email

benloh commented Jan 17, 2022

benloh commented Jan 19, 2022

kalanicraig commented Jan 19, 2022 via email

benloh commented Jan 19, 2022

benloh commented Jan 19, 2022

benloh commented Nov 19, 2021 •

edited

Loading

benloh commented Dec 22, 2021 •

edited

Loading

benloh commented Jan 17, 2022 •

edited

Loading