Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export #179

Merged
merged 68 commits into from
Jan 19, 2022
Merged

Export #179

merged 68 commits into from
Jan 19, 2022

Conversation

benloh
Copy link
Collaborator

@benloh benloh commented Nov 19, 2021

IMPORTANT: Merge #169 before merging this!

To Do

  • Export as two files dataset_nodes.csv / dataset_edges.csv
  • Combine "Vocabulary" and "Help" as "More"
  • Add TOC for "More"
  • Collapsible Search/Node -- Won't do, not necessary (maybe for future)
  • Autocollapse Search/Node when Filter is opened for smaller screens
  • Filter display in collapsible right panel

This implements #177.

Branch: dev-bl/export

This adds the ability to Export and download a CSV file to a local file.

Prototype Implementation

Currently this takes a preliminary approach to exporting. We can work out exactly how you want this to work.

To Test

  1. git fetch; git checkout dev-bl/export
  2. Start NetCreate with your own data, e.g. ./nc.js --dataset=junk
  3. Go to http://localhost:3000
  4. Log in if necessary
  5. Click the "Help" tab
  6. Click the "Export" button.
  7. Set the download name.
  8. Click "Save" to save the CSV file to your local hard drive.
  9. Open the CSV in Excel

Where to put the Export button?

There is now an "Export" button in the "Help" tab.

I didn't want to add another tab because we already had so many tabs and on narrower windows the tabs do not flow well. We can revisit where to put the button

What is exported?

To keep things simple, we currently just export whatever is being drawn on the graph. Any nodes/edges that are filtered are not exported. Any nodes/edges that are highlighted ARE exported but there is no marker to indicate that they are highlighted.

We can add a second button "EXPORT FILTERED DATA" to "EXPORT DATA" if you think we need that distinction.

What is the export format?

Rather than exporting two separate files, I've put all the data in one file, something like this:

NODES
<node headers>
<nodes>

EDGES
<edge headers>
<edges>

It seems more convenient to be able to keep things in a single file. You can easily copy and paste to move them as needed.

If they should be exported as two separate files, let me know.

What is the CSV format?

Currently we're just building up the nodes and edges data based on the data in the database. The specific fields that are exported can easily be configured via the code. But right now there is no end-user customization available.

It's not clear we want to support end-user customization as that can get pretty complicated. The issue is that the data is not a simple flat table but consists of nested fields, so creating the index for the fields is something only an expert should do.

How to handle 'attributes' fields?

Our file format uses nested attributes fields for each node/edge record. I believe this came from whichever package we were originally importing the data from (Gephy?). e.g. a node record might look like this:

node = {
  id,
  label,
  attributes: {
    Node_Type,
    Extra Info,
    Notes,
  }
}

When exporting the data, we flatten out the data so that everything can fit within a single record. We do this by adding the 'attributes' tag to the field names, e.g. we have attributes:Node_Type, attributes:Extra Info, and attributes:Notes`, e.g.:

node = [
  id,
  label,
  attributes:Node_Type,
  attributes:Extra Info,
  attributes:Notes
]

Let me know if you need a different way of handling that data.

How are source and targets referenced in Edge nodes?

Currently, we're using numeric IDs to reference source and targets in edges. We can also add labels if needed, but that does get more complicated. Using ids seemed like the most efficient format.

Let me know if you need labels as well.

How to handle Commas, Quotes, and Special Characters?

To keep things simple, right now all fields are wrapped in double quotes ("") which should theoretically support commas in descriptions. We can strip out other characters if needed, but I wasn't sure how aggressive we should be.

The critical symbols are probably:

  • commas
  • double quotes -- depending on the application reading the final csv, we might be able to replace double quotes with single quotes, or otherwise encode them.
  • line feeds/carriage returns -- how do you want to preserve these? line feeds are the record delimiters in CSV.
  • control characters

I expect to tackle the encoding issues later.

How to set the default filename?

By default, the filename is '_export.csv'. We can use a different default filename if you prefer.

How to display date and time?

Both nodes and edges keep track of created and updated date and time information. We export the created and updated dates in UTC format. We can relatively easily select a different format if you prefer. Or of course we can remove dates altogether.

Thu, 11 Nov 2021 23:56:54 GMT

Any other issues?

jdanish and others added 30 commits February 4, 2021 09:28
…TA. D3DATA is the core data. We always only draw filtered data changes.
…his is necessary for table updates to display current filtered data.
…ated via radius, which in turn is already based on edge count.
@benloh
Copy link
Collaborator Author

benloh commented Dec 18, 2021

@jdanish @kalanicraig Alright, I think the layout issues have mostly been fixed now. The whole app can probably use a relayout/cleanup, especially for narrower screens, but we'll save that for the end.

Fixes:

  • FILTER button no longer crows "More..." button
  • Node and Edge Table resizer correct position has been restored (the drag position was off by 40px)
  • The Node and Edge Table heights are now correct (the scrollbar would extend beyond the bottom of the visible area by 40px).
  • The Node and Edge Table headers now stay fixed as the table body scrolls.
  • The Search label "Type to search or add a node:" no longer crowds the "Add New Node" button.

Please give it a whirl and if things look good, let's merge this so we can move onto import.

@benloh
Copy link
Collaborator Author

benloh commented Dec 20, 2021

Kalani wrote:

I tested export several times and the only thing I see happening is with the ID label in the nodes and edges export. The nodes “id” field needs to be uppercase “ID” and the ID field for the edges should be at the end of the export column list. The Source and Target ID fields in the edge table need to be labeled with the sentence case.

@benloh
Copy link
Collaborator Author

benloh commented Dec 21, 2021

@kalanicraig Some questions:

The nodes “id” field needs to be uppercase “ID”
Do you mean the header needs to say "ID"?

the ID field for the edges should be at the end of the export column list
Here you're saying that for exported edges, you want the ID field to be the last item in the list? e.g.
For ID = 595, instead of this:

// OLD
"595","The Briber","","","","0","Fri, 04 Sep 2020 13:53:39 GMT",

...you want this:

// NEW
"The Briber","","","","0","Fri, 04 Sep 2020 13:53:39 GMT","595"
```

> The Source and Target ID fields in the edge table need to be labeled with the sentence case.
I'm not sure what you're referring to here.  Are you saying in the exported edges csv the header should read `Source` and `Target` instead of the current `source` and `target`?  


@netcreateorg netcreateorg deleted a comment from Kalani Dec 21, 2021
@benloh
Copy link
Collaborator Author

benloh commented Dec 22, 2021

@kalanicraig One more question/issue on exports: For edges I believe you had said that you wanted to be able to specify the source and target nodes via the node labels rather than the node ID numbers. The problem is that we HAVE to use the ID numbers because the labels are not guaranteed to be unique, e.g. you can have two nodes named "Alexandria". If you want to be able link via labels only then, we need to:

  1. Always check to make sure node labels are unique (e.g. when editing a node, if someone enters a duplicate label, we would need to prevent that and tell the user to enter something unique).
  2. Change the edges export format to either also include labels with id numbers, or remove the id numbers.

This seems doable so long as it matches your workflow. I don't have a sense of the capabilities and limitations of the tools you're using outside of NetCreate to manipulate the data.

So should we remove IDs from the edges and just use Source and Target labels AND change the editor to not allow duplicate node labels?

@kalanicraig
Copy link
Collaborator

kalanicraig commented Dec 22, 2021 via email

@benloh
Copy link
Collaborator Author

benloh commented Jan 17, 2022

@kalanicraig I think the challenge here is that you're talking about three different types of labeling:

  1. matching our internal code representation
  2. matching Gephi/Cytoscape labeling
  3. making the label human readable.

For example, if we edges with "Source Label", does Gephi/Cytoscape recognize that? Don't they need it to be "Source"?

Do you need the ability to designate different label mappings? Or can we keep things simpler and just choose one? (e.g. only have a Gephi/Cytoscape, not also a Human Readable?


Also, I wanted to confirm that the attributes and meta information mappings are working? I wasn't sure how Gephi/Cytoscape handle those extra fields?

@kalanicraig
Copy link
Collaborator

kalanicraig commented Jan 17, 2022 via email

@benloh
Copy link
Collaborator Author

benloh commented Jan 17, 2022

@kalanicraig I think we probably need to make some of this editable via the template too.

Can you send me a prototypical Gephi export? I know you've sent one before, but let's start fresh with an real-world use case.

I'm especially confused by the Edge Type. Should we be representing that internally as well?

Popping up a level, it kind of seems like the idea case is that are able to import and export in Gephi format rather than some Net.Create proprietary format?

@benloh
Copy link
Collaborator Author

benloh commented Jan 19, 2022

@kalanicraig @jdanish This may have gotten lost in the slew of updates/emails I was sending:

I think we probably need to make some of this editable via the template too.

Can you send me a prototypical Gephi export? I know you've sent one before, but let's start fresh with an real-world use case. I'd like to see what the raw file looks like (I'm assuming it's csv).

I'm especially confused by the Edge Type field. Should we be representing that internally as well?

Popping up a level, it kind of seems like the ideal case is that we are able to import and export in Gephi format rather than some Net.Create proprietary format?

@benloh benloh mentioned this pull request Jan 19, 2022
38 tasks
@kalanicraig
Copy link
Collaborator

kalanicraig commented Jan 19, 2022 via email

@benloh
Copy link
Collaborator Author

benloh commented Jan 19, 2022

@kalanicraig Thanks! Unfortunately replying to github doesn't attach the file. You'll have to eithe log into github and attach it there or you can just email me directly. Thanks!

@benloh
Copy link
Collaborator Author

benloh commented Jan 19, 2022

Merging export for now.
This has a hacked in override for defining headers, e.g. id is exported as ID. This will be replaced with a template definition with #175.

@benloh benloh merged commit 3ed097f into dev Jan 19, 2022
Version 1.4 automation moved this from In Review to Done Jan 19, 2022
@benloh benloh deleted the dev-bl/export branch January 22, 2022 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants