Skip to content
This repository has been archived by the owner on Jul 18, 2022. It is now read-only.

Importing Data

Benjamin Bach edited this page Aug 2, 2021 · 11 revisions

The Vistorian has two ways of importing data.

  1. Basic: you need to create a node table, edge table, node schema, and link schema. This is explained at the section Home.
  2. Through importers that can read a specific data format. Eventually, all importers use the basic import above, which acts as a bottleneck format.

Basic import

If data is loaded other than through one of the importers, a DataSet object must be created manually. A dataset object looks as follows:

var dataset = new networkcube.DataSet({
    name: 'myAwesomeNetwork',
    nodeTable: [[..some data..]], // node table here
    linkTable: [[..some data..]], // edge table here
    nodeSchema: {id:0, label:1}, // some node schema
    linkSchema: {id:0, source:1, target:2} // some link schema
});

All these information are required and mandatory. They guarantee that the network is in minimal format. For basic data import, it takes two tables:

  • a node table that contains one row per node with its respective attributes, and
  • a link table that contains one row per link with its respective attributes.

Once the DataSet object is created, it can be imported via:

    var session= 'someSessionStringHere;
    window.vc.main.importData(session, dataSet);

Again, this method is not recommended for use, unless there is no suitable importer function.

Node and Link Schemas

The respective node and link schemas are simple JSON objects that map columns to data attributes

For example,

var nodeSchema = 
{
   id:0 
}
var linkSchema = 
{
   id:0, 
   source:1, 
   target:2 
}

The above code creates two schemas assigning the required attributes for each type, node and link.

You can add optional properties to any schema, e.g. an age to a node, or some other value to a link; just specify the attribute in the respective schema and make sure the table column is valid. The property will be added to the schema.

Node Schema

A node schema can have the following attributes:

  • id (required) - Node ID. Must be unique, start with 0, and running.
  • label (optional) - Label shown for this node. If not provided, node ID should be shown.
  • location (optional) - a node's geographic location. Can change over time.
  • time (optional) - associates a time stamp to this node. Attributes in this row are valid only at this time stamp. This can be used to, e.g. change a node's type or label over time.
  • nodeType (optional) - a node's type

Link Schema

A link schema can have the following attributes:

  • id (required) - Node ID. Must be unique, start with 0, and running.
  • source (required) - ID of source node
  • target (required) - ID of target node
  • time (optional) - associates a time stamp to this node. Attributes in this row are valid only at this time stamp. This can be used to, e.g. change a node's type or label over time.
  • weight (optional) - an edge's weight. Can change with time
  • linkType (optional) - a link's type. Can change over time
  • directed (optional) - if a link is directed or not. If not explicitly indicated a link is seen as directed.

Table Formatting

Tables used in the basic import have to be normalized and must link node, link, and location tables through IDs. That means:

  • Each table's first column is an id (e.g., a node id, link id, location id).
  • Ids start from 0.
  • source and target (in linktable) contain ids in the node table.
  • location (in nodetable) contains ids that link to a location table.

Importing data in this way helps keeping the memory load small and load bigger networks.

Importers

Using the basic import is recommended only if no proper importer is available. The Vistorian provides functions for some common data formats.

LoadLinkTable()

networkcube.loadLinkTable(url, callBack, linkSchema, delimiter, timeFormat?)`

This function reads a table in link format (each row is a link) with table columns indicating link attributes such as source, target, weight, time, etc. A link's weight at different time points must be indicated by another row. Below an example of a CSV formatted file, but the file ending can be anything.

    source, target, weight, time, type
    Ana, Bob, 4, 2010, letters
    Ana, Bob, 10, 2010, visits  
    Cyril, Ana, 10, 2009, visits

The function must deliver a linkSchema, which is a simple json object specifying which column of the input table maps to which attribute. For the example above, the linkschema would look as follows:

    {
        source: 0,
        target: 1,
        weight: 2,
        linkType: 4,
        time: 3
    }

Note that not every attribute or column must be assigned. Non-assigned columns will be ignored and not imported into networkcube. If an attribute is not assigned a column, this attribute is ignored by the visualizations. E.g. if no linkType is set, no link types are shown. If no time is specified, the network will show as a static non-temporal network.

Parameter delimiter

The delimiter defines the delimiter used to separate fields in a row in the csv file, e.g. ,, \t, ; ...

Parameter timeFormat

If any field in the table indicates a time stamp, timeFormat indicates the time formatting. See http://momentjs.com/docs/#/parsing/ for information on how to specify your time format. If no timeFormat is given, but the linkSchema contains a field time, then networkcube will assume the time given in Unix milliseconds (since 01/01/1970)

Information for developers

The TypeScript file implementing the loaders is core/importers.ts. Add loaders there.