Skip to content

WebEntity Links update process

jrault edited this page Dec 21, 2012 · 1 revision

The WebEntity Links update process allows to have performance when retrieving links between web entities as well as having performance when inserting new links between Nodes. Its purpose is somehow to put the links in cache. We do not want to update the cache each time there is a change in the WebEntity (a Node insertion, with its links) nor to update each time the core asks for data (no cache).

The problem

One of the goals we want to achieve is to provide users a graph of the web entities they have defined. The graph of web entities is an expected and common output of our system. But this graph is highly aggregated, and thus costly to build. Here is just a reminder of the aggregation stages we use:

  • At the lower level, we have pages and the links in them
  • We use this information to build a graph of nodes. These are just an approximation of the pages. We do it into reducing the complexity (and size) of the graph.
  • The web entities, defined by the user, whose links are aggregated from links between nodes.
So basically we have a list of web entities, we have a list of nodes in them (we know which node belongs to which web entity, and we can retrieve the nodes in a given web entity) and we have a list of Node Links (links between nodes). We want to build the WebEntity links from these data. We explain here when and how.

Objects involved

  • A WebEntity (we describe the process for a given WebEntity)
    • Including some timestamps:
      • last insert timestamp
      • last update timestamp
  • Nodes contained by this WebEntity
  • Node Links outbound from these Nodes
  • The WebEntity Links that will be built from these Node Links.

The process as an algorithm

We propose now the process in pseudo-code.

  • When a Node and its outbound Node Links are inserted:
    • Retrieve the WebEntity containing the Node and update its last insert timestamp to now
    • Look if the update process is running on the WebEntity
      • If it is not running, launch it.
  • Update process of the WebEntity:
    • Retrieve from the WebEntity the last insert timestamp and the last update timestamp, and compare it.
    • If last insert timestamp < last update timestamp, we need to update:
      • Set the current update timestamp to now
      • Retrieve from the WebEntity all the Node Links inserted after last update timestamp and before current update timestamp
      • Build the WebEntity Links from to these Nodes Links
      • Finally, update the last update timestamp to current update timestamp.
      • And iterate (trigger the process again)
Clone this wiki locally