Library

Stephen Oliver edited this page Mar 30, 2017 · 2 revisions
Clone this wiki locally

The search plugin for Freenet.

Basic functionality

The Library plugin has two main functions:

  • To search old format and new format indexes, providing a user interface similar to the old XMLLibrarian plugin. This is primarily for searching freesites, and includes basic tools including adjacent word search (with quotes), negative matches, basic ranking, and so on. However, indexes can theoretically include files of other types.

  • To build new format indexes.

Old format indexes are based on XML, with index.xml including a list of sub-indexes, which are within the same directory (manifest). Each sub-index contains a list of files and a list of keywords, each with a list of the files (by index into the files list) and optionally word numbers (to enable adjacent word searches etc). Thus old format indexes can rapidly reach the point where just downloading the manifest takes significant time. XMLSpider plugin generates old format indexes.

Dependency diagram of work to do on Library.

New format indexes

New format indexes are based on forkable scalable b-tree indexes, a new on-Freenet data structure by infinity0.

The new plugin Spider spiders the freesite web (much like XMLSpider), feeds data to Library (version 10 or later) in configurable sized chunks. Library stores the data, merges it into an on-disk index, and then merges the on-disk index into an on-Freenet index, uploading only the changed nodes and their parents (up to the root), and pushing a USK after each CHK-based tree update. There is a cache on disk of all updated data to speed up future updates and prevent problems with data not being retrievable when we need to update it; unfortunately this is not garbage collected. This seems to be working now, and has been released. It needs a 1GB memory limit.

Distributed searching

Originally infinity0's Summer of Code project was to build a good index format and then use it to build a distributed searching system based on the Web of Trust. This was not practical in the time available, but is still intended in the long run. There are design documents about this and a small amount of code in Library.

This would primarily be for sharing files: Users would announce their own files, and those they have found, in indexes (which would include some additional data e.g. file size that isn't supported in current indexes), would link to and rank other users' indexes, and so on, and Library would search all the indexes by users that are sufficiently trusted. There are lots of possible optimizations to avoid having to pull everything e.g. bloom filters for what terms are indexed, and for improving search results e.g. aliases. Hopefully some users will take it upon themselves to maintain large, high quality indexes, and these may be cross-merged, in which case the new format will really shine.

Using it for searching freesites would provide a convenient means to automatically announce sites, for users to rate sites, and might eventually allow for collaborative spidering (see XMLSpider, scale is rapidly becoming a problem and eventually it will be impossible for one person's spider to find everything).

infinity0 is working on this for his undergraduate degree. He has a working non-Freenet prototype and hopes to implement the Freenet version in summer 2010 at some point (the rest of Library needs more work first). nikotyan is now working on this with infinity0 as a Summer of Code project in 2010.