A Metalsmith plugin to find related files within collections.
Files are "related" if they share important terms in their contents.
For each file in a collection, Term Frequency-Inverse Document Frequency (TF-IDF) is used to:
- Find the top
natural.maxTermsimportant terms in the file's contents - Find how much weight those terms have in every other file in the collection
- Filter matches that have at least
natural.minTfIdfweight - Sort by descending weight (most "related" first)
- Limit to
maxRelatednumber of matches
npm install --save metalsmith-collections-relatedCollections need to be processed before related files can be found:
const Metalsmith = require('metalsmith');
const collections = require('metalsmith-collections');
const related = require('metalsmith-collections-related');
Metalsmith(__dirname)
.use(collections({
// options here
}))
.use(related({
// options here
}))
.build((err) => {
if (err) {
throw err;
}
});This plugin adds a metadata field named related to each file in the format:
{
"contents": "...",
"path": "...",
"related": {
"[collection name]": [
{ "contents": "...", "path": "..." },
{ "contents": "...", "path": "..." }
// up to the `maxRelated` number of files
],
"[another collection name]": [
{ "contents": "...", "path": "..." },
{ "contents": "...", "path": "..." }
// up to the `maxRelated` number of files
]
// up to as many collections as the file is in
}
}which can be used with templating engines, such as with Handlebars:
Type: string Default: **/*
A micromatch glob pattern to find input files.
Type: number Default: 3
The number of related files to add to each file's metadata.
Type: object Default:
{
"minTfIdf": 0,
"maxTerms": 10
}Type: number Default: 0
The minimum Term Frequency-Inverse Document Frequency (TF-IDF) measure.
Type: number Default: 10
The maximum number of terms to use for tf-idf weighting.
Type: object Default:
{
"allowedTags": [],
"allowedAttributes": {},
"nonTextTags": ["pre"]
}An object of sanitize-html options.