NodeJournal reads the site and summarizes it like newspaper.
In concrete terms, first it reads the site's text contents then classifies these into title and detail by model.
(neuraln needs C compiler when installing).
You can train the model as below.
-
download site contents. You will get
dataset.txt
node training/GetDataset.js (site-url)
-
copy the
dataset.txt
totraining.txt
and add labels for supervised lerning. -
prepare
model.json
to define the model (layer architecture).
{
"layers": [ 5, 10, 10, 3]
}
- training the model. training result is saved to
modelMemory.txt
node training/Training.js
If you havemodelMemory.txt
already, it will be loaded before lerning.
Then, you can read the site. The result is saved in crawled.txt
.
node Crawl.js (site-url)