Extract and classify Spanish terms from wiki pages, with TypeScript
CervantesJS is a TypeScript library for extracting Spanish terms from wiki pages; even more, it is a plugin for JardineroJS, creating a SQLite dictionary of Spanish terms by parsing Wikcionario.
To install the package as a plugin, please refer to the documentation of JardineroJS.
The current version of the plugin requires Jardinero 2.x
Otherwise, to install it as a library reference within a project:
npm install @giancosta86/cervantes
or
yarn add @giancosta86/cervantes
The public API entirely resides in the root package index, so you shouldn't reference specific modules.
CervantesJS is firstly and foremostly a plugin for JardineroJS: please, consult its documentation for details.
However, you can also reference the package as a standalone library for extracting Spanish terms from wiki pages!
In this case, you can just import names directly from its root:
import {...} from "@giancosta86/cervantes"
In particular, you may want to consider:
-
the
SpanishTerm
union type - and the related types likeNoun
,Article
, ... -
extractTerms()
- to extract Spanish terms from a given wiki page -
SpanishTransform
- a transform stream applyingextractTerms()
to a flow of wiki pages -
SPANISH_SQLITE_SCHEMA
: a string containing the DDL code for SQLite -
createSpanishWritableBuilder()
- creating aWritableBuilder
(from the sqlite-writable library) with the required type registrations and with a suitable transaction capacity
Please, feel free to explore:
-
JardineroJS - the web stack itself, designed for extensible linguistic analysis
-
JardineroJS - SDK - the development kit for creating your own plugins