Skip to content

Extract and classify Spanish terms from wiki pages, with TypeScript

License

Notifications You must be signed in to change notification settings

giancosta86/CervantesJS

Repository files navigation

CervantesJS

Extract and classify Spanish terms from wiki pages, with TypeScript

GitHub CI npm version MIT License

Overview

CervantesJS is a TypeScript library for extracting Spanish terms from wiki pages; even more, it is a plugin for JardineroJS, creating a SQLite dictionary of Spanish terms by parsing Wikcionario.

Installation

To install the package as a plugin, please refer to the documentation of JardineroJS.

The current version of the plugin requires Jardinero 2.x

Otherwise, to install it as a library reference within a project:

npm install @giancosta86/cervantes

or

yarn add @giancosta86/cervantes

The public API entirely resides in the root package index, so you shouldn't reference specific modules.

Usage

CervantesJS is firstly and foremostly a plugin for JardineroJS: please, consult its documentation for details.

However, you can also reference the package as a standalone library for extracting Spanish terms from wiki pages!

In this case, you can just import names directly from its root:

import {...} from "@giancosta86/cervantes"

In particular, you may want to consider:

  • the SpanishTerm union type - and the related types like Noun, Article, ...

  • extractTerms() - to extract Spanish terms from a given wiki page

  • SpanishTransform - a transform stream applying extractTerms() to a flow of wiki pages

  • SPANISH_SQLITE_SCHEMA: a string containing the DDL code for SQLite

  • createSpanishWritableBuilder() - creating a WritableBuilder (from the sqlite-writable library) with the required type registrations and with a suitable transaction capacity

Further reference

Please, feel free to explore:

  • JardineroJS - the web stack itself, designed for extensible linguistic analysis

  • JardineroJS - SDK - the development kit for creating your own plugins

About

Extract and classify Spanish terms from wiki pages, with TypeScript

Resources

License

Stars

Watchers

Forks

Packages

No packages published