Skip to content

nlp-compromise/fr-compromise

Repository files navigation

fr-compromise
linguistique computationnelle modeste
npm install fr-compromise
travaux en cours! • work-in-progress!

fr-compromise est un port de compromise en français

L'objectif de ce projet est de fournir un petit POS-tagger de base basé sur des règles.

(this project is a small, basic, rules-based POS tagger!)

import tal from 'fr-compromise'

let doc = tal(`Je m'baladais sur l'avenue le cœur ouvert à l'inconnu`)
doc.match('#Noun').out('array')
// [ 'je', 'avenue', 'cœur', 'inconnu' ]

ou côté client:

<script src="https://unpkg.com/fr-compromise"></script>
<script>
  let txt = `J'avais envie de dire bonjour à n'importe qui`
  let doc = frCompromise(txt) // espace de noms global 
  console.log(doc.sentences(1).json())
  // { text:'J'avais...', terms:[ ... ] }
</script>

API

fr-compromise inclut toutes les méthodes de compromise/one:

cliquez pour voir l'API

Output
  • .text() - return the document as text
  • .json() - return the document as data
  • .debug() - pretty-print the interpreted document
  • .out() - a named or custom output
  • .html({}) - output custom html tags for matches
  • .wrap({}) - produce custom output for document matches
Utils
  • .found [getter] - is this document empty?
  • .docs [getter] get term objects as json
  • .length [getter] - count the # of characters in the document (string length)
  • .isView [getter] - identify a compromise object
  • .compute() - run a named analysis on the document
  • .clone() - deep-copy the document, so that no references remain
  • .termList() - return a flat list of all Term objects in match
  • .cache({}) - freeze the current state of the document, for speed-purposes
  • .uncache() - un-freezes the current state of the document, so it may be transformed
Accessors
Match

(match methods use the match-syntax.)

  • .match('') - return a new Doc, with this one as a parent
  • .not('') - return all results except for this
  • .matchOne('') - return only the first match
  • .if('') - return each current phrase, only if it contains this match ('only')
  • .ifNo('') - Filter-out any current phrases that have this match ('notIf')
  • .has('') - Return a boolean if this match exists
  • .before('') - return all terms before a match, in each phrase
  • .after('') - return all terms after a match, in each phrase
  • .union() - return combined matches without duplicates
  • .intersection() - return only duplicate matches
  • .complement() - get everything not in another match
  • .settle() - remove overlaps from matches
  • .growRight('') - add any matching terms immediately after each match
  • .growLeft('') - add any matching terms immediately before each match
  • .grow('') - add any matching terms before or after each match
  • .sweep(net) - apply a series of match objects to the document
  • .splitOn('') - return a Document with three parts for every match ('splitOn')
  • .splitBefore('') - partition a phrase before each matching segment
  • .splitAfter('') - partition a phrase after each matching segment
  • .lookup([]) - quick find for an array of string matches
  • .autoFill() - create type-ahead assumptions on the document
Tag
  • .tag('') - Give all terms the given tag
  • .tagSafe('') - Only apply tag to terms if it is consistent with current tags
  • .unTag('') - Remove this term from the given terms
  • .canBe('') - return only the terms that can be this tag
Case
Whitespace
  • .pre('') - add this punctuation or whitespace before each match
  • .post('') - add this punctuation or whitespace after each match
  • .trim() - remove start and end whitespace
  • .hyphenate() - connect words with hyphen, and remove whitespace
  • .dehyphenate() - remove hyphens between words, and set whitespace
  • .toQuotations() - add quotation marks around these matches
  • .toParentheses() - add brackets around these matches
Loops
  • .map(fn) - run each phrase through a function, and create a new document
  • .forEach(fn) - run a function on each phrase, as an individual document
  • .filter(fn) - return only the phrases that return true
  • .find(fn) - return a document with only the first phrase that matches
  • .some(fn) - return true or false if there is one matching phrase
  • .random(fn) - sample a subset of the results
Insert
Transform
Lib

(these methods are on the main nlp object)

Les Numeros:

fr-compromise peut analyser les nombres écrits et numériques:

let doc = nlp(`j'ai moins quarante dollars`).debug()
doc.numbers().add(50)
doc.text()
// "j'ai dix dollars"

Lemmatisation:

il peut conjuguer des mots à leur racine:

let doc=nlp('Nous jetons les chaussures')
doc.compute('root')
doc.found('{jeter} les {chaussure}')
// true

Analyse de date:

à l'aide le plugin fr-compromise-dates, il peut transformer des dates en langage naturel en dates au format ISO

import plg from 'fr-compromise-dates'
nlp.plugin(plg)
let opts = { timezone: 'UTC', today: '2023-03-02' }

let doc=nlp('Je peux emprunter votre voiture entre le 2 mai et le 14 juillets')
let res=doc.dates().json()[0]
/*
  {
    text: 'entre le 2 mai et le 14 juillet',
    dates: [
      {
        start: '2023-05-02T00:00:00.000Z',
        end: '2023-07-14T23:59:59.999Z'
      }
    ]
  }
*/
// true

Contribuant

Veuillez rejoindre pour aider! - please join to help!

help with first PR1

git clone https://github.com/nlp-compromise/fr-compromise.git
cd fr-compromise
npm install
npm test
npm watch

Voir aussi

MIT