Skip to content

Command-line wrapper around SpaCy and trankit, focused on dependency parsing

Notifications You must be signed in to change notification settings

mwestera/spacy-wrap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 

Repository files navigation

spacy-wrap

Convenient command-line interface for applying spacy and trankit pipelines. Allows word tokenization and sentence segmentation, for given language or using automatic language detection, as well as printing spacy parses directly as json.

Install

$ pipx install git+https://github.com/mwestera/spacy-wrap

This will make three commands available:

  • tokenize
  • sentencize
  • spacyjson

Examples

$ echo "Here's just a short text. For you to parse." | tokenize --info -tree

Or, to process each line from a file separately (and this time using a transformer model, --trf):

$ cat some_dutch_sentences.txt | tokenize --info --trf --lang nl --lines --tree
$ cat texts_in_various_languages.txt | sentencize --trf --lines

Note: In this case, will detect language separately for each input line.

Or output full sentence parses in json format:

$ cat texts_in_various_languages.txt | sentencize --lines --lang nl --json

Or entire spacy docs as json:

$ cat texts_in_various_languages.txt | spacyjson --lines --lang nl --json > parses.jsonl

About

Command-line wrapper around SpaCy and trankit, focused on dependency parsing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages