Skip to content

Commit

Permalink
Add examples of basic usage without POS tags
Browse files Browse the repository at this point in the history
  • Loading branch information
sorenlind committed Apr 24, 2019
1 parent 1ea4c98 commit 9f84340
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 7 deletions.
38 changes: 33 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,18 @@ supports training on your own dataset.

The models included in Lemmy were evaluated on the respective Universal Dependencies dev
datasets. The Danish model scored > 99% accuracy, while the Swedish model scored > 97%.
All reported scores were obtained when supplying Lemmy with POS tags.

You can use Lemmy as a spaCy extension, more specifcally a spaCy pipeline component.
This is highly recommended and makes the lemmas easily accessible from the spaCy tokens.
Lemmy makes use of POS tags to predict the lemmas. When wired up to the spaCy pipeline,
Lemmy has the benefit of using spaCy’s builtin POS tagger.

Lemmy can also by used without spaCy, as a standalone lemmatizer. In that case, you will
have to provide the POS tags. Alternatively, you can train a Lemmy model which does not
depend on POS tags, though most likely the accuracy will suffer.
have to provide the POS tags. Alternatively, you can use Lemmy without POS tags, though
most likely the accuracy will suffer. Currrently, only the Danish Lemmy model comes with
a model trained for use without POS tags. That is, if you want to use Lemmy on Swedish
text without POS tags, you must train your own Lemmy model.

Lemmy is heavily inspired by the [CST Lemmatizer for
Danish](https://cst.dk/online/lemmatiser/).
Expand All @@ -30,18 +33,43 @@ Danish](https://cst.dk/online/lemmatiser/).
pip install lemmy
```

## Usage
## Basic Usage Without POS tags

```python
import da_custom_model as da # name of your spaCy model
import lemmy

# Create an instance of the standalone lemmatizer.
lemmatizer = lemmy.load("da")

# Find lemma for the word 'akvariernes'. First argument is an empty POS tag.
lemmatizer.lemmatize("", "akvariernes")
```

## Basic Usage With POS tags

```python
import lemmy

# Create an instance of the standalone lemmatizer.
# Replace 'da' with 'sv' for the Swedish lemmatizer.
lemmatizer = lemmy.load("da")

# Find lemma for the word 'akvariernes'. First argument is the user-provided POS tag.
lemmatizer.lemmatize("NOUN", "akvariernes")
```

## Usage with spaCy Model

```python
import da_custom_model as da # replace da_custom_model with name of your spaCy model
import lemmy.pipe
nlp = da.load()

# Create an instance of Lemmy's pipeline component for spaCy.
# Replace 'da' with 'sv' for the Swedish lemmatizer.
pipe = lemmy.pipe.load('da')

# Add the comonent to the spaCy pipeline.
# Add the component to the spaCy pipeline.
nlp.add_pipe(pipe, after='tagger')

# Lemmas can now be accessed using the `._.lemmas` attribute on the tokens.
Expand Down
4 changes: 2 additions & 2 deletions notebooks/da/04 examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,14 @@
],
"source": [
"lemmatizer = lemmy.load(\"da\")\n",
"lemmatizer.lemmatize(\"\", full_form=\"skibene\")"
"lemmatizer.lemmatize(\"\", \"skibene\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using Lemy with spaCy\n",
"## Using Lemmy with spaCy\n",
"This is an example of how to use lemma with spaCy. \n",
"\n",
"**Caution**: The Danish model included with spaCy is not trained for POS tagging. This model can not be used with Lemmy since the Lemmy pipeline component for spaCy requires POS tags. You must train your own spaCy model capable of POS tagging. The example below assumes you have your own model called `da_custom_model`."
Expand Down

0 comments on commit 9f84340

Please sign in to comment.