Skip to content

Commit

Permalink
Small update to getting started guide
Browse files Browse the repository at this point in the history
  • Loading branch information
markgw committed Jul 7, 2020
1 parent 6991624 commit 61e1dcb
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 13 deletions.
2 changes: 2 additions & 0 deletions admin/newproject.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,8 @@ def _rem(filename):


TEMPLATE_CONF = """\
#!./pimlico.sh
[pipeline]
name={pipeline_name}
release={latest_release}
Expand Down
28 changes: 15 additions & 13 deletions docs/guides/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,10 @@ it's what distinguishes the storage locations.
set to the latest one, which has been downloaded.

If you later try running the same pipeline with an updated version of Pimlico,
it will work fine as long as it's the same major version (the first digit).
Otherwise, there may be backwards incompatible changes, so you'd
it will work fine as long as it's the same minor version (the second part).
The minor-minor third part can be updated and may bring some improvements.
If you use a higher minor version (e.g. 0.10.x when you started with 0.9.24),
there may be backwards incompatible changes, so you'd
need to update your config file, ensuring it plays nicely with the later
Pimlico version.

Expand All @@ -113,28 +115,28 @@ the home directory.

.. code-block:: ini
[input-text]
[input_text]
type=pimlico.modules.input.text.raw_text_files
files=%(home)s/data/europarl_demo/*
.. todo::

Continue writing from here

Doing something: tokenization
-----------------------------
Now, some actual linguistic processing, albeit somewhat uninteresting. Many NLP tools assume that
their input has been divided into sentences and tokenized. The OpenNLP-based tokenization module does both of these
things at once, calling OpenNLP tools.
their input has been divided into sentences and tokenized. To keep things simple, we use a very
basic, regular expression-based tokenizer.

Notice that the output from the previous module feeds into the input for this one, which we specify simply by naming
the module.
Notice that the output from the previous module feeds into the
input for this one, which we specify simply by naming the module.

.. code-block:: ini
[tokenize]
type=pimlico.modules.opennlp.tokenize
input=tar-grouper
type=pimlico.modules.text.simple_tokenize
input=input_text
.. todo::

Continue writing from here

Doing something more interesting: POS tagging
---------------------------------------------
Expand Down

0 comments on commit 61e1dcb

Please sign in to comment.