docs: Move citations into a global references ReST file

mahlberg-lab · Nov 16, 2018 · a0ae951 · a0ae951
1 parent e519c52
commit a0ae951
Show file tree

Hide file tree

Showing 4 changed files with 11 additions and 15 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -12,6 +12,7 @@ CLiC
    advanced
    appendices
    footnotes
+   references
 
 * :ref:`genindex`
 * :ref:`search`
diff --git a/docs/references.rst b/docs/references.rst
@@ -0,0 +1,6 @@
+References
+==========
+
+.. [ICU] http://userguide.icu-project.org/boundaryanalysis
+.. [UAX29] https://www.unicode.org/reports/tr29/tr29-33.html#Word_Boundaries
+.. [UNIDECODE] https://pypi.org/project/Unidecode/
diff --git a/server/clic/region/chapter.py b/server/clic/region/chapter.py
@@ -149,7 +149,7 @@
 blank line in the text).
 
 ``chapter.paragraph`` are then broken up into ``chapter.sentence``, using the
-Unicode sentence segmentation in [UAX29], using the implementation in the [ICU]
+Unicode sentence segmentation in [UAX29]_, using the implementation in the [ICU]_
 library.
 
 * We use the ``en_GB@ss=standard`` locale (ss=standard tells ICU to not treat
@@ -207,10 +207,6 @@
      ('chapter.sentence', 141, 236, 2, 'Above the door was p...Oliver, News Agent."'),
      ('chapter.paragraph', 238, 395, 2, 'So if you wish to st...et all these things.'),
      ('chapter.sentence', 238, 395, 3, 'So if you wish to st...et all these things.')]
-
-.. [ICU] http://userguide.icu-project.org/boundaryanalysis
-.. [UAX29] https://www.unicode.org/reports/tr29/tr29-33.html#Word_Boundaries
-.. [UNIDECODE] https://pypi.org/project/Unidecode/
 """
 import re
 

diff --git a/server/clic/tokenizer.py b/server/clic/tokenizer.py
@@ -7,8 +7,8 @@
 Method
 ------
 
-To extract tokens, we use Unicode text segmentation as described in [UAX29],
-using the implementation in the [ICU] library and standard rules for en_GB, and
+To extract tokens, we use Unicode text segmentation as described in [UAX29]_,
+using the implementation in the [ICU]_ library and standard rules for en_GB, and
 then apply our own additions (see later).
 
 Please read the document for a full description of ICU word boundaries, however
@@ -43,7 +43,7 @@
 Tokens are then normalised into types by:-
 
 * Lower-casing, ``The`` -> ``the``.
-* Normalising any non-ascii characters with [UNIDECODE], e.g.
+* Normalising any non-ascii characters with [UNIDECODE]_, e.g.
   * ``can’t`` -> ``can't``.
   * ``café`` -> ``cafe``.
 * Removing any surrounding underscores, e.g. ``_connoisseur_`` -> ``connoisseur``.
@@ -133,13 +133,6 @@
     ... ''')]
     ['we', 'have', 'books', 'everywhere',
      'moo', 'oi', 'nk']
-
-References
-----------
-
-.. [ICU] http://userguide.icu-project.org/boundaryanalysis
-.. [UAX29] https://www.unicode.org/reports/tr29/tr29-33.html#Word_Boundaries
-.. [UNIDECODE] https://pypi.org/project/Unidecode/
 """
 import re