Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Common Lisp
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
analysis
common-lisp
corpus
data
fsa
orig
sysdcl
util
.gitignore
COPYRIGHT
README.md
cltl1-compat.lisp
package.lisp
pdefsys.doc
pdefsys.lisp
tagger.asd

README.md

This directory contains release 1.2 of the Xerox Part-of-Speech tagger. For more information, print the file doc/tagger/tagger.ps.


Until this project is added to the Quicklisp repository installation must be performed manually in several steps (considering that you've got Quicklisp installed already):

  1. Download project sources (either by cloning the repository or downloading it in an archive).
  2. Unpack them to some directory, remember it.
  3. cd to the ~/quicklisp/local-projects directory.
  4. Create symbolic links to the .asd files in the directory from step 2.

Now it is possible to download the application either in parts or entirely:

(ql:quickload "tagger")

When the loading is complete, you can run some simple queries:

(tag-analysis:tag-string "I saw the man on the hill with the telescope.")

I saw the man on the hill with the telescope.
ppss/2 vbd/3 at nn in at nn in/2 at nn/2

(The number following the tag is the arity of the ambiguity class assigned by the lexicon. Words without a number are unambiguous.)

Programmatic Tagging

To use the tagger in a program, create a tagging-ts and use the values of calls to the generic function next-token. Note that reinitialize-instance redirects tagging to a new text with minimal initialization overhead.

For example, the following function, my-tag-files, calls my-process-token-and-tag on each token/tag pair generated by tagging each le in the argument files:

(use-package :tdb)
(use-package :tag-analysis)
(defun my-tag-files (files)
  (let ((token-stream (make-instance 'tagging-ts)))
    (dolist (file files)
      (with-open-file (char-stream file)
    (reinitialize-instance token-stream :char-stream char-stream)
    (loop (multiple-value-bind (token tag)
          (next-token token-stream)
        (unless token (return))
        (my-process-token-and-tag token tag)))))))
Something went wrong with that request. Please try again.