Find file
Fetching contributors…
Cannot retrieve contributors at this time
95 lines (64 sloc) 2.17 KB


LibCPV::Categorizer - Class for Hierarchical CPV-Number Categorizing via AI::Categorizer.


  my $doc_set = new LibCPV::Categorizer::DocumentSet
    dirname         => '/path/to/docset/dir'

  my $categorizer = new LibCPV::Categorizer
    document_set         => $doc_set,
    learner_rootdir      => '/path/to/learner/output/dir'


We use AI::Categorizer. Because AI::Categorizer does not do hierarchical categorization we added our own hierarchy schema based on the semantics of "cpv numbers".

For introduction to cpv numbers see

In LibCPV::Categorizer we try to use a consistent wording. Here are the most important phrases:

learner - An AI::Categorizer instance used to learn (or train).

category - a cpv number, simply an 8-digit-number. CPV numbers are hierarchically built. The first 2 digits form a common level of accuracy, then each following digit forms another accuracy level. We derive the word "group" from that accuracy level definition.

Exercise some Affe dance.

Quite funky Zomtec




moo foo bar


Some code examples

  # a verbatim block
  sub cut { 42 }
  my $foo = cut();
  sub affe {
          do_something_strong($foo, $zomtec, @tiger);
          print STDERR $foo, "\n";

  # another verbatim block after a single empty line
  # although that is not the only reason for confusion
  sub kram {

  If all possible cpv numbers with 8 digits would be used, the tree
  would have one root level learner categorizing into 99 categories,
  99 learners at the next level each categorizing into 9 categories,
  therefore 9 learnes in each of the 99 categories, and in each
  following level 9 more learners for each category.


Hey! The above document had some coding errors, which are explained below:

Around line 43:

Unknown directive: =func

Around line 47:

Unknown directive: =method

Around line 51:

Unknown directive: =method

Around line 55:

Unknown directive: =func

Around line 59:

Unknown directive: =attr