Skip to content

Latest commit

 

History

History
81 lines (66 loc) · 3.04 KB

README-Natural-v4.md

File metadata and controls

81 lines (66 loc) · 3.04 KB

Integrated Natural Language System

  • November 2022.

This is a revised version of the earlier Unsupervised Natural Language Learning README Version 3, updated to reflect a new integrated process that enables continuous learing.

A general overview, including pre-requistes, is provided by the Unsupervised Natural Language Learning README file.

This integration is in development; the instructions below are incomplete. Look at the older version for general guidance.

There are technical challenges to fully implementing continuous learning. The instructions below will be a hybrid of the older-style batch process, and the newer style.

Table of Contents

  1. Processing Overview
  2. Preliminaries

Preliminaries

The setup of the integrated pipeline requires many prequisites and preliminaries. These are given in the earlier

  1. Setting up the AtomSpace
  2. Bulk Pair Counting
  3. Mutual Information of Word Pairs
  4. The Vector Structure Encoded in Pairs
  5. Maximum Spanning Trees
  6. MST Disjunct Counting
  7. Disjunct Marginal Statistics
  8. Determining Grammatical Classes
  9. Creating Grammatical Classes
  10. Exporting a Lexis
  11. Clustering
  12. Precomputed LXC containers

Mutual Information of Word Pairs

The goal is to have MI be computed dynamically, on the fly. The code to get this working is half written, but incomplete. So, for now, do it as before, as a batch process. Run the code in run-common/marginals-pair.scm. It works.

MST Parsing Demo

As a demo of what is about to happen, aim the link-parser at a running instance of the CogServer, containing word-pairs (and word-pair MI data.) Type in any sentence, and then patiently wait (about 5-10 seconds) for data to fly over the net. The resulting parses will be maximal planar graphs (MPG), which are similar to maximal spanning trees (MST), but contain loops. What's being maximized is the sum-total of all of the MI of the links between words.

Use the dictionary in run-config/dict-combined after adjusting the URL in it. Like so:

link-parser run-config/dict-combined

MST Disjunct Counting

As before, but with modernized infrastucture. (This is not yet the "continuous learning" design...)

  • Edit run-config/3-mpg-conf.sh and modify as needed,
  • Start CogServer with run/3-mst-parsing/run-mst-cogserver.sh or simply guile -l run-common/cogserver-mst.scm.
  • Place text data into $CORPORA_DIR as configured in 3-mpg-conf.sh
  • Run ./run/3-mst-parsing/mst-submit.sh

That's all for now!

THE END.