Natural Language Processing Library for Pharo Smalltalk

License: MIT

Note: the most frequent updates to this Pharo Smalltalk package will appear on the github repo for this project.

Note 2: on 4/25/2021 I converted this project to use the IceBerg github support for Pharo Smalltalk. All source code and data have been moved to the subdirectory src.

IceBerg/github documentation: https://books.pharo.org/booklet-ManageCode/pdf/2019-03-24-ManageCode.pdf

Add this repository using the IcewBerg Browser.

Setup to be done one time after loading the code via IceBerg

Part Of Speech Tagging

Open a File Browser and fileIn the KBSnlp.st source file. Open a Class Browser and and look at the code in the KBnlp class.

Open a Workspace and one time only evaluate:

NLPtagger initializeLexicon

Try tagging a sentence to make sure the data was read from disk correctly:

NLPtagger pptag: 'The dog ran down the street'

If this does not work then probably the directory nlp_smalltalk is not in the default directory. The code containing the file path is:

read := (FileStream fileNamed: './nlp_smalltalk/lexicon.txt') readOnly.

Categorization

I am using NeoJSON to parse the category word count data so make sure NeoJSON is installed. NeoJSON can be installed using:

Gofer it
   smalltalkhubUser: 'SvenVanCaekenberghe' project: 'Neo';
   configurationOf: 'NeoJSON';
   loadStable.

One time initialization:

NLPcategories initializeCategoryHash

Try it:

 NLPcategories classify: 'The economy is bad and taxes are too high.'

Entity Recognition

Implemented for products, companies, places, and people's names.

One time initialization:

 NLPentities initializeEntities

Example:

NLPentities entities: 'The Coca Cola factory is in London'

        -->  a Dictionary('companies'->a Set('Coca Cola') 'places'->a Set('London') 'products'->a Set('Coca Cola') )

NLPentities humanNameHelper: 'John Alex Smith and Andy Jones went to the store.'

                    --> a Set('John Alex Smith' 'Andy Jones')

Sentence Segmentation

One time initialization:

NLPsentences loadData

NLPsentences sentences: 'Today Mr. Jones went to town. He bought gas.'

  --> an OrderedCollection(an OrderedCollection('Today' 'Mr.' 'Jones' 'went' 'to' 'town' '.') an OrderedCollection('He' 'bought' 'gas' '.'))

Summarization

No additional data needs to be loaded for summarization, but all other data should be loaded as-per the above directions. Here is a short example:

NLPsummarizer summarize: 'The administration and House Republicans have asked a federal appeals court for a 90-day extension in a case that involves federal payments to reduce deductibles and copayments for people with modest incomes who buy their own policies. The fate of $7 billion in "cost-sharing subsidies" remains under a cloud as insurers finalize their premium requests for next year. Experts say premiums could jump about 20 percent without the funding. In requesting the extension, lawyers for the Trump administration and the House said the parties are continuing to work on measures, including potential legislative action, to resolve the issue. Requests for extensions are usually granted routinely.'

--> #('The administration and House Republicans have asked a federal appeals court for a 90-day extension in a case that involves federal payments to reduce deductibles and copayments for people with modest incomes who buy their own policies .' 'The fate of $ 7 billion in "cost-sharing subsidies" remains under a cloud as insurers finalize their premium requests for next year .' 'In requesting the extension , lawyers for the Trump administration and the House said the parties are continuing to work on measures , including potential legislative action , to resolve the issue .')

Limitations

Does not currently handle special characters like: —
Categorization and summarization should also use "bag of ngrams" in addition to "bag of words" (BOW)

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github		.github
KBSnlp.package		KBSnlp.package
src		src
.filetree		.filetree
.gitignore		.gitignore
.project		.project
LICENSE.txt		LICENSE.txt
README.md		README.md
company_names.txt		company_names.txt
firstnames.txt		firstnames.txt
honorifics.txt		honorifics.txt
lastnames.txt		lastnames.txt
lexicon.txt		lexicon.txt
placenames.txt		placenames.txt
prefixnames.txt		prefixnames.txt
product_names.txt		product_names.txt
tags.json		tags.json
tags_2gram.json		tags_2gram.json
tokensWithPeriods.txt		tokensWithPeriods.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Natural Language Processing Library for Pharo Smalltalk

Setup to be done one time after loading the code via IceBerg

Part Of Speech Tagging

Categorization

Entity Recognition

Sentence Segmentation

Summarization

Limitations

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Languages

Uh oh!

License

mark-watson/nlp_smalltalk

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing Library for Pharo Smalltalk

Setup to be done one time after loading the code via IceBerg

Part Of Speech Tagging

Categorization

Entity Recognition

Sentence Segmentation

Summarization

Limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Packages