Skip to content

Narsese similarity of unstructured data

PtrMan edited this page Mar 15, 2019 · 2 revisions

NARS is able to compute the similarity between arbitrary unstructured information.

<(*,this, _d0) --> wordin>.
<(*,duck, _d0) --> wordin>.
<(*,is, _d0) --> wordin>.
<(*,green, _d0) --> wordin>.

<(*,green, _d1) --> wordin>.
<(*,ants, _d1) --> wordin>.
<(*,may, _d1) --> wordin>.
<(*,be, _d1) --> wordin>.
<(*,ducks, _d1) --> wordin>.

<duck <-> ducks>.

<_d0<->_d1>?"

A script to generate the narsese from text may look like this:

# print the narsese for the tokens in a article or section
def printTokensN(text, datasetNumber):
	text = text.lower()

	# replace special letters with spaces
	for i in[".",",",";","(",")","[","]"]:
	    text=text.replace(i, " ")

	tokens = text.split(" ")
	for iToken in tokens:
	    if not iToken in ["", ".", ";"]:
	        print("<(*," + iToken + ", _d" + str(datasetNumber) + ") --> wordin>.")




text = "A neural network is a network or circuit of neurons, or in a modern sense, an artificial neural network, composed of artificial neurons or nodes.[1] Thus a neural network is either a biological neural network, made up of real biological neurons, or an artificial neural network, for solving artificial intelligence (AI) problems."
datasetNumber = 0
printTokensN(text, datasetNumber)



text = "Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model of sample data, known as \"training data\", "
datasetNumber = 1
printTokensN(text, datasetNumber)


text = "Machine learning has gone from the the realm of a relatively small number of data scientists to the mainstream of analysis and business. And while this has resulted in a plethora of innovations and improvements for organizations worldwide, its also provoked reactions ranging from curiosity to anxiety among people everywhere."
datasetNumber = 2
printTokensN(text, datasetNumber)


# ask questions to know how similar the text pieces are

print("<_d0<->_d1>?")
print("<_d1<->_d2>?")

Additionally n-grams may get used to increase the accuracy.

visit #nars (more active) and ##nars on FreeNode

Clone this wiki locally