The ability to systematically compare the semantic proximity of words is an incredibly useful operation while we are developing semantic systems. 

Indra facilitates this process, providing a set of pre-built distributional semantic models (DSMs) and a simple API to experiment with different distributional semantic models. 

With Indra you can experiment with different DSMs (Word2Vec, Glove, LSA, ESA) built from different corpora and different languages (12 languages, please check the documentation). 

In the first example we will compute the semantic relatedness between the words 'wife' and 'spouse' and 'wife' and 'car' using the W2V model built from the Wikipedia 2014 corpus, using the Word2Vec model and the COSINE similarity measure. 

In [2]:
import http.client

conn = http.client.HTTPConnection("alphard.fim.uni-passau.de:8916")

payload = '''{
    "corpus": "wiki-2014",
    "model": "W2V",
    "language": "EN",
    "scoreFunction": "COSINE",
    "pairs": [{
        "t2": "wife",
        "t1": "spouse"
    }, {
        "t2": "wife",
        "t1": "car"
    }]
}
'''

headers = {
    'accept': "application/json",
    'content-type': "application/json",
    'authorization':  "Basic aW5kcmE6UVk4SDVkcm9ZODQ9",
    'cache-control': "no-cache"
}

conn.request("POST", "/indra/v1/relatedness", payload, headers)
res = conn.getresponse()
data = res.read()

print(data.decode("utf-8"))

{"corpus":"wiki-2014","model":"W2V","language":"EN","pairs":[{"t1":"car","t2":"wife","score":-0.02258930864194628},{"t1":"spouse","t2":"wife","score":0.5078599088550189}],"scoreFunction":"COSINE"}


Now an example over a Portuguese DSM, computing the semantic relatedness between the same words.

In [2]:
import http.client

conn = http.client.HTTPConnection("indra.amtera.net:80")

payload = '''{
    "corpus": "wiki-2014",
    "model": "W2V",
    "language": "PT",
    "scoreFunction": "COSINE",
    "pairs": [{
        "t2": "mulher",
        "t1": "esposa"
    }, {
        "t2": "mulher",
        "t1": "carro"
    }]
}
'''

headers = {
    'accept': "application/json",
    'content-type': "application/json",
    'authorization':  "Basic aW5kcmE6UVk4SDVkcm9ZODQ9",
    'cache-control': "no-cache"
}

conn.request("POST", "/indra/v1/relatedness", payload, headers)
res = conn.getresponse()
data = res.read()

print(data.decode("utf-8"))

{"corpus":"wiki-2014","model":"W2V","language":"PT","pairs":[{"t1":"carro","t2":"mulher","score":-0.047018436961208644},{"t1":"esposa","t2":"mulher","score":0.5239692741368474}],"scoreFunction":"COSINE"}


In [5]:
import http.client

conn = http.client.HTTPConnection("indra.amtera.net:80")

payload = '''{
    "corpus": "wiki-2014",
    "model": "LSA",
    "language": "FR",
    "scoreFunction": "EUCLIDEAN",
    "pairs": [{
        "t2": "mere",
        "t1": "pere"
    }, {
        "t2": "mere",
        "t1": "voiture"
    }]
}
'''

headers = {
    'accept': "application/json",
    'content-type': "application/json",
    'authorization':  "Basic aW5kcmE6UVk4SDVkcm9ZODQ9",
    'cache-control': "no-cache"
}

conn.request("POST", "/indra/v1/relatedness", payload, headers)
res = conn.getresponse()
data = res.read()

print(data.decode("utf-8"))

{"corpus":"wiki-2014","model":"LSA","language":"FR","pairs":[{"t1":"voiture","t2":"mere","score":0.010360440900461221},{"t1":"pere","t2":"mere","score":0.0140639897264022}],"scoreFunction":"EUCLIDEAN"}


And the same operations for a French DSM, experimenting with another model (Latent Semantic Analysis - LSA), with another distance measure (Euclidean)