Sketchengine API for OntoLex FrAC Module 
===
OntoLex Module for Frequency, Attestation and Corpus Information (FrAC)

Ranka Stanković, University of Belgrade; 25.1.2021

Detailed documentation for Sketchengine API available at: https://www.sketchengine.eu/documentation/api-documentation/  
See FUP (Fair Use Policy) at https://www.sketchengine.eu/fair-use-policy/
For fewer than 100 requests, you may not need to wait at all. For 100–900 requests, 4 seconds recommended.
For more than 900 requests, it is best to wait at least 45 seconds.

For methods and attributes in Sketch Engine API see https://www.sketchengine.eu/documentation/methods-documentation/
    

In [2]:
import requests
# insert user name for https://auth.sketchengine.eu 
USERNAME =  ' '  
# when logged in, click the three-dot icon at the top-right corner of the screen and select My account
# click the Generate new API key button (the API key is a long string of letters and numbers)
API_KEY = ' '
base_url = 'https://api.sketchengine.eu/bonito/run.cgi'


Reading based on preloaded British National Corpus (BNC) corpus 
---
For association measure see https://www.sketchengine.eu/documentation/statistics-used-in-sketch-engine/ or
https://www.sketchengine.eu/wp-content/uploads/ske-statistics.pdf 



Parameter |	Type   | Default  | Description
--------- | ------ | -------- | --------------
lemma | string | | REQUIRED	
lpos | | |part of speech | in notation ‘-n’, ‘-v’, … but the particular notation depends on a corpus
maxthesitems | integer | 60 | maximum number of items
clusteritems | integer (boolean) |	0	| in wsketch is clustercolls: cluster collocations
minsim | | | minimum similarity between clustered items, relevant only when clustercolls is set to 1

In [8]:
# thesaurus page access params - similarity and frequency
page = 'thes'  
databnc = {
 'corpname': 'preloaded/bnc2',
 'format': 'json',
 'lemma': 'risk',
 'lpos': '-n',
 'tab': 'basic' 
}
url = base_url + '/%s?corpname=%s' % (page, databnc['corpname'])

#
d = requests.get(url, params=databnc, auth=(USERNAME, API_KEY)).json()
d['Words']



[{'freq': 7440, 'score': 0.3, 'word': 'danger', 'id': 3209},
 {'freq': 9403, 'score': 0.271, 'word': 'possibility', 'id': 469},
 {'freq': 12998, 'score': 0.262, 'word': 'difficulty', 'id': 6671},
 {'freq': 26757, 'score': 0.257, 'word': 'cost', 'id': 4093},
 {'freq': 7763, 'score': 0.242, 'word': 'consequence', 'id': 3382},
 {'freq': 15115, 'score': 0.238, 'word': 'benefit', 'id': 8285},
 {'freq': 24807, 'score': 0.231, 'word': 'value', 'id': 4141},
 {'freq': 6910, 'score': 0.228, 'word': 'threat', 'id': 7728},
 {'freq': 7619, 'score': 0.228, 'word': 'impact', 'id': 6192},
 {'freq': 15289, 'score': 0.223, 'word': 'loss', 'id': 5108},
 {'freq': 30204, 'score': 0.222, 'word': 'rate', 'id': 3128},
 {'freq': 55745, 'score': 0.222, 'word': 'problem', 'id': 2482},
 {'freq': 26921, 'score': 0.218, 'word': 'need', 'id': 2413},
 {'freq': 33315, 'score': 0.218, 'word': 'effect', 'id': 807},
 {'freq': 7024, 'score': 0.212, 'word': 'damage', 'id': 7691},
 {'freq': 16956, 'score': 0.212, 'word': 'a

In [17]:
# page Wordsketch, collocations and word combinations   same databnc 
page = 'wsketch' 

url = base_url + '/%s?corpname=%s' % (page, databnc['corpname'])

d = requests.get(url, params=databnc, auth=(USERNAME, API_KEY)).json()


print("There are %d grammar relations for %s%s (lemma+PoS) in corpus %s." % (
    len(d['Gramrels']), databnc['lemma'], databnc['lpos'], databnc['corpname']))

for relation in d['Gramrels']:
    print(relation["name"].replace('%w', databnc["lemma"]))
    for word in relation["Words"]:
    
        if("name" in word):
          
            text = word["name"].replace('%w', word["headword"])
        else:
            text = word["cm"]
        print (text, str(word["score"]))
    print("\n")

There are 30 grammar relations for risk (lemma+PoS) in corpus preloaded/bnc2.
usage patterns
Sfin 4.43
VPing 1.36
poss 1.3
Swh 0.73
VPto 0.44
Sing 0.14
it+ 0.09


modifier
an increased risk of 9.41
of subjective risk 9.32
the relative risk 9.11
a grave risk that 7.94
greater risk of 7.81
potential risks 7.59
a calculated risk 7.45
serious risk of 7.31
high risk of 7.23
the greatest risk 7.04
cancer risk 7.0
a higher risk of 6.95
measure of objective risk 6.93
unnecessary risks 6.81
low risk 6.76
a real risk of 6.73
default risk 6.64
the credit risk 6.6
of exemplar risk 6.52
non-market risk 6.52
systematic risk 6.51
an unacceptable risk 6.49
estimated risk 6.44
health risks 6.42
excess risk 6.41


object_of
to minimise the risk of 9.17
reduce the risk of 8.35
 8.05
to minimize the risk 8.01
run the risk of 7.75
increase the risk of 7.43
to avoid the risk of 7.27
eliminate the risk 7.17
risks associated with 7.06
 7.04
insured risks 6.94
lessen the risk of 6.93
 6.81
the risks involved 6

In [26]:
# Concordances 
page = 'first'
databnc = {
 'corpname': 'preloaded/bnc2',
 'format': 'json',
 'iquery': 'risk',
 'tab': 'basic',    
} 
url = base_url + '/%s?corpname=%s' % (page, databnc['corpname'])

d = requests.get(url, params=databnc, auth=(USERNAME, API_KEY)).json()

for line in d['Lines']:

    left = line["Left"][0]["str"]
    right = line["Right"][0]["str"]
    mid = line["Kwic"][0]["str"]
    
    print(left + " [[" + mid  + " ]] " + right)

Health Officers have said that the [[ risks ]]  to health are" staggering" and have urged a
's shell industry puts hawksbill turtle at [[ risk ]] <p>
floods the reactor core, thereby avoiding the [[ risk ]]  of a partial" meltdown", similar to that which
children and old people. * Increased cancer [[ risk ]]  from high concentrations of benzene and
the Gulf as resting places, would shortly be at [[ risk ]]  as they arrived en route from East Africa to
all suffered from the oil spill, presenting a" [[ risk ]]  that they may not respond to conventional
Pollution: Water Leningrad's water at [[ risk ]] <p>
suggested that some unacknowledged health [[ risk ]] , such as an outbreak of the waterborne parasite
the wealthiest EC countries, Britain as a whole [[ risks ]]  becoming a backward `region' of the Community.
the changes taking place, without ignoring the [[ risks ]]  involved.
, accused the government of playing down the [[ risks ]]  so as not to undermine its £4 billion Mersey
, which

In [23]:
# to see the structure of retrived concordances
d['Lines']


[{'Right': [{'class': '',
    'str': ' to health are" staggering" and have urged a'}],
  'hitlen': 1,
  'Tbl_refs': ['Applied science'],
  'linegroup_id': 0,
  'Links': [],
  'Kwic': [{'class': 'col0 coll', 'str': ' risks'}],
  'Refs': ['Applied science'],
  'toknum': 73399,
  'linegroup': '_',
  'Left': [{'class': '', 'str': 'Health Officers have said that the'}]},
 {'Right': [{'class': 'strc', 'str': '<p>'},
   {'class': '', 'str': 'Japan is jeopardising the existence of the'}],
  'hitlen': 1,
  'Tbl_refs': ['Applied science'],
  'linegroup_id': 0,
  'Links': [],
  'Kwic': [{'class': 'col0 coll', 'str': ' risk'}],
  'Refs': ['Applied science'],
  'toknum': 77562,
  'linegroup': '_',
  'Left': [{'class': '',
    'str': "'s shell industry puts hawksbill turtle at"}]},
 {'Right': [{'class': '',
    'str': ' of a partial" meltdown", similar to that which'}],
  'hitlen': 1,
  'Tbl_refs': ['Applied science'],
  'linegroup_id': 0,
  'Links': [],
  'Kwic': [{'class': 'col0 coll', 'str': ' ri

Reading based on corpus  created by user
---

In [14]:
# thesaurus page user corpus
page = 'thes'  
datauser = {
 'corpname': 'user%2FAleksandraTomasevic%2Frudkor',
 'format': 'json',
 'lemma': 'rizik',
 'tab': 'basic',    
 'showScores' :'1',
 'showresults':'1',
}
url = base_url + '/%s?corpname=%s' % (page, datauser['corpname'])

d = requests.get(url, params=datauser, auth=(USERNAME, API_KEY)).json()
d['Words']


[{'freq': 1177, 'score': 0.312, 'word': 'opasnost', 'id': 3647},
 {'freq': 2512, 'score': 0.307, 'word': 'riziko', 'id': 107},
 {'freq': 1173, 'score': 0.171, 'word': 'posledica', 'id': 1237},
 {'freq': 2387, 'score': 0.166, 'word': 'problem', 'id': 1574},
 {'freq': 2222, 'score': 0.166, 'word': 'trošak', 'id': 465},
 {'freq': 1972, 'score': 0.154, 'word': 'faktor', 'id': 1385},
 {'freq': 1976, 'score': 0.148, 'word': 'aktivnost', 'id': 51},
 {'freq': 1177, 'score': 0.135, 'word': 'resurs', 'id': 6448},
 {'freq': 1486, 'score': 0.135, 'word': 'mogućnost', 'id': 1399},
 {'freq': 664, 'score': 0.133, 'word': 'ograničenje', 'id': 1102},
 {'freq': 3046, 'score': 0.132, 'word': 'cilj', 'id': 48},
 {'freq': 1299, 'score': 0.129, 'word': 'odluka', 'id': 570},
 {'freq': 672, 'score': 0.128, 'word': 'situacija', 'id': 747},
 {'freq': 293, 'score': 0.123, 'word': 'neizvesnost', 'id': 6709},
 {'freq': 1285, 'score': 0.122, 'word': 'kvalitet', 'id': 2878},
 {'freq': 3741, 'score': 0.122, 'word': '

In [13]:
# Wordsketch page access
page = 'wsketch'
datauser = {
 'corpname': 'user%2FAleksandraTomasevic%2Frudkor',
 'format': 'json',
 'lemma': 'rizik',
 'lpos': '',
 'tab': 'basic',    
}

url = base_url + '/%s?corpname=%s' % (page, datauser['corpname'])

d = requests.get(url, params=datauser, auth=(USERNAME, API_KEY)).json()


print("There are %d grammar relations for %s%s (lemma+PoS) in corpus %s." % (
    len(d['Gramrels']), datauser['lemma'], datauser['lpos'], datauser['corpname']))

for relation in d['Gramrels']:
    print(relation["name"].replace('%w', datauser["lemma"]))
    for word in relation["Words"]:
    
        if("name" in word):
          
            text = word["name"].replace('%w', word["headword"])
        else:
            text = word["cm"]
        print (text, str(word["score"]))
    print("\n")

There are 12 grammar relations for rizik (lemma+PoS) in corpus user%2FAleksandraTomasevic%2Frudkor.
verb prepositional phrases
"rizik" od ... 14.65
... od "rizik" 14.65
"rizik" sa ... 11.62
... sa "rizik" 11.62
"rizik" u ... 4.97
... u "rizik" 4.97
"rizik" po ... 4.55
... po "rizik" 4.55
"rizik" na ... 3.87
... na "rizik" 3.87
"rizik" za ... 2.1
... za "rizik" 2.1
"rizik" prema ... 0.76
... prema "rizik" 0.76
"rizik" o ... 0.67
... o "rizik" 0.67
"rizik" kroz ... 0.51
... kroz "rizik" 0.51
"rizik" tokom ... 0.25
... tokom "rizik" 0.25
"rizik" nad ... 0.25
... nad "rizik" 0.25


modifiers of "rizik"
radnom mestu sa povećanim rizikom 12.49
potencijalni rizik 9.68
politički rizik 9.06
identifikovani i rangirani rizici 8.19
procenjeni rizici 8.01
neprihvatljiv Uopšteno • visoki rizik 7.97
su zaposleni posebno izloženi rizicima u slučaju nestanka 7.92
neophodna zbog upravljanja OHS rizicima . To uključuje 7.86
kojima se pojavljuje specifičan rizik od nastanka povreda 7.8
zaposlenih rizicima