This notebook reviews all of the variable search & exploration components modules.

## Table of contents:
1. [wiktiwordnetapi.py](#sec-wwn)
2. [wikipediaapi.py](#sec-wapi)
3. [svoapi.py](#sec-svoapi)

## 1. wiktiwordnetapi.py <a class="anchor" id="sec-wwn"></a>

This module loads the generated WiktiWordNet data and has two interaction functions:
 - check_domain(term) - checks if the selected term refers to a domain
 - get_category(term) - returns a dictionary of {category:definition} pairs for the selected term

The wiktiwornetapi can be imported with the following command:

In [1]:
import wiktiwordnetapi as wwnapi

Next, instantiate a WiktiWordNet object with the following command:

In [2]:
wwn = wwnapi.wiktiwordnet()

Test the functionality of the two methods available:

In [3]:
def print_domain_status(term):
    [is_domain, definition] = wwn.check_domain(term)
    
    print('According to WiktiWordNet, {} is {}a domain.'\
          .format(term, '' if is_domain else 'NOT '))
    
for term in ['dogs', 'astronomy', 'geology', 'astrology']:
    print_domain_status(term)


According to WiktiWordNet, dogs is NOT a domain.
According to WiktiWordNet, astronomy is a domain.
According to WiktiWordNet, geology is a domain.
According to WiktiWordNet, astrology is NOT a domain.


In [4]:
def print_term_categories(term):
    categories = wwn.get_category(term)
    
    num_categories = len(list(categories.keys()))
    if num_categories == 0:
        print('WiktiWordnet does not have any categories for the term {}.'\
              .format(term))
    else:
        print('Found the following {} for {}:'\
              .format('categories' if num_categories > 1 else 'category', term))
        print(', '.join(list(categories.keys())))
    
for term in ['dog', 'astronomy', 'butter']:
    print_term_categories(term)

Found the following categories for dog:
Body, Role
Found the following category for astronomy:
Domain
WiktiWordnet does not have any categories for the term butter.


## 2. wikipediaapi.py <a class="anchor" id="sec-wapi"></a>

The functions contained in this module can be used to interact with Wikipedia. They can
- perform a search and return the top/most relevant result according to the Wikipedia algorithm
- get the "bulk" text from a Wikipedia page (discarding panel information)

The main function to be used from this module is:
- get_wikipedia_text(term) : returns the text and metadata information from most closely related Wikipedia page

And there are two helper functions present:
- get_top_wikipedia_entry(term) : returns metadata information about most closely related Wikipedia page
- parse_wikipedia_page(pageid) : returns the text and disambugation information about the Wikipedia page

Load the module as follows:

In [1]:
import wikipediaapi as wapi

First, test the helper functions.

In [2]:
def get_wikipedia_page_info(term):
    results = wapi.get_top_wikipedia_entry(term)
    
    if results == {}:
        print('Did not find a relevant Wikipedia page for {}.'.format(term))
    else:
        print('Found the following Wikipedia page for {}:'.format(term))
        if 'title' in results.keys():
            print('Title: {}'.format(results['title']))
        if 'pageid' in results.keys():
            print('Page ID: {}'.format(results['pageid']))
        if 'redirecttitle' in results.keys():
            print('Redirect Title: {}'.format(results['redirecttitle']))
        if 'sectiontitle' in results.keys():
            print('Section Title: {}'.format(results['sectiontitle']))

for term in ['dog', 'crop production', 'conductivity', 'hafdkj']:
    get_wikipedia_page_info(term)

Found the following Wikipedia page for dog:
Title: Dog
Page ID: 4269567
Found the following Wikipedia page for crop production:
Title: Agriculture
Page ID: 627
Redirect Title: Crop production
Found the following Wikipedia page for conductivity:
Title: Conductivity
Page ID: 403990
Did not find a relevant Wikipedia page for hafdkj.


In [3]:
def get_wikipedia_text_info(pageid):
    [text, disambig] = wapi.parse_wikipedia_page(pageid)
    
    if disambig:
        print('Page with id {} is a disambiguation page.'.format(pageid))
    if text != []:
        print('Here are the first few lines of page id {}:'.format(pageid) )
        print(text[0])
        
for term in [4269567, 403990, 0]:
    get_wikipedia_text_info(term)

Here are the first few lines of page id 4269567:

Page with id 403990 is a disambiguation page.
Here are the first few lines of page id 403990:
Electrical conductivity
Here are the first few lines of page id 0:



Now, test the main function:

In [4]:
def get_wikipedia_text(term):
    [text, disambig, title, redirecttitle] = wapi.get_wikipedia_text(term)
    
    if title == '':
        print('No page found for term {}.'.format(term))
    else:
        print('Page Title for term {}: {}'.format(term, title))
    if redirecttitle != '':
        print('Redirect title for term {} page is: {}.'.format(term, redirecttitle))
    if disambig:
        print('Page for term {} is a disambiguation page.'.format(term))
    if text != []:
        print('Here is the first paragraph of the page for term {}:'.format(term) )
        print(text[0])

for term in ['dog', 'crop production', 'conductivity', 'hafdkj']:
    get_wikipedia_text(term)
    print('==================================')
    

Page Title for term dog: dog
Here is the first paragraph of the page for term dog:
Canis familiaris Linnaeus, 1758[2][3]
Page Title for term crop production: agriculture
Redirect title for term crop production page is: crop production.
Here is the first paragraph of the page for term crop production:
Agriculture is the science and art of cultivating plants and livestock.[1] Agriculture was the key development in the rise of sedentary human civilization, whereby farming of domesticated species created food surpluses that enabled people to live in cities. The history of agriculture began thousands of years ago. After gathering wild grains beginning at least 105,000 years ago, nascent farmers began to plant them around 11,500 years ago. Pigs, sheep and cattle were domesticated over 10,000 years ago. Plants were independently cultivated in at least 11 regions of the world. Industrial agriculture based on large-scale monoculture in the twentieth century came to dominate agricultural output,

## 3. svoapi.py <a class="anchor" id="sec-svoapi"></a>

This module interacts with the SVO SPARQL endpoint to do term search
 
The main function to be used from this module is:
- search_rank(term) : returns a pandas dataframe of directly labeled entities and linked entities related to the search term(s) as well as a rank (from 0 to 1) of the match

There are three helper functions present in the module:
- search(term) : returns a pandas dataframe of directly labeled entities and linked entities related to the search term(s)
- search_entity_links(entities) : return a Pandas dataframe containing the columns: term, entity, entitylabel, entityclass, linkedentity, linkedentitylabel, linkedentityclass
    - linkedentity (and label, class) will be one of the entities passed in
    - entity (and label, class) will be the entities linked to that entity
- search_label(term) : return a Pandas dataframe containing the columns: term, entity, entitylabel, entityclass

Load the module as follows:

In [1]:
import svoapi

Test the rank search functionality:

In [2]:
def rank_search_test(terms):
    print('Performing search for {}'.format(', '.join(terms)))
    results = svoapi.rank_search(terms)
    print("Here are the top ten search results overall:")
    for _,row in results.head(10).iterrows():
        print('\t{}\t{}'.format(row['entity'].split('/')[-1],row['rank']))
    print()
    print("Here are the top ten search results for variables:")
    for _,row in results.loc[results['entityclass']=='Variable'].head(10).iterrows():
        print('\t{}\t{}'.format(row['entity'].split('/')[-1],row['rank']))
    
search_terms = [['viscosity'], ['volume viscosity'], ['rainfall', 'precipitation']]
for term in search_terms:
    rank_search_test(term)
    print('++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')

Performing search for viscosity
Here are the top ten search results overall:
	property#viscosity	1.0
	property#viscosity_term	0.475
	property#dynamic_viscosity	0.46
	property#apparent_viscosity	0.455
	property#kinematic_viscosity	0.45
	property#power-law-fluid_viscosity	0.42
	property#log10_of_dynamic_viscosity	0.2633333333333333
	property#shear_dynamic_viscosity	0.2633333333333333
	property#volume_dynamic_viscosity	0.2583333333333333
	property#shear_kinematic_viscosity	0.2533333333333333

Here are the top ten search results for variables:
	variable#sea%40context%7Ein_%28water_eddy%29__viscosity	0.14500000000000002
	variable#air__shear_dynamic_viscosity	0.10500000000000001
	variable#air__volume_dynamic_viscosity	0.1
	variable#air__shear_kinematic_viscosity	0.09500000000000001
	variable#polymer__extensional_kinematic_viscosity	0.09500000000000001
	variable#water__shear_dynamic_viscosity	0.09500000000000001
	variable#air__volume_kinematic_viscosity	0.09000000000000001
	variable#water__vo