#Scikic API v0.2
 
Here are some examples of the API in action. A POST request is used for the query, in case the data we're sending is too large to fit in a GET request. Note it always uses POST (so not using the range of HTTP queries).

###Overview

The scikic api is an inference tool which takes a set of question/answer items and then queries a series of local and distant databases to generate conditional probability distributions over various features. The api is highly modular, and some modules don't use this probabilistic framework, for example the music module simply contacts api.bandsintown.com to provide useful suggestions about local bands to go and see.

The conditional probabilities are combined using a Bayesian network, using the pyMC module. Each module can provide pyMC 'features' which create functions to output the relevant probability distributions.

###Question/Answer dictionaries

The questions and answers are organised to be in 4 value tuples, containing:

- dataset: lets the system know which class to instantiate etc, examples: postcode, census, movielens, ...etc
- dataitem: used by classes to know which aspect of the dataset. For example in the movielens dataset, one could be interested in whether the user's seen a film or what rating they've given the film.
- detail: often unused by the classes, could be, for example the id of the film we want to know about.
- answer: the user's answer.

###1. Get a suggestion for a question to answer *[action: question]*

####Parameters
One passes to this call in data, a dictionary containing:
 - 'questions_asked'
 - 'unprocessed_questions'
 - 'facts'
 - 'target'

The 'questions_asked' include all the questions we've asked, so we don't ask the same question again.
The 'facts' dictionary contains information that we've found from earlier questions, etc. It allows caching of the calculations from earlier calls to the API.
The 'unprocessed_questions' are a list of question/answers that we've asked before, that haven't had their results added to the 'facts' dictionary.
The 'target' item is currently unused, but in the future will allow the choice of question to be selected to maximise the information about a particular feature.

####Returns

This call (updated) returns a dictionary containing two things:

 - a 'facts' dictionary - this you can pass back in future so that the method doesn't have to recalculate or generate earlier results.
 - a 'question' dictionary, containing the dictionary describing the question, e.g. {dataset,dataitem,detail}.

'data' contains a list of previous asked (and answered) questions, to allow an optimum question to be asked.

####Usage example

1. One might initially call this method with all these fields being empty. The method will return an empty 'facts' dictionary and a question dictionary for the first question you want to ask. 
2. Once you have an answer from the user you would call the method again, this time with the question/answer tuple as both 'questions_asked' and 'unprocessed_questions'.
3. The method will return a facts dictionary now, potentially with some results from the processing of the last answer, and another question for you to ask. 
4. When you call the method a third time (with the user's second answer), you'll pass all the question/answer tuples that you've asked so far in 'questions_asked' and the last question/answer tuple in 'unprocessed_questions'. You'll also pass the new facts dictionary, that now has some content in it.
5. This process continues, with the facts dictionary growing each time, the 'questions_asked' growing too, and each time you just have one item in 'unprocessed_questions'.

To summarise:

Generating a question requires a dictionary of 'questions_asked', 'facts' and 'target'. The 'questions_asked' is a list of dictionaries of previous questions, that you want to avoid asking again.
The 'unprocessed_questions' are questions that you've asked already and that haven't been incorporated into the 'facts' dictionary.

In [1]:
#apiurl = 'http://scikic.org/api/api.cgi';
#apiurl = 'http://127.0.0.1/~lionfish/scikic/api.cgi';
#apiurl = 'http://52.18.184.63/~ubuntu/scikic/api.cgi';
apiurl = 'http://production-backend-lb-no-ssl-1389362950.eu-west-1.elb.amazonaws.com/~ubuntu/scikic/api.cgi';

In [2]:
import requests
#We provide data about previous questions etc:
#data consists of a dictionary of 
#'questions_asked': An array of previous questions and answers we've asked, consists of a list of dictionaries.
#'facts': If you've run the inference query and stored a copy of the facts dictionary you can pass it back.
#this is used by the API to improve its choice of questions
#'target': What feature we want to know more about (example: 'age', 'gender', 'location') NOT YET IMPLEMENTED
#all of these are optional
questions_asked = [{'dataset':'postal','dataitem':'postcode','detail':''},{"detail": "", "dataitem": "favourite_artist", "dataset": "music"}]
unprocessed_questions = questions_asked
facts = {}
data = {'unprocessed_questions':unprocessed_questions,'questions_asked':questions_asked,'facts':facts}
#data = {}
payload = {"version":1, 'data': data, 'apikey': 'YOUR_API_KEY_HERE', 'action':'question'}
r = requests.post(apiurl,json=payload)
r.content

'\n{"facts": {"guess_loc": {}, "where": {}}, "question": {"detail": "{}", "dataitem": "country", "dataset": "geoloc"}}\n'

###2. Get a text string of the question *[action: questionstring]*

Once you have a tuple, like the one generated above, you may want a human readable string of the question. This method takes the tuple (in data) and returns a dictionary, of:

- 'text' - the actual string of the question (e.g. "Who's your favourite band or artist?")
- 'type' - the type of question (it might just want a text reply, so this would equal 'text' or it might be a choice, and so would say 'select'
- 'options' - optional, and is included if the type is 'select'.

In [3]:
import requests

data = {'dataset':'postal','dataitem':'postcode','detail':''}
#data = {"detail": "", "dataitem": "country", "dataset": "postal"}
payload = {"version":1, 'data': data, 'apikey': 'YOUR_API_KEY_HERE', 'action':'questionstring'}
r = requests.post(apiurl,json=payload)
r.content

'\n{"type": "text", "question": "What\'s your postcode?"}\n'

In [4]:
import json
data = {'dataset':'geoloc','dataitem':'nearcity','detail':json.dumps({'city':'Sheffield','country':'UK'})}
payload = {"version":1, 'data': data, 'apikey': 'YOUR_API_KEY_HERE', 'action':'questionstring'}
r = requests.post(apiurl,json=payload)
r.content

'\n{"type": "select", "question": "Is your home in or near Sheffield, UK?", "options": ["yes", "no", "don\'t know"]}\n'

###3. Process Answer *[action: processanswer]* DEPRECATED

<b>This is now done through other calls and storing extra information in the facts dictionary.</b>

Previously: Sometimes we need to improve or process an answer. For example if someone uses the 'where' class to give the city they are in, we can call 'processanswer' to get more details. To use, pass the question-answer dictionary as 'data' and use action 'processanswer'. The result is a new dictionary: It may be the same as the one you entered, or the class may change its contents. In this example it has looked up the latitude and longitude of sheffield.

In [5]:
import requests
data = {'dataset':'where','dataitem':'city','detail':'','answer':'sheffield'}
payload = {"version":1, 'data': data, 'apikey': 'YOUR_API_KEY_HERE', 'action':'processanswer'}
r = requests.post('http://scikic.org/api/api.cgi',json=payload)
r.content

"\nThis action is not available. Please use 'inference','getfacts','question','questionstring' or 'metadata'\n"

###4. Inference *[action: inference]*

####Parameters

The data dictionary should contain three things, similar to the "action: question" above,

- questions_asked - list of question tuples that we've asked (with their answers)
- unprocessed_questions - list of question tuples that we've asked (with their answers), that have not yet been added to the facts dictionary.
- facts - the current 'facts' dictionary (possibly provided by earlier calls using action:question)

Previously one also entered a 'features' list (features one wants inference about, but now it just returns all features).

Returns a dictionary of:
 - features - is a dictionary of things that have probabilities associated, for example one of its items is 'household' with the following fields:
 {"distribution": [0.029, 0.058, 0.23, 0.034, 0.070, 0.026, 0.055, 0.036, 0.14, 0.024, 0.023, 0.035, 0.24], "quartiles": {"upper": 11, "lower": 2, "mean": 6.46}}
 where the distribution is how likely the person is to be in each of the categories of a household (these categories can be found in the metadata from the module, or elsewhere). The quartiles don't mean much here as this is properly categorical data. This makes more sense in data such as age.
 - facts - as mentioned previously is a set of truths about the user generated from processing their answers.
 - insights - This is a list of strings, generated by each module, here is an example:
 ["I can\'t tell which country you\'re in, just looking at your facebook likes, as I can\'t see your facebook likes!", "You are aged between 20 and 33.", "You don\'t have children living at home", " I think you are Christian or of no religion."]}
 
Note regarding the distribution above: If some probabilities are zero towards the end of a list then the list will be truncated. For example if inference is certain the user is a male, then the output list will be {"factor_gender":[1.0]}. If they are definitely female it will be {"factor_gender":[0.0, 1.0]}

In [6]:
import requests
questions_asked = [{'dataset':'postal','dataitem':'postcode','detail':'','answer':'s63af'}]
unprocessed_questions = [{'dataset':'postal','dataitem':'postcode','detail':'','answer':'s63af'}]
facts = {}
        
data = {'questions_asked':questions_asked,'unprocessed_questions':unprocessed_questions,'facts':facts}
payload = {"version":1, 'data': data, 'apikey': 'YOUR_API_KEY_HERE', 'action':'inference'}
r = requests.post('http://scikic.org/api/api.cgi',json=payload)
r.content

'\n{"facts": {"guess_loc": {}, "where": {"ukcensus": [{"item": "E00172420", "probability": 1.0, "level": "oa"}], "city": [{"item": ["Sheffield", "uk"], "probability": 1.0}], "country": [{"item": "gb", "probability": 1.0}]}, "where_history": {"error": "no_fb_likes"}}, "features": {"religion": {"distribution": [0.28444444444444444, 0.03333333333333333, 0.029777777777777778, 0.032, 0.058222222222222224, 0.028888888888888888, 0.041777777777777775, 0.4915555555555556], "quartiles": {"upper": 7, "lower": 0, "mean": 4.257777777777777}}, "household": {"distribution": [0.02711111111111111, 0.05555555555555555, 0.22266666666666668, 0.02311111111111111, 0.08755555555555555, 0.032, 0.05733333333333333, 0.04533333333333334, 0.14666666666666667, 0.018222222222222223, 0.022222222222222223, 0.03111111111111111, 0.2311111111111111], "quartiles": {"upper": 11, "lower": 2, "mean": 6.41688888888889}}, "factor_gender": {"distribution": [0.352, 0.648], "quartiles": {"upper": 1, "lower": 0, "mean": 0.648}}, 

In [7]:
import requests
questions_asked = [{'dataset':'postal','dataitem':'zipcode','detail':'','answer':'86021'}]
unprocessed_questions = [{'dataset':'postal','dataitem':'zipcode','detail':'','answer':'86021'}]
facts = {}
        
data = {'questions_asked':questions_asked,'unprocessed_questions':unprocessed_questions,'facts':facts}

payload = {"version":1, 'data': data, 'apikey': 'YOUR_API_KEY_HERE', 'action':'inference'}
r = requests.post('http://scikic.org/api/api.cgi',json=payload)
r.content

'\n{"facts": {"guess_loc": {}, "where": {"uscensus": [{"item": ["04", "015", "950100", "1"], "probability": 0.514, "level": "blockgroup"}, {"item": ["04", "015", "950100", "3"], "probability": 0.486, "level": "blockgroup"}], "city": [{"item": ["Colorado City, AZ", "us"], "probability": 1.0}], "country": [{"item": "us", "probability": 1.0}]}, "where_history": {"error": "no_fb_likes"}}, "features": {"bg": {"distribution": [0.35333333333333333, 0.6466666666666666], "quartiles": {"upper": 1, "lower": 0, "mean": 0.6466666666666666}}, "factor_gender": {"distribution": [0.3168888888888889, 0.6831111111111111], "quartiles": {"upper": 1, "lower": 0, "mean": 0.6831111111111111}}, "factor_age": {"distribution": [0.02266666666666667, 0.03288888888888889, 0.025777777777777778, 0.024444444444444446, 0.03111111111111111, 0.042666666666666665, 0.036444444444444446, 0.036444444444444446, 0.03911111111111111, 0.035111111111111114, 0.029333333333333333, 0.029333333333333333, 0.032, 0.03288888888888889, 0

###5. Metadata *[action: metadata]*

Some of the classes provide metadata about the results. Use the 'metadata' action to retrieve these. Pass a dictionary in 'data' with the name of the dataset, or leave empty to get all the metadata of all the classes.

In this example we display the citation information for the 'babynames' dataset.

In [8]:
import requests
import json
data = {'dataset':'babynames'}
payload = {"version":1, 'data': data, 'apikey': 'YOUR_API_KEY_HERE', 'action':'metadata'}
r = requests.post('http://scikic.org/api/api.cgi',json=payload)
for item in json.loads(r.content):
    if 'citation' in item:
        print(item['citation'])

The ONS provide statistics on the distribution of the names of baby's in the UK: <a href="http://www.ons.gov.uk/ons/about-ons/business-transparency/freedom-of-information/what-can-i-request/published-ad-hoc-data/pop/august-2014/baby-names-1996-2013.xls">1996-2013</a> and <a href="http://www.ons.gov.uk/ons/rel/vsob1/baby-names--england-and-wales/1904-1994/top-100-baby-names-historical-data.xls">1904-1994</a>.


In this example we get all citations:

In [9]:
import requests
import json
data = {}#no dataset specified (makes it output all metadata)
payload = {"version":1, 'data': data, 'apikey': 'YOUR_API_KEY_HERE', 'action':'metadata'}
r = requests.post('http://scikic.org/api/api.cgi',json=payload)
for item in json.loads(r.content):
    if 'citation' in item:
        print(item['citation'])

The <a href="facebook.com">facebook</a> graph API
The <a href="http://www.census.gov/developers/">US census bureau</a>
The <a href="http://files.grouplens.org/datasets/movielens">movielens</a> database
The ONS provide statistics on the distribution of the names of baby's in the UK: <a href="http://www.ons.gov.uk/ons/about-ons/business-transparency/freedom-of-information/what-can-i-request/published-ad-hoc-data/pop/august-2014/baby-names-1996-2013.xls">1996-2013</a> and <a href="http://www.ons.gov.uk/ons/rel/vsob1/baby-names--england-and-wales/1904-1994/top-100-baby-names-historical-data.xls">1904-1994</a>.
The <a href="https://geoportal.statistics.gov.uk">UK office of national statistics</a> (see <a href="http://www.ons.gov.uk/ons/guide-method/geography/products/census/lookup/other/index.html">details</a> and <a href="https://geoportal.statistics.gov.uk/geoportal/catalog/search/resource/details.page?uuid={A33B0569-97E2-4F44-836C-B656A6D082B6} ">information</a>) and the US zipcode data 

###Typical API usage

The scikic front end may use the API in the following way.

In [10]:
import requests

apiurl = 'http://scikic.org/api/api.cgi';
questions_asked = []
unprocessed_questions = []
facts = {}
for loop in range(3):
    data = {'unprocessed_questions':unprocessed_questions,'questions_asked':questions_asked,'facts':facts}
    payload = {"version":1, 'data': data, 'apikey': 'YOUR_API_KEY_HERE', 'action':'question'}
    r = requests.post(apiurl,json=payload)
    question_query_result = json.loads(r.content)
    facts = question_query_result['facts']

    print question_query_result 
    data = question_query_result['question']
    payload = {"version":1, 'data': data, 'apikey': 'YOUR_API_KEY_HERE', 'action':'questionstring'}
    r = requests.post(apiurl,json=payload)

    question_string_result = json.loads(r.content)

    userinput = raw_input(question_string_result['question'])

    question = question_query_result['question']
    question['answer'] = userinput

    questions_asked.append(question)
    unprocessed_questions.append(question)
    facts = question_query_result['facts']

    
data = {'questions_asked':questions_asked,'unprocessed_questions':unprocessed_questions,'facts':facts}

payload = {"version":1, 'data': data, 'apikey': 'YOUR_API_KEY_HERE', 'action':'inference'}
r = requests.post('http://scikic.org/api/api.cgi',json=payload)
inference_results = json.loads(r.content)

print "\nInsights\n"
for insight in inference_results['insights']:
    print insight

{u'facts': {}, u'question': {u'dataset': u'music', u'detail': u'', u'dataitem': u'favourite_artist'}}
What's your favourite band or artist? (be honest!)Eels
{u'facts': {}, u'question': {u'dataset': u'movielens', u'detail': 1704, u'dataitem': u'seen'}}


KeyboardInterrupt: 