In [162]:
%%bash
# put these in requirements later on 
# but stuff here for convenience
conda install -c anaconda urllib3 --yes
conda install -c jmcmurray json --yes
conda install -c conda-forge sparqlwrapper --yes

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /Users/plewis/opt/anaconda3

  added / updated specs:
    - urllib3


The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates    conda-forge::ca-certificates-2020.4.5~ --> anaconda::ca-certificates-2020.1.1-0
  certifi            conda-forge::certifi-2020.4.5.2-py37h~ --> anaconda::certifi-2020.4.5.1-py37_0
  conda              conda-forge::conda-4.8.3-py37hc8dfbb8~ --> anaconda::conda-4.8.3-py37_0
  openssl            conda-forge::openssl-1.1.1g-h0b31af3_0 --> anaconda::openssl-1.1.1g-h1de35cc_0


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /Users/plewis/opt/ana

In [163]:
from __future__ import print_function
import json
import urllib3
# from https://developers.google.com/knowledge-graph

'''
Code to pull information from google knowledge-graph
and store it in a simple dictionary 

Author: P. Lewis
Date: 16 June 2020

'''

def justDoit(query='Taylor Swift',itemNumber=0):
    '''
    Pull some information on a topic (person, company etc)
    described in `query` (e.g. 'Mickey Mouse')
    
    itemNumber(s) : which item(s) to select (by score)
                    default is 0 (1st item: highest score)
                    This will generally be fine, but may fail
                    with something obscure and you need to use a diifferent 
                    itemNumber. It can be a list, in which case the 
                    return is a list of dictionaries.
                 
    return:         Dictionary of attributes of entry in 
                    Google knowledge graph. This will be a list if
                    itemNumber is a list
    
    Assumes api_key available in .api_key.txt

    '''
    # ensure list
    if type(itemNumber) != list:
        itemNumber = [itemNumber]
    
    # obs not secure ATM, but easier for prototype
    try:
        api_key = open('.api_key.txt').read()
    except:
        print("failed to read API key from file .api_key.txt")
        print("see: https://console.developers.google.com/",\
              "apis/credentials?folder=&organizationId=&project=")
        exit(0)

    service_url = "https://kgsearch.googleapis.com/v1/entities:search"
    params = {
        'query'  : query,
        'limit'  : 10,
        'indent' : True,
        'key'    : api_key
    }

    # get information from url
    url = service_url
    http = urllib3.PoolManager()
    r = http.request('GET', url,fields=params)
    response=json.loads(r.data.decode('utf-8'))
    
    # get items from list
    retval = []
    for value in itemNumber:
        # if we fail on any, we fail on all
        # could make this more tolerant
        try:
            # lets go with the highest score match
            item = response['itemListElement'][value]
            # pull some attributes and put them in 
            # a dictionary called attributes
            attributes = {
                'name'       : item['result']['name'],
                'type'       : item['result']['@type'],
                'blurb'      : item['result']['detailedDescription']['articleBody'],
                'url'        : item['result']['detailedDescription']['url'],
                'image'      : item['result']['image']['url'],
                'resultScore': item['resultScore'],
                'description': item['result']['description']
            }
            retval.append(attributes)
        except:
            retval.append({})
            
    # un-list it if appropriate
    if len(retval) == 1:
        retval = retval[0]
        
    return(retval)

In [164]:
# example runs and printing
import pprint
names = ['mickey mouse', 'donald duck', 'nike', 'audi']


pp = pprint.PrettyPrinter(indent=4)

# loop over names and printed  out info
for this in names:
    print('-'*80+'\n'+this.title()+'\n'+'-'*80+'\n')  
    # use 2 sources
    result = justDoit(this)#.update(get_wikidata(this))
    pp.pprint(result)
    #print(get_wikidata(this))

--------------------------------------------------------------------------------
Mickey Mouse
--------------------------------------------------------------------------------

{   'blurb': 'Mickey Mouse is a cartoon character and the mascot of The Walt '
             'Disney Company. He was created by Walt Disney and Ub Iwerks at '
             'the Walt Disney Studios in 1928. ',
    'description': 'Cartoon character',
    'image': 'https://commons.wikimedia.org/wiki/File:Mickey-Mouse.png',
    'name': 'Mickey Mouse',
    'resultScore': 17994.41015625,
    'type': ['Thing'],
    'url': 'https://en.wikipedia.org/wiki/Mickey_Mouse'}
--------------------------------------------------------------------------------
Donald Duck
--------------------------------------------------------------------------------

{   'blurb': 'Donald Fauntleroy Duck is a cartoon character created in 1934 at '
             'Walt Disney Productions. Donald is an anthropomorphic white duck '
             'with a ye