# Automatically matching new Wikipedia articles with Wikidata items using Python - Task 3

In this notebook:
1. We implemented a function that looks for a data item given a string as input via the API. It returns a custom number of results each with a description that helps the user choose the best match.
2. Created a function that returns the properties used in an item, or checks if an input list of properties is in that item. This allows to look for an item (T3), check its properties (T3) and return  desired statements only (T2).

In [2]:
import pywikibot
from pywikibot.data import api
import requests
from task2 import item_page, print_wikidata

In [3]:
# Connect to enwiki
enwiki = pywikibot.Site('en', 'wikipedia')
# and then to wikidata
enwiki_repo = enwiki.data_repository()

In [4]:
# From the provided example.py
def search_entities(site, itemtitle, limit):
    if limit <= 50:
     params = { 'action' :'wbsearchentities', 
                'format' : 'json',
                'language' : 'en',
                'limit' : limit,
                'continue': 0,
                'type' : 'item',
                'search': itemtitle}
     request = api.Request(site=site, parameters=params)
     return request.submit()

In [5]:
 def search_item_description(label, limit):
# Using the API to return QIDs and their descriptions
    dic = search_entities(enwiki_repo, label, limit)
    # Record of found QIDs 
    qids = [item['id'] for item in dic['search']]
    
    if dic:
        for item in dic['search']:
            page = item_page(item['id'])
            # Corroborating a correct item with pywikibot
            if page.get()['descriptions']['en'] == item['description']:
                print(item['id'] + ' ' + item['label'])
                # Show the description for the user to choose
                print(item['description'])
                print('\n')
                            
            else:
                print(item['description'] + ' does not match')
        
    print('Select QID:')
    qid = input().upper()
    print('\n')
    print('+++++++++++++++++++++++++++++++')
    
    if qid not in qids:
        print('Warning: selected QID not a search result')
        
    return qid

In [6]:
search_item_description('earth',3)

Q2 Earth
third planet from the Sun in the Solar System


Q21152267 dirt
natural surface of the ground


Q83697636 Earth
planet Earth as depicted in the 1987–1996 Teenage Mutant Ninja Turtles animated television series


Select QID:
q2


+++++++++++++++++++++++++++++++


'Q2'

In [7]:
# This function bridges the gap between Tasks 2 and 3
def item_properties(qid, n = False):
    page = item_page(qid)
    item_dict = page.get() 
    
    print(page.title() + ' ' + item_dict['labels']['en'])
    print('\n+++++++++++++++++++++++++++++++')
    print('-------------------------------')
    
    # Display either all the item properties,
    if n == False:
        props = [prop for prop in item_dict['claims']]
    # Or just the first n   
    elif type(n) == int:
        props = [prop for prop in item_dict['claims']][:n]
    # Or we try for a passed list of props
    elif type(n) == list:
        props = n
        
    else:
        print('Error. Please check the property list.')
        return 0
    
    for prop in props:
        # Check if the solicited properties are in our data item 
        if prop in item_dict['claims']:
            # We show both the P code and the property name, easier for user to choose
            prop_page = pywikibot.PropertyPage(enwiki_repo, 'Property:' + prop) 
            prop_name = prop_page.get()['labels']['en']
            print(prop + ' ' + prop_name)      
                  
        else:
            print('Warning: Property ' + prop + ' not in ' + qid)

In [8]:
item_properties('Q2',5)

Q2 Earth

+++++++++++++++++++++++++++++++
-------------------------------
P1589 lowest point
P1419 shape
P527 has part
P522 type of orbit
P1036 Dewey Decimal Classification


In [9]:
# Integrating finding QID through label and then showing data page info
def search_item_data(label, n, p):
    # n = number of items to display and select
    # p = number or list of properties to display
    qid = search_item_description(label, n)
    return print_wikidata(qid, p)

In [10]:
search_item_data('earth',3,2)

Q2 Earth
third planet from the Sun in the Solar System


Q21152267 dirt
natural surface of the ground


Q83697636 Earth
planet Earth as depicted in the 1987–1996 Teenage Mutant Ninja Turtles animated television series


Select QID:
q2


+++++++++++++++++++++++++++++++
Q2 Earth

+++++++++++++++++++++++++++++++
-------------------------------
P1589 lowest point


Value: Challenger Deep
Code: Q459173


Value: Galathea Depth
Code: Q1491734


Value: Vityas Deep 1
Code: Q2586548
-------------------------------
-------------------------------
P1419 shape


Value: oblate spheroid
Code: Q3241540


Value: geoid
Code: Q185969


Value: ball
Code: Q838611


Value: disk
Code: Q238231
-------------------------------
-------------------------------
