# Automatically matching new Wikipedia articles with Wikidata items using Python - Task 2

In this notebook:
1. We logged into our Wikimedia account after setting up pywikibot and user-config.py as per instructions
2. Connected to Wikidata
3. Printed the Wikidata page for our [previous task](https://www.wikidata.org/wiki/User:A1exGP/Outreachy_1)
4. Added 'hello' at the end of the same page
5. Created a function that loads a Wikidata item and prints specified statements including qualifiers

6. Created a function that adds statements to an item
7. Created a function that adds qualifiers to statements 

All the changes made to Wikidata items have been reverted so  the scripts can be tested again.


In [1]:
import pywikibot
from textwrap import indent
# Indent allows to indent multi-line strings

# Connect to enwiki
enwiki = pywikibot.Site('en', 'wikipedia')
# and then to wikidata
enwiki_repo = enwiki.data_repository()

pywikibot.Site().login()

In [2]:
def item_page(qid=str):
    return pywikibot.ItemPage(enwiki_repo, qid)

def data_page(url=str):
    return pywikibot.Page(enwiki_repo, url)

def property_page(pid=str):
    return pywikibot.PropertyPage(enwiki_repo, 'Property:' + pid)

def wiki_claim(pid=str):
    return pywikibot.Claim(enwiki_repo, pid)

def print_page(url=str):
    print(data_page(url).text)

In [3]:
print_page('User:A1exGP/Outreachy_1')

= Deserts =

Notes:

Properties were listed in the order they appeared in the Wikidata item.

There were some properties that had multiple values both in article and data page. Only the first one was included.

Articles where selected with diversity in mind. Some of them where taken from [[:en:List of deserts by area]]. At least one desert from each type was chosen, as well as at least one from North America, South America, Europe, Australia, Western Asia, Eastern Asia, Northern Africa, Southern Africa each.

===== Properties used =====

*{{P|P17}}
*{{P|P18}}
*{{P|P30}}
*{{P|P31}}
*{{P|P131}}
*{{P|P242}}
*{{P|P361}}
*{{P|P610}}
*{{P|P625}}
*{{P|P948}}
*{{P|P1425}}
*{{P|P1589}}
*{{P|P2043}}
*{{P|P2044}}
*{{P|P2046}}
*{{P|P2049}}

Unique properties listed: 16. Counting qualifiers: 18

*{{P|P518}}
*{{P|P2096}}

Qualifiers allow to expand on the information that can be displayed in simple property-value pairs. They are attached to values and include a property and value themselves. These w

In [4]:
def add_wikidata_text(url=str, added=str):
    # Adds text at the end of the page in a new line
    # Inputs are the string url after '/wiki/' and the added text
    page = data_page(url)    
    page.text = page.text + '\n\n' + added
    
    try:
        page.save("Text added through pywikibot")
    except:
        print('Error. Please check log in status.')
    
add_wikidata_text('user:A1exGP/Outreachy_1', 'hello')

Sleeping for 8.7 seconds, 2021-11-04 13:56:12
Page [[wikidata:User:A1exGP/Outreachy 1]] saved


In [5]:
def print_wikidata_p(item_page=str, n = False):
    # We'll access this dic a number of times
    item_dict = item_page.get() 
    
    print(item_page.title() + ' ' + item_dict['labels']['en'])
    print('\n+++++++++++++++++++++++++++++++')
    print('-------------------------------')
    
    # Create a list of all the P numbers from our item page
    if n == False:
        props = [prop for prop in item_dict['claims']]
    # Work with passed list of P numbers
    elif type(n) == list:
        props = n
    # Lastly, use the first n properties
    elif type(n) == int:
        props = [prop for prop in item_dict['claims']][:n]
    else:
        print('Error. Please check the property list.')
        return 0
    
    for prop in props:        
        try:
            # Retrieve the property page object and extract the name
            # Printing just the P number is a faster, less descriptive alternative
            prop_page = property_page(prop)
            prop_label = prop_page.labels['en']
            print(prop + ' ' + prop_label)
            
            for claim in item_dict['claims'][prop]:
                # There are two scenarios: the value is a data item or not
                # Try to print the value Q number and label
                try:
                    qid = claim.getTarget().title()
                    label = claim.getTarget().labels['en']
                    print(qid + ' ' + label)
                # Else print the value object as is
                except:
                    print(claim.getTarget())
                    
                # Getting the qualifiers when relevant
                # Code is analogous to the regular statements
                if claim.qualifiers:
                    for qprop, qvalues in claim.qualifiers.items():
                        qprop_page = property_page(qprop)
                        qprop_label = qprop_page.labels['en']
                        print(indent(qprop + ' ' + qprop_label, '     '))
                        for qvalue in qvalues:
                            # Same distinction. Qualifier value might be an item or not
                            try:
                                qid = qvalue.getTarget().title()
                                label = claim.getTarget().labels['en']
                                print(indent(str(qid) + ' ' + str(label), '     '))
                            except:
                                print(indent(str(qvalue.getTarget()), '     '))
                    
            print('-------------------------------')
            print('-------------------------------')
            
        except:
            print('Error. Please check the property list.')
            break
            
        # The outer try can fail because: an element of props isn't a valid string P number,
        # or one of the P numbers isn't a key in item_dict['claims'] a.k.a that property can't be found
        # in our passed item page.
        
# Calling the function by a string is more intuitive
def print_wikidata(s, n = False):
    page = pywikibot.ItemPage(enwiki_repo, s)
    
    return print_wikidata_p(page, n)

In [6]:
# First 2 statements from Sahara
print_wikidata('Q6583',2)
print('\n')

# Sandbox with qualifier
print_wikidata('Q4115189',['P735', 'P7763'])

Q6583 Sahara

+++++++++++++++++++++++++++++++
-------------------------------
P31 instance of
Q8514 desert
-------------------------------
-------------------------------
P30 continent
Q15 Africa
-------------------------------
-------------------------------


Q4115189 Wikidata Sandbox

+++++++++++++++++++++++++++++++
-------------------------------
P735 given name
Q1242905 Penelope
-------------------------------
-------------------------------
P7763 copyright status as a creator
Q73555012 works protected by copyrights
     P1001 applies to jurisdiction
     Q87048619 works protected by copyrights
-------------------------------
-------------------------------


In [7]:
def add_statement(qid=str, pid=str, value=str, value_type = 's'):
    # We will consider 2 scenarions: value_type = 's' for a string and 'q' for an item
    item = item_page(qid)
    claim = wiki_claim(pid)
    
    if value_type == 'q':
        try:
            target = item_page(value)
            claim.setTarget(target)
            item.addClaim(claim, summary=u'pywikibot test Outreachy')
            print(pid + ':' + value + ' added to ' + qid)
        except:
            print('Error. Please check value QID or credentials')
            
    if value_type == 's':
        try:
            target = pywikibot.WbMonolingualText(value,'en')
            claim.setTarget(target)
            item.addClaim(claim, summary=u'pywikibot test Outreachy')
            print(pid + ':' + value + ' added to ' + qid)
        except:
            print('Error. Please check credentials')

In [8]:
# Adding Occupation:Educator and Title:Wikipedia Sandbox
add_statement('Q4115189', 'P106', 'Q974144', 'q')
add_statement('Q4115189', 'P1476', 'Wikipedia Sandbox', 's')

Sleeping for 4.2 seconds, 2021-11-04 13:56:26


P106:Q974144 added to Q4115189


Sleeping for 9.1 seconds, 2021-11-04 13:56:31


P1476:Wikipedia Sandbox added to Q4115189


In [9]:
def add_qualifier(qid=str, pid1=str, value1=str, pid2=str, value2=str):
    # Similar to previous function with item values, but now we take outer and inner pids 
    item = item_page(qid)
    # We look for a the speficic property-value pair pid1-value1.
    for claim in item.claims[pid1]: 
        if claim.getTarget().title() == value1:
            qualifier = wiki_claim(pid2)
            target = item_page(value2)
            qualifier.setTarget(target)
            claim.addQualifier(qualifier, summary=u'pywikibot test Outreachy')
            print(pid2 + ':' + value2 + ' added to ' + pid1 + ':' + value1 + ' in ' + qid)

In [10]:
print_wikidata('Q4115189',['P31'])
print('\n')

# We are looking for the P31:Q95074 statement in Q4115189
# If it exists, add a qualifier P1080:Q171
add_qualifier(qid = 'Q4115189', pid1 = 'P31', value1 = 'Q95074', pid2 = 'P1080', value2 = 'Q171')
print('\n')

print_wikidata('Q4115189',['P31'])

Q4115189 Wikidata Sandbox

+++++++++++++++++++++++++++++++
-------------------------------
P31 instance of
Q95074 fictional character
-------------------------------
-------------------------------




Sleeping for 7.7 seconds, 2021-11-04 13:56:43


P1080:Q171 added to P31:Q95074 in Q4115189


Q4115189 Wikidata Sandbox

+++++++++++++++++++++++++++++++
-------------------------------
P31 instance of
Q95074 fictional character
     P1080 from narrative universe
     Q171 fictional character
-------------------------------
-------------------------------
