### Linkroot WD SWERIK
* [code](https://github.com/salgo60/Wikidata_riksdagen-corpus/blob/main/Notebook/SWERIKS%20check.ipynb)
* see [issue 27](https://github.com/swerik-project/riksdagen-persons/issues/27#issuecomment-2456456383)

**Description:** check SWERIKS linkroot in Wikidata - [P12192](https://www.wikidata.org/wiki/Property:P12192)

version 0.1 has progressbar

In [1]:
from datetime import datetime
start_time = datetime.now()
print("Last run: ", datetime.now())

Last run:  2024-12-07 10:49:58.118580


In [2]:
import sys
!{sys.executable} -m pip install wikibaseintegrator



In [3]:
from wikibaseintegrator.wbi_helpers import execute_sparql_query
from wikibaseintegrator import WikibaseIntegrator 
from wikibaseintegrator.wbi_config import config as wbi_config

In [4]:
wbi_config['USER_AGENT'] = 'WikibaseIntegrator in PAWS by salgo60'
wbi = WikibaseIntegrator()
results = execute_sparql_query("""
SELECT ?wd ?swerik WHERE {
  ?wd wdt:P12192 ?swerik.
} 
""")


In [5]:
bindings = results["results"]["bindings"]
print(f"Found {len(bindings)} results")
count = 1
#global NrnotValid 
NrnotValid = 0
#global NrValid
NrValid = 0

Found 6177 results


In [6]:
import requests

def checkurl(wd, swerik):
    global NrValid
    global NrnotValid
    base_url = f"https://swerik-project.github.io/person-catalog/{swerik}"
    try:
        response = requests.get(base_url)
        if response.status_code == 200:
            NrValid += 1
            return True, None  # Success, no error message
        else:
            NrnotValid += 1
            return False, f"WD {wd} - {base_url} - Status Code: {response.status_code}"
    except requests.exceptions.RequestException as e:
        NrnotValid += 1
        return False, f"WD {wd} - {base_url} - Error: {e}"


In [7]:
# pip install tqdm
from tqdm.notebook import tqdm 
from time import sleep
from tqdm import tqdm

In [8]:
# List to store errors
errors = []
for result in tqdm(bindings, 
                         total=len(bindings), 
                         desc="Processing records"):
    #print (result)

    swerik = result["swerik"]["value"]
    wdurl = result["wd"]["value"]
    wd = str(wdurl).replace("http://www.wikidata.org/entity/","")
    success, error_message = checkurl(wd, swerik)
    if not success and error_message:
        errors.append(error_message)



Processing records: 100%|█████████████████████████████████████████| 6177/6177 [32:28<00:00,  3.17it/s]


In [9]:
# Print the results
print(f"Number of valid URLs: {NrValid}")
print(f"Number of invalid URLs: {NrnotValid}")

if len(errors) > 0:
    print("\nErrors encountered:")
    for e in errors:
        print(f"{e}")
else:
    print("\nAll records processed without errors.")

Number of valid URLs: 6109
Number of invalid URLs: 68

Errors encountered:
WD Q4934552 - https://swerik-project.github.io/person-catalog/i-PCZrYEHwPaEeNTZphEsWTv - Status Code: 404
WD Q4957371 - https://swerik-project.github.io/person-catalog/i-31gPpUoSm7zqzQckVmfPGy - Status Code: 404
WD Q4970175 - https://swerik-project.github.io/person-catalog/i-UX4D3JJdrTjFBf2zyfHx5t - Status Code: 404
WD Q4976825 - https://swerik-project.github.io/person-catalog/i-NvxzaU2RSok83zCskNAuhg - Status Code: 404
WD Q97971262 - https://swerik-project.github.io/person-catalog/i-RH6VCPhyxs9yYcfXJzPxYT - Status Code: 404
WD Q97971276 - https://swerik-project.github.io/person-catalog/i-Cdgsqn4Ts9WMwbjXcE4537 - Status Code: 404
WD Q98271639 - https://swerik-project.github.io/person-catalog/i-x1CuoKmRHYgQr9i2kh3B5 - Status Code: 404
WD Q98538839 - https://swerik-project.github.io/person-catalog/i-TUyWWYGDFXW92GhiG3CLwF - Status Code: 404
WD Q98937434 - https://swerik-project.github.io/person-catalog/i-EzcxskgMA

In [10]:
print("End run: ", datetime.now())
print('Time elapsed (hh:mm:ss.ms) {}'.format(datetime.now() - start_time))

End run:  2024-12-07 11:22:28.108041
Time elapsed (hh:mm:ss.ms) 0:32:29.989612
