## Arken  - Wikidata
version 1.1

WD egenskap [Property:P8899](https://www.wikidata.org/wiki/Property:P8899) 
* this [notebook](https://github.com/salgo60/open-data-examples/blob/master/Arken.ipynb)  
* Task [T269064](https://phabricator.wikimedia.org/T269064)
----


#### Other sources we sync
* [Kulturpersoner Uppsalakyrkogård](https://github.com/salgo60/open-data-examples/blob/master/Check%20WD%20kulturpersoner%20uppsalakyrkogardar.ipynb)
* [Litteraturbanken](https://github.com/salgo60/open-data-examples/blob/master/Litteraturbanken%20Author.ipynb) 
  * WD property [P5101](https://www.wikidata.org/wiki/Property_talk:P5101) [P5123](https://www.wikidata.org/wiki/Property_talk:P5123)
* [Nobelprize.org](https://github.com/salgo60/open-data-examples/blob/master/Nobel%20API.ipynb)
  * WD [property 8024](https://www.wikidata.org/wiki/Property:P8024)
* [SBL](https://github.com/salgo60/open-data-examples/blob/master/SBL.ipynb) 
  * WD [property 3217](https://www.wikidata.org/wiki/Property:P3217) 
* [SKBL](https://github.com/salgo60/open-data-examples/blob/master/Svenskt%20Kvinnobiografiskt%20lexikon%20part%203.ipynb)
  * WD [property 4963](https://www.wikidata.org/wiki/Property:P4963)
* [Svenska Akademien](https://github.com/salgo60/open-data-examples/blob/master/Svenska%20Akademien.ipynb) 
  * WD [property 5325](https://www.wikidata.org/wiki/Property:P5325) 


In [1]:
from datetime import datetime
now = datetime.now()
print("Last run: ", now)

Last run:  2020-12-01 15:33:51.464472


In [2]:
import urllib3, json
import pandas as pd   
from bs4 import BeautifulSoup
import sys
import pprint
from SPARQLWrapper import SPARQLWrapper, JSON
from tqdm.notebook import trange  
from wikidataintegrator import wdi_core, wdi_login

endpoint_url = "https://query.wikidata.org/sparql"

SparqlQuery = """SELECT ?item ?arkid WHERE {
?item wdt:P8899 ?arkid
}"""


http = urllib3.PoolManager()

# Query https://w.wiki/Vo5
def get_results(endpoint_url, query):
    user_agent = "user  salgo60/%s.%s" % (sys.version_info[0], sys.version_info[1])
    sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    return sparql.query().convert()

SparQlResults = get_results(endpoint_url, SparqlQuery)
length = len (SparQlResults["results"]["bindings"])
df = pd.DataFrame(columns=['WD', 'arkid'])
    
for r in trange(0,length):
    resultSparql = SparQlResults["results"]["bindings"][r]
    wd = resultSparql["item"]["value"].replace("http://www.wikidata.org/entity/","") 
    try: 
        wdArkid= resultSparql["arkid"]["value"] 
    except:
        wdArkid = ""    
    df = df.append({'WD': wd, 'arkid': wdArkid}, ignore_index=True)
  

HBox(children=(FloatProgress(value=0.0, max=1691.0), HTML(value='')))




In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1691 entries, 0 to 1690
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   WD      1691 non-null   object
 1   arkid   1691 non-null   object
dtypes: object(2)
memory usage: 26.5+ KB


In [4]:
df.head(200)

Unnamed: 0,WD,arkid
0,Q1777178,"Ahnlund,-Nils"
1,Q4934967,"Alfons,-Harriet"
2,Q54945,alfven-hannes
3,Q522079,"Alfvén,-Inger"
4,Q22250694,"Alin,-Hans"
...,...,...
195,Q1554722,"Brusewitz,-Gunnar"
196,Q4941084,"Bråkenhielm,-Malvina"
197,Q84423,"Buber,-Martin"
198,Q571710,"Bull,-Francis"


In [5]:
import urllib.parse
urlbase = "https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page="
urlbase_entry = "https://arken.kb.se"
df = pd.DataFrame(columns=['nameAuth', 'urlAuth', 'Auktoriserad', 'Datum', 'Auktoritetspost'])

for i in range(1,80):
    url = urlbase + str(i)
    print(url)
    r = http.request('GET', url)
    soup = BeautifulSoup(r.data, "html.parser")
    for link in soup.select('div.search-result-description a[href]'):
        nameAuth = link.string
        urlAuth = urllib.parse.unquote(link['href'].split("/")[1])
        #print ("\t",urlAuth, nameAuth)    
        urlentry = urlbase_entry + link['href']
        #print ("\t\t",urlentry)
        try:
            r_entry = http.request('GET', urlentry)
            soup_entry = BeautifulSoup(r_entry.data, "html.parser")
            Auktoriserad = ""
            Datum = ""
            Auktoritetspost = ""
            fields = soup_entry.select('div.field')
            for f in fields:
                h3 = f.select("h3")
                divText = f.select("div")
                if len(h3) > 0:
                    if "Auktoriserad" in h3[0].getText():
                        #print("\t\tAuktoriserad: " + divText[0].getText().strip())
                        Auktoriserad = divText[0].getText().strip()
                    if "Datum för verksamhetstid" in h3[0].text:
                        #print("\t\t\tDatum: " + divText[0].getText().strip())
                        Datum =  divText[0].getText().strip()
                    if "Auktoritetspost" in h3[0].text:
                        Auktoritetspost =   divText[0].getText().strip()
                        #print("\t\t\tAuktoritetspost: " + divText[0].getText().strip())
                
            df = df.append({'nameAuth': nameAuth, 'urlAuth': urlAuth, 'Auktoriserad': Auktoriserad, 
                              'Datum': Datum, 'Auktoritetspost': Auktoritetspost}, ignore_index=True)
  
        except:
            print("Error")
                
 

https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=1
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=2
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=3
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=4
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=5
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=6
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=7
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=8
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=9
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=10
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=11
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=12
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=13
https://arken.kb.se/actor/browse?sort=alphabetic&sortDir=asc&page=14
https://arken.kb.se/actor/browse?sort=alpha

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5888 entries, 0 to 5887
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   nameAuth         5888 non-null   object
 1   urlAuth          5888 non-null   object
 2   Auktoriserad     5888 non-null   object
 3   Datum            5888 non-null   object
 4   Auktoritetspost  5888 non-null   object
dtypes: object(5)
memory usage: 230.1+ KB


In [7]:
df.head(200)

Unnamed: 0,nameAuth,urlAuth,Auktoriserad,Datum,Auktoritetspost
0,"Abenius, Margit","Abenius,-Margit","Abenius, Margit",1899-1970,https://libris.kb.se/ljx00mt45v0dfx5#it
1,"Abenius, Vera","Abenius,-Vera","Abenius, Vera",1890-1967,ediffah:kb:636923:1147851925
2,"Aber, Erich","Aber,-Erich","Aber, Erich",1904-1995,ediffah:kb:294903:1160049953
3,"Abildgaard, Nicolai","Abildgaard,-Nicolai","Abildgaard, Nicolai",1743-1809,https://libris.kb.se/sq4671cb16gj9q4#it
4,"Abrahamson, August","Abrahamson,-August","Abrahamson, August",1817-1898,https://libris.kb.se/wt7bkc9f1h1tt4z#it
...,...,...,...,...,...
195,"Andersson, Ingeborg","Andersson,-Ingeborg","Andersson, Ingeborg",,
196,"Andersson, John Ulf","Andersson,-John-Ulf","Andersson, John Ulf",1934-2013,https://libris.kb.se/zw9dl30h19vj8rr#it
197,"Andersson, Kaj",andersson-kaj,"Andersson, Kaj",1897-1991,https://libris.kb.se/b8nqswmv3g69j8t#it
198,"Andersson, Kaleb","Andersson,-Kaleb","Andersson, Kaleb",1889-1983,https://libris.kb.se/fcrv3vrz0m8jvnz#it


In [8]:
df.to_csv(r'Arken.csv')