# Adding main subject to RT-LAMP papers on Wikidata

- Search wd for "RT-LAMP" using Wikidata integrator
- Check which articles refer to scientific articles
- Add property "main subject" pointing to RT-LAMP (|P921|Q18394182) + based on heuristic (S887) inferred from title (Q69652283) 
- Run batch on quickstatements. 


In [1]:

from wikidataintegrator import wdi_core

rtlamp_results = wdi_core.WDItemEngine.get_wd_search_results("RT-LAMP") 

In [3]:
rtlamp_results

['Q50618516', 'Q93219964', 'Q18394182']

That was not the expected result. The search is retrieving only a part of the matches. I need to (somehow) do a [CirrusSeach](https://www.mediawiki.org/wiki/Help:CirrusSearch) programatically to retrieve all results of interest. 

[This](https://stackoverflow.com/questions/37170179/wikidata-api-wbsearchentities-why-are-results-not-the-same-in-python-than-in-wi) StackOverflow answer by Addshore gives some directions. 

In [9]:
import requests

wikidata_api_command = "https://www.wikidata.org/w/api.php?action=query&list=search&srsearch=RT-LAMP&format=json"
rtlamp_results = requests.get(wikidata_api_command)

In [19]:
import pandas as pd

pd.json_normalize(rtlamp_results.json()["query"]["search"])

Unnamed: 0,ns,title,pageid,size,wordcount,snippet,timestamp
0,0,Q88975954,88202463,,0,scientific article (preprint),2020-07-31T12:08:10Z
1,0,Q88976679,88203191,,0,scientific article (preprint),2020-07-31T12:08:21Z
2,0,Q18394182,19928324,,0,molecular biology method used to detect specif...,2020-07-31T12:06:57Z
3,0,Q28282644,29980190,,0,scientific article,2020-07-31T12:08:56Z
4,0,Q60917851,60771765,,0,,2019-11-25T08:31:30Z
5,0,Q85735284,85011481,,0,scientific article published on 22 January 2013,2020-07-19T11:54:57Z
6,0,Q97903475,96210081,,0,scientific article published on 27 July 2020,2020-07-30T08:52:01Z
7,0,Q88977451,88203962,,0,scientific article (preprint),2020-07-31T12:08:34Z
8,0,Q90799372,89994016,,0,scientific article published on 26 December 2018,2020-07-31T03:13:49Z
9,0,Q97417101,95840113,,0,scientific article published on 07 July 2020,2020-07-24T20:54:18Z


That's much better, but we only have 10 results. I think that is just an extra API parameter. 

In [22]:

wikidata_api_command = "https://www.wikidata.org/w/api.php?action=query&list=search&srsearch=RT-LAMP&format=json&srlimit=500"
rtlamp_results = requests.get(wikidata_api_command)
pd.json_normalize(rtlamp_results.json()["query"]["search"])

Unnamed: 0,ns,title,pageid,size,wordcount,snippet,timestamp
0,0,Q88975954,88202463,,0,scientific article (preprint),2020-07-31T12:08:10Z
1,0,Q88976679,88203191,,0,scientific article (preprint),2020-07-31T12:08:21Z
2,0,Q18394182,19928324,,0,molecular biology method used to detect specif...,2020-07-31T12:06:57Z
3,0,Q28282644,29980190,,0,scientific article,2020-07-31T12:08:56Z
4,0,Q60917851,60771765,,0,,2019-11-25T08:31:30Z
...,...,...,...,...,...,...,...
118,0,Q48369815,49397387,,0,scientific article published on 22 December 2009,2019-02-01T06:30:03Z
119,0,Q92752127,91915756,,0,scientific article published on 18 January 2020,2020-05-31T13:17:32Z
120,0,Q54174867,54485964,,0,scientific article published in May 1995,2019-05-12T13:03:55Z
121,0,Q71694179,71351725,,0,scientific article published on 01 September 1996,2020-01-24T03:27:15Z


That's what I'm talking about! Now we have all of them. 

I am going to apply an heuristic: snippets with "scientific" in the "snippet" column will be be considered articles.


In [23]:
rtlamp_df = pd.json_normalize(rtlamp_results.json()["query"]["search"])

In [25]:
rtlamp_df = rtlamp_df[rtlamp_df['snippet'].str.contains('scientific')]

In [27]:
for qid in rtlamp_df["title"]:
    print(qid + "|P921|Q18394182" + "|S887|Q69652283")

Q88975954|P921|Q18394182|S887|Q69652283
Q88976679|P921|Q18394182|S887|Q69652283
Q28282644|P921|Q18394182|S887|Q69652283
Q85735284|P921|Q18394182|S887|Q69652283
Q97903475|P921|Q18394182|S887|Q69652283
Q88977451|P921|Q18394182|S887|Q69652283
Q90799372|P921|Q18394182|S887|Q69652283
Q97417101|P921|Q18394182|S887|Q69652283
Q21231950|P921|Q18394182|S887|Q69652283
Q88974682|P921|Q18394182|S887|Q69652283
Q36820267|P921|Q18394182|S887|Q69652283
Q97435960|P921|Q18394182|S887|Q69652283
Q46302286|P921|Q18394182|S887|Q69652283
Q92672066|P921|Q18394182|S887|Q69652283
Q34042661|P921|Q18394182|S887|Q69652283
Q34292405|P921|Q18394182|S887|Q69652283
Q35628887|P921|Q18394182|S887|Q69652283
Q27347920|P921|Q18394182|S887|Q69652283
Q34257266|P921|Q18394182|S887|Q69652283
Q95650716|P921|Q18394182|S887|Q69652283
Q94523184|P921|Q18394182|S887|Q69652283
Q45711171|P921|Q18394182|S887|Q69652283
Q95628286|P921|Q18394182|S887|Q69652283
Q94523179|P921|Q18394182|S887|Q69652283
Q54502723|P921|Q18394182|S887|Q69652283
