# Why this code exists
This notebook takes the final annotations dataset **final_annotations_set.csv**, and verifies cases of inaccessibility.

In [1]:
import json
import copy
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import requests

In [2]:
final_annotation_set = pd.read_csv('data/final/all_aggregated_annotations/final_annotation_set.csv')
final_annotation_set.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2310 entries, 0 to 2309
Data columns (total 25 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   reference_id           2310 non-null   object 
 1   url                    2310 non-null   object 
 2   language_crawl         2310 non-null   object 
 3   statement_node         2310 non-null   object 
 4   subject                2310 non-null   object 
 5   predicate              2310 non-null   object 
 6   object                 2310 non-null   object 
 7   url_type               2310 non-null   object 
 8   is_inferred_from       2310 non-null   object 
 9   stated_in              2310 non-null   object 
 10  external_id_prop       2310 non-null   object 
 11  external_id            2310 non-null   object 
 12  internal_urls          2310 non-null   object 
 13  external_url           2310 non-null   object 
 14  wikimedia_import_urls  2310 non-null   object 
 15  retr

In [3]:
final_annotation_set.sample(5)

Unnamed: 0,reference_id,url,language_crawl,statement_node,subject,predicate,object,url_type,is_inferred_from,stated_in,...,retrieved,publication_date,is_present,difficulty,reason,author,publisher,sub_publisher,long_sub_publisher,authoritative
2150,43a1455e231f247818137723d71116df5d029805,https://web.archive.org/web/20170330183441/htt...,ja,Q50317871-bf6eacd2-474e-ba40-0585-a38188b87090,西ただす,読み仮名,にし ただす,external_url,,,...,,,1,3,-1,1,2,-1,2.0,Yes
728,13bfa5f7da675f86f160713935213af04541ea30,https://viaf.org/viaf/11703791,es,Q33123291-16BC7E5F-76A6-4D8B-BD67-B4B553A2AC79,Jaume Alonso-Cuevillas,ID WorldCat de identidades,lccn-n2004097312,internal_url,,,...,,,1,2,-1,1,1,3,1.3,Yes
143,9a9fb043035d2bc7ad595422f1b062426fb406e6,https://pubmed.ncbi.nlm.nih.gov/1384599,en,Q43713334-DE0727AB-7BE9-441A-B933-CCD6A7CA6940,Dynamic properties of the colicin E1 ion channel.,author name string,Cohen FS,internal_url,,Q5412157,...,"{'time': '+2017-11-26T00:00:00Z', 'timezone': ...",,1,3,-1,1,0,0,0.0,Yes
2228,a45c6dbfdd832bbbede07f89cd2b3e4bc969971a,https://viaf.org/viaf/258111558,ja,Q212713-2930FDDF-C876-49F2-83F5-4D0B7B36DFDA,文京区,WorldCat 識別ID,viaf-258111558,internal_url,,,...,,,1,2,-1,1,1,3,1.3,Yes
583,b0677578f4ee287ac2d089b8c0f7d69f05da4622,https://tools.wmflabs.org/heritage/api/api.php...,es,Q43112386-F0C16BD1-9061-41ED-8247-8132001D541A,Grabado Rupestre Monte Da Rocha Laxe Das Chave...,estatus patrimonial,Bien de Interés Cultural,external_url,,Q28563569,...,,"{'time': '+2017-11-13T00:00:00Z', 'timezone': ...",1,3,-1,3,5,-1,5.0,Inaccessible


# Checking references deemed inaccessible

We'll pick up references deemed inaccessible by the workers and check if they are indeed inaccessible

## By task 1 results

For task 1, these would be those with barrier code 0 (hard inaccessibility) and 1-3 (soft inaccessibility).

Code 4 is not inaccessibility, but irrelevancy due to inability of digesting the content.

Codes 5 and 6 are not inaccessibility, but irrelevancy due to inability to find the content after digesting it.

So we'll check codes 0-3 and see if they really reflect the website's content, or if the workers misunderstood the concept of inaccessibility.

Lets look at some of the feedback we got first, so that we can be informed about the worker's reasonings:
1. "On 5/6, the object was \"Ministerio de la Presidencia, Relaciones con las Cortes y Memoria Democrática\" and what appeared in the Wikipedia page was \"Ministerio de la Presidencia, Relaciones con las Cortes e Igualdad\". Which turns out is the same thing, but I selected 'No' because it didn't appear as stated here."
2. "On some of the questions, where the object is a Wikimedia Commons image, the URL given is not exactly the URL of the image in the Wikipedia article, but the images are the same, so I selected 'Yes'."
3. "The \"Podemos\" (last reference) is kinda a tricky one, since \"Podemos\" is a party but it's acronym is the one that shows in the website being \"PTN\", so people might get the wrong idea and say that it the object \"podemos\" doesn't show up in the page, when it actually does, but with a different name."
4. "Subject: Aristoteles Jose de Souza Silva\rPredicate: ocupação\rObject: mercador\r\rThe word \"mercador\" is kinda obsolete being used \"Comerciante\" instead, they are synonyms, but still... could confuse some people"
5. "Task is very good, only think i disliked is that the link for museum of São Paulo isn't to the OBJECTS PAGE but for the \"ICONOGRAFIA\" which made me on the first hits don't find what was asked, later navigating through the page i understood how to do it! Also would be good to not lose all the work when refreshing the page or closing it by accident. Thanks for the good value and interesting task!"
6. "https://collectie.nederlandsfotomuseum.nl/fotografen/detail/8f5e3c43-e6fe-1b66-7db7-7a0f1f0fff61 all data object name etc where there however it is not clear is there is an actual work of this photographer in the collection. All other data are lacking"
7. "Het steegje is infindable http://resolver.mskgent.be/collection/work/data/1978-M this link did not work as well.\rHad to navigate extra for other one:\rhttp://www.vlaamsekunstcollectie.be/collection.aspx?p=0848cab7-2776-4648-9003-25957707491a&inv=1850-C"
8. "I have to disagree with question 4: the object in the task was in English, but the English title was nowhere to be found on the website."
9. "RUDOLF BOS\tMechanisch recht\t00003788583\tComponist\t1\tSTEMRA\r\rOfficial name apparently Rudolf and \"calling name\" Ruud"
10. "2: In the text it was specified that the building was started in July 1850 and was finished in 1851, so the object 1850-01-01 is incorrect."
11. "6: an \"arrangeur\" adapts music; the object was the person who did the lights though."
12. "For questions 2 and 4 I answered YES, but they had multiple answers. For question 2, another answer (Nijmegen) would have been more correct."
13. "https://www.mleuven.be/nl seems not te be able to find anything on it not even with search option\rhttps://www.gidsvoornederland.nl/werken-met-gids/meerwaarde-voor-bibliotheken/bibliotheken-in-nederland?pi=364&organisation_id=199527&startindex=1250 the subject was mentioned here, but object and predicate where not. No search option on site\rFor other one i had to navigate to additional site: https://viaf.org/viaf/280808699/viaf.xml"
14. "https://hart.amsterdam/collectie/object/amcollect/38168 for the dating of this one, a wide range of years is given \r\r1837 – 1897 it implies that they do not know for sure the given year. Taking the average here just does not do it justice."
15. "OS-I-167 so object was close but lacking a minus sign"
16. "Object 5 wasn't wrong, just very incomplete"
17. "https://collectie.nederlandsfotomuseum.nl/fotografen/detail/8f5e3c43-e6fe-1b66-7db7-7a0f1f0fff61 it is not clear if there are works from him, they are not visible here.\rhttp://data.kunsten.be/productions/437143 she is mentioned as actor"
18. "https://data.collectienederland.nl/page/aggregation/catharijneconvent/BMH-s3153 the data provider is Museum Catharijnenconvent it is not completely clear that this is the actual location for th object"
19. "ADVN - archief voor nationale bewegingen (volledig) Just approximately it :)"
20. "https://rkd.nl/nl/explore/artists/8719 was born in Suriname technically long time nationality would be there indeed Kingdom of the Netherland or Dutch nationality. In current time he'd have their own nationality i guess..\rhttps://rkd.nl/nl/explore/images/271056 the letter G is lacking before the inventorynumber"
21. "1600 - 1609 was the date of the maria portret thing, not an exact date but an estimation of the possible date. Tasking 1604 as average is a bit unusual, making it seem like the date is exactly known, while in fact it is not."
22. "Number 6: the full name wasn't given (middle name was missing)"
23. "Searching for Verrusalem on that site also gives no results:\rhttps://www.mleuven.be/nl/search?search=Portret%20van%20ridder%20Verrusalem\r\rhttps://www.ecal.nu/?title=archief&mivast=26&mizig=210&miadt=26&micode=0263&miview=inv2\rIs not just called Winterswijk but Gemeentebestuur Winterswijk it is about the municipality the board of the city, not just the city of Winterswijk"
24. "https://www.franshalsmuseum.nl/nl/search-results/query/evert+doublet could not find this even when searching webpage.\rusually when measuring width length etc you set the unit of measurement with it breedte 138.5 cm  or 138.5cm not just 138.5 cause then no one really knows what size it actually is.\rhttps://rkd.nl/nl/explore/artists/265464 only his birth year is mentioned not the actual date."
25. "Object 3 has \"cm\" missing, for object 6 I could only find the year, while you stated a specific date."
26. "https://collectie.nederlandsfotomuseum.nl/fotografen/detail/78d5cd8c-e602-081e-a9ae-1527f8eb6c2a photographer is mentioned by name only, no visible works are found when you click further, so not so sure if they actually have works from him in the collection or just have his name on the webpage. It just isn't clear."
27. "on of them had a very unclear birth date: \rantwerpen 1583/1585\rca. 1583 (Saur 2013); 1585 (DataBnF); Grivel 1986\rFurthermore this one where Amsterdam was object as workplace it  was only there for 1 year, for many years he worked at Nijmegen i think 15 + years"
28. "it keep refering me to https://www.mskgent.be/nl this site instead of the link i clicked need to paste additional info in url of browser to actually see the information"
29. "https://rkd.nl/nl/explore/artists/463883 his nationality is Dutch, the answer as posed said. Nationality: Kingdom of the Netherlands. Which is the Nation, not necessarily the nationality as mantioned on the website :) though deduction one can determine that what is mentioned is that nationality = Nederlands"
30. "Those for which you say you know the answer to the question, can still be found if you think a bit:\rhttp://www.vlaamsekunstcollectie.be/collection.aspx?p=0848cab7-2776-4648-9003-25957707491a&inv=2005-N\rIt is just on a different website than the url mentioned, but still findable"
31. "https://collectie.nederlandsfotomuseum.nl/fotografen/detail/8f5e3c43-e6fe-1b66-7db7-7a0f1f0fff61\rThe Photographers it is unclear whether or not there is actual work of them in this musuem, it is not visible at least. All information about the object was present on the website though."
32. "http://www.vlaamsekunstcollectie.be/collection.aspx?p=0848cab7-2776-4648-9003-25957707491a&inv=2008-E-165 had to navigate to this website that is not using any certificate in its browser window, to find the object info\rSame for  https://hart.amsterdam/collectie/object/amcollect/38882\radditional url that i had to use to find object info"
33. "https://www.mleuven.be/nl/search?search=Henri%20Doupagne searched on the Belgian website, could not find anything"
34. "https://www.rijksmuseum.nl/nl/collectie/SK-A-89 the height was 18,5 cm \r\rfor http://resolver.mskgent.be/collection/work/data/1978-M need to go to \rhttp://www.vlaamsekunstcollectie.be/collection.aspx?p=0848cab7-2776-4648-9003-25957707491a&inv=1883-G to actually see information"
35. "https://viaf.org/viaf/50344159/ not clear what ULAN code is, did see the same number JPG|500084702 can't tell whether this is an ULAN number or not."
36. "http://data.kunsten.be/organisations/cc-de-velinx de Velinx is the space where it is held not the production company\rhttps://www.mskgent.be/nl/collectie can't really find it \"het steegje\" in this website."
37. "https://data.collectienederland.nl/page/aggregation/rijkscollectie-rce/K112 the object 3.2 it is not the thickness but the depth: diepte: 3.2 cm Lijst  of the List of the painting."
38. "Yahoo Japan link always gives me this message and I have to click on the link and search the subject. サービス終了のお知らせ\rいつもYahoo! JAPANのサービスをご利用いただき誠にありがとうございます。\rお客様がアクセスされたサービスは本日までにサービスを終了いたしました。\r今後ともYahoo! JAPANのサービスをご愛顧くださいますよう、よろしくお願いいたします"
39. "Part 4: Object was 宋廷芬 rather than 宋庭芬. It seemed to me that I should still count it since it's just an error"
40. "The last one about ブレイディ・テネル used the alternate spelling of ブレイディー on the site."
41. "In 2, the answer was abbreviated but I think it was obvious so I still counted it."
42. "The one i could not find, i even searched for the \"de zaaier\" https://www.mleuven.be/nl/search?search=de%20zaaier\rdoes not return any hits"
43. "In question 6, I would have picked a different date: the average of the two dates given".
44. "https://rkd.nl/nl/explore/artists/36849 born 1919-1939 they say approximately aroun 1929 \rhttps://data.collectienederland.nl/page/aggregation/rijkscollectie-rce/SZ60224 the website is from RCE but not clear if the artwork is located in their depot, that is not mentioned anywhere explicitly"
45. "Last question was about the Wereldmuseum. It is clear that the collection belongs to that, just not clear if the Wereldmuseum is actually located in the city of Rotterdam. Would have to google to find which city it is actually located in."
46. "With question 6, I only found the year on the website, and the Object was a specific date in February."


With this feedback, here's what we can conclude for the reasons workers might claim information is either inaccessible or irrelevant:

- **Surface forms**: When things are the *same entity* but with *different names*, workers might consider it irrelevant or relevant based on personal opinion, e.g. feedbacks 1, 3, 4, 39
	- This includes when things are the same but in *different languages*, e.g. feedbacks 8, 
	- This also includes *nicknames* or when names are lacking, e.g. feedback 9, 22 
	- This also includes *slight variations*, e.g. feedback 19, 40, 41
	- This even includes errors, e.g. feedback 39
    - Stretching the concept a bit: When the object is an URL, if the URLs in the reference is not the same but *what they represent is*, workers might consider it relevant, e.g. feedback 2
- **Learning curve**: Workers might not find information at a specific domain on one task, but find it later after learning how to navigate it, e.g. feedback 5
- **Insufficient proof**: Workers might want more convincing or explicit proof of the statements than what's in the reference, e.g. feedbacks 6, 17, 18, 26, 31, 44, 45
- **Insufficient information**: Information in references sometimes is partial, e.g. feedbacks 24, 25, 27, 34
- **Insufficient context**: Some references lack context for workers to understand, e.g. feedback 35
- **Use of common sense**: Workers might dispute or confirm the information based on their common sense, e.g. feedback 20, 21, 23, 27, 29, 36, 43, 45
- **Date format mismatch**: Wikidata sometimes puts only *years* as dates, but used predicates such as 'inception' or 'creation'. When the reference carries the whole date, workers might disagree because, while the year is correct, the day and month is not mentioned in the object. e.g. feedback 10
	- The opposite might happen too, if the object in Wikidata contains the full date but the reference only has the year, e.g. feedbacks 24, 25, 46
- **Wrong or missing information**: Some information is *simply wrong or missing*, e.g. feedback 11, 14, 21, 33, 34, 37, 38, 42
- **Slightly wrong identifiers**: Some IDs are given as incorrect by workers for *minor differences*, e.g. feedbacks 15, 20
- **Worker misunderstanding**: Workers might wrongly think that the object should contain *all possible answers* or *the most correct answer* to the subject-predicate question , e.g. feedback 12, 16

It is important to notice that when not able to find the information:
- Some workers might seek information on their and find it elsewhere, e.g. feedbacks 7, 30, 32
- Some workers might instead give up at the first redirection, e.g. feedback 28

In [4]:
task_1_inaccessible_0 = sorted(final_annotation_set[
    (final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 0)
]['url'].unique())

Reference URLS with barrier code 0:
- http://acervo.mp.usp.br/IconografiaV2.aspx: OK, Leads to a search page which is difficult and vague to use,
- http://data.collectienederland.nl/resource/aggregation/fries-museum/328: OK, gave me an 404 error at one time,
- http://geonames.nga.mil/: OK, Is sometimes not accessible due to security issues (see feedback for task 2),
- http://mleuven.be/collection/data/LP_79: OK, Redirects to home,
- http://nvpub.vic-metria.nu/naturvardsregistret/rest/omrade/2001394/G%C3%A4llande: OK, was giving 404 error at one point, but stopped. Also, is a JSON.
- http://nvpub.vic-metria.nu/naturvardsregistret/rest/omrade/2001958/G%C3%A4llande: OK, was giving 404 error at one point, but stopped. Also, is a JSON.
- http://resolver.smak.be/collection/work/data/1581: OK, Is a nonexisting item in the smak,
- http://resolver.smak.be/collection/work/data/3289: OK, Is a nonexisting item in the smak,
- http://resolver.smak.be/collection/work/data/5509: OK, Is a nonexisting item in the smak,
- http://talent.yahoo.co.jp/pf/detail/pp11116: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp13199: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp16393: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp169519: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp201611: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp203257: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp213709: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp234365: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp242286: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp246352: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp276773: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp280474: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp315692: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp3195: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp4524: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp4735: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp832: OK, Yahoo Japan has discontinued this service,
- http://talent.yahoo.co.jp/pf/detail/pp932: OK, Yahoo Japan has discontinued this service,
- http://www.bumastemra.nl/over-bumastemra/titelcatalogus/: OK, Is actually a search page, might have been off for some workers. See the first link in this list.
- http://www.magrama.gob.es/es/prensa/noticias/derribada-la-presa-de-robledo-de-chavela-(madrid)-sobre-el-r%C3%ADo-cofio-la-m%C3%A1s-alta-desmantelada-en-espa%C3%B1a-con-casi-23-metros-de-altura/tcm7-345305-16: OK, Is off,
- https://sok.riksarkivet.se/bildvisning/00196550_00387#?c=&m=&s=&cv=386&xywh=237%2C2761%2C2397%2C1724: NOT OK, Is working for me, but might have been off during the task, or people did not understand the scribles and did not know what to answer on this case
- https://sok.riksarkivet.se/bildvisning/A0012916_00057#?c=&m=&s=&cv=56&xywh=1358%2C1852%2C2615%2C1489: NOT OK, Is working for me, but might have been off during the task, or people did not understand the scribles and did not know what to answer on this case
- https://sok.riksarkivet.se/bildvisning/F0000678_00018#?c=&m=&s=&cv=17&xywh=615%2C3594%2C2474%2C1758: NOT OK, Is working for me, but might have been off during the task, or people did not understand the scribles and did not know what to answer on this case
- https://sync.nm.delorean.se/objects?id=212705: MAYBE OK, Is a JSON and some people might not identify this as being a website

In [13]:
task_1_inaccessible_1 = sorted(final_annotation_set[
    (final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 1)
]['url'].unique())

Reference URLS with barrier code 1:
- 'http://geonames.nga.mil/': OK, Sometimes gives security warnings

In [6]:
task_1_inaccessible_2 = sorted(final_annotation_set[
    (final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 2)
]['url'].unique())

Reference URLS with barrier code 2:
- http://patrimonio.ipac.ba.gov.br/bem/antigo-hospital-portugues-e-jardins: OK, Raises permission error

In [7]:
task_1_inaccessible_3 = sorted(final_annotation_set[
    (final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 3)
]['url'].unique())

Reference URLS with barrier code 3:
- https://www.ethnologue.com/language/luj: OK, Required payment

### For sanity, checking references which are accessible in task 1 but irrelevant

In [8]:
task_1_inaccessible_4 = sorted(final_annotation_set[
    (final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 4)
]['url'].unique())

Reference URLS with barrier code 4:
- http://purl.uniprot.org/uniprot/E8TGH3: These four are all hard to understand without domain knowledge,
- http://purl.uniprot.org/uniprot/P95187,
- http://purl.uniprot.org/uniprot/Q7W0C7,
- http://purl.uniprot.org/uniprot/Q9BXT6

In [82]:
task_1_inaccessible_5 = sorted(final_annotation_set[
    (final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 5)
]['url'].unique())

# NOT MUCH TO SAY ABOUT THEM, AS IT WOULD REQUIRE CHECKING AGAINST THE TRIPLE STATEMENT

Reference URLS with barrier code 5:

- http://acervo.mp.usp.br/IconografiaV2.aspx
- http://alo.co/actualidad-y-entretenimiento/morella-zuleta-y-su-hija-ilenia-antonine
- http://am.adlibhosting.com/amonline/details/collect/38882
- http://dbe.rah.es/biografias/18334/juan-de-arce-o-arze
- http://europepmc.org/abstract/MED/28959891
- http://germanyinnyc.org/index.php?section=catevent&cat_evt_id=1795&cat_id=9
- http://girp.uma.pt/download/Curriculo%20breve-Rcarita-2016-s%C3%A9%20do%20Funchal.pdf
- http://id.worldcat.org/fast/255136
- http://identifiers.org/ensembl/FBpp0309278
- http://kulturarvsdata.se/raa/fornvannen/html/1994_233
- http://mleuven.be/collection/data/LP_100: These are all redirecting to home
- http://mleuven.be/collection/data/LP_372
- http://mleuven.be/collection/work/data/C_420
- http://mleuven.be/collection/work/data/S_27_B
- http://mleuven.be/collection/work/id/C_254
- http://mskgent.be/collection/work/data/1883-G: These are all redirecting to home
- http://mskgent.be/collection/work/data/1950-AA-6
- http://mskgent.be/collection/work/data/1978-M
- http://mskgent.be/collection/work/data/1998-B-82
- http://mskgent.be/collection/work/data/2008-E-148
- http://mskgent.be/collection/work/data/2009-AW
- http://mskgent.be/collection/work/data/2013-BT
- http://mskgent.be/collection/work/id/1850-C
- http://mskgent.be/collection/work/id/1950-AA-9
- http://mskgent.be/collection/work/id/1978-K
- http://mskgent.be/collection/work/id/1982-K-72
- http://mskgent.be/collection/work/id/1984-G-9
- http://mskgent.be/collection/work/id/2003-P
- http://mskgent.be/collection/work/id/2005-N
- http://mskgent.be/collection/work/id/2008-E-165
- http://mskgent.be/collection/work/id/2013-SD
- http://mskgent.be/collection/work/id/2013-U
- http://mubevirtual.com.br/pt_br?Dados&area=ver&id=111
- http://nvpub.vic-metria.nu/naturvardsregistret/rest/omrade/2001193/G%C3%A4llande
- http://purl.uniprot.org/uniprot/A0A0E0Y2Z7
- http://web.archive.org/web/20190208034730/http://cemdp.sdh.gov.br/modules/desaparecidos/acervo/ficha/cid/259
- http://www.biodiversitylibrary.org/bibliography/139004
- http://www.biografischportaal.nl/persoon/00451338?
- http://www.biografischportaal.nl/persoon/50223056?
- http://www.city.mobara.chiba.jp/0000000511.html
- http://www.congreso.es/portal/page/portal/Congreso/Congreso/Diputados/BusqForm?_piref73_1333155_73_1333154_1333154.next_page=/wc/fichaDiputado?idDiputado=140&idLegislatura=13
- http://www.congreso.es/portal/page/portal/Congreso/Congreso/Iniciativas?_piref73_2148295_73_1335437_1335437.next_page=/wc/servidorCGI&CMD=VERLST&BASE=DIPH&FMT=DIPHXD1S.fmt&DOCS=2-2&DOCORDER=FIFO&OPDEF=Y&NUM1=&DES1=&QUERY=%2841230%29.NDIP.
- http://www.ebi.ac.uk/QuickGO/annotations?geneProductId=UniProtKB:Q62839
- http://www.ebi.ac.uk/QuickGO/annotations?geneProductId=UniProtKB:Q8CJP2
- http://www.ebi.ac.uk/interpro/protein/A0A0H3CDN6
- http://www.ebi.ac.uk/interpro/protein/Q180S9
- http://www.ebi.ac.uk/interpro/protein/Q7NG95
- http://www.ebi.ac.uk/interpro/protein/Q9X279
- http://www.franshalsmuseum.nl/nl/collectie/zoeken-de-collectie/portret-van-evert-doublet-als-kind-563/: Back to home
- http://www.hpip.org/Default/pt/Homepage/Obra?a=2287: Back to home
- http://www.imae.co.jp/works/item/18-2015-09-04-11-17-24.html
- http://www.lavanguardia.com/
- http://www.viaf.org/viaf/50344159/
- https://api.crossref.org/works/10.1080%2F13691050802233454
- https://dialnet.unirioja.es/servlet/articulo?codigo=2691318
- https://es.wikipedia.org/w/index.php?title=Azumi_Asakura&oldid=114132230
- https://especiais.gazetadopovo.com.br/eleicoes/2018/resultados/
- https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pmc&linkname=pmc_refs_pubmed&retmode=json&id=4413150
- https://id.loc.gov/authorities/n88026327
- https://ja.wikipedia.org/w/index.php?title=アクリルアミド&oldid=67932077
- https://libris.kb.se/auth/278188
- https://ndclist.com/ndc/59212-700
- https://nl.wikipedia.org/w/index.php?title=Arge_enodis&oldid=41510703
- https://s.maho.jp/profile/629be1g88c08789d/
- https://sync.nm.delorean.se/objects?id=16539
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=be-bru&srlanguage=nl&srid=2043-0589/0: These are all difficult to read and understand, and their content might be inside the wiki link
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=cl&srlang=es&srid=357
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0001519
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=se-bbr&srlanguage=sv&srid=21300000013264
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=se-bbr&srlanguage=sv&srid=21300000014535
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=se-fornmin&srlanguage=sv&srid=10082400490001
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=sv&srlang=es&srid=02100012
- https://uma-furusato.com/i_search/detail_horse/_id_0000262926
- https://viaf.org/viaf/253351577
- https://w3id.org/oc/corpus/br/479749.html
- https://www.cuitonline.com/detalle/27223718677/grinspan-javier-mauro-isaac.html
- https://www.gidsvoornederland.nl/werken-met-gids/meerwaarde-voor-bibliotheken/bibliotheken-in-nederland?pi=364&organisation_id=199527&startindex=1250
- https://www.grid.ac/institutes/grid.475802.c
- https://www.ncbi.nlm.nih.gov/gene/19832484
- https://www.ncbi.nlm.nih.gov/gene/3169437

In [10]:
task_1_inaccessible_6 = sorted(final_annotation_set[
    (final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 6)
]['url'].unique())

# NOT MUCH TO SAY ABOUT THEM, AS IT WOULD REQUIRE CHECKING AGAINST THE TRIPLE STATEMENT

Reference URLS with barrier code 6:

- http://acervo.mp.usp.br/IconografiaV2.aspx: Local search engine
- http://am.adlibhosting.com/amonline/details/collect/107867
- http://am.adlibhosting.com/amonline/details/collect/38168
- http://cadenaser.com/ser/2018/08/08/politica/1533758734_279512.html?ssm=tw
- http://collectie.nederlandsfotomuseum.nl/fotografen/detail/185a2c7c-d980-9c73-c249-fb3c07748c42
- http://collectie.nederlandsfotomuseum.nl/fotografen/detail/78d5cd8c-e602-081e-a9ae-1527f8eb6c2a
- http://data.collectienederland.nl/resource/aggregation/catharijneconvent/ABM-s91
- http://dbe.rah.es/biografias/67026/manuel-cantos
- http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2016/2/46256/130000032320
- http://hoogleraren.ub.rug.nl/hoogleraren/1197
- http://id.worldcat.org/fast/25418
- http://nvpub.vic-metria.nu/naturvardsregistret/rest/omrade/2000154/G%C3%A4llande
- http://nvpub.vic-metria.nu/naturvardsregistret/rest/omrade/2005093/G%C3%A4llande
- http://rkd.nl/explore/artists/41135
- http://vanherck.collectionkbf.be/nl/Ontwerp-voor-de-versiering-van-een-galerij
- http://www.archiefbank.be/dlnk/AE_3401
- http://www.azumi-ghp.jp/about/history/
- http://www.biodiversitylibrary.org/bibliography/77712
- http://www.congreso.es/portal/page/portal/Congreso/Congreso/Diputados/BusqForm?_piref73_1333155_73_1333154_1333154.next_page=/wc/fichaDiputado?idDiputado=276&idLegislatura=13
- http://www.ebi.ac.uk/QuickGO/annotations?geneProductId=UniProtKB:A0A0H3CNZ8
- http://www.pref.yamagata.jp/ou/kikakushinko/020052/tokei/copy_of_jinkm.html
- http://www.sdk.co.jp/about/corporate/outline.html
- https://api.crossref.org/works/10.1177%2F0898264313518066
- https://data.collectienederland.nl/resource/aggregation/rijkscollectie-rce/NK1557
- https://data.collectienederland.nl/resource/aggregation/rijkscollectie-rce/SZ60224
- https://en.wikipedia.org/w/index.php?title=The_Art_of_Jazz:_Live_in_Leverkusen&oldid=794357271
- https://es.wikipedia.org/w/index.php?title=Estación_de_Las_Retamas&oldid=112148354
- https://es.wikipedia.org/w/index.php?title=Ricardo_Ciciliano&oldid=118501964
- https://es.wikipedia.org/w/index.php?title=Saint_Seiya_Tenkai-hen_~Overture~&oldid=120649477
- https://pt.wikipedia.org/w/index.php?title=Eu_Sou_Mais_Eu&oldid=54229920: IDs like IMDB are seen in the footer but to get the ID one needs to follow the link or to hover the link and see the URL (the ID is often there)
- https://rkd.nl/explore/artists/265464
- https://rkd.nl/explore/artists/36849
- https://sv.wikipedia.org/w/index.php?title=Herråkra_socken&oldid=43177855
- https://sync.nm.delorean.se/objects?id=14990
- https://sync.nm.delorean.se/objects?id=15536
- https://sync.nm.delorean.se/objects?id=15787
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=cl&srlang=es&srid=749
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=se-fornmin&srlanguage=sv&srid=10119200010001
- https://www.brasilianaiconografica.art.br/obras/17624/desmodium: There are two identifiers for two different systems, and this confuses the reader
- https://www.gemeentemuseum.nl/nl/collectie/landschap-bij-dekkersduin
- https://www.genealogics.org/getperson.php?personID=I00348924&tree=LEO: Dates often are only the year in Wikidata, which then become 1/1/YEAR when transformed, which is often not the same as the actual date shown in the references
- https://www.mfa.org/collections/object/william-lloyd-garrison-30992: Here, people were really divided, all voting for a different barrier. Maybe interpreting that the title of something is that same something is a really subjective concept.
- https://www.museodelprado.es/coleccion/obra-de-arte/marina/13a947d7-65d6-4046-9790-e384a8253389
- https://www.museodelprado.es/en/the-collection/art-work/the-surrender-of-julich/2765e3e7-36ef-4e14-9f24-50313835f81c
- https://www.ncbi.nlm.nih.gov/gene/3082014
- https://www.ncbi.nlm.nih.gov/gene/3718006
- https://www.ncbi.nlm.nih.gov/gene/7789707
- https://www.rijksmuseum.nl/nl/collectie/SK-A-89'

## By task 2 results

In [16]:
task_2_inaccessible = sorted(final_annotation_set[
    (final_annotation_set['author'] == 3) |
    (final_annotation_set['publisher'] == 5)
]['url'].unique())
task_2_inaccessible

Some feedback we got on the task, before looking at the references:

- "My computer security software prevents me from accessing geonames.nga.mil web addresses."
- "I cannot access geonames.nga.mil on my computer. It is loaned to me by employer."
- "I am working on a laptop provided by my employer. Links to NGA resources are blocked by my web filter. That is the reason I could not access NGA.mil or .gov websites."
- "https://api.crossref.org/works/10.1111%2FCBDD.12883  looks like a load of broken JSON, but I didn't look at it too hard. I am not able to see the site itself, but it seems to be about neuroscience journals like many of the other sites in this task."
- "Here's another broken JSON page: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pmc&linkname=pmc_refs_pubmed&retmode=json&id=3722411"
- "https://api.crossref.org/works/10.1111%2FPALA.12330\r\rand\r\rhttps://www.ebi.ac.uk/europepmc/webservices/rest/search?query=EXT_ID:22832743%20AND%20SRC:MED&resulttype=core&format=json\r\rappear as JSON code. I am not sure how to classify them and I have been choosing the option that indicates the site is not accessible when I run into sites like this."
- "Some websites didn't loaded at all specially two related to:  https://tools.wmflabs.org"
- "http://data.collectienederland.nl/resource/aggregation/fries-museum/328\r\rThis resource produced a 404 page not found in one of the questions which I had not seen before."
- "The Yahoo Talent website is no longer in service, the link leads to an \"ending service\" message (サービス終了のお知らせ).\r\rThis is the link: http://talent.yahoo.co.jp/pf/detail/pp13199"

Based on this, we can gather that:

- geonames.nga.mil has security issues
- JSON websites such as api.crossref.org, eutils.ncbi.nlm.nih.gov and tools.wmflabs.org are seen as "broken JSON", probably not understood by the majority of workers or mistaken for "not loading". This is due to lack of context. **We'll replace these annotations with our own.**
- data.collectienederland.nl was giving 404 errors at the time
- yahoo talent was deactivated

The references:

- http://data.collectienederland.nl/resource/aggregation/fries-museum/328: OK, from the feedback of one of the workers it seems as if this page was giving 404 errors at the time of the task run
- http://geonames.nga.mil/: OK, sometimes does not resolve due to security issues
- http://mubevirtual.com.br/pt_br?Dados&area=ver&id=111: OK, takes you to another random page
- http://resolver.smak.be/collection/work/data/1581: OK, 404
- http://resolver.smak.be/collection/work/data/3289: OK, 404
- http://resolver.smak.be/collection/work/data/5509: OK, 404
- http://talent.yahoo.co.jp/pf/detail/pp11116: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp16393: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp169519: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp201611: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp203257: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp213709: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp234365: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp242286: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp246352: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp276773: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp280474: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp315692: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp3195: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp4524: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp4735: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp832: OK, talent yahoo has been deactivated
- http://talent.yahoo.co.jp/pf/detail/pp932: OK, talent yahoo has been deactivated
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=bo&srlang=es&srid=S/00-014: OK, as this is a JSON file, it might be difficult to understand author and publisher from this, and workers might have found themselves forced to choose the inaccessible option. Same goes for all tool.wmflabs.org references.
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=cl&srlang=es&srid=357
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=cl&srlang=es&srid=749
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=co&srlang=es&srid=05-089
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0001519
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0001804
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0001819
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0001936
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0003465
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0003846
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0004038
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0004952
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0005116
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0005143
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0005852
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0007685
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0007849
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0008022
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0008043
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0008096
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0009439
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0009885
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-55-0000240
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-55-0000418
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=sv&srlang=es&srid=02100012
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=sv&srlang=es&srid=05170001
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=sv&srlang=es&srid=06140016
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=uy&srlang=es&srid=020-106
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=ve&srlang=es&srid=AMAAUA-0099
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=ve&srlang=es&srid=ARAGIA-0165
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=ve&srlang=es&srid=DELANS-0021
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=ve&srlang=es&srid=FALCOA-0276
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=ve&srlang=es&srid=GUA088
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=ve&srlang=es&srid=LARMON-0540
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=ve&srlang=es&srid=LARTOS-0757
- https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=ve&srlang=es&srid=MERLIR-0343'

## Checking references which are inaccessible in one task but not in the other

Some references might have been voted as inaccessible in one task but not in another. This might be due to various issues, not only spamming or low quality annotations. Lets explore them and check it out.

### Checking those which are inaccessible in task 2 but accessible task 1

In [36]:
# WE LOOK AT THOSE WHICH ARE INACCESSIBLE IN TASK 2 BUT GIVEN AS RELEVANT IN TASK 1

task_diff_inaccessible = sorted(final_annotation_set[
    ((final_annotation_set['author'] == 3) &
    (final_annotation_set['publisher'] == 5)) &
    (final_annotation_set['is_present'] == 1)
]['url'].unique())
task_diff_inaccessible

# WE CAN SEE FROM THE LIST THAT THESE ARE GEONAMES.NGA.MIL AND TOOLS.WMFLABS.ORG. THE FIRST GIVES SECURITY ISSUES
# ONLY SOMETIMES, SO SOME PEOPLE MIGHT HAVE NOT GOTTEM THEM OR IGNORED, AND MANAGED TO FIND INFORMATION
# WHILE OTHERS DID NOT. THE SECOND HAS INFORMATION BUT ARE JSON AND HAVE NO TELLING OF AUTHOR OR
# PUBLISHER IN THEM, SO THE DIFFERENCE HERE MAKES SENSE

['http://geonames.nga.mil/',
 'https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=bo&srlang=es&srid=S/00-014',
 'https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=co&srlang=es&srid=05-089',
 'https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0001804',
 'https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0001819',
 'https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0001936',
 'https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0003465',
 'https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0003846',
 'https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0004038',
 'https://tools.wmflabs.org/her

In [52]:
# WE LOOK AT THOSE WHICH ARE INACCESSIBLE IN TASK 2 BUT GIVEN AS IRRELEVANT IN TASK 1 WITH CODE 4 (STILL ACESSIBLE)

task_diff_inaccessible = sorted(final_annotation_set[
    ((final_annotation_set['author'] == 3) &
    (final_annotation_set['publisher'] == 5)) &
    ((final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 4))
]['url'].unique())
task_diff_inaccessible

# THIS LIST SHOWS REFERENCES WHICH ARE INACCESSIBLE IN TASK 2 AND BUT NOT IN TASK 1, WITH CODE 4.
# IT IS EMPTY.

[]

In [55]:
# WE LOOK AT THOSE WHICH ARE INACCESSIBLE IN TASK 2 BUT GIVEN AS IRRELEVANT IN TASK 1 WITH CODE 5 (STILL ACESSIBLE)

task_diff_inaccessible = sorted(final_annotation_set[
    ((final_annotation_set['author'] == 3) &
    (final_annotation_set['publisher'] == 5)) &
    ((final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 5))
]['url'].unique())
task_diff_inaccessible

# THIS LIST SHOWS REFERENCES WHICH ARE INACCESSIBLE IN TASK 2 AND BUT NOT IN TASK 1, WITH CODE 5.
# THIS MAKES SENSE, AS IT IS SIMILAR TO THE FIRST CASE (INACCESSIBLE IN TASK 2 BUT RELEVANT IN TASK 1).

['https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=cl&srlang=es&srid=357',
 'https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=es&srlang=es&srid=RI-51-0001519',
 'https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=sv&srlang=es&srid=02100012']

In [57]:
# WE LOOK AT THOSE WHICH ARE INACCESSIBLE IN TASK 2 BUT GIVEN AS IRRELEVANT IN TASK 1 WITH CODE 6 (STILL ACESSIBLE)

task_diff_inaccessible = sorted(final_annotation_set[
    ((final_annotation_set['author'] == 3) &
    (final_annotation_set['publisher'] == 5)) &
    ((final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 6))
]['url'].unique())
task_diff_inaccessible

# THIS LIST SHOWS REFERENCES WHICH ARE INACCESSIBLE IN TASK 2 AND BUT NOT IN TASK 1, WITH CODE 6.
# THIS MAKES SENSE, SAME AS ABOVE.

['https://tools.wmflabs.org/heritage/api/api.php?action=search&format=json&srcountry=cl&srlang=es&srid=749']

### For sanity, checking those which are inaccessible in task 2 and also in task 1

In [83]:
# WE LOOK AT THOSE WHICH ARE INACCESSIBLE IN TASK 2 AND GIVEN AS INACCESSIBLE IN TASK 1 WITH CODE 0

task_diff_inaccessible = sorted(final_annotation_set[
    ((final_annotation_set['author'] == 3) &
    (final_annotation_set['publisher'] == 5)) &
    ((final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 0))
]['url'].unique())
task_diff_inaccessible

# THIS LIST SHOWS REFERENCES WHICH ARE INACCESSIBLE IN BOTH TASKS, AND LOOKS CORRECT, AS PER THE CONCLUSIONS
# WE HAVE DRAWN SO FAR

['http://data.collectienederland.nl/resource/aggregation/fries-museum/328',
 'http://geonames.nga.mil/',
 'http://resolver.smak.be/collection/work/data/1581',
 'http://resolver.smak.be/collection/work/data/3289',
 'http://resolver.smak.be/collection/work/data/5509',
 'http://talent.yahoo.co.jp/pf/detail/pp11116',
 'http://talent.yahoo.co.jp/pf/detail/pp16393',
 'http://talent.yahoo.co.jp/pf/detail/pp169519',
 'http://talent.yahoo.co.jp/pf/detail/pp201611',
 'http://talent.yahoo.co.jp/pf/detail/pp203257',
 'http://talent.yahoo.co.jp/pf/detail/pp213709',
 'http://talent.yahoo.co.jp/pf/detail/pp234365',
 'http://talent.yahoo.co.jp/pf/detail/pp242286',
 'http://talent.yahoo.co.jp/pf/detail/pp246352',
 'http://talent.yahoo.co.jp/pf/detail/pp276773',
 'http://talent.yahoo.co.jp/pf/detail/pp280474',
 'http://talent.yahoo.co.jp/pf/detail/pp315692',
 'http://talent.yahoo.co.jp/pf/detail/pp3195',
 'http://talent.yahoo.co.jp/pf/detail/pp4524',
 'http://talent.yahoo.co.jp/pf/detail/pp4735',
 'http

In [84]:
# WE LOOK AT THOSE WHICH ARE INACCESSIBLE IN TASK 2 AND GIVEN AS INACCESSIBLE IN TASK 1 WITH CODE 1

task_diff_inaccessible = sorted(final_annotation_set[
    ((final_annotation_set['author'] == 3) &
    (final_annotation_set['publisher'] == 5)) &
    ((final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 1))
]['url'].unique())
task_diff_inaccessible

# THIS LIST SHOWS REFERENCES WHICH ARE INACCESSIBLE IN TASK 2 AND ALSO IN TASK 1 BUT FOR SECURITY ISSUES.
# IT LOOKS CORRECT.

['http://geonames.nga.mil/']

In [85]:
# WE LOOK AT THOSE WHICH ARE INACCESSIBLE IN TASK 2 AND GIVEN AS INACCESSIBLE IN TASK 1 WITH CODE 2

task_diff_inaccessible = sorted(final_annotation_set[
    ((final_annotation_set['author'] == 3) &
    (final_annotation_set['publisher'] == 5)) &
    ((final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 2))
]['url'].unique())
task_diff_inaccessible

# THIS LIST SHOWS REFERENCES WHICH ARE INACCESSIBLE IN TASK 2 AND ALSO IN TASK 1 BUT FOR CREDENTIAL ISSUES.
# IT IS EMPTY.

[]

In [86]:
# WE LOOK AT THOSE WHICH ARE INACCESSIBLE IN TASK 2 AND GIVEN AS INACCESSIBLE IN TASK 1 WITH CODE 3

task_diff_inaccessible = sorted(final_annotation_set[
    ((final_annotation_set['author'] == 3) &
    (final_annotation_set['publisher'] == 5)) &
    ((final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 3))
]['url'].unique())
task_diff_inaccessible

# THIS LIST SHOWS REFERENCES WHICH ARE INACCESSIBLE IN TASK 2 AND ALSO IN TASK 1 BUT FOR PAYMENT ISSUES.
# IT IS EMPTY.

[]

### Checking those which are inaccessible in task 1 but are accessible in task 2

In [63]:
# WE LOOK AT THOSE WHICH ARE INACCESSIBLE IN TASK 1 WITH CODE 0 BUT ACCESSIBLE IN TASK 2

task_diff_inaccessible = sorted(final_annotation_set[
    ((final_annotation_set['author'] != 3) |
    (final_annotation_set['publisher'] != 5)) &
    ((final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 0))
]['url'].unique())
#task_diff_inaccessible

# OK, SEE BELOW

- http://acervo.mp.usp.br/IconografiaV2.aspx: OK, browse screen might have been inaccessible for finding information (difficult) but not to get author/publisher info (obvious)
- http://mleuven.be/collection/data/LP_79: OK, takes you to home screen, so inaccessible for finding information (not there) but not to get author/publisher info (obvious)
- http://nvpub.vic-metria.nu/naturvardsregistret/rest/omrade/2001394/G%C3%A4llande: OK, is a JSON, so might have been badly formatted for those in task 1 and thus inaccessible, with enough information in the URL for author/publisher. Was also giving me 404 at one point.
- http://nvpub.vic-metria.nu/naturvardsregistret/rest/omrade/2001958/G%C3%A4llande: OK, see above
- http://talent.yahoo.co.jp/pf/detail/pp13199: OK, deactivated service but not website, so information not accessible but author/publisher is possible to be inferred
- http://www.bumastemra.nl/over-bumastemra/titelcatalogus/: OK, same as the first one
- http://www.magrama.gob.es/es/prensa/noticias/derribada-la-presa-de-robledo-de-chavela-(madrid)-sobre-el-r%C3%ADo-cofio-la-m%C3%A1s-alta-desmantelada-en-espa%C3%B1a-con-casi-23-metros-de-altura/tcm7-345305-16: OK, same as the yahoo one
- https://sok.riksarkivet.se/bildvisning/00196550_00387#?c=&m=&s=&cv=386&xywh=237%2C2761%2C2397%2C1724: MAYBE OK, might have been difficult to read the scribbles (inaccessible information) but we can be sure of author/publisher easily 
- https://sok.riksarkivet.se/bildvisning/A0012916_00057#?c=&m=&s=&cv=56&xywh=1358%2C1852%2C2615%2C1489: MAYBE OK, see above
- https://sok.riksarkivet.se/bildvisning/F0000678_00018#?c=&m=&s=&cv=17&xywh=615%2C3594%2C2474%2C1758: MAYBE OK, see above
- https://sync.nm.delorean.se/objects?id=212705: MAYBE OK, see nvpub

In [87]:
# WE LOOK AT THOSE WHICH ARE INACCESSIBLE IN TASK 1 WITH CODE 1 BUT ACCESSIBLE IN TASK 2

task_diff_inaccessible = sorted(final_annotation_set[
    ((final_annotation_set['author'] != 3) |
    (final_annotation_set['publisher'] != 5)) &
    ((final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 1))
]['url'].unique())
task_diff_inaccessible

# LIST IS EMPTY. MAKES SENSE, AS SECURITY ISSUES IN TASK 1 WOULD ALSO IMPEDE TASK 2

[]

In [88]:
# WE LOOK AT THOSE WHICH ARE INACCESSIBLE IN TASK 1 WITH CODE 2 BUT ACCESSIBLE IN TASK 2

task_diff_inaccessible = sorted(final_annotation_set[
    ((final_annotation_set['author'] != 3) |
    (final_annotation_set['publisher'] != 5)) &
    ((final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 2))
]['url'].unique())
task_diff_inaccessible

# LOOKS OK, AS THIS LOGIN REQUEST STILL LETS YOU KNOW WHAT THE AUTHOR/PUBLISHER ARE

['http://patrimonio.ipac.ba.gov.br/bem/antigo-hospital-portugues-e-jardins']

In [89]:
# WE LOOK AT THOSE WHICH ARE INACCESSIBLE IN TASK 1 WITH CODE 3 BUT ACCESSIBLE IN TASK 2

task_diff_inaccessible = sorted(final_annotation_set[
    ((final_annotation_set['author'] != 3) |
    (final_annotation_set['publisher'] != 5)) &
    ((final_annotation_set['is_present'] == 0) &
    (final_annotation_set['reason'] == 3))
]['url'].unique())
task_diff_inaccessible

# LOOKS OK, AS THIS PAYMENT REQUEST STILL LETS YOU KNOW WHAT THE AUTHOR/PUBLISHER ARE

['https://www.ethnologue.com/language/luj']

# Conclusion

With everything considered, the annotations look healthy and with no major conflicts that could not be excused by subjectivity or difference in connection time or origin (e.g. 404 at one case, normal service at the other, or being able to access geonames or not due to security issues).

The only links that need reviewing for the ML part are the tools.wmflabs.org links in regards to author/publisher. This is due to workers having no context to classify them, but us having it.

More specifically, here's what we concluded:

## References voted inaccessible in Task 1

References deemed inaccessible in this group are either:
- A search page (or a redirect to it)
- A redirect to a home page
- Security issues
- A 404 at some point (either told by workers or faced by us at some point)
- Discontinued service
- Is a JSON: Some JSONs are reported 'not structured' by workers and information in them might be impossible or hard to understand or retrieve, prompting some workers to say it is inaccessible.
- Permission errors
- Payment required

**We did not identify any references that would not fit here**

## References voted inaccessible in Task 2

References deemed inaccessible in this group are either:
- A redirect to another page
- Security issues
- A 404 page
- Discontinued service: Workers might not assume that the host of the service is the author/publisher and deem the reference inaccessible
- Is a JSON (tools.wmflabs.org): JSONs from tools.wmflabs have no indication of what author/publisher created them and the URL does not give much indication of it, nor do they provide a home page. The only way to conclude would be to know it is a wiki project, or to access the base url manually.

As we have the context to attribute an author/publisher code do the tools.wmflabs.org JSON references, we'll do so.

**Apart from the JSONs, we did not identify any references that would not fit here**

## References voted inaccessible in both Tasks 1 and 2

References deemed inaccessible by both tasks are either:
- A redirect to another page
- Security issues
- A 404 page
- Discontinued service

**We did not identify any references that would not fit here**

## References voted inaccessible in Task 1 but accessible in Task 2

These references would be:
- Inaccessible for getting their content, but
- Accessible for getting their author/publisher info

References in this group are either:
- A search page (or redirect to it): Information inaccessible (depending on user search ability), but author/publisher obvious
- A redirect to a home page: Information inaccessible, but author/publisher obvious
- A 404 page: Specific page inaccessible, but worker can use the domain or home page to infer/get author/publisher, i.e. http://nvpub.vic-metria.nu/naturvardsregistret/rest/omrade/2001958/G%C3%A4llande
- Discontinued service: Information inaccessible, but author/publisher obvious, i.e. http://talent.yahoo.co.jp/pf/detail/pp13199. This is very subjective, as whether or not Yahoo was the author/publisher or if it was only hosting it is concluded by each worker through their own reasoning. This prompts them to classify this discontinued service as either inaccessible in Task 2 or to use Yahoo as author/publisher and deem it accessible.
- Is a JSON: Same as with 404 page, but information is unintelligible to some workers instead of really inaccessible
- Permission errors: Same as 404 page
- Payment required: Same as 404 page 

There were 3 websites from the sok.riksarkivet.se/bildvisning/ domain in this group. They all link to scannings of handwritten records. Workers might not have understood the information in them and classified them as inaccessible in task 1, but knew how to get author/publisher from the website. Otherwise, the service might have been unavailable at the time of the task, given that all urls from this domain/subdomain were given as inaccessible.

**Apart from that, we did not identify any references that would not fit here**

## References voted inaccessible in Task 2 but accessible in Task 1

These references would be:
- Accessible for getting their content, but
- Inaccessible for getting their author/publisher info

References in this group are either:
- Is a tools.wmflabs.org JSON: The information might have been accessible (intelligible) but no way of finding author/publisher with only the URL, and no way of accessing a home page.
- geonames.nga.mil: Some workers manage and some dont to access this website

**Again, apart from the tools.wmflabs JSONs, we did not identify any references that would not fit here**

# A detail about tools.wmflabs
Only tools.wmflabs references in SPANISH got MAJORITY VOTED as inaccessible in task 2.
The tools.wmflabs references in both Swedish and Dutch got MAJORITY VOTED as author=collective and publisher=selfpublished.
We have no idea why.

And only 4 were voted as inaccessible in task 1 in Spanish, from 36.
So it might be some bias from spanish speakers? Considering tasks 1 and 2 were live at the same time (so they couldn't be both accessible AND inaccessible.