# Analysis of Described Contributions in Comparisons and Papers in the ORKG
In this notebook, I analyze the described contributions contained in various comparisons and papers in the Open Research Knowledge Graph ([ORKG](https://www.orkg.org)) in terms of their design complexity. The goal of this analysis is to understand how complex contributions in existing comparisons and papers are described to compare their complexity to that of my described contributions that belong to a specific comparison on *[Tailored Forming Process Chain for the Manufacturing of Hybrid Components with Bearing Raceways Using Different Material Combinations](https://orkg.org/comparison/R187049/)* from the research field *Mechanical Process Engineering*. In this way, I want to assess how comparable my comparison and the described contributions are with other comparisons and described contributions that are cited in a published articles. In particular, I investigate the distinct number of resources, literals, and predicates used in described contributions as these numbers are indicators for the design complexity. 

In the [first part](#p1), I analyze nine sets of described contributions in papers and comparisons that belong to published articles that cite the respective papers or comparisons in their article.

In the [second part](#p2), I analyze all comparsions in the ORKG that have been published with a DOI. The first analysis and its results are limited as I need to know the articles that cite the papers or comparisons of their described contributions in the ORKG. Unfortunately, the ORKG does not have a feature to identify which articles cite content from the ORKG. For this reason, I came up with the idea to go the other way by analyzing the comparison that have been published with an DOI in the ORKG. These comparsions and their described contributions are meant to be cited and are thus similar to those in the first analysis.

<a id='p1'></a>
## 1) Analysis of Nine Sets of Described Contributions in Papers and Comparisons Cited in Published Articles

In the following, I list the corresponding articles and a table with the results of the following code cell with some addtional data.

List of articles that cite the described contributions in papers or comparsions in the ORKG:
1. Carsten Knoll: [Examining the ORKG Towards Representation of Control Theoretic Knowledge - Preliminary Experiences and Conclusions](https://www2022.thewebconf.org/PaperFiles/116.pdf)
2. Mila Runnwerth et al.: [Operational Research Literature as a Use Case for the Open Research Knowledge Graph](https://link.springer.com/chapter/10.1007/978-3-030-52200-1_32)
3. Oliver Karras et al.: [Researcher or Crowd Member? Why not both! The Open Research Knowledge Graph for Applying and Communicating CrowdRE Research](https://ieeexplore.ieee.org/document/9582384)
4. Sören Auer et al.: [Improving Access to Scientific Literature with Knowledge Graphs](https://www.degruyter.com/document/doi/10.1515/bfp-2020-2042/html)
5. Marco Anteghini et al.: [Representing Semantified Biological Assays in the Open Research Knowledge Graph](https://link.springer.com/chapter/10.1007/978-3-030-64452-9_8)

| Publication ID | Research field                     | #Comparisons | #Papers | #Contributions | #Resources | #Literals | #Predicates | #RelatedResources | #RelatedFigues |
|----------------|------------------------------------|--------------|---------|----------------|------------|-----------|-------------|-------------------|----------------|
| 1              | Control Theory                     | 0            | 5       | 5              | 30         | 11        | 9           | 0                 | 0              |
| 2              | Numerical Analysis and Computation | 0            | 6       | 6              | 28         | 0         | 6           | 0                 | 0              |
| 3              | Software Engineering               | 1            | 19      | 19             | 212        | 398       | 12          | 0                 | 0              |
| 3              | Software Engineering               | 1            | 27      | 27             | 181        | 323       | 17          | 0                 | 0              |
| 4              | Artificial Intelligence            | 1            | 17      | 17             | 82         | 17        | 22          | 0                 | 0              |
| 4              | Virology                           | 1            | 21      | 31             | 139        | 197       | 17          | 2                 | 2              |
| 4              | Inorganic Chemistry                | 1            | 16      | 16             | 19         | 95        | 8           | 0                 | 0              |
| 4              | Databases/Information Systems      | 1            | 6       | 6              | 13         | 38        | 23          | 0                 | 0              |
| 5              | Molecular Biology                  | 1            | 3       | 3              | 79         | 1         | 22          | 0                 | 0              |

Data of my comparison:
| Research field                     | #Comparisons | #Papers | #Contributions | #Resources | #Literals | #Predicates | #RelatedResources | #RelatedFigues |
|------------------------------------|--------------|---------|----------------|------------|-----------|-------------|-------------------|----------------|
| Mechnical Process Engineering      | 1            | 5       | 10             | 670        | 131       | 30          | 1                 | 8              |

In [181]:
import sparql_dataframe
import seaborn as sns
import matplotlib.pyplot as plt
import dataframe_image as dfi
import pandas as pd
from orkg import ORKG
from datetime import datetime

orkg = ORKG(host="https://www.orkg.org/")

# Contributions of publication ID 1
pub1_contrib_res_id = ['R164083', 'R164072', 'R164067', 'R163740', 'R162666']
# Contributions of publication ID 2
pub2_contrib_res_id =['R12073', 'R12068', 'R12059', 'R12018', 'R10193', 'R12165']
# Contributions of publication ID 3, first entry
pub3_1_contrib_res_id = ['R78434', 'R77179', 'R76800', 'R78373', 'R78394', 'R78468', 'R76825', 'R78457', 'R108210', 'R109123', 'R111925', 'R111971', 'R111981', 'R111990', 'R112002', 'R112017', 'R112023', 'R112040', 'R112046']
# Contributions of publication ID 3, second entry
pub3_2_contrib_res_id = ['R108201', 'R76820', 'R113069', 'R113215', 'R113206', 'R113198', 'R76128', 'R76120', 'R113183', 'R113175', 'R76355', 'R76125', 'R76794', 'R113162', 'R113153', 'R113139', 'R113124', 'R113087', 'R113056', 'R113033', 'R113010', 'R112474', 'R112436', 'R112427', 'R112418', 'R112409', 'R111443']
# Contributions of publication ID 4, first entry
pub_4_1_contrib_res_id = ['R6381', 'R6354', 'R6351', 'R6365', 'R6314', 'R6384', 'R6271', 'R6323', 'R6399', 'R6304', 'R6301', 'R6269', 'R6275', 'R6402', 'R6320', 'R6317', 'R6396']
# Contributions of publication ID 4, second entry
pub_4_2_contrib_res_id = ['R44727', 'R44766', 'R44771', 'R44777', 'R44781', 'R44785', 'R44914', 'R44921', 'R44744', 'R44749', 'R44754', 'R44789', 'R44794', 'R44880', 'R44808', 'R44866', 'R44832', 'R44828', 'R44875', 'R44843', 'R44852', 'R44857', 'R44861', 'R44838', 'R44902', 'R44906', 'R44801', 'R44815', 'R44820', 'R44732', 'R44738']
# Contributions of publication ID 4, third entry
pub_4_3_contrib_res_id = ['R41147', 'R41145', 'R41143', 'R41141', 'R41139', 'R41137', 'R41135', 'R41133', 'R41131', 'R41129', 'R41127', 'R41125', 'R41123', 'R41121', 'R41118', 'R41116']
# Contributions of publication ID 4, fourth entry
pub_4_4_contrib_res_id = ['R50007', 'R50009', 'R50013', 'R49596', 'R78018', 'R78016']
# Contributions of publication ID 5s
pub_5_contrib_res_id = ['R48195', 'R48179', 'R48147']
#Contributions of my comparison
mechanical_process_eng_contrib_res_id = ['R171849','R172247','R172160', 'R172322', 'R162790', 'R162733', 'R162788', 'R145734', 'R145731', 'R175728']

all_contributions = [pub1_contrib_res_id, pub2_contrib_res_id, pub3_1_contrib_res_id, pub3_2_contrib_res_id, pub_4_1_contrib_res_id, pub_4_2_contrib_res_id, pub_4_3_contrib_res_id, pub_4_4_contrib_res_id, pub_5_contrib_res_id, mechanical_process_eng_contrib_res_id]

def count_RPL(resid):
    resources = set({})
    literals = set({})
    predicates = set({})

    for id in resid:
        statements = orkg.statements.bundle(thing_id=id).content['statements']
    
        for statement in statements:
            cls = statement['subject']['_class']
            if cls == 'resource':
                resources.add(statement['subject']['id'])
            else:
                literals.add(statement['subject']['id'])
            
            cls = statement['object']['_class']
            if cls == 'resource':
                resources.add(statement['object']['id'])
            else:
                literals.add(statement['object']['id'])
            
            predicates.add(statement['predicate']['id'])
    
    return len(resources), len(literals), len(predicates)

results = pd.DataFrame({'#Resources': pd.Series(dtype='int'), '#Literals': pd.Series(dtype='int'), '#Predicates': pd.Series(dtype='int')})

for element in all_contributions:
   r, l, p = count_RPL(element)
   results = results.append({'#Resources':r, '#Literals':l, '#Predicates':p}, ignore_index=True)
   #print(f'For contributions({element}): you have distinct {r} resource, {l} literals, and {p} predicate')

results.head()

Unnamed: 0,#Resources,#Literals,#Predicates
0,30,11,9
1,28,0,6
2,212,398,12
3,181,323,17
4,82,17,22


In [185]:
results.describe()

Unnamed: 0,#Resources,#Literals,#Predicates
count,10.0,10.0,10.0
mean,145.3,121.1,16.6
std,197.19818,142.568073,7.77746
min,13.0,0.0,6.0
25%,28.5,12.5,9.75
50%,80.5,66.5,17.0
75%,170.5,180.5,22.0
max,670.0,398.0,30.0


<a id='p2'></a>
## 2) Analysis of Described Contribution in Comparsions Published with DOI

For this analysis, I first query the ORGK via its SPARQL endpoint to obtain all comparsions with DOI, their research fields, the included papers, described contributions as well as the number of related resources and related figures. While related resources are supplementary materials such as Jupyter Notebooks, related figures are addtional figues that supplement the respective comparsion.

In [188]:
ENDPOINT_URL = "https://www.orkg.org/triplestore"

PREFIXES =  """
            PREFIX orkgr: <http://orkg.org/orkg/resource/>
            PREFIX orkgc: <http://orkg.org/orkg/class/>
            PREFIX orkgp: <http://orkg.org/orkg/predicate/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
            PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            """

query = """
        SELECT ?comparison, ?contribution, ?paper, COUNT(DISTINCT ?related_resource) AS ?numberOfRelatedResources, COUNT(DISTINCT ?related_figure) AS ?numberOfRelatedFigures, ?researchField
            WHERE {
                ?comparison a orkgc:Comparison;
                            orkgp:P26 ?DOI;
                            orkgp:compareContribution ?contribution.
                ?paper orkgp:P31 ?contribution.
                OPTIONAL{?comparison orkgp:hasSubject ?field.
                         ?field rdfs:label ?researchField}
                OPTIONAL{?comparison orkgp:RelatedResource ?related_resource;
                                     orkgp:RelatedFigure ?related_figure}
            }
            ORDER BY ?comparison
        """
data = sparql_dataframe.get(ENDPOINT_URL, PREFIXES+query)
data.head()

Unnamed: 0,comparison,contribution,paper,numberOfRelatedResources,numberOfRelatedFigures,researchField
0,http://orkg.org/orkg/resource/R107854,http://orkg.org/orkg/resource/R107624,http://orkg.org/orkg/resource/R107618,0,0,Learner-Interface Interaction
1,http://orkg.org/orkg/resource/R107854,http://orkg.org/orkg/resource/R107845,http://orkg.org/orkg/resource/R107843,0,0,Learner-Interface Interaction
2,http://orkg.org/orkg/resource/R107854,http://orkg.org/orkg/resource/R107665,http://orkg.org/orkg/resource/R107663,0,0,Learner-Interface Interaction
3,http://orkg.org/orkg/resource/R107854,http://orkg.org/orkg/resource/R107836,http://orkg.org/orkg/resource/R107834,0,0,Learner-Interface Interaction
4,http://orkg.org/orkg/resource/R108358,http://orkg.org/orkg/resource/R108130,http://orkg.org/orkg/resource/R108129,0,0,Geology


In [192]:
comparsions= data.drop_duplicates(subset=['comparison'])[['comparison', 'numberOfRelatedResources', 'numberOfRelatedFigures', 'researchField']]

result = pd.DataFrame({'comparison': pd.Series(dtype='str'), 'number_of_papers': pd.Series(dtype='int'), 'number_of_contributions': pd.Series(dtype='int'), 'number_of_resources': pd.Series(dtype='int'), 'number_of_literals': pd.Series(dtype='int'), 'number_of_predicates': pd.Series(dtype='int'), 'number_of_related_resources': pd.Series(dtype='int'), 'number_of_related_figures': pd.Series(dtype='int'), 'research_field': pd.Series(dtype='str')})

for index, row1 in comparsions.iterrows():
    contribution_list = set({})
    paper_list = set({})
    for index, row2 in data.iterrows():
        if row1['comparison'] == row2['comparison']:
            contribution_list.add(row2['contribution'].split('/')[-1])
            paper_list.add(row2['paper'])
    r, l, p = count_RPL(contribution_list)
    result = result.append({'comparison':row1['comparison'], 'number_of_papers': len(paper_list), 'number_of_contributions':len(contribution_list), 'number_of_resources': r, 'number_of_literals': l, 'number_of_predicates': p, 'number_of_related_resources': row1['numberOfRelatedResources'], 'number_of_related_figures': row1['numberOfRelatedFigures'], 'research_field': row1['researchField']}, ignore_index=True)
    print(row1['comparison'], r, l, p)
    
now = datetime.now()
result.to_csv('query_result_' + now.strftime('%Y-%m-%d') + '.csv', encoding='utf-8')
result.head(16)

http://orkg.org/orkg/resource/R107854 29 1 5
http://orkg.org/orkg/resource/R108358 90 249 29
http://orkg.org/orkg/resource/R108601 6 24 7
http://orkg.org/orkg/resource/R108719 6 47 19
http://orkg.org/orkg/resource/R109041 10 126 18
http://orkg.org/orkg/resource/R109236 21 56 15
http://orkg.org/orkg/resource/R109546 17 40 12
http://orkg.org/orkg/resource/R109612 14 160 17
http://orkg.org/orkg/resource/R109904 15 35 4
http://orkg.org/orkg/resource/R110071 8 30 9
http://orkg.org/orkg/resource/R110124 26 25 9
http://orkg.org/orkg/resource/R110138 4 41 9
http://orkg.org/orkg/resource/R110188 4 13 5
http://orkg.org/orkg/resource/R110245 4 13 5
http://orkg.org/orkg/resource/R110361 43 21 15
http://orkg.org/orkg/resource/R110651 6 20 18
http://orkg.org/orkg/resource/R110655 23 28 15
http://orkg.org/orkg/resource/R110777 7 11 5
http://orkg.org/orkg/resource/R110991 7 30 7
http://orkg.org/orkg/resource/R111117 5 14 6
http://orkg.org/orkg/resource/R111151 7 6 5
http://orkg.org/orkg/resource/R1111

Unnamed: 0,comparison,number_of_papers,number_of_contributions,number_of_resources,number_of_literals,number_of_predicates,number_of_related_resources,number_of_related_figures,research_field
0,http://orkg.org/orkg/resource/R107854,4,4,29,1,5,0,0,Learner-Interface Interaction
1,http://orkg.org/orkg/resource/R108358,12,12,90,249,29,0,0,Geology
2,http://orkg.org/orkg/resource/R108601,4,4,6,24,7,0,0,Digital Communications and Networking
3,http://orkg.org/orkg/resource/R108719,3,3,6,47,19,0,0,Plant Pathology
4,http://orkg.org/orkg/resource/R109041,9,9,10,126,18,0,0,"Atomic, Molecular and Optical Physics"
5,http://orkg.org/orkg/resource/R109236,5,5,21,56,15,0,0,Geology
6,http://orkg.org/orkg/resource/R109546,4,4,17,40,12,0,0,Industrial and Organizational Psychology
7,http://orkg.org/orkg/resource/R109612,10,10,14,160,17,0,0,Oceanography
8,http://orkg.org/orkg/resource/R109904,12,12,15,35,4,0,0,Information Science
9,http://orkg.org/orkg/resource/R110071,4,4,8,30,9,0,0,Biomedical Engineering and Bioengineering


In [195]:
result.describe()

Unnamed: 0,number_of_papers,number_of_contributions,number_of_resources,number_of_literals,number_of_predicates,number_of_related_resources,number_of_related_figures
count,147.0,147.0,147.0,147.0,147.0,147.0,147.0
mean,7.870748,9.14966,57.857143,114.278912,19.346939,0.020408,0.027211
std,7.61152,8.473451,100.4211,114.431937,12.563075,0.183922,0.232484
min,1.0,2.0,4.0,0.0,2.0,0.0,0.0
25%,4.0,5.0,9.0,26.0,8.0,0.0,0.0
50%,6.0,6.0,26.0,84.0,18.0,0.0,0.0
75%,8.0,9.5,63.5,161.5,26.0,0.0,0.0
max,55.0,55.0,781.0,612.0,84.0,2.0,2.0


In [208]:
print(result[['research_field']].value_counts())

research_field                                                             
Oceanography                                                                   24
Earth Sciences                                                                 15
Ecology and Biodiversity of Animals and Ecosystems, Organismic Interactions    11
Planetary Sciences                                                              9
Medicinal Chemistry and Pharmaceutics                                           8
Artificial Intelligence                                                         8
Natural Language Processing                                                     6
Software Engineering                                                            5
Semantic Web                                                                    5
Computer Sciences                                                               4
Information Science                                                             4
Geology               