# Antimicrobial Resistance Knowledge Graph 

## Abstact

This study explores the integration of antibiotic resistance data from GenBank and RefSeq databases into the publicly accessible Knowledge Graph, Wikidata. Antibiotic resistance poses a significant global health threat, with the misuse of antibiotics leading to the emergence of resistant strains. The project focuses on modeling proteins, genes, chromosomes, bacterial strains, and species in Wikidata, using a comprehensive dataset from the National Center for Biotechnology Information (NCBI). The data cleaning process involves addressing discrepancies and extracting crucial information from the linked databases. Despite challenges, the implementation in Wikidata progresses, with ongoing efforts to link bacterial strains, genes, and proteins. The study highlights the need for standardized entries in databases and emphasizes the potential impact of integrating antibiotic resistance information into Wikidata for global accessibility and collaborative contributions.

## Background and Motivation

Antibiotics play a crucial role in the treatment of bacterial infections. The use of antibiotics annually saves millions of lives, but it also accelerates the growth of livestock in factory farming. Due to the increased and sometimes unnecessary use of antibiotics, individual microorganisms develop resistance, rendering antibiotics ineffective. The development of antibiotic resistance is caused by genetic mutations that occur randomly in bacteria. If these mutations have a positive impact on the bacterium's survival, it survives the antibiotic treatment and passes on the positive traits to other bacteria or the next generation.

Improper use of antibiotics, especially in developing countries, has led to the prevalence of many antibiotic-resistant bacteria today. Just as different antibiotics have different mechanisms of action (for example, Beta-Lactam targets the bacterial cell wall), bacteria also develop various defense mechanisms. For instance, bacteria can form an efflux pump, which expels already entered antibiotics from the cell.

Infection with antibiotic-resistant bacteria poses a high risk to humans, as medical treatment becomes impossible. The World Health Organization (WHO) classifies Antimicrobial Resistance (AMR) as one of the three greatest medical threats, referring to it as a silent pandemic due to the high annual death toll (1.27 million people). By 2050, the WHO estimates that the number could rise to 10 million deaths per year, far surpassing the annual mortality rate of cancer.

Although various health organizations are aware of numerous bacterial strains encoding antibiotic-resistant proteins, these databases are not interconnected, despite the urgent need for such integration. This work focuses on incorporating the largest publicly accessible database from the National Center for Biotechnology Information (NCBI) into a publicly accessible Knowledge Graph (Wikidata). Medial staff, scientists and interested people can easily access the knowledge from Wikidata. [1]

## Material and Methods

Since the data in the publicly accessible Wikidata Knowledge Graph is intended to be available to everyone, the implementation concept follows the modeling of proteins, genes, chromosomes, bacterial strains, and bacterial species. The premise is that the bacterial species is already present in Wikidata, and only the further sequence needs to be modeled.

As a data source, a public database from the National Center for Biotechnology Information (NIH) is chosen [2]. This database consists of approximately 10,000 proteins that cause antibiotic resistances in bacteria. The protein is linked to the type of antibiotic resistance, as well as to links to nucleotides and proteins in the Refseq and GenBank databases. To model the sequence described above, consisting of protein, gene, chromosome, and bacterial strain starting from the bacterial species, the names of the gene and bacterial strain must be extracted from the linked Refseq and GenBank databases. Using the "bioservice EUtils" and the Refseq or GenBank references, the aforementioned information is extracted from the databases. Redundancies are deliberately built (e.g., extracting the bacterial species via Refseq and GenBank, as well as via protein and nucleotide) to subsequently combine information or choose the most complete data source.


In [1]:
import pandas as pd 
import urllib
import time 
import numpy as np 
import bioservice_fetcher as biof 
import os 
from typing import Optional 

### Fetch data from NCBI database 

The read dataframe contains keys to access the protein, nucleotide either via Reference Sequence (RefSeq) database or genbank. Lets load the first two columns of the dataframe and read some values. First try is to get the data from the RefSeq-database.  

In [2]:
# Reads everything from the linked refseq and genbank databases that could possible be interesting for this project

def fetch_data(read_from_web: bool = False, amount: Optional[int] = None) -> pd.DataFrame: 
    """
    if fetch_data_switch is True data will be fetched from NCBI and interesting things will be read from genbank or refseq database -- caution: Takes for ages 
    else data will be read from last time -- should be used in most of the cases 
    """
    if read_from_web: 
        # will take ~ 12h 
        url = "https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/latest/ReferenceGeneCatalog.txt"
        df = pd.read_csv(urllib.request.urlopen(url), delimiter="\t")
        if amount is not None:
            df = df.sample(amount)
        df[["refseq_parent_taxon", "refseq_protein", "refseq_parent_taxon2"]] = df.apply(biof.get_protein_and_parent, axis=1)
        print(1)
        df[["refseq_gene", "refseq_protein2", "refseq_genome", "refseq_organism", "refseq_tax_id"]] = df.apply(biof.get_strain_and_gene, axis=1)
        print(2)
        df[["genbank_organism", "genbank_strain"]] = df.apply(biof.get_organism_strain_via_prot,axis=1)
        print(3)
        df[["genbank_organism2", "genbank_strain2", "genbank_tax_id"]] = df.apply(biof.get_organism_strain_via_nuc, axis=1)
        df.to_csv("resistance_df.csv", index=False)
    else: 
        if not os.path.exists("resistance_df.csv"): 
            print("Cannot read from hard drive because file does not exist -- set read from web switch to True")
            return None
        df = pd.read_csv("resistance_df.csv")
    return df


df = fetch_data(False)
df.sample(5, random_state=19)


Unnamed: 0,allele,gene_family,whitelisted_taxa,product_name,scope,type,subtype,class,subclass,refseq_protein_accession,...,refseq_gene,refseq_protein2,refseq_genome,refseq_organism,refseq_tax_id,genbank_organism,genbank_strain,genbank_organism2,genbank_strain2,genbank_tax_id
3774,blaOXA-823,blaOXA,,OXA-10 family class D beta-lactamase OXA-823,core,AMR,AMR,BETA-LACTAM,BETA-LACTAM,WP_136512103.1,...,Pseudomonas aeruginosa HUPM19015969 blaOXA,OXA-10 family class D beta-lactamase OXA-823,,Pseudomonas aeruginosa,taxon:287,Pseudomonas aeruginosa,HUPM19015969,Pseudomonas aeruginosa,HUPM19015969,taxon:287
6706,,narB,,ionophore ABC transporter permease subunit NarB,plus,AMR,AMR,IONOPHORE,MADURAMICIN/NARASIN/SALINOMYCIN,,...,,,,,,Enterococcus faecium,WT1145,Enterococcus faecium,WT1145,taxon:1352
139,aac(6')-29a,aac(6')-29,,aminoglycoside N-acetyltransferase AAC(6')-29a,core,AMR,AMR,AMINOGLYCOSIDE,AMINOGLYCOSIDE,WP_064190968.1,...,Pseudomonas aeruginosa aac(6')-29,aminoglycoside N-acetyltransferase AAC(6')-29a,,Pseudomonas aeruginosa,taxon:287,,,Pseudomonas aeruginosa,,taxon:287
7508,,tet(D),,tetracycline efflux MFS transporter Tet(D),core,AMR,AMR,TETRACYCLINE,TETRACYCLINE,WP_001039466.1,...,Shigella sonnei 119 tet(D),tetracycline efflux MFS transporter Tet(D),,Shigella sonnei,taxon:624,Shigella sonnei,119,Shigella sonnei,119,taxon:624
1398,blaCAR-1,blaCAR,,subclass B3 metallo-beta-lactamase CAR-1,core,AMR,AMR,BETA-LACTAM,CARBAPENEM,WP_011094382.1,...,Pectobacterium atrosepticum SCRI1043 blaCAR,subclass B3 metallo-beta-lactamase CAR-1,,Pectobacterium atrosepticum SCRI1043,taxon:218491,Pectobacterium atrosepticum SCRI1043,SCRI1043,Pectobacterium atrosepticum SCRI1043,SCRI1043,taxon:218491


## STRAIN

Because there is quite a bit of data present, my goal is to extract the right name of the bacterial strain. Therefore, I am going to find the best data source of organism name and combine it with the best source of the exact strain name

In [3]:
df.keys()

Index(['allele', 'gene_family', 'whitelisted_taxa', 'product_name', 'scope',
       'type', 'subtype', 'class', 'subclass', 'refseq_protein_accession',
       'refseq_nucleotide_accession', 'curated_refseq_start',
       'genbank_protein_accession', 'genbank_nucleotide_accession',
       'genbank_strand', 'genbank_start', 'genbank_stop', 'refseq_strand',
       'refseq_start', 'refseq_stop', 'pubmed_reference', 'blacklisted_taxa',
       'synonyms', 'hierarchy_node', 'db_version', 'refseq_parent_taxon',
       'refseq_protein', 'refseq_parent_taxon2', 'refseq_gene',
       'refseq_protein2', 'refseq_genome', 'refseq_organism', 'refseq_tax_id',
       'genbank_organism', 'genbank_strain', 'genbank_organism2',
       'genbank_strain2', 'genbank_tax_id'],
      dtype='object')

In [4]:
# Just by looking into this small random sampled dataframe "refseq organism", "genbank_organsim" and "genbank_organism2" yield similar results, 
# altough "refseq_organism" has less information
# "refseq_parent_taxon" and "refseq_parent_taxon2" sometimes have the same information as the others and sometimes have a much higher taxon (e.g. line 8816 "Bacteria")
# For extracting the right organism name I am going to look closer into "refseq_organism", "genbank_organism" and "ganbank_organism2"
df[["refseq_parent_taxon", "refseq_parent_taxon2",  "refseq_organism", "genbank_organism", "genbank_organism2"]].sample(10, random_state=100)

Unnamed: 0,refseq_parent_taxon,refseq_parent_taxon2,refseq_organism,genbank_organism,genbank_organism2
758,,,,Acidiphilium multivorum,Acidiphilium multivorum
9222,Enterobacteriaceae,Enterobacteriaceae,,Shigella sonnei,Shigella sonnei
8816,Bacteria,Bacteria,,Escherichia coli str. K-12 substr. MG1655,Escherichia coli str. K-12 substr. MG1655
3993,Klebsiella oxytoca,Klebsiella oxytoca,Klebsiella oxytoca,Klebsiella oxytoca,Klebsiella oxytoca
4874,Escherichia coli,Escherichia coli,Escherichia coli,Escherichia coli,Escherichia coli
6956,Citrobacter freundii,Citrobacter freundii,Citrobacter freundii,,Citrobacter freundii
277,Pseudomonadaceae,Pseudomonadaceae,Pseudomonas putida,,Pseudomonas putida
3978,Klebsiella michiganensis,Klebsiella michiganensis,Klebsiella michiganensis,Klebsiella michiganensis,Klebsiella michiganensis
8247,Enterococcus,Enterococcus,,Enterococcus faecium,Enterococcus faecium
2520,Lelliottia amnigena,Lelliottia amnigena,Lelliottia amnigena,Lelliottia amnigena,Lelliottia amnigena


In [5]:
# The above impression solifies: refseq organism obivously contains the most empty fields
df[["genbank_organism", "genbank_organism2", "refseq_organism"]].isna().sum()

genbank_organism      368
genbank_organism2       0
refseq_organism      2499
dtype: int64

In [6]:
#cases where organisms in genbank via protein (genbank_organism), genbank via nuclotide and refseq do not match 

diff_organism_df = df.loc[(df["genbank_organism"] != df["genbank_organism2"]) | (df["genbank_organism"] != df["refseq_organism"]), ["genbank_organism", 
                                                                                                                                    "genbank_organism2", 
                                                                                                                                    "refseq_organism"]]
diff_organism_df.sample(10, random_state=8)

Unnamed: 0,genbank_organism,genbank_organism2,refseq_organism
7123,Staphylococcus aureus,Staphylococcus aureus,
6122,Escherichia coli,Escherichia coli,
770,Enterococcus sp. JM4C,Enterococcus sp. JM4C,
8458,Campylobacter jejuni subsp. jejuni NCTC 11168 ...,Campylobacter jejuni subsp. jejuni NCTC 11168 ...,
6540,Enterobacter roggenkampii,Enterobacter roggenkampii,
999,,Acinetobacter baumannii AB4A3,Acinetobacter baumannii AB4A3
8553,Pseudomonas aeruginosa,Pseudomonas aeruginosa,
5718,Escherichia coli,Escherichia coli,
3126,,Pseudomonas aeruginosa,Pseudomonas aeruginosa
5948,Bacillus bingmayongensis,Bacillus bingmayongensis,


In [7]:
# As can be seen there are 35 issues where organism found via genbank (nucleotid) and via genbank (protein) are different 
# What should be done here? Are these alternative names? -- Expert knowledge requiered: I'm going to drop them 
(x1 := diff_organism_df.loc[diff_organism_df["genbank_organism"] != diff_organism_df["genbank_organism2"], ["genbank_organism", "genbank_organism2"]].dropna())

Unnamed: 0,genbank_organism,genbank_organism2
126,Salmonella enterica subsp. enterica serovar Ty...,Salmonella virus Fels2
601,Bacillus anthracis str. Ames,Bacillus phage lambda Ba02
5475,Escherichia coli K-12,Escherichia coli
5697,Clostridioides difficile 630,Peptoclostridium phage p630P2
5721,Bacillus cereus ATCC 14579,Bacillus phage phBC6A52
5942,Bacillus cereus ATCC 14579,Bacillus phage phBC6A52
6214,Bacillus cereus ATCC 14579,Bacillus phage phBC6A52
6217,Staphylococcus epidermidis RP62A,Staphylococcus epidermidis RP62A phage SP-beta
6220,Bacillus phage lambda Ba03,Bacillus phage lambda Ba02
6257,Salmonella enterica subsp. enterica serovar Ty...,Salmonella virus Fels2


In [8]:
# Furthermore there are 5 issues between genbank via nucleotide and refseq 
# Are these alternative names? Expert knowledge requiered: I'm going to drop them as I'm not sure
(x2 := diff_organism_df.loc[diff_organism_df["genbank_organism"] != diff_organism_df["refseq_organism"], ["genbank_organism", 
                                                                                                          "refseq_organism"]].dropna())

Unnamed: 0,genbank_organism,refseq_organism
2032,Acinetobacter sp.,Acinetobacter baumannii
4045,Klebsiella michiganensis,Klebsiella michiganensis M5al
5056,Salmonella enterica,Salmonella enterica subsp. enterica serovar In...
6220,Bacillus phage lambda Ba03,Bacillus anthracis str. Ames
6236,Cytobacillus massiliigabonensis,Bacillus massiliigabonensis


I am going to choose the organism found by genbank via nucleotid as base organism, because it has no NaN and seems to be complete

In [9]:
# drop data instances where I'm unsure 
df.drop(set(x1.index).union(set(x2.index)), inplace=True)

In [10]:
# Genbank_strain carrys either redundant information or no information at all -- genbank_strain2 will therefore be selected
df.loc[(df["genbank_strain2"] != df["genbank_strain"]) & ~(df["genbank_strain"].isna()), ["genbank_strain2", "genbank_strain", "genbank_organism2"]]

Unnamed: 0,genbank_strain2,genbank_strain,genbank_organism2


In [11]:
# Often the strain name in as expected to be in genbank_strain2, somtimes it is already included in genbank organism
# e.g. line 7743: Clostridium sp. MLG080-1 
df.loc[:, ["genbank_strain2", "genbank_organism2"]].sample(10, random_state=50)

Unnamed: 0,genbank_strain2,genbank_organism2
9018,FA19,Neisseria gonorrhoeae
7743,MLG080-1,Clostridium sp. MLG080-1
9165,WHO_U,Neisseria gonorrhoeae
3013,2318902,Pseudomonas aeruginosa
4,FC1K,Mycolicibacterium fortuitum
6795,SKLX003475,Klebsiella pneumoniae
1363,G4074,Elizabethkingia miricola
4295,185584,Pseudomonas aeruginosa
9014,FA19,Neisseria gonorrhoeae
5168,13S00929-3,Escherichia coli


To find the correct and full strain name, I will combine "genbank_strain2" and "genbank_organism2", only if strain is not included in organism. Else, the organism itself will be selected.

In [12]:
def check_for_combination(df_row: pd.Series) -> bool:
    """
    Checks if genbank_organism and genbank_strain should be connected to one strain
    """
    if not isinstance(df_row["genbank_strain2"], str): 
        # don't combine organism and strain if is NaN --> Is this correct? 
        return False
    return not (df_row["genbank_strain2"].upper() in df_row["genbank_organism2"].upper())
    
    
df["strain"] = np.where(df.apply(check_for_combination , axis=1), 
                           df["genbank_organism2"] + " " + df["genbank_strain2"], 
                           df["genbank_organism2"])
df[["genbank_strain2", "genbank_organism2", "strain"]].sample(10, random_state=5)

Unnamed: 0,genbank_strain2,genbank_organism2,strain
5358,,Staphylococcus aureus,Staphylococcus aureus
850,HA-2,Hafnia alvei,Hafnia alvei HA-2
9038,VRCO0432,Klebsiella pneumoniae,Klebsiella pneumoniae VRCO0432
4021,SG271,Klebsiella spallanzanii,Klebsiella spallanzanii SG271
7357,NIPH56,Acinetobacter baumannii,Acinetobacter baumannii NIPH56
5862,13H1,Escherichia coli,Escherichia coli 13H1
2529,HD24,Klebsiella pneumoniae,Klebsiella pneumoniae HD24
613,,uncultured bacterium,uncultured bacterium
8818,K-12,Escherichia coli str. K-12 substr. MG1655,Escherichia coli str. K-12 substr. MG1655
4861,KK19,Klebsiella pneumoniae,Klebsiella pneumoniae KK19


In [13]:
# As can be seen sometimes strain now contains only the parent taxon name (e.g. line Staphylococcus aureus), which is not correct. 
# I will drop instances where strain is no longer than two words. Expert knowledge required  
df = df[~(df["strain"].str.split().str.len() <= 2)]
df[["genbank_strain2", "genbank_organism2", "strain"]].sample(10, random_state=5)

Unnamed: 0,genbank_strain2,genbank_organism2,strain
6455,EC3769,Escherichia coli,Escherichia coli EC3769
5176,CLSiS 1590/96,Klebsiella pneumoniae,Klebsiella pneumoniae CLSiS 1590/96
8198,VC4477,Burkholderia cenocepacia,Burkholderia cenocepacia VC4477
2566,blaCLHK-3,Laribacter hongkongensis,Laribacter hongkongensis blaCLHK-3
1466,BS,Escherichia coli,Escherichia coli BS
5081,3343,Escherichia coli,Escherichia coli 3343
8839,PAO1,Pseudomonas aeruginosa PAO1,Pseudomonas aeruginosa PAO1
3313,268/2C,Acinetobacter baumannii,Acinetobacter baumannii 268/2C
3345,XM1570,Acinetobacter calcoaceticus,Acinetobacter calcoaceticus XM1570
9152,PNUSAS062732,Salmonella enterica,Salmonella enterica PNUSAS062732


## Gene

In [14]:
# Gene is only in one column -- everything Nan is dropped 
# Gene_family was included in orginal dataframe and matches with the found gene -- Should be okay: expert knowledge 
# If gene family is not in found gene (refseq_gene) data instance will be dropped 
# Where do I get the Entrez ID from ? -- Expert knowledge, I think is not existant?
df.dropna(subset="refseq_gene", inplace=True)
df = df[df.apply(lambda x: x["gene_family"] in x["refseq_gene"], axis=1)]
df[["gene_family", "refseq_gene"]].sample(10, random_state=10)

Unnamed: 0,gene_family,refseq_gene
2444,blaKPC,Klebsiella pneumoniae 1B blaKPC
7566,tet(M),Staphylococcus aureus 4520 tet(M)
5341,blaVIM,Klebsiella oxytoca NMI1536_13 unnamed blaVIM
5226,blaTMB,Acinetobacter pittii MRY12-0142 blaTMB
4928,blaSHV,Klebsiella pneumoniae 86-4 blaSHV
4600,blaPDC,Pseudomonas aeruginosa Paebeta-25 blaPDC
3715,blaOXA,Acinetobacter baumannii 17-VT7014T-1 blaOXA
5030,blaTEM,Klebsiella pneumoniae Va32447 blaTEM
3169,blaOXA,Acinetobacter baumannii 255 blaOXA
2386,blaKPC,Klebsiella pneumoniae ES06_R04 blaKPC


## Search for connection to Wikidata

In [15]:
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("https://query.wikidata.org/sparql")

In [16]:


def search_parent_taxon(df_row: pd.Series) -> str:
    """
    Searches in wikidata for a species of bacterium which matches the two first words of strain
    Sometime we have abbreviations for example Achromobacter sp. -- nobody knows if this is rather Achromobacter spanius or Achromobacter spiritinus -- This is all rather unusable
    """
    parent = df_row["strain"].split()[:2]
    print(df_row["strain"])
    abbreviation = False
    for i, word in enumerate(parent): 
        if word[-1] == ".": 
            abbreviation = True
            parent[i] = word[:-1]
    parent = " ".join(parent).lower()
    query = f"""SELECT ?item ?itemLabel ?itemDescription
    WHERE {{
      ?item rdfs:label ?label;
            schema:description "species of bacterium"@en.
      
      FILTER(LANG(?label) = "en" && CONTAINS(LCASE(?label), "{parent}"))
      
      SERVICE wikibase:label {{ bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }}
    }}
    LIMIT 10
    """
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    while True:
        try:
            results = sparql.query().convert().get("results").get("bindings")
            break
        except (urllib.error.HTTPError, urllib.request.HTTPError): 
            time.sleep(5)
        except: 
            pass
    results = [(item.get("item").get("value"), item.get("itemLabel").get("value")) for item in results]
    if not results:
        return None
    if abbreviation: 
        return [res[0] for res in results] if len(results) > 1 else results[0][0]
    else: 
        for res in results: 
            if res[1].lower() in parent: 
                return res[0]


get_from_wikidata_switch = False
if get_from_wikidata_switch:
    # Will take ~ 2h
    df["parent_taxon"] = df.apply(search_parent_taxon, axis=1)
    df.to_csv("resistance_df2.csv", index=False)
else: 
    df = pd.read_csv("resistance_df2.csv")

In [17]:
# The dataframe now contains a column called parent taxon which leads to the corresponding wikidata species of bacterium 
# Some parent taxon could not be found -- will be dropped later 
df[["product_name", "refseq_organism",  "strain", "parent_taxon", "refseq_gene"]].sample(5, random_state=9)

Unnamed: 0,product_name,refseq_organism,strain,parent_taxon,refseq_gene
1420,extended-spectrum class A beta-lactamase CTX-M-74,Enterobacter cloacae,Enterobacter cloacae JF216,http://www.wikidata.org/entity/Q4038096,Enterobacter cloacae JF216 blaCTX-M
5304,tetracycline efflux MFS transporter Tet(L),Latilactobacillus sakei,Latilactobacillus sakei Rits9,,Lactobacillus sakei Rits9 pLS55 tet(L)
1453,class C beta-lactamase DHA-18,Morganella morganii,Morganella morganii 984080,http://www.wikidata.org/entity/Q2696880,Morganella morganii 984080 blaDHA
3415,class C beta-lactamase PDC-156,Pseudomonas aeruginosa,Pseudomonas aeruginosa 1231451,http://www.wikidata.org/entity/Q31856,Pseudomonas aeruginosa 1231451 blaPDC
780,extended-spectrum class C beta-lactamase ADC-256,Acinetobacter baumannii,Acinetobacter baumannii 20A3025,http://www.wikidata.org/entity/Q3241189,Acinetobacter baumannii 20A3025 blaADC


### Handle special cases

For example: Escherichia coli K-12 
I found this randomly -- could be more -- expert knowledge required

In [18]:
df.keys()

Index(['allele', 'gene_family', 'whitelisted_taxa', 'product_name', 'scope',
       'type', 'subtype', 'class', 'subclass', 'refseq_protein_accession',
       'refseq_nucleotide_accession', 'curated_refseq_start',
       'genbank_protein_accession', 'genbank_nucleotide_accession',
       'genbank_strand', 'genbank_start', 'genbank_stop', 'refseq_strand',
       'refseq_start', 'refseq_stop', 'pubmed_reference', 'blacklisted_taxa',
       'synonyms', 'hierarchy_node', 'db_version', 'refseq_parent_taxon',
       'refseq_protein', 'refseq_parent_taxon2', 'refseq_gene',
       'refseq_protein2', 'refseq_genome', 'refseq_organism', 'refseq_tax_id',
       'genbank_organism', 'genbank_strain', 'genbank_organism2',
       'genbank_strain2', 'genbank_tax_id', 'strain', 'parent_taxon'],
      dtype='object')

In [19]:
df["parent_taxon"] = np.where((df["genbank_organism2"].str.lower().str.contains("escherichia coli")) & (df["genbank_strain2"] == "K-12"), 
                             "https://www.wikidata.org/entity/Q21399437", 
                             df["parent_taxon"])
df[df["parent_taxon"] == "https://www.wikidata.org/entity/Q21399437"]

Unnamed: 0,allele,gene_family,whitelisted_taxa,product_name,scope,type,subtype,class,subclass,refseq_protein_accession,...,refseq_genome,refseq_organism,refseq_tax_id,genbank_organism,genbank_strain,genbank_organism2,genbank_strain2,genbank_tax_id,strain,parent_taxon
459,,aph(4)-Ia,,aminoglycoside O-phosphotransferase APH(4)-Ia,core,AMR,AMR,AMINOGLYCOSIDE,HYGROMYCIN,WP_000742814.1,...,,Escherichia coli K-12,taxon:83333,Escherichia coli K-12,K-12,Escherichia coli K-12,K-12,taxon:83333,Escherichia coli K-12,https://www.wikidata.org/entity/Q21399437
4443,,catA2,,type A-2 chloramphenicol O-acetyltransferase C...,core,AMR,AMR,PHENICOL,CHLORAMPHENICOL,WP_012477888.1,...,,Escherichia coli K-12,taxon:83333,Escherichia coli K-12,K-12,Escherichia coli K-12,K-12,taxon:83333,Escherichia coli K-12,https://www.wikidata.org/entity/Q21399437


## Drop useless data instances

In [20]:
# drop rows where no wikidata parent taxon (species of bacterium) was found 
df = df.dropna(subset="parent_taxon")

In [21]:
# These data instances are made up of abbreviations, which make it unclear to which taxon they belong. 
# e.g. Streptomyces sp. 769 could belong to Streptomyces sp. myrophorea (Q60748847), Streptomyces spiramyceticus (Q104909301) or Streptomyces sporangiiformans (Q104957131)
# Expert Knowledge required -- They also need to be dropped 
df.loc[:, "parent_taxon"] = df["parent_taxon"].astype(str)
df.loc[df["parent_taxon"].str.contains("\[|\]"), ["strain", "parent_taxon"]]

Unnamed: 0,strain,parent_taxon
38,Streptomyces sp. 769,"['http://www.wikidata.org/entity/Q60748845', '..."
46,Streptomyces sp. GBA 94-10 4N24,"['http://www.wikidata.org/entity/Q60748845', '..."
52,Streptomyces sp. SPB78,"['http://www.wikidata.org/entity/Q60748845', '..."
59,Streptomyces sp. NRRL S-1831,"['http://www.wikidata.org/entity/Q60748845', '..."
74,Streptomyces sp. MBRL 601,"['http://www.wikidata.org/entity/Q60748845', '..."
80,Streptomyces sp. M10,"['http://www.wikidata.org/entity/Q60748845', '..."
82,Streptomyces sp. KE1,"['http://www.wikidata.org/entity/Q60748845', '..."
89,Streptomyces sp. NRRL F-4711,"['http://www.wikidata.org/entity/Q60748845', '..."
90,Streptomyces sp. NRRL F-4707,"['http://www.wikidata.org/entity/Q60748845', '..."
1953,Streptomyces sp. NRRL S-1868,"['http://www.wikidata.org/entity/Q60748845', '..."


In [22]:
df = df[~df["parent_taxon"].str.contains("\[|\]")]

In [23]:
len(df) # About half of the data is lost after everything is clearead

4563

In [24]:
# I now have a dataframe which contains antibiotic resistance class / subclass, protein name, gene, species of bacterium (wikidata) and bacterial strain 
# Could be implemented into wikidata like this 
df[["class", "subclass", "product_name", "refseq_gene", "parent_taxon", "strain"]].sample(5, random_state=5)

Unnamed: 0,class,subclass,product_name,refseq_gene,parent_taxon,strain
3752,BETA-LACTAM,CEPHALOSPORIN,inhibitor-resistant class C beta-lactamase PDC...,Pseudomonas aeruginosa 208176 blaPDC,http://www.wikidata.org/entity/Q31856,Pseudomonas aeruginosa 208176
266,AMINOGLYCOSIDE,KANAMYCIN/TOBRAMYCIN,aminoglycoside 6'-N-acetyltransferase AacA34,Klebsiella pneumoniae KP-PNK-1 aacA34,http://www.wikidata.org/entity/Q132592,Klebsiella pneumoniae KP-PNK-1
543,BETA-LACTAM,CEPHALOSPORIN,cephalosporin-hydrolyzing class C beta-lactama...,Enterobacter cloacae 963327 blaACT,http://www.wikidata.org/entity/Q4038096,Enterobacter cloacae 963327
5074,QUINOLONE,QUINOLONE,quinolone resistance pentapeptide repeat prote...,Citrobacter freundii V1 pCFV1 qnrB,http://www.wikidata.org/entity/Q5122842,Citrobacter freundii V1
1095,BETA-LACTAM,CEPHALOSPORIN,class C beta-lactamase CMY-124,Citrobacter freundii DNS-2 blaCMY,http://www.wikidata.org/entity/Q5122842,Citrobacter freundii DNS-2


In [25]:
# NCBI taxonomy ID is in all data instances regardsless of access via refseq or genbank the same 
any(df["genbank_tax_id"] != df["refseq_tax_id"])

False

In [26]:
# But the NCBI taxonomy ID has not always enough depth -- e.g. line 3664 taxon 470 -> https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi leads to acinetobacter baumannii 
# but not to the corresponding strain (16-02P46T-1) which does not exist yet. 
# Line 59 Serratia marcescens W2.3 leads to the wanted result: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi --> 1218513
# What sould be done here? Expert knowledge required -- I don't feel well implementing the data without the correct taxonomy ID
# I will drop rows where strain and genbank_organism2 don't match - basically undo combination step further up
df[["strain", "genbank_organism2", "genbank_strain2", "genbank_tax_id"]].sample(10, random_state=1)

Unnamed: 0,strain,genbank_organism2,genbank_strain2,genbank_tax_id
5129,Citrobacter braakii 107,Citrobacter braakii,107,taxon:57706
2998,Acinetobacter baumannii 16-02P46T-1,Acinetobacter baumannii,16-02P46T-1,taxon:470
3013,Acinetobacter baumannii 17A1872,Acinetobacter baumannii,17A1872,taxon:470
4113,Klebsiella pneumoniae 1409130,Klebsiella pneumoniae,1409130,taxon:573
30,Serratia marcescens W2.3,Serratia marcescens W2.3,W2.3,taxon:1218513
3965,Pseudomonas libanensis DSM 17149,Pseudomonas libanensis,DSM 17149,taxon:75588
865,Acinetobacter baumannii 23A3701,Acinetobacter baumannii,23A3701,taxon:470
4389,Vibrio alginolyticus Vb1833,Vibrio alginolyticus,Vb1833,taxon:663
445,Salmonella enterica subsp. enterica serovar Ty...,Salmonella enterica subsp. enterica serovar Ty...,,taxon:90371
3497,Pseudomonas aeruginosa 163604,Pseudomonas aeruginosa,163604,taxon:287


In [27]:
# These would be the data instances where I feel comfortable, including into Wikidata, because I have the correct NCBI taxonomy ID for others to check. 
# Now there are only 312 instances left 
df = df.loc[df.apply(lambda x: str(x["genbank_strain2"]) in x["genbank_organism2"], axis=1), :]
print(len(df))
df[["strain", "genbank_organism2", "genbank_strain2", "genbank_tax_id"]].sample(10)

312


Unnamed: 0,strain,genbank_organism2,genbank_strain2,genbank_tax_id
248,Macrococcus caseolyticus JCSC5402,Macrococcus caseolyticus JCSC5402,JCSC5402,taxon:458233
20,Saccharophagus degradans 2-40,Saccharophagus degradans 2-40,2-40,taxon:203122
2594,Acinetobacter bouvetii DSM 14964 = CIP 107468,Acinetobacter bouvetii DSM 14964 = CIP 107468,CIP 107468,taxon:1120925
4649,Mycobacterium tuberculosis CCDC5180,Mycobacterium tuberculosis CCDC5180,CCDC5180,taxon:443150
1022,Vibrio parahaemolyticus S105,Vibrio parahaemolyticus S105,S105,taxon:1394641
5461,Desulfitobacterium hafniense PCP-1,Desulfitobacterium hafniense PCP-1,PCP-1,taxon:1090321
4715,Vibrio cholerae MO10,Vibrio cholerae MO10,MO10,taxon:345072
2560,Acinetobacter calcoaceticus ANC 3680,Acinetobacter calcoaceticus ANC 3680,ANC 3680,taxon:1217653
627,Acinetobacter baumannii AYE,Acinetobacter baumannii AYE,AYE,taxon:509173
2256,Acinetobacter baumannii A424,Acinetobacter baumannii A424,A424,taxon:1082934


In [28]:
df.to_csv("resistance_df3.csv", index=False)

## Results

After selecting and combining the best data sources (see the commented code above), a dataframe remains containing information about the protein, its encoding gene, and the bacterial strain. To establish a connection with Wikidata, the first two words of the bacterial strain (often the name of the corresponding bacterial species) are used in a SPARQL Wikidata query to find the associated bacterial species or its Wikidata qualifier. In addition to the respective names, identifiers are available to link each name with the NCBI database. This connection is crucial to provide users with more comprehensive information and allow experts to make improvements.

Furthermore, data is cleaned if there are missing or conflicting details at critical points. The most extensive cleaning, associated with the greatest data loss, occurs when it was observed that the found NCBI Taxonomy ID does not always refer to the bacterial strain but rather to higher taxa. This is unacceptable for Wikidata implementation, leading to the removal of such instances.

In the final step, the data is implemented in Wikidata. Using "pywikibots," a connection is established. Firstly, if not already present, the bacterial strain is implemented with the NCBI taxonomy ID. Subsequently, the identified gene is implemented, referencing the bacterial strain. Finally, the gene encodes a protein that makes the bacterium antibiotic-resistant. This protein is also implemented with a corresponding reference to the gene and the Quick-Go reference "response to antibiotic" (GO:0046677).

In [29]:
df[df["genbank_organism2"] == "Sphingobium indicum B90A"]

Unnamed: 0,allele,gene_family,whitelisted_taxa,product_name,scope,type,subtype,class,subclass,refseq_protein_accession,...,refseq_genome,refseq_organism,refseq_tax_id,genbank_organism,genbank_strain,genbank_organism2,genbank_strain2,genbank_tax_id,strain,parent_taxon
4043,blaSGM-3,blaSGM,,class A beta-lactamase SGM-3,core,AMR,AMR,BETA-LACTAM,BETA-LACTAM,WP_007688792.1,...,,Sphingobium indicum B90A,taxon:861109,Sphingobium indicum B90A,B90A,Sphingobium indicum B90A,B90A,taxon:861109,Sphingobium indicum B90A,http://www.wikidata.org/entity/Q18392374
4155,blaSIE-1,blaSIE,,subclass B3 metallo-beta-lactamase SIE-1,core,AMR,AMR,BETA-LACTAM,BETA-LACTAM,WP_007683232.1,...,,Sphingobium indicum B90A,taxon:861109,Sphingobium indicum B90A,B90A,Sphingobium indicum B90A,B90A,taxon:861109,Sphingobium indicum B90A,http://www.wikidata.org/entity/Q18392374


In [30]:
df.iloc[215:]

Unnamed: 0,allele,gene_family,whitelisted_taxa,product_name,scope,type,subtype,class,subclass,refseq_protein_accession,...,refseq_genome,refseq_organism,refseq_tax_id,genbank_organism,genbank_strain,genbank_organism2,genbank_strain2,genbank_tax_id,strain,parent_taxon
4043,blaSGM-3,blaSGM,,class A beta-lactamase SGM-3,core,AMR,AMR,BETA-LACTAM,BETA-LACTAM,WP_007688792.1,...,,Sphingobium indicum B90A,taxon:861109,Sphingobium indicum B90A,B90A,Sphingobium indicum B90A,B90A,taxon:861109,Sphingobium indicum B90A,http://www.wikidata.org/entity/Q18392374
4044,blaSGM-4,blaSGM,,class A beta-lactamase SGM-4,core,AMR,AMR,BETA-LACTAM,BETA-LACTAM,WP_013846594.1,...,,Sphingobium chlorophenolicum L-1,taxon:690566,Sphingobium chlorophenolicum L-1,L-1,Sphingobium chlorophenolicum L-1,L-1,taxon:690566,Sphingobium chlorophenolicum L-1,http://www.wikidata.org/entity/Q7576783
4048,blaSHD-1,blaSHD,,subclass B1 metallo-beta-lactamase SHD-1,core,AMR,AMR,BETA-LACTAM,CARBAPENEM,WP_011495276.1,...,,Shewanella denitrificans OS217,taxon:318161,Shewanella denitrificans OS217,OS217,Shewanella denitrificans OS217,OS217,taxon:318161,Shewanella denitrificans OS217,http://www.wikidata.org/entity/Q21327861
4049,blaSHN-1,blaSHN,,subclass B1 metallo-beta-lactamase SHN-1,core,AMR,AMR,BETA-LACTAM,CARBAPENEM,WP_011497575.1,...,,Shewanella denitrificans OS217,taxon:318161,Shewanella denitrificans OS217,OS217,Shewanella denitrificans OS217,OS217,taxon:318161,Shewanella denitrificans OS217,http://www.wikidata.org/entity/Q21327861
4155,blaSIE-1,blaSIE,,subclass B3 metallo-beta-lactamase SIE-1,core,AMR,AMR,BETA-LACTAM,BETA-LACTAM,WP_007683232.1,...,,Sphingobium indicum B90A,taxon:861109,Sphingobium indicum B90A,B90A,Sphingobium indicum B90A,B90A,taxon:861109,Sphingobium indicum B90A,http://www.wikidata.org/entity/Q18392374
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5562,,vanX-I,,D-Ala-D-Ala dipeptidase VanX-I,core,AMR,AMR,GLYCOPEPTIDE,VANCOMYCIN,WP_015943580.1,...,,Desulfitobacterium hafniense DCB-2,taxon:272564,Desulfitobacterium hafniense DCB-2,DCB-2,Desulfitobacterium hafniense DCB-2,DCB-2,taxon:272564,Desulfitobacterium hafniense DCB-2,http://www.wikidata.org/entity/Q3706079
5614,,vatI,,streptogramin A O-acetyltransferase Vat(I),core,AMR,AMR,STREPTOGRAMIN,STREPTOGRAMIN,WP_071218948.1,...,,Paenibacillus sp. LC231,taxon:1120679,Paenibacillus sp. LC231,LC231,Paenibacillus sp. LC231,LC231,taxon:1120679,Paenibacillus sp. LC231,http://www.wikidata.org/entity/Q26270468
5626,,vga(G),,ABC-F type ribosomal protection protein Vga(G),core,AMR,AMR,LINCOSAMIDE,LINCOSAMIDE,WP_010989615.1,...,,Listeria monocytogenes EGD-e,taxon:169963,Listeria monocytogenes EGD-e,EGD-e,Listeria monocytogenes EGD-e,EGD-e,taxon:169963,Listeria monocytogenes EGD-e,http://www.wikidata.org/entity/Q292015
5628,,vgbC,,streptogramin B lyase Vgb(C),core,AMR,AMR,STREPTOGRAMIN,STREPTOGRAMIN,WP_071219112.1,...,,Paenibacillus sp. LC231,taxon:1120679,Paenibacillus sp. LC231,LC231,Paenibacillus sp. LC231,LC231,taxon:1120679,Paenibacillus sp. LC231,http://www.wikidata.org/entity/Q26270468


In [31]:
# Including the bacterial strain into wikidata -- if it is not present upon creation it will be created 

from WikidataAdder import StrainWikidataAdder


def wikidata_wrapper(df_row: pd.Series) -> str | None: 
    wa = StrainWikidataAdder(df_row, sim=False) # Set sim to True unless you really want to include data into wikidata
    created = wa.create_strain()
    if created != 0:
        wa.add_instance_of_strain()
        wa.add_taxon_name()
        wa.add_parent_taxon()
        wa.add_ncbi_taxonomy_id()
        if created == 2:
            print("wait after creation")
            time.sleep(300) # This is needed to let wikidata enter newly create item into wikidata -- else risk of duplicate creation is high
        return wa.strain_page.getID()
    return None

In [32]:

# z = df.iloc[1:2].copy()
df["strain_wd_id"] = df.apply(wikidata_wrapper, axis=1)

[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124664031'}}]
description already exists
alias already exists
Q124664031
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124663344'}}]
description already exists
alias already exists
Q124663344
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q21102987'}}]
description already exists
alias already exists
Q21102987
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q21398890'}}]
description already exists
alias already exists
Q21398890
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q21398562'}}]
description already exists
alias already exists
Q21398562
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124

In [33]:
df[["genbank_start", "genbank_stop", "refseq_gene", "gene_family", "strain_wd_id"]]

Unnamed: 0,genbank_start,genbank_stop,refseq_gene,gene_family,strain_wd_id
0,7204881,7205417,Pseudomonas aeruginosa PA38182 aac(2')-I(A267),aac(2')-I(A267),Q124664031
2,1,723,Paenibacillus sp. LC231 aac(2')-IIb,aac(2')-IIb,Q124663344
4,373,918,Mycobacterium tuberculosis H37Rv aac(2')-Ic,aac(2')-Ic,Q21102987
20,2333620,2334096,Saccharophagus degradans 2-40 aac(3)-Ig,aac(3)-Ig,Q21398890
21,638262,638720,Sphingopyxis alaskensis RB2256 aac(3)-Ii,aac(3)-Ii,Q21398562
...,...,...,...,...,...
5562,1776505,1777113,Desulfitobacterium hafniense DCB-2 vanX-I,vanX-I,Q124704709
5614,1,633,Paenibacillus sp. LC231 vatI,vatI,Q124663344
5626,66031,67602,Listeria monocytogenes EGD-e vga(G),vga(G),Q21102981
5628,1,891,Paenibacillus sp. LC231 vgbC,vgbC,Q124663344


In [34]:
# Including gene into wikidata -- Entrez Gene ID is missing here 

from WikidataAdder import GeneWikidataAdder


def gene_wrapper(df_row: pd.Series) -> str | None: 
    gwa = GeneWikidataAdder(df_row, False)
    success = gwa.create_gene()
    if success:
        gwa.add_instance_of_gene()
        gwa.add_subclass_of()
        gwa.add_found_in_taxon()
        return gwa.gene_page.getID()
    return None

In [35]:
df["wd_gene_id"] = df.apply(gene_wrapper, axis=1)    

[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124678817'}}]
description already exists
Q124678817
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124683975'}}]
description already exists
Q124683975
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124704728'}}]
description already exists
Q124704728
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124704731'}}]
description already exists
Q124704731
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124704734'}}]
description already exists
Q124704734
claim already existent
claim alr

In [36]:
# Including Protein into wikidata -- some proteins are encoded by the same bacterial strain 
# Thats why some proteins have multiple "found in taxon" Qualifiers 

from WikidataAdder import ProteinWikidataAdder

def protein_wrapper(df_row: pd.Series) -> str | None: 
    pwa = ProteinWikidataAdder(df_row, False)
    created = pwa.create_protein()
    if created != 0:
        pwa.add_instance_of()
        pwa.add_subclass_of()
        pwa.add_found_in_taxon()
        pwa.add_encoded_by()
        pwa.add_identifier()
        pwa.add_antibiotic_resistance()
        if created == 2: 
            print("wait after creation")
            time.sleep(300)
        return pwa.protein_page.getID()
    return None

In [37]:
df["wd_protein_id"] = df.apply(protein_wrapper, axis=1)

[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124683955'}}]
description already exists
Q124683955
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124683978'}}]
description already exists
Q124683978
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707533'}}]
description already exists
Q124707533
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707538'}}]
description already exists
Q124707538
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/

Sleeping for 9.3 seconds, 2024-03-03 20:55:13


Q124707638
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.3 seconds, 2024-03-03 20:55:23


Q124707638
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.2 seconds, 2024-03-03 20:55:33


Q124707638
claim already existent
claim already existent


Sleeping for 9.6 seconds, 2024-03-03 20:55:43
Sleeping for 9.6 seconds, 2024-03-03 20:55:53
Sleeping for 9.6 seconds, 2024-03-03 20:56:03


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.2 seconds, 2024-03-03 20:56:13


Q124707638
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 20:56:23
Sleeping for 9.4 seconds, 2024-03-03 20:56:33
Sleeping for 9.6 seconds, 2024-03-03 20:56:43


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.2 seconds, 2024-03-03 20:56:53


Q124707638
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 20:57:03
Sleeping for 9.5 seconds, 2024-03-03 20:57:13
Sleeping for 9.6 seconds, 2024-03-03 20:57:23


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.2 seconds, 2024-03-03 20:57:33


Q124707638
claim already existent
claim already existent


Sleeping for 9.6 seconds, 2024-03-03 20:57:43
Sleeping for 9.6 seconds, 2024-03-03 20:57:53
Sleeping for 9.6 seconds, 2024-03-03 20:58:03


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.2 seconds, 2024-03-03 20:58:13


Q124707638
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 20:58:23
Sleeping for 9.6 seconds, 2024-03-03 20:58:33
Sleeping for 9.6 seconds, 2024-03-03 20:58:43


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.2 seconds, 2024-03-03 20:58:53


Q124707638
claim already existent
claim already existent


Sleeping for 9.6 seconds, 2024-03-03 20:59:03
Sleeping for 9.6 seconds, 2024-03-03 20:59:13
Sleeping for 9.6 seconds, 2024-03-03 20:59:23


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.2 seconds, 2024-03-03 20:59:33


Q124707638
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 20:59:43
Sleeping for 9.6 seconds, 2024-03-03 20:59:53
Sleeping for 9.6 seconds, 2024-03-03 21:00:03


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:00:14


Q124707638
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:00:23
Sleeping for 9.5 seconds, 2024-03-03 21:00:33
Sleeping for 9.6 seconds, 2024-03-03 21:00:43


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:00:54


Q124707638
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:01:03
Sleeping for 9.5 seconds, 2024-03-03 21:01:13
Sleeping for 9.6 seconds, 2024-03-03 21:01:23


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707694'}}]
description already exists
Q124707694
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707698'}}]
description already exists
Q124707698
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707700'}}]
description already exists
Q124707700
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707703'}}]
description already exists
Q124707703
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'h

Sleeping for 6.3 seconds, 2024-03-03 21:01:37


Q124707716
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:01:43
Sleeping for 9.4 seconds, 2024-03-03 21:01:53
Sleeping for 9.6 seconds, 2024-03-03 21:02:03


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707716'}}]


Sleeping for 9.3 seconds, 2024-03-03 21:02:14


Q124707716
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:02:23
Sleeping for 9.6 seconds, 2024-03-03 21:02:33
Sleeping for 9.6 seconds, 2024-03-03 21:02:43


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707716'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:02:54


Q124707716
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707716'}}]


Sleeping for 9.3 seconds, 2024-03-03 21:03:04


Q124707716
claim already existent
claim already existent


Sleeping for 9.6 seconds, 2024-03-03 21:03:13
Sleeping for 9.5 seconds, 2024-03-03 21:03:23
Sleeping for 9.5 seconds, 2024-03-03 21:03:33


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:03:44


Q124707638
claim already existent
claim already existent


Sleeping for 9.4 seconds, 2024-03-03 21:03:54
Sleeping for 9.6 seconds, 2024-03-03 21:04:03
Sleeping for 9.5 seconds, 2024-03-03 21:04:13


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:04:24


Q124707638
claim already existent
claim already existent


Sleeping for 9.4 seconds, 2024-03-03 21:04:34
Sleeping for 9.6 seconds, 2024-03-03 21:04:43
Sleeping for 9.5 seconds, 2024-03-03 21:04:53


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:05:04


Q124707638
claim already existent
claim already existent


Sleeping for 9.4 seconds, 2024-03-03 21:05:14
Sleeping for 9.6 seconds, 2024-03-03 21:05:23
Sleeping for 9.5 seconds, 2024-03-03 21:05:34


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.0 seconds, 2024-03-03 21:05:44


Q124707638
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:05:54
Sleeping for 9.5 seconds, 2024-03-03 21:06:04
Sleeping for 9.5 seconds, 2024-03-03 21:06:14


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:06:24


Q124707638
claim already existent
claim already existent


Sleeping for 9.4 seconds, 2024-03-03 21:06:34
Sleeping for 9.6 seconds, 2024-03-03 21:06:44
Sleeping for 9.5 seconds, 2024-03-03 21:06:54


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.0 seconds, 2024-03-03 21:07:04


Q124707638
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:07:14
Sleeping for 9.6 seconds, 2024-03-03 21:07:24
Sleeping for 9.4 seconds, 2024-03-03 21:07:34


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:07:44


Q124707638
claim already existent
claim already existent


Sleeping for 9.4 seconds, 2024-03-03 21:07:54
Sleeping for 9.6 seconds, 2024-03-03 21:08:04
Sleeping for 9.6 seconds, 2024-03-03 21:08:14


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:08:24


Q124707638
claim already existent
claim already existent


Sleeping for 9.4 seconds, 2024-03-03 21:08:34
Sleeping for 9.6 seconds, 2024-03-03 21:08:44
Sleeping for 9.5 seconds, 2024-03-03 21:08:54


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:09:04


Q124707638
claim already existent
claim already existent


Sleeping for 9.4 seconds, 2024-03-03 21:09:14
Sleeping for 9.5 seconds, 2024-03-03 21:09:24
Sleeping for 9.5 seconds, 2024-03-03 21:09:34


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:09:44


Q124707638
claim already existent
claim already existent


Sleeping for 9.4 seconds, 2024-03-03 21:09:54
Sleeping for 9.6 seconds, 2024-03-03 21:10:04
Sleeping for 9.2 seconds, 2024-03-03 21:10:14


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:10:24


Q124707638
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:10:34
Sleeping for 9.5 seconds, 2024-03-03 21:10:44
Sleeping for 9.5 seconds, 2024-03-03 21:10:54


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.0 seconds, 2024-03-03 21:11:04


Q124707638
claim already existent
claim already existent


Sleeping for 9.4 seconds, 2024-03-03 21:11:14
Sleeping for 9.6 seconds, 2024-03-03 21:11:24
Sleeping for 9.5 seconds, 2024-03-03 21:11:34


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:11:44


Q124707638
claim already existent
claim already existent


Sleeping for 9.4 seconds, 2024-03-03 21:11:54
Sleeping for 9.5 seconds, 2024-03-03 21:12:04
Sleeping for 9.4 seconds, 2024-03-03 21:12:14


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:12:24


Q124707638
claim already existent
claim already existent


Sleeping for 9.4 seconds, 2024-03-03 21:12:34
Sleeping for 9.6 seconds, 2024-03-03 21:12:44
Sleeping for 9.4 seconds, 2024-03-03 21:12:54


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:13:04


Q124707638
claim already existent
claim already existent


Sleeping for 9.4 seconds, 2024-03-03 21:13:14
Sleeping for 9.6 seconds, 2024-03-03 21:13:24
Sleeping for 9.5 seconds, 2024-03-03 21:13:34


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707638'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:13:44


Q124707638
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:13:54
Sleeping for 9.5 seconds, 2024-03-03 21:14:04
Sleeping for 9.4 seconds, 2024-03-03 21:14:14


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707729'}}]
description already exists
Q124707729
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707734'}}]
description already exists
Q124707734
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707734'}}]


Sleeping for 8.2 seconds, 2024-03-03 21:14:25


Q124707734
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707734'}}]


Sleeping for 8.9 seconds, 2024-03-03 21:14:35


Q124707734
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707743'}}]
description already exists
Q124707743
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707745'}}]
description already exists
Q124707745
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707745'}}]


Sleeping for 8.4 seconds, 2024-03-03 21:14:45


Q124707745
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707752'}}]
description already exists
Q124707752
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707755'}}]
description already exists
Q124707755
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707756'}}]
description already exists
Q124707756
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707757'}}]
description already exists
Q124707757
claim already existent
claim already existent
cla

Sleeping for 7.5 seconds, 2024-03-03 21:14:56


Q124707757
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:15:04
Sleeping for 9.5 seconds, 2024-03-03 21:15:14
Sleeping for 9.6 seconds, 2024-03-03 21:15:24


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707761'}}]
description already exists
Q124707761
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707764'}}]
description already exists
Q124707764
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707764'}}]


Sleeping for 8.3 seconds, 2024-03-03 21:15:35


Q124707764
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:15:44
Sleeping for 9.3 seconds, 2024-03-03 21:15:54
Sleeping for 9.6 seconds, 2024-03-03 21:16:04


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707764'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:16:14


Q124707764
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:16:24
Sleeping for 9.6 seconds, 2024-03-03 21:16:34
Sleeping for 9.5 seconds, 2024-03-03 21:16:44


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707764'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:16:54


Q124707764
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:17:04
Sleeping for 9.6 seconds, 2024-03-03 21:17:14
Sleeping for 9.5 seconds, 2024-03-03 21:17:24


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707764'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:17:35


Q124707764
claim already existent
claim already existent


Sleeping for 9.6 seconds, 2024-03-03 21:17:44
Sleeping for 9.3 seconds, 2024-03-03 21:17:54
Sleeping for 9.5 seconds, 2024-03-03 21:18:04
ERROR: Traceback (most recent call last):
  File "/home/finn/Documents/python-venvs/lodkg/lib/python3.10/site-packages/pywikibot/data/api/_requests.py", line 682, in _http_request
    response = http.request(self.site, uri=uri,
  File "/home/finn/Documents/python-venvs/lodkg/lib/python3.10/site-packages/pywikibot/comms/http.py", line 283, in request
    r = fetch(baseuri, headers=headers, **kwargs)
  File "/home/finn/Documents/python-venvs/lodkg/lib/python3.10/site-packages/pywikibot/comms/http.py", line 457, in fetch
    callback(response)
  File "/home/finn/Documents/python-venvs/lodkg/lib/python3.10/site-packages/pywikibot/comms/http.py", line 333, in error_handling_callback
    raise ServerError(response)
pywikibot.exceptions.ServerError: HTTPSConnectionPool(host='www.wikidata.org', port=443): Read timed out. (read timeout=45)



claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707791'}}]
description already exists
Q124707791
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707791'}}]


Sleeping for 7.9 seconds, 2024-03-03 21:35:41


Q124707791
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:35:49
Sleeping for 9.4 seconds, 2024-03-03 21:35:59
Sleeping for 9.6 seconds, 2024-03-03 21:36:09


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707791'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:36:20


Q124707791
claim already existent
claim already existent


Sleeping for 9.6 seconds, 2024-03-03 21:36:29
Sleeping for 9.6 seconds, 2024-03-03 21:36:39
Sleeping for 9.6 seconds, 2024-03-03 21:36:49


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707800'}}]
description already exists
Q124707800
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707800'}}]


Sleeping for 8.2 seconds, 2024-03-03 21:37:01


Q124707800
claim already existent
claim already existent


Sleeping for 9.5 seconds, 2024-03-03 21:37:09
Sleeping for 9.5 seconds, 2024-03-03 21:37:19
Sleeping for 9.6 seconds, 2024-03-03 21:37:29


claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707810'}}]
description already exists
Q124707810
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707822'}}]
description already exists
Q124707822
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707827'}}]
description already exists
Q124707827
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124707829'}}]
description already exists
Q124707829
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'h

Sleeping for 9.2 seconds, 2024-03-03 21:38:43


Q124717549
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124717549'}}]


Sleeping for 9.3 seconds, 2024-03-03 21:38:53


Q124717549
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124718026'}}]
description already exists
Q124718026
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124718407'}}]
description already exists
Q124718407
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124718770'}}]
description already exists
Q124718770
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124718809'}}]
description already exists
Q124718809
claim already existent
claim already existent
cla

Sleeping for 9.2 seconds, 2024-03-03 21:39:13


Q124718979
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124718979'}}]


Sleeping for 9.3 seconds, 2024-03-03 21:39:23


Q124718979
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124718979'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:39:33


Q124718979
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124718994'}}]


Sleeping for 8.9 seconds, 2024-03-03 21:39:43


Q124718994
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124718994'}}]


Sleeping for 9.3 seconds, 2024-03-03 21:39:53


Q124718994
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719010'}}]
description already exists
Q124719010
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719029'}}]
description already exists
Q124719029
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719039'}}]
description already exists
Q124719039
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719055'}}]
description already exists
Q124719055
claim already existent
claim already existent
cla

Sleeping for 3.8 seconds, 2024-03-03 21:40:08


Q124719179
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719179'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:40:13


Q124719179
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719198'}}]
description already exists
Q124719198
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719214'}}]
description already exists
Q124719214
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719238'}}]
description already exists
Q124719238
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719267'}}]
description already exists
Q124719267
claim already existent
claim already existent
cla

Sleeping for 3.3 seconds, 2024-03-03 21:40:29


Q124719457
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719457'}}]


Sleeping for 9.3 seconds, 2024-03-03 21:40:33


Q124719457
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719494'}}]
description already exists
Q124719494
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719516'}}]


Sleeping for 8.4 seconds, 2024-03-03 21:40:44


Q124719516
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719529'}}]
description already exists
Q124719529
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719516'}}]


Sleeping for 8.7 seconds, 2024-03-03 21:40:53


Q124719516
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719552'}}]
description already exists
Q124719552
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719580'}}]


Sleeping for 8.8 seconds, 2024-03-03 21:41:03


Q124719580
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719580'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:41:13


Q124719580
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719609'}}]
description already exists
Q124719609
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719631'}}]
description already exists
Q124719631
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719653'}}]
description already exists
Q124719653
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719683'}}]
description already exists
Q124719683
claim already existent
claim already existent
cla

Sleeping for 7.0 seconds, 2024-03-03 21:41:25


Q124719770
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719770'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:41:33


Q124719770
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719791'}}]
description already exists
Q124719791
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719804'}}]
description already exists
Q124719804
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719850'}}]
description already exists
Q124719850
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719868'}}]
description already exists
Q124719868
claim already existent
claim already existent
cla

Sleeping for 9.3 seconds, 2024-03-03 21:41:53


Q124719994
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719994'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:42:03


Q124719994
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719994'}}]


Sleeping for 9.1 seconds, 2024-03-03 21:42:13


Q124719994
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124719994'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:42:23


Q124719994
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124720020'}}]
description already exists
Q124720020
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124720034'}}]
description already exists
Q124720034
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124720180'}}]
description already exists
Q124720180
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124720430'}}]
description already exists
Q124720430
claim already existent
claim already existent
cla

Sleeping for 7.3 seconds, 2024-03-03 21:42:35


Q124720636
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124720636'}}]


Sleeping for 9.3 seconds, 2024-03-03 21:42:43


Q124720636
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124720882'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:42:53


Q124720882
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124720882'}}]


Sleeping for 9.2 seconds, 2024-03-03 21:43:03


Q124720882
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124721070'}}]
description already exists
Q124721070
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124721126'}}]
description already exists
Q124721126
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124721166'}}]
description already exists
Q124721166
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124721190'}}]
description already exists
Q124721190
claim already existent
claim already existent
cla

Sleeping for 6.9 seconds, 2024-03-03 21:43:15


Q124721228
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124721228'}}]


Sleeping for 9.3 seconds, 2024-03-03 21:43:23


Q124721228
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124721238'}}]
description already exists
Q124721238
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124721250'}}]
description already exists
Q124721250
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124721270'}}]
description already exists
Q124721270
claim already existent
claim already existent
claim already existent
claim already existent
claim already existent
[{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q124721299'}}]
description already exists
Q124721299
claim already existent
claim already existent
cla

In [38]:
# after the proteins are added every gene can get the link to the protein in encoeds

def add_encodes_to_gene(df_row: pd.Series): 
    gwa = GeneWikidataAdder(df_row, False)
    gwa.encodes_protein(df_row["wd_gene_id"])

In [39]:
df.apply(add_encodes_to_gene, axis=1)

claim already existent
claim already existent
claim already existent
claim already existent
claim already existent


Sleeping for 3.1 seconds, 2024-03-03 21:43:39
Sleeping for 8.0 seconds, 2024-03-03 21:43:44
Sleeping for 9.3 seconds, 2024-03-03 21:43:53
Sleeping for 9.3 seconds, 2024-03-03 21:44:03
Sleeping for 9.3 seconds, 2024-03-03 21:44:13
Sleeping for 9.3 seconds, 2024-03-03 21:44:23
Sleeping for 9.3 seconds, 2024-03-03 21:44:33
Sleeping for 9.2 seconds, 2024-03-03 21:44:43
Sleeping for 9.2 seconds, 2024-03-03 21:44:53
Sleeping for 9.4 seconds, 2024-03-03 21:45:03
Sleeping for 9.4 seconds, 2024-03-03 21:45:13
Sleeping for 9.3 seconds, 2024-03-03 21:45:23
Sleeping for 9.2 seconds, 2024-03-03 21:45:33
Sleeping for 9.3 seconds, 2024-03-03 21:45:43
Sleeping for 9.4 seconds, 2024-03-03 21:45:53
Sleeping for 9.2 seconds, 2024-03-03 21:46:03
Sleeping for 9.2 seconds, 2024-03-03 21:46:13
Sleeping for 9.3 seconds, 2024-03-03 21:46:23
Sleeping for 9.3 seconds, 2024-03-03 21:46:33
Sleeping for 9.3 seconds, 2024-03-03 21:46:43
Sleeping for 9.3 seconds, 2024-03-03 21:46:53
Sleeping for 9.4 seconds, 2024-03-

0       None
2       None
4       None
20      None
21      None
        ... 
5562    None
5614    None
5626    None
5628    None
5630    None
Length: 312, dtype: object

## Discussion and Conclusion

In conclusion, it can be noted that data has been successfully queried from two databases (GenBank and RefSeq). However, due to incomplete data in some cases (references to higher taxons, the use of abbreviations leading to ambiguity in bacterial species, or discrepancies between GenBank and RefSeq in providing different names), the originally extensive dataset of approximately 10,000 entries has been reduced to a more manageable 300 entries that can be reliably implemented. Most of the found data is newly entered into wikidata, while some data instances were already present. The below shown dataframe displays the (newly) created wikidata ID's for the bacterial strain, gene and the encoded protein. For example the bacterial strain "Campylobacter jejuni subsp. jejuni 81-176" (Q21382941, already present in wikidata) has the gene "Campylobacter jejuni subsp. jejuni 81-176 pTet tet(O)" (Q124707459, newly entered into wikidata) which encodes the protein "tetracycline resistance ribosomal protection protein Tet(O)" (Q124720636, newly entered into wikidata).

Following the model of other bacteria, genes, and proteins (with the addition of the tag "response to antibiotic" here), these selected entries are implemented. However, unlike other genes already present in Wikidata, the "Entrez Gene ID" reference could not be found. Despite this missing link, the genes are implemented in Wikidata with the hope that users will contribute and add this information in the future.

In summary, it can be concluded that a small percentage of all antibiotic-resistant bacteria known to NCBI could potentially be implemented in Wikidata. The main obstacles to implementing further data are the incomplete or non-standardized entries in GenBank and RefSeq databases and the absence of expert knowledge.

In [40]:
df[["strain_wd_id", "wd_gene_id", "wd_protein_id"]].sample(10)

Unnamed: 0,strain_wd_id,wd_gene_id,wd_protein_id
5325,Q21382941,Q124707459,Q124720636
641,Q124691337,Q124705134,Q124711886
5214,Q124695818,Q124707428,Q124719919
5194,Q21398305,Q124707421,Q124719885
2940,Q124691944,Q124705925,Q124714396
4438,Q21382032,Q124707110,Q124719010
414,Q21102994,Q124705024,Q124707800
467,Q21398129,Q124705073,Q124707864
192,Q124691052,Q124704923,Q124707729
259,Q124691090,Q124704946,Q124707745


## Link to GitHub


https://github.com/gjmm07/DS_LOD_and_Knowledge_Graphs_2023_Finn_Heydemann

## Litarture

[1] Salam et al.: Antimicrobial Resistance: A Growing Serious Threat for Global Public Health, 2023

[2] National Center for Biotechnolog Information: National Database of Antibiotic Resistant Organisms (NDARO), URL: https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/, last accessed: 26.02.2024