<a href="https://colab.research.google.com/github/nafis-momeni/BioRED_LLM/blob/main/Biored_gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# from google.colab import drive
# drive.mount('/gdrive')

In [2]:
!pip install -U -q google-generativeai

In [3]:
# Install the client library and import necessary modules.
import google.generativeai as genai
import time
import json
import mimetypes
import pathlib
import pprint
import requests
import random

import IPython.display
from IPython.display import Markdown

In [4]:
with open('/content/new_test.json') as file:
    new_test = json.load(file)
with open('/content/new_dev.json') as file:
    new_dev = json.load(file)
with open('/content/new_train.json') as file:
    new_train = json.load(file)

In [5]:
from google.colab import userdata

API_KEY=userdata.get('GOOGLE_API_KEY')

In [6]:
# Configure the client library by providing your API key.
genai.configure(api_key=API_KEY)

In [7]:
# Set up the model
generation_config = {
  "temperature": 0,
  "top_p": 1,
  "top_k": 1,
  "max_output_tokens": 2048,
}


safety_settings = [
  {
    "category": "HARM_CATEGORY_HARASSMENT",
    "threshold": "BLOCK_NONE"
  },
  {
    "category": "HARM_CATEGORY_HATE_SPEECH",
    "threshold": "BLOCK_NONE"
  },
  {
    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
    "threshold": "BLOCK_NONE"
  },
  {
    "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
    "threshold": "BLOCK_NONE"
  }
]

model = genai.GenerativeModel(model_name="gemini-pro",
                              generation_config=generation_config,
                              safety_settings=safety_settings)



In [8]:
# @title prompt one
init_prompt = '''
You are an AI assistant specializing in extracting biomedical relations from scientific articles. Your task is to identify relevant relations between biomedical entities in the given text and output them in a structured format.

Input Format:
1. Title of the article
2. Abstract text
3. List of entities in the format: [Entity Names], Entity ID, Entity Type

Entity Types:
- GeneOrGeneProduct: Genes, proteins, mRNA, and other gene products (NCBI Gene ID)
- ChemicalEntity: Chemicals and drugs (MeSH ID)
- DiseaseOrPhenotypicFeature: Diseases, symptoms, and phenotypes (MeSH/OMIM ID)
- SequenceVariant: Genomic and protein variants (dbSNP ID or component representation)
- OrganismTaxon: Species names (NCBI Taxonomy ID)
- CellLine: Cell line names (Cellosaurus ID)

Relation Types:
- Association: Relation between two entities where the association is unclear
- Comparison: Comparison of effects or properties of two chemicals/drugs
- Conversion: Transformation of one chemical into another
- Cotreatment: Use of two or more chemicals/drugs as combination therapy
- Negative_Correlation: Inverse or opposing effect between two entities
- Positive_Correlation: Direct or reinforcing effect between two entities
- Bind: Physical interaction or binding between two entities
- Drug_Interaction: Pharmacological interaction between two co-administered drugs

Novelty:
- Novel: Relation related to the main point or novelty of the abstract
- No: Relation providing background information or context

Output Format:
Relation Type, Entity 1 ID, Entity 2 ID, Novelty

Guidelines:
- Extract all relevant relations between the given entities from the abstract text
- Ensure the output follows the specified format exactly
- Each entiy pairs has only one relation. there should not be two relations with the same entities in results.

'''

In [None]:
# @title comment
ex1 = '''
Example Input:
The differential effects of bupivacaine and lidocaine on prostaglandin E2 release, cyclooxygenase gene expression and pain in a clinical pain model.
BACKGROUND: In addition to blocking nociceptive input from surgical sites, long-acting local anesthetics might directly modulate inflammation. In the present study, we describe the proinflammatory effects of bupivacaine on local prostaglandin E2 (PGE2) production and cyclooxygenase (COX) gene expression that increases postoperative pain in human subjects. METHODS: Subjects (n = 114) undergoing extraction of impacted third molars received either 2% lidocaine or 0.5% bupivacaine before surgery and either rofecoxib 50 mg or placebo orally 90 min before surgery and for the following 48 h. Oral mucosal biopsies were taken before surgery and 48 h after surgery. After extraction, a microdialysis probe was placed at the surgical site for PGE2 and thromboxane B2 (TXB2) measurements. RESULTS: The bupivacaine/rofecoxib group reported significantly less pain, as assessed by a visual analog scale, compared with the other three treatment groups over the first 4 h. However, the bupivacaine/placebo group reported significantly more pain at 24 h and PGE2 levels during the first 4 h were significantly higher than the other three treatment groups. Moreover, bupivacaine significantly increased COX-2 gene expression at 48 h as compared with the lidocaine/placebo group. Thromboxane levels were not significantly affected by any of the treatments, indicating that the effects seen were attributable to inhibition of COX-2, but not COX-1. CONCLUSIONS: These results suggest that bupivacaine stimulates COX-2 gene expression after tissue injury, which is associated with higher PGE2 production and pain after the local anesthetic effect dissipates.
[bupivacaine],D002045,ChemicalEntity
[lidocaine],D008012,ChemicalEntity
[prostaglandin E2/ PGE2],5732,GeneOrGeneProduct
[cyclooxygenase/ COX],4512,4513,GeneOrGeneProduct
[pain],D010146,DiseaseOrPhenotypicFeature
[inflammation],D007249,DiseaseOrPhenotypicFeature
[postoperative pain],D010149,DiseaseOrPhenotypicFeature
[human],9606,OrganismTaxon
[rofecoxib],C116926,ChemicalEntity
[thromboxane B2/ TXB2],D013929,ChemicalEntity
[COX-2],4513,GeneOrGeneProduct
[Thromboxane],D013931,ChemicalEntity
[COX-1],4512,GeneOrGeneProduct
[tissue injury],D017695,DiseaseOrPhenotypicFeature

Example Output:
Association,4512,D010149,Novel
Negative_Correlation,D013931,4513,Novel
Association,4513,D010149,Novel
Positive_Correlation,4513,5732,Novel
Association,4513,D010146,Novel
Positive_Correlation,5732,D010149,Novel
Negative_Correlation,C116926,D010146,Novel
Positive_Correlation,D002045,D010149,Novel
Association,D002045,4512,Novel
Association,D002045,D017695,Novel
Positive_Correlation,D002045,4513,Novel
Positive_Correlation,D002045,5732,Novel
Positive_Correlation,D002045,D010146,Novel
Cotreatment,D002045,C116926,Novel
Comparison,D002045,D008012,Novel

'''

In [None]:
# @title prompt two
init_prompt_pair = '''You are an AI assistant specializing in extracting biomedical relations from scientific articles. Your task is to identify relevant relations between biomedical entities in the given text and output them in a structured format.

Input Format:
1. Title of the article
2. Abstract text
3. List of entities in the format: [Entity Names], Entity ID, Entity Type
4. list of entity pairs with relation in article: Entity 1 ID, Entity 2 ID

Entity Types:
- GeneOrGeneProduct: Genes, proteins, mRNA, and other gene products (NCBI Gene ID)
- ChemicalEntity: Chemicals and drugs (MeSH ID)
- DiseaseOrPhenotypicFeature: Diseases, symptoms, and phenotypes (MeSH/OMIM ID)
- SequenceVariant: Genomic and protein variants (dbSNP ID or component representation)
- OrganismTaxon: Species names (NCBI Taxonomy ID)
- CellLine: Cell line names (Cellosaurus ID)

Relation Types:
- Association: Relation between two entities where the association is unclear
- Comparison: Comparison of effects or properties of two chemicals/drugs
- Conversion: Transformation of one chemical into another
- Cotreatment: Use of two or more chemicals/drugs as combination therapy
- Negative_Correlation: Inverse or opposing effect between two entities
- Positive_Correlation: Direct or reinforcing effect between two entities
- Bind: Physical interaction or binding between two entities
- Drug_Interaction: Pharmacological interaction between two co-administered drugs

Novelty:
- Novel: Relation related to the main point or novelty of the abstract
- No: Relation providing background information or context

Output Format:
Relation Type, Entity 1 ID, Entity 2 ID, Novelty'''

In [None]:
# @title pair
from collections import defaultdict

pairs_data = defaultdict(list)
dup =0
with open('' , 'r') as file:
  content = file.read()
  results = content.split('\n\n')
  for result in results:
    lines = result.strip().split('\n')
    pmid = lines[0].split(':')[1].strip(' ')
    for line in lines[1:]:
      parts = line.split(',')
      relation = tuple(parts)
      if relation not in pairs_data[pmid]:
        pairs_data[pmid].append(relation)
      else:
        # print("duplication: "+ line + '\n')
        dup+=1
print(dup)

In [9]:
# @title Utils


def add_exaples_prompt(init_prompt):
  exs = create_examples(5, new_train, False)
  sec_prompt = init_prompt
  sec_prompt += "examples of inputs and outputs:" + '\n'
  for e in exs:
    sec_prompt += e +'\n'
  return sec_prompt


def prompt_model(i):
  global Tokens
  global RDP

  pmid = new_test[str(i)]["pmid"]
  print(i, "." , pmid)
  prompt = add_exaples_prompt(init_prompt)
  prompt += "produce similar output for this article:" + '\n'
  prompt += inputs[i]
  # prompt += "identify the entity pairs with relations in this article:" + '\n'
  # prompt += create_input_pair(new_test[str(i)])
  try:
    response = model.generate_content(prompt)
    Tokens += len(prompt.split())
    RDP+=1
  except:
    print("no response try again")
    response = model.generate_content(prompt)
    Tokens += len(prompt.split())
    RDP+=1
  return response.text

def correct_format(result):
  try:
    first_par = result.split('\n')[0].split(',')[0]
    if first_par in types:
      return True
    else:
      return False
  except:
    return False

def create_inputs_list():
  inputs = []
  for i in range(len(new_test)):
    example = ""
    doc = new_test[str(i)]
    ents = ""
    for e in doc["entities"]:
      ents += '[' + "/ ".join(e["names"]) + ']' + "," + e["id"]+ ","  + e["type"] + "\n"

    # example += "pmid:" + doc["pmid"] +"\n"
    example += doc["title"] +"\n"
    example += doc["article"] +"\n"
    example +=  ents +"\n"
    i += 1
    inputs.append(example)
  return inputs

def create_input_pair(doc):
  pmid = doc["pmid"]
  input = ""
  ents = ""
  for e in doc["entities"]:
    ents += '[' + "/ ".join(e["names"]) + ']' + "," + e["id"]+ ","  + e["type"] + "\n"
  pairs = ""
  for p in pairs_data[pmid]:
    pairs+= p + '\n'
  # example += "pmid:" + doc["pmid"] +"\n"
  input += doc["title"] +"\n"
  input += doc["article"] +"\n"
  input +=  ents +"\n\n"
  input + "entity pairs:\n"
  input += pairs
  return input


In [None]:
# @title example pair
def create_examples_pairs(shots, filename, rand):
  set_5 = [399,185,319, 339,196]
  set_10 = set_5 + [263,288, 68, 359, 78]
  set_15 = set_10 + [22, 368, 367, 373, 51]
  zero_docs = [128,169,205,315,323, 363]
  if rand:
    docs = rand_set(shots, filename,zero_docs)
  else:
    match shots:
      case 5:
        docs = set_5
      case 10:
        docs = set_10
      case 20:
        excluded_values = set_15 + zero_docs
        docs = set_15 + rand_set(5, filename,excluded_values)

  examples=[]

  for i in docs:
    example = ""
    doc = filename[str(i)]
    ents = ""
    for e in doc["entities"]:
      ents += '[' + "/ ".join(e["names"]) + ']' + "," + e["id"]+ ","  + e["type"] + "\n"
    pairs = ""
    for r in doc["relation"]:
      pairs +=  r["infons"]["entity1"]+ "," + r["infons"]["entity2"] + "\n"
    rels = ""
    for r in doc["relation"]:
      rels += r["infons"]["type"]+ "," + r["infons"]["entity1"]+ "," + r["infons"]["entity2"]+ "," + r["infons"]["novel"] + "\n"
      # rels += entity_id_to_name(doc, r)

    # example += "pmid:" + doc["pmid"] +"\n"
    example += doc["title"] +"\n"
    example += doc["article"] +"\n"
    example +=  ents +"\n\n"
    example += pairs + '\n\n'
    example += rels +"\n\n"

    examples.append(example)
  return examples

In [17]:
types = ['Association', 'Comparison', 'Conversion', 'Cotreatment', 'Negative_Correlation', 'Positive_Correlation', 'Bind', 'Drug_Interaction']
doc , RDP, Tokens =0,0,0
inputs = create_inputs_list()

def rand_set(shots, file_name, excluded_values):
  all_numbers = set(range(0, len(file_name)-1))
  valid_numbers = all_numbers - set(excluded_values)
  docs = random.sample(list(valid_numbers), shots)
  return docs

def create_examples(shots, filename, rand):
  # set_5 = [399,185,319, 339,196]
  # set_10 = set_5 + [263,288, 68, 359, 78]
  # set_15 = set_10 + [22, 368, 367, 373, 51]
  zero_docs = [128,169,205,315,323]
  # if rand:
  #   docs = rand_set(shots, filename,zero_docs)
  # else:
  #   match shots:
  #     case 5:
  #       docs = set_5
  #     case 10:
  #       docs = set_10
  #     case 20:
  #       excluded_values = set_15 + zero_docs
  #       docs = set_15 + rand_set(5, filename,excluded_values)

  # docs = [22, 399, 263, 359, 196]
  # docs = [355, 185, 373, 288, 263, 339, 196, 363, 359, 399]
  # docs = [196, 363, 399, 56, 95] # comb 38
  # docs = [279, 46, 358, 196, 201, 76, 320, 225, 56, 95]

  docs= [355, 185, 373, 288, 263, 339, 196, 363, 359, 399]

  examples=[]

  for i in docs:
    example = ""
    doc = filename[str(i)]
    ents = ""
    for e in doc["entities"]:
      ents += '[' + "/ ".join(e["names"]) + ']' + "," + e["id"]+ ","  + e["type"] + "\n"
    rels = ""
    for r in doc["relation"]:
      rels += r["infons"]["type"]+ "," + r["infons"]["entity1"]+ "," + r["infons"]["entity2"]+ "," + r["infons"]["novel"] + "\n"
      # rels += entity_id_to_name(doc, r)

    # example += "pmid:" + doc["pmid"] +"\n"
    example += doc["title"] +"\n"
    example += doc["article"] +"\n"
    example += "\n" + ents +"\n"
    example += rels +"\n\n"

    examples.append(example)
  return examples

In [None]:
# @title sys
sys_content1 = '''You are an AI assistant specializing in extracting biomedical relations from scientific articles. Your task is to identify the biomedical entity pairs with relations in the given text and output them in a structured format.

Input Format:
1. Title of the article
2. Abstract text
3. List of entities in the format: [Entity Names], Entity ID, Entity Type

Entity Types:
- Gene: Genes, proteins, mRNA, and other gene products (NCBI Gene ID)
- Chemical: Chemicals and drugs (MeSH ID)
- Disease: Diseases, symptoms, and phenotypes (MeSH/OMIM ID)
- Variant: Genomic and protein variants (dbSNP ID or component representation)
- Species: Species names (NCBI Taxonomy ID)
- CellLine: Cell line names (Cellosaurus ID)

Output Format:
Entity 1 ID, Entity 2 ID
'''
sys_content = '''You are an AI assistant specializing in extracting biomedical relations from scientific articles. Your task is to identify relevant relations between biomedical entities in the given text and output them in a structured format.

Input Format:
1. Title of the article
2. Abstract text
3. List of entities in the format: [Entity Names], Entity ID, Entity Type

Entity Types:
- Gene: Genes, proteins, mRNA, and other gene products (NCBI Gene ID)
- Chemical: Chemicals and drugs (MeSH ID)
- Disease: Diseases, symptoms, and phenotypes (MeSH/OMIM ID)
- Variant: Genomic and protein variants (dbSNP ID or component representation)
- Species: Species names (NCBI Taxonomy ID)
- CellLine: Cell line names (Cellosaurus ID)

Relation Types:
- Association: Relation between two entities where the association is unclear
- Comparison: Comparison of effects or properties of two chemicals/drugs
- Conversion: Transformation of one chemical into another
- Cotreatment: Use of two or more chemicals/drugs as combination therapy
- Negative_Correlation: Inverse or opposing effect between two entities
- Positive_Correlation: Direct or reinforcing effect between two entities
- Bind: Physical interaction or binding between two entities
- Drug_Interaction: Pharmacological interaction between two co-administered drugs

Novelty:
- Novel: Relation related to the main point or novelty of the abstract
- No: Relation providing background information or context

Output Format:
Relation Type, Entity 1 ID, Entity 2 ID, Novelty

Guidelines:
- Extract all relevant relations between the given entities from the abstract text
- Ensure the output follows the specified format exactly
- Each entiy pairs has only one relation. there should not be two relations with the same entities in results.

'''

In [None]:
'''
Free of charge
Rate Limits*

15 RPM (requests per minute)
32,000 TPM (tokens per minute)
1500 RPD (requests per day)
'''



0 86 89 0


In [18]:
print( doc , RDP, Tokens)


0 0 0


In [16]:
c = 0
doc , RDP, Tokens =0,0,0

In [12]:
# inputs = create_inputs_list()
doc =98

In [19]:
# blocked_docs= ["27959387", "19484664", "17935240"]
blocked_docs = []

In [22]:
%time
while (doc < 100 and RDP < 1500):
  c=0
  Tokens = 0

  while( c <15 and Tokens < 8000  and doc < 100):
    outputs = ""
    pmid = new_test[str(doc)]["pmid"]
    if pmid not in blocked_docs:
      result = prompt_model(doc)
      c +=1
      if correct_format(result):
        outputs += "pmid:" + pmid+ "\n" + result + "\n\n"
        doc +=1
        with open("/content/10_shots_diverse_samples2.txt", "a") as f:
          f.write(outputs)
      else:
        print(doc, "." , pmid, "incorrect format")


  print("last doc: ", doc)
  print("tokens:", Tokens)

  time.sleep(60)


  # time.sleep(6)

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 8.34 µs
43 . 28846666
44 . 15459975
45 . 16330669
last doc:  46
tokens: 11245
46 . 19208385
47 . 19681452
48 . 20709368
last doc:  49
tokens: 10954
49 . 21879313
50 . 27798239
51 . 27993978
last doc:  52
tokens: 11134
52 . 15111599
53 . 15820770
54 . 16186368
last doc:  55
tokens: 11203
55 . 17511042
56 . 26110643
57 . 18179903
last doc:  58
tokens: 11120
58 . 19353688
59 . 21126715
60 . 28068934
last doc:  61
tokens: 11200
61 . 16252083
62 . 16820346
63 . 17003923
last doc:  64
tokens: 11217
64 . 24671324
65 . 16116131
66 . 19484664
last doc:  67
tokens: 10951
67 . 26449539
68 . 28481876
69 . 16731636
last doc:  70
tokens: 11269
70 . 16849419
71 . 19789368
72 . 24442316
last doc:  73
tokens: 11413
73 . 28684635
74 . 21750150
75 . 16629641
last doc:  76
tokens: 11108
76 . 17166870
77 . 17491223
78 . 18507837
no response try again


ReadTimeout: HTTPConnectionPool(host='localhost', port=46623): Read timed out. (read timeout=600.0)

In [None]:
p2 = '''produce similar output for this article:
A novel splicing mutation in SLC12A3 associated with Gitelman syndrome and idiopathic intracranial hypertension.
We report a case of Gitelman syndrome (GS) in a dizygotic twin who presented at 12 years of age with growth delay, metabolic alkalosis, hypomagnesemia and hypokalemia with inappropriate kaliuresis, and idiopathic intracranial hypertension with bilateral papilledema (pseudotumor cerebri). The patient, her twin sister, and her mother also presented with cerebral cavernous malformations. Based on the early onset and normocalciuria, Bartter syndrome was diagnosed first. However, mutation analysis showed that the proband is a compound heterozygote for 2 mutations in SLC12A3: a substitution of serine by leucine at amino acid position 555 (p.Ser555Leu) and a novel guanine to cytosine transition at the 5' splice site of intron 22 (c.2633+1G>C), providing the molecular diagnosis of GS. These mutations were not detected in 200 normal chromosomes and cosegregated within the family. Analysis of complementary DNA showed that the heterozygous nucleotide change c.2633+1G>C caused the appearance of 2 RNA molecules, 1 normal transcript and 1 skipping the entire exon 22 (r.2521_2634del). Supplementation with potassium and magnesium improved clinical symptoms and resulted in catch-up growth, but vision remained impaired. Three similar associations of Bartter syndrome/GS with pseudotumor cerebri were found in the literature, suggesting that electrolyte abnormalities and secondary aldosteronism may have a role in idiopathic intracranial hypertension. This study provides further evidence for the phenotypical heterogeneity of GS and its association with severe manifestations in children. It also shows the independent segregation of familial cavernomatosis and GS.

[SLC12A3],6559,GeneOrGeneProduct
[Gitelman syndrome/ GS],D053579,DiseaseOrPhenotypicFeature
[idiopathic intracranial hypertension/ pseudotumor cerebri],D011559,DiseaseOrPhenotypicFeature
[growth delay],D006130,DiseaseOrPhenotypicFeature
[metabolic alkalosis],D000471,DiseaseOrPhenotypicFeature
[hypomagnesemia],C537153,DiseaseOrPhenotypicFeature
[hypokalemia],D007008,DiseaseOrPhenotypicFeature
[bilateral papilledema],D010211,DiseaseOrPhenotypicFeature
[patient],9606,OrganismTaxon
[cerebral cavernous malformations],D002543,DiseaseOrPhenotypicFeature
[Bartter syndrome/ secondary aldosteronism],D001477,DiseaseOrPhenotypicFeature
[serine by leucine at amino acid position 555/ p.Ser555Leu],rs148038173,SequenceVariant
[guanine to cytosine],c|SUB|G||C,SequenceVariant
[c.2633+1G>C],c|SUB|G|2633+1|C,SequenceVariant
[r.2521_2634del],r|DEL|2521_2634|,SequenceVariant
[potassium],D011188,ChemicalEntity
[magnesium],D008274,ChemicalEntity
[electrolyte abnormalities],D014883,DiseaseOrPhenotypicFeature
[familial cavernomatosis],D006392,DiseaseOrPhenotypicFeature
'''

p3 = '''
Cardioprotective effect of tincture of Crataegus on isoproterenol-induced myocardial infarction in rats.
Tincture of Crataegus (TCR), an alcoholic extract of the berries of hawthorn (Crataegus oxycantha), is used in herbal and homeopathic medicine. The present study was done to investigate the protective effect of TCR on experimentally induced myocardial infarction in rats. Pretreatment of TCR, at a dose of 0.5 mL/100 g bodyweight per day, orally for 30 days, prevented the increase in lipid peroxidation and activity of marker enzymes observed in isoproterenol-induced rats (85 mg kg(-1) s. c. for 2 days at an interval of 24 h). TCR prevented the isoproterenol-induced decrease in antioxidant enzymes in the heart and increased the rate of ADP-stimulated oxygen uptake and respiratory coupling ratio. TCR protected against pathological changes induced by isoproterenol in rat heart. The results show that pretreatment with TCR may be useful in preventing the damage induced by isoproterenol in rat heart.

[tincture of Crataegus/ TCR/ alcoholic extract of the berries of hawthorn/ Crataegus oxycantha],C007145,ChemicalEntity
[isoproterenol],D007545,ChemicalEntity
[myocardial infarction],D009203,DiseaseOrPhenotypicFeature
[rats/ rat],10116,OrganismTaxon
[lipid],D008055,ChemicalEntity
[ADP],D000244,ChemicalEntity
[oxygen],D010100,ChemicalEntity
'''

In [None]:
response = model.generate_content(init_prompt + ex1 + p3)
print(response.text)

In [20]:
ddd= create_one_input(new_test, 38)
print(ddd)
#,185,319, 339,196

pmid:19207031
Growth hormone dose in growth hormone-deficient adults is not associated with IGF-1 gene polymorphisms.
AIMS: Several SNPs and a microsatellite cytosine-adenine repeat promoter polymorphism of the IGF-1 gene have been reported to be associated with circulating IGF-1 serum concentrations. Variance in IGF-1 concentrations due to genetic variations may affect different response to growth hormone (GH) treatment, resulting in different individually required GH-doses in GH-deficient patients. The aim of this study was to test if the IGF-1 gene polymorphisms are associated with the GH-dose of GH-deficient adults. MATERIALS & METHODS: A total of nine tagging SNPs, five additionally selected SNPs and a cytosine-adenine repeat polymorphism were determined in 133 German adult patients (66 men, 67 women; mean age 45.4 years +/- 13.1 standard deviation; majority Caucasian) with GH-deficiency (GHD) of different origin, derived from the prospective Pfizer International Metabolic Study (

In [13]:
def create_one_input(file_name,i):
  example = ""
  doc = file_name[str(i)]
  ents = ""
  for e in doc["entities"]:
    ents += '[' + "/ ".join(e["names"]) + ']' + "," + e["id"]+ ","  + e["type"] + "\n"
  rels = ""
  for r in doc["relation"]:
    rels += r["infons"]["type"]+ "," + r["infons"]["entity1"]+ "," + r["infons"]["entity2"]+ "," + r["infons"]["novel"] + "\n"


  example += "pmid:" + doc["pmid"] +"\n"
  example += doc["title"] +"\n"
  example += doc["article"] +"\n"
  example += "\n" + ents +"\n"
  example += rels +"\n\n"

  return example

In [None]:
ddd= create_pairs(new_train, 196)
print(ddd)

pmid:16506214
Genetic variation in the COX-2 gene and the association with prostate cancer risk.
COX-2 is a key enzyme in the conversion of arachidonic acid to prostaglandins. The prostaglandins produced by COX-2 are involved in inflammation and pain response in different tissues in the body. Accumulating evidence from epidemiologic studies, chemical carcinogen-induced rodent models and clinical trials indicate that COX-2 plays a role in human carcinogenesis and is overexpressed in prostate cancer tissue. We examined whether sequence variants in the COX-2 gene are associated with prostate cancer risk. We analyzed a large population-based case-control study, cancer prostate in Sweden (CAPS) consisting of 1,378 cases and 782 controls. We evaluated 16 single nucleotide polymorphisms (SNPs) spanning the entire COX-2 gene in 94 subjects of the control group. Five SNPs had a minor allele frequency of more than 5% in our study population and these were genotyped in all case patients and contr

In [None]:
def create_pairs(file_name,i):
  example = ""
  doc = file_name[str(i)]
  ents = ""
  for e in doc["entities"]:
    ents += '[' + "/ ".join(e["names"]) + ']' + "," + e["id"]+ ","  + e["type"] + "\n"
  rels = ""
  for r in doc["relation"]:
    rels +=   r["infons"]["entity1"]+ "," + r["infons"]["entity2"] + "\n"


  example += "pmid:" + doc["pmid"] +"\n"
  example += doc["title"] +"\n"
  example += doc["article"] +"\n"
  example += "\n" + ents +"\n"
  example += rels +"\n\n"

  return example


#convert