### Using MedCPT embedding to 

- "malteos/PubMedNCL" is likely used for single-text input tasks (e.g., classifying a single sentence or passage).

In [9]:
import warnings
import os
import torch
from transformers import AutoTokenizer, AutoModel, AutoModelForSequenceClassification
warnings.filterwarnings('ignore')
os.environ["TOKENIZERS_PARALLELISM"] = "false"
from sklearn.metrics.pairwise import cosine_similarity

### Case 1. Using the MedCPT Query Encoder

In [None]:
import torch
from transformers import AutoTokenizer, AutoModel

model = AutoModel.from_pretrained("ncbi/MedCPT-Query-Encoder")
tokenizer = AutoTokenizer.from_pretrained("ncbi/MedCPT-Query-Encoder")

queries = [
    "diabetes treatment", 
    "How to treat diabetes?", 
    "A 45-year-old man presents with increased thirst and frequent urination over the past 3 months.",
]

with torch.no_grad():
    # tokenize the queries
    encoded = tokenizer(
        queries, 
        truncation=True, 
        padding=True, 
        return_tensors='pt', 
        max_length=64,
    )
    
    # encode the queries (use the [CLS] last hidden states as the representations)
    embeds = model(**encoded).last_hidden_state[:, 0, :]

    print(embeds)
    print(embeds.size())


## Case 2. Using MedCPT-Article-Encoder

In [2]:
model = AutoModel.from_pretrained("ncbi/MedCPT-Article-Encoder")
tokenizer = AutoTokenizer.from_pretrained("ncbi/MedCPT-Article-Encoder")

In [3]:
with open('/work/acslab/users/svalipou/docs/13_exp/exp1.txt', 'r') as file:
    explanation_text = file.read().strip()

reference_files = [
    '/work/acslab/users/svalipou/docs/13/32493475.txt',
    '/work/acslab/users/svalipou/docs/13/32493478.txt',
    '/work/acslab/users/svalipou/docs/13/32616067.txt'
]
references = []

for ref_file in reference_files:
    with open(ref_file, 'r') as file:
        references.append(file.read().strip())

In [4]:
explanation = [[explanation_text, explanation_text]]
references = [[ref, ref] for ref in references]  

In [12]:
with torch.no_grad():
    explanation_encoded = tokenizer(explanation, truncation=True, padding=True, return_tensors='pt', max_length=512)
    explanation_embedding = model(**explanation_encoded).last_hidden_state[:, 0, :]

    references_encoded = tokenizer(references, truncation=True, padding=True, return_tensors='pt', max_length=512)
    reference_embeddings = model(**references_encoded).last_hidden_state[:, 0, :]

In [16]:
similarities = cosine_similarity(explanation_embedding, reference_embeddings)

most_similar_index = similarities.argmax()
most_similar_reference = references[most_similar_index]

print(f"Most similar reference: {most_similar_reference}")

Most similar reference: ["ChemoPROphyLaxIs with hydroxychloroquine For covId-19 infeCtious disease (PROLIFIC) to prevent covid-19 infection in frontline healthcare workers: A structured summary of a study protocol for a randomised controlled trial. PRIMARY OBJECTIVE: To determine whether chemoprophylaxis with hydroxychloroquine versus placebo increases time to contracting coronavirus disease 2019 (COVID-19) in frontline healthcare workers. 1) To determine whether chemoprophylaxis with daily versus weekly dosing of hydroxychloroquine increases time to contracting COVID-19 disease in frontline healthcare workers. 2) To compare the number of COVID-19 cases between each trial arm on the basis of positive tests (as per current clinical testing methods and/or serology) 3) To compare the percentage of COVID-19 positive individuals with current testing methods versus serologically-proven COVID-19 in each trial arm 4) To compare COVID-19 disease severity in each trial arm 5) To compare recovery

## Using MedCPT-Cross-Encoder

In [11]:
tokenizer = AutoTokenizer.from_pretrained("ncbi/MedCPT-Cross-Encoder")
model = AutoModelForSequenceClassification.from_pretrained("ncbi/MedCPT-Cross-Encoder")

In [18]:
query = "The indirect relationship between amiodarone and mefloquine, based on the provided scientific abstracts, can be understood through their interactions with the cytochrome P450 enzyme system, specifically CYP2C8, and the implications for drug resistance and metabolism in the context of malaria treatment.\n\n1. **Amiodarone and CYP2C8 Interaction**: Amiodarone is mentioned as a weak time-dependent inhibitor (TDI) of CYP2C8 across various enzyme preparations, including human liver microsomes (HLMs) and recombinant CYP2C8. This suggests that amiodarone can influence the metabolism of other drugs that are substrates of CYP2C8 by potentially reducing their clearance.\n\n2. **Mefloquine and Drug Resistance**: Mefloquine is a registered antimalarial drug that has been used in routine clinical practice. The genomic architecture of antimalarial drug resistance, including mutations in the Plasmodium falciparum chloroquine resistance transporter gene (pfcrt), affects the efficacy of antimalarial drugs. Although the abstracts do not directly link mefloquine resistance to CYP2C8, the broader context of drug metabolism and resistance mechanisms is relevant.\n\n3. **Indirect Relationship through CYP2C8**: The indirect relationship between amiodarone and mefloquine can be inferred from their involvement with CYP2C8. Amiodarone's inhibition of CYP2C8 could theoretically affect the metabolism of drugs metabolized by this enzyme, including antimalarials. However, the specific impact on mefloquine is not directly addressed in the abstracts. Additionally, the genetic polymorphisms in CYP2C8 among different populations could influence the pharmacokinetics and dynamics of both drugs, potentially affecting their efficacy and safety profiles.\n\n4. **Malaria Treatment and Drug Resistance**: The abstracts highlight the complexity of malaria treatment, including the challenges posed by drug resistance. While amiodarone is not an antimalarial drug, its interaction with CYP2C8 and potential to influence the metabolism of drugs like mefloquine indirectly connects it to the broader issue of antimalarial treatment efficacy. The presence of genetic mutations that confer resistance to antimalarials underscores the importance of understanding drug-drug interactions and metabolic pathways in the management of malaria.\n\nIn summary, the indirect relationship between amiodarone and mefloquine is situated within their interactions with the CYP2C8 enzyme and the broader context of malaria treatment and drug resistance. While the abstracts do not provide a direct link between the two drugs in terms of their pharmacological actions against malaria, they underscore the complexity of drug metabolism and the potential for interactions that could impact the efficacy of antimalarial therapies"

articles = [
    "A randomized multicenter clinical trial to evaluate the efficacy of melatonin in the prophylaxis of SARS-CoV-2 infection in high-risk contacts (MeCOVID Trial): A structured summary of a study protocol for a randomised controlled trial. Primary objective: to evaluate the efficacy of melatonin as a prophylactic treatment on prevention of symptomatic SARS-CoV-2 infection among healthcare workers at high risk of SARS-CoV-2 exposure. Secondary objectives: To evaluate the efficacy of melatonin as a prophylactic treatment on prevention of asymptomatic SARS-CoV-2 infection.To evaluate the efficacy of melatonin to prevent the development of severe COVID-19 in the participants enrolled in this study who develop SARS-CoV-2 infection along the trial.To evaluate the duration of COVID-19 symptoms in participants receiving melatonin before the infection.To evaluate seroconversion timing post-symptom onset. Exploratory objectives:To compare severity of COVID-19 between men and women.To evaluate the influence of sleep and diet on prevention from SARS-CoV-2 infection.To evaluate the effect of melatonin on the incidence and characteristics of lymphopenia and increase of inflammatory cytokines related to COVID-19. This is a two-arm parallel randomised double-blind controlled trial to evaluate the efficacy of melatonin versus placebo in the prophylaxis of coronavirus disease 2019 among healthcare workers. Inclusion Criteria: Male or female participants >= 18 and <= 80 years of age. Healthcare workers from the public and private Spanish hospital network at risk of SARS-CoV 2 infection.Not having a previous COVID19 diagnosis. Understanding the purpose of the trial and not having taken any pre-exposure prophylaxis (PrEP) including HIV PrEP from March 1st 2020 until study enrolment. Having a negative SARS-CoV 2 reverse-transcription PCR (RT-PCR) result or a negative serologic rapid test (IgM/IgG) result before randomization. Premenopausal women must have a negative urinary pregnancy test in the 7 days before starting the trial treatment. Premenopausal women and males with premenopausal couples must commit to using a high efficiency anticonceptive method. HIV infection. Active hepatitis B infection. Renal failure (CrCl < 60 mL/min/1.73 m2) or need for hemodialysis. Osteoporosis. Myasthenia gravis.Pre-existent maculopathy. Retinitis pigmentosa. Bradycardia (less than 50 bpm). Weight less than 40 Kg. Participant with any immunosuppressive condition or hematological disease. Treatment with drugs that may prolong QT in the last month before randomization for more than 7 days including: azithromycin, chlorpromazine, cisapride, clarithromycin, domperidone, droperidol, erythromycin, halofantrine, haloperidol, lumefantrine, mefloquine, methadone, pentamidine, procainamide, quinidine, quinine, sotalol, sparfloxacin, thioridazine, amiodarone. Hereditary intolerance to galactose, Lapp lactase deficiency or glucose or galactose malabsorption. Treatment with fluvoxamine. Treatment with benzodiazepines or benzodiazepine analogues such as zolpidem, zopiclone or zaleplon. Pregnancy. Breastfeeding. History of potentially immune derived diseases such as: lupus, Crohn's disease, ulcerative colitis, vasculitis or rheumatoid arthritis. Insulin-dependent diabetes mellitus.Known history of hypersensitivity to the study drug or any of its components. Patients that should not be included in the study at the judgment of the research team. Participants will be recruited from the following eight hospitals in Madrid, Spain: Hospital Universitario La Paz, Hospital Ramon y Cajal, Hospital Infanta Sofia, Hospital 12 de Octubre, Hospital Clinico San Carlos, Hospital Central de la defensa Gomez Ulla,Hospital de La Princesa and Hospital Infanta Leonor. Experimental: Melatonin (Circadin(r), Exeltis Healthcare, Spain): 2 mg of melatonin orally before bedtime for 12 weeks. Comparator: Identical looking placebo (Laboratorios Liconsa, Spain) orally before bedtime for 12 weeks. Number of SARS-CoV-2 (COVID-19) symptomatic infections confirmed by polymerase chain reaction (PCR) test or serologic test or according to each centre diagnosis protocol. Primary outcome will be measured until the end of treatment for each participant (until the date of the last dose taken by each patient). Patients who meet all inclusion and no exclusion criteria will be randomised, stratified by centres, sex and age (<50 and >= 50 years old). The randomisation sequence was created using SAS version 9.4 statistical software (procedure 'PROC PLAN') with a 1:1 allocation. No randomisation seed was specified. The randomisation seed was generated taking the hour of the computer where the program was executed. Randomization will be done centrally through the electronic system RedCAP(r) in order to conceal the sequence until interventions are assigned BLINDING (MASKING): Participants, caregivers, and those assessing the outcomes are blinded to group assignment. A total of 450 participants are planned to be enrolled in this clinical trial, 225 in the experimental arm and 225 in the placebo arm. Protocol version 3.0, 17th of April 2020. Recruitment ongoing. First participant was recruited on the 21st of April 2020. The final participant is anticipated to be recruited on the 31st of May 2020. As of May 18th, 2020, a total of 312 participants have been enrolled (154 at Hospital La Paz, 85 at Hospital Infanta Sofia and 73 at Hospital 12 de Octubre). EU Clinical Trials Register: 2020-001530-35; Date of trial registration: 13th of April 2020; https://www.clinicaltrialsregister.eu/ctr-search/trial/2020-001530-35/ES FULL PROTOCOL: The full protocol is attached as an additional file, accessible from the Trials website (Additional file 1). In the interest in expediting dissemination of this material, the familiar formatting has been eliminated; this Letter serves as a summary of the key elements of the full protocol.",
    "Efficacy of hydroxychloroquine for post-exposure prophylaxis to prevent severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection among adults exposed to coronavirus disease (COVID-19): a structured summary of a study protocol for a randomised controlled trial. Primary Objective  To test the efficacy of Hydroxychloroquine (HCQ) (400 mg orally daily for 3 days then 200 mg orally daily for an additional 11 days, to complete 14 days) to prevent incident SARS-CoV-2 infection, compared to ascorbic acid among contacts of persons with SARS-CoV-2 infection Secondary objectives  To determine the safety and tolerability of HCQ as SARS-CoV-2 Post-exposure Prophylaxis (PEP) in adults  To test the efficacy of HCQ (400 mg orally daily for 3 days then 200 mg orally daily for an additional 11 days, to complete 14 days) to prevent incident SARS-CoV-2 infection 2 weeks after completing therapy, compared to ascorbic acid among contacts of persons with SARS-CoV-2 infection  To test the efficacy of HCQ to shorten the duration of SARS-CoV-2 shedding among those with SARS-CoV-2 infection in the HCQ PEP group  To test the efficacy of HCQ to prevent incident COVID-19 TRIAL DESIGN: This is a randomized, multi-center, placebo-equivalent (ascorbic acid) controlled, blinded study of HCQ PEP for the prevention of SARS-CoV-2 infection in adults exposed to the virus. This study will enroll up to 2000 asymptomatic adults 18 to 80 years of age (inclusive) at baseline who are close contacts of persons with polymerase chain reaction (PCR)-confirmed SARS-CoV-2 or clinically suspected COVID-19 and a pending SARS-CoV-2 PCR test. This multisite trial will be conducted at seven sites in Seattle (UW), Los Angeles (UCLA), New Orleans (Tulane), Baltimore (UMB), New York City (NYU), Syracuse (SUNY-Upstate), and Boston (BMC). Inclusion criteria Participants are eligible to be included in the study only if all of the following criteria apply: 1.Men or women 18 to 80 years of age inclusive, at the time of signing the informed consent2. Willing and able to provide informed consent3.Had a close contact of a person (index) with known PCR-confirmed SARS-CoV-2 infection or index who is currently being assessed for COVID-19 Close contact is defined as: a. Household contact (i.e., residing with the index case in the 14 days prior to index diagnosis or prolonged exposure within a residence/vehicle/enclosed space without maintaining social distance)b. Medical staff, first responders, or other care persons who cared for the index case without personal protection (mask and gloves)4. Less than 4 days since last exposure (close contact with a person with SARS-CoV-2 infection) to the index case5. Access to device and internet for Telehealth visits6.Not planning to take HCQ in addition to the study medication Exclusion criteria Participants are excluded from the study if any of the following criteria apply: 1.Known hypersensitivity to HCQ or other 4-aminoquinoline compounds2. Currently hospitalized3. Symptomatic with subjective fever, cough, or shortness of breath4. Current medications exclude concomitant use of HCQ5. Concomitant use of other anti-malarial treatment or chemoprophylaxis, including chloroquine, mefloquine, artemether, or lumefantrine.6. History of retinopathy of any etiology7.Psoriasis8.Porphyria9.Known bone marrow disorders with significant neutropenia (polymorphonuclear leukocytes <1500) or thrombocytopenia (<100 K)10. Concomitant use of digoxin, cyclosporin, cimetidine, amiodarone, or tamoxifen11.Known moderate or severe liver disease12.Known long QT syndrome13. Severe renal impairment14.Use of any investigational or non-registered drug or vaccine within 30 days preceding the first dose of the study drugs or planned use during the study period INTERVENTION AND COMPARATOR: Households will be randomized 1:1 (at the level of household), with close contact participants receiving one of the following therapies: HCQ 400 mg orally daily for 3 days then 200 mg orally daily for an additional 11 days Placebo-like control (ascorbic acid) 500 mg orally daily for 3 days then 250 mg orally daily for 11 days MAIN OUTCOMES: The primary outcome of the study is the incidence of SARS-CoV-2 infection through day 14 among participants who are SARS-CoV-2 negative at baseline by randomization group. Participants will be randomized in a 1:1 ratio to HCQ or ascorbic acid at the level of the household (all eligible participants in 1 household will receive the same intervention). The randomization code and resulting allocation list will be generated and maintained by the Study Statistician. The list will be blocked and stratified by site and contact type (household versus healthcare worker). This is a blinded study. HCQ and ascorbic acid will appear similar, and taste will be partially masked as HCQ can be bitter and ascorbic acid will be sour. The participants will be blinded to their randomization group once assigned. Study team members, apart from the Study Pharmacist and the unblinded statistical staff, will be blinded. Laboratory staff are blinded to the group allocation. The sample size for the study is N=2 000 participants randomized 1:1 to either HCZ (n=1 000) and ascorbic acid (n=1 000). Protocol version: 1.2 05 April 2020 Recruitment is ongoing, started March 31 and anticipated end date is September 30, 2020. ClinicalTrials.gov, Protocol Registry Number: NCT04328961 Date of registration: April 1, 2020, retrospectively registered FULL PROTOCOL: The full protocol is attached as an additional file, accessible from the Trials website (Additional file 1). In the interest in expediting dissemination of this material, the familiar formatting has been eliminated; this Letter serves as a summary of the key elements of the full protocol. " ,  
    "ChemoPROphyLaxIs with hydroxychloroquine For covId-19 infeCtious disease (PROLIFIC) to prevent covid-19 infection in frontline healthcare workers: A structured summary of a study protocol for a randomised controlled trial. PRIMARY OBJECTIVE: To determine whether chemoprophylaxis with hydroxychloroquine versus placebo increases time to contracting coronavirus disease 2019 (COVID-19) in frontline healthcare workers. 1) To determine whether chemoprophylaxis with daily versus weekly dosing of hydroxychloroquine increases time to contracting COVID-19 disease in frontline healthcare workers. 2) To compare the number of COVID-19 cases between each trial arm on the basis of positive tests (as per current clinical testing methods and/or serology) 3) To compare the percentage of COVID-19 positive individuals with current testing methods versus serologically-proven COVID-19 in each trial arm 4) To compare COVID-19 disease severity in each trial arm 5) To compare recovery time from COVID-19 infection in each trial arm EXPLORATORY OBJECTIVES: 1) To determine compliance (as measured by trough pharmacokinetic hydroxychloroquine levels) on COVID-19 positive tests 2) To determine if genetic factors determine susceptibility to COVID-19 disease or response to treatment 3) To determine if blood group determines susceptibility to COVID-19 disease 4) To compare serum biomarkers of COVID-19 disease in each arm TRIAL DESIGN: Double-blind, multi-centre, 2-arm (3:3:2 ratio) randomised placebo-controlled trial PARTICIPANTS: National Health Service (NHS) workers who have direct patient contact delivering care to patients with COVID-19. Participants in the trial will be recruited from a number of NHS hospitals directly caring for patients with COVID-19. To be included in the trial the participant MUST: 1) Have given written informed consent to participate 2) Be aged 18 years to 70 years 3) Not previously have been diagnosed with COVID-19 4) Work in a high-risk secondary or tertiary healthcare setting (hospitals accepting COVID-19 patients) with direct patient-facing care EXCLUSION CRITERIA: The presence of any of the following will mean participants are ineligible: 1) Known COVID-19 positive test at baseline (if available) 2) Symptomatic for possible COVID-19 at baseline 3) Known hypersensitivity reaction to hydroxychloroquine, chloroquine or 4-aminoquinolines 4) Known retinal disease 5) Known porphyria 6) Known chronic kidney disease (CKD; eGFR<30ml/min) 7) Known epilepsy 8) Known heart failure or conduction problems 9) Known significant liver disease (Gilbert's syndrome is permitted) 10) Known glucose-6-phosphate dehydrogenase (G6PD) deficiency 11) Currently taking any of the following contraindicated medications: Digoxin, Chloroquine, Halofantrine, Amiodarone, Moxifloxacin, Cyclosporin, Mefloquine, Praziquantel, Ciprofloxacin, Clarithromycin, Prochlorperazine, Fluconazole 12) Currently taking hydroxychloroquine or having a clinical indication for taking hydroxychloroquine 13) Currently breastfeeding 14) Unable to be followed-up during the trial 15) Current or future involvement in the active treatment phase of other interventional research studies (excluding observational/non-interventional studies) before study follow-up visit 16) Not able to use or have access to a modern phone device/web-based technology 17) Any other clinical reason which may preclude entry in the opinion of the investigator INTERVENTION AND COMPARATOR: Interventions being evaluated are: A) Daily hydroxychloroquine or B) Weekly hydroxychloroquine or C) Placebo The maximum treatment period is approximately 13 weeks per participant. Hydroxychloroquine-identical matched placebo tablets will ensure that all participants are taking the same number and dosing regimen of tablets across the three trial arms. There is no variation in the dose of hydroxychloroquine by weight. The dosing regimen for the three arms of the study (A, B, C) are described in further detail below. Arm A: Active Hydroxychloroquine (- daily dosing and placebo-matched hydroxychloroquine - weekly dosing). Form: Tablets Route: Oral. Dose and Frequency: Active hydroxychloroquine: Days 1-2: Loading phase - 400mg (2 x 200mg tablets) taken twice a day for 2 days Days 3 onwards: Maintenance Phase - 200mg (1 x 200mg tablet) taken once daily, every day for 90 days (~3 months) Matched Placebo hydroxychloroquine: Days 3 onwards: Maintenance Phase - 2 tablets taken once a week on the same day each week (every 7th day) for 90 days (~3 months) Arm B: Active Hydroxychloroquine (- weekly dosing and placebo matched hydroxychloroquine - daily dosing.) Form: Tablets Route: Oral. Dose and Frequency: Active hydroxychloroquine: Days 1-2: Loading Phase - 400mg (2 x 200mg tablets) taken twice daily for 2 days Days 3 onwards: Maintenance Phase - 400mg (2 x 200mg tablets) taken once a week on the same day each week (every 7th day) for 90 days (~3 months) Matched Placebo hydroxychloroquine: Days 3 onwards: Maintenance Phase - 1 tablet taken once daily for 90 days (~3 months) Arm C: Matched placebo Hydroxychloroquine (- daily dosing and matched placebo hydroxychloroquine - weekly dosing.) Form: Table. Route: Oral. Frequency: Matched placebo hydroxychloroquine - daily dosing: Days 1-2: Loading Phase - 2 tablets taken twice daily for 2 days Days 3 onwards: Maintenance Phase - 1 tablet taken once daily for 90 days (~3 months) Matched placebo hydroxychloroquine - weekly dosing: Days 3 onwards: Maintenance Phase - 2 tablets taken once a week on the same day each week (every 7th day) for 90 days (~3 months) A schematic of the dosing schedule can be found in the full study protocol (Additional File 1). Time to diagnosis of positive COVID-19 disease (defined by record of date of symptoms onset and confirmed by laboratory test) RANDOMISATION: Participants will be randomised to either hydroxychloroquine dosed daily with weekly placebo, HCQ dosed weekly with daily placebo, or placebo dosed daily and weekly. Randomisation will be in a 3:3:2 ratio [hydroxychloroquine-(daily), hydroxychloroquine-(weekly), placebo], using stratified block randomisation. Random block sizes will be used, and stratification will be by study site. Participants and trial investigators consenting participants, delivering trial assessments and procedures will be blinded to intervention. A sufficient number of participants will be enrolled so that approximately 1000 participants in total will have data suitable for the primary statistical analysis. It is anticipated that approximately 1,200 participants will need to be enrolled in total, to allow for a 20% dropout over the period of the trial. This would result in approximately 450:450:300 participants randomised to hydroxychloroquine daily, hydroxychloroquine weekly+daily matched placebo or matched-placebo daily and weekly. V 1.0, 7th April 2020 EU Clinical Trials Register EudraCT Number: 2020-001331-26 Date of registration: 14th April 2020 Trial registered before first participant enrolment. Trial site is Cambridge University Hospitals NHS Foundation Trust. Recruitment started on 11th May 2020. It is anticipated that the trial will run for 12 months. The recruitment end date cannot yet be accurately predicted. The full protocol is attached as an additional file, accessible from the Trials website (Additional file 1). In the interest of expediting dissemination of this material, the familiar formatting has been eliminated; this Letter serves as a summary of the key elements of the full protocol. The study protocol has been reported in accordance with the Standard Protocol Items: Recommendations for Clinical Interventional Trials (SPIRIT) guidelines (Additional file 2)."
]

# combine query article into pairs
pairs = [[query, article] for article in articles]

with torch.no_grad():
    encoded = tokenizer(
        pairs,
        truncation=True,
        padding=True,
        return_tensors="pt",
        max_length=512,
    )

    logits = model(**encoded).logits.squeeze(dim=1)
    
    print(logits)

tensor([6.7278, 4.4231, 5.2699])


In [16]:
with open('/work/acslab/users/svalipou/docs/13_exp/exp1.txt', 'r') as file:
    explanation = file.read().strip()

with open('/work/acslab/users/svalipou/docs/13/32493475.txt', 'r') as file:
    reference = file.read().strip()
   
pair = (explanation, reference)

with torch.no_grad():
    encoded = tokenizer([pair], truncation=True, padding=True, return_tensors="pt", max_length=512)
    logits = model(**encoded).logits.squeeze(dim=1)

    print(f"Relevance score: {logits.item()}")


Relevance score: 5.999639987945557


## Using MedCPT-Cross-Encoder

In [8]:
tokenizer = AutoTokenizer.from_pretrained("ncbi/MedCPT-Cross-Encoder")
model = AutoModelForSequenceClassification.from_pretrained("ncbi/MedCPT-Cross-Encoder")

with open('/work/acslab/users/svalipou/docs/13_exp/exp1.txt', 'r') as file:
    explanation_text = file.read().strip()

reference_files = [
    '/work/acslab/users/svalipou/docs/13/32493475.txt',
    '/work/acslab/users/svalipou/docs/13/32493478.txt',
    '/work/acslab/users/svalipou/docs/13/32616067.txt'
]

for ref_file in reference_files:
    with open(ref_file, 'r') as file:
        reference = file.read().strip()
    
    pair = explanation + " " + reference
    
    with torch.no_grad():
        encoded = tokenizer(pair, truncation=True, padding=True, return_tensors="pt", max_length=512)
        logits = model(**encoded).logits.squeeze(dim=1)

    print(f"Relevance score for {ref_file}: {logits.item()}")

Relevance score for /work/acslab/users/svalipou/docs/13/32493475.txt: 11.007904052734375
Relevance score for /work/acslab/users/svalipou/docs/13/32493478.txt: 9.12633228302002
Relevance score for /work/acslab/users/svalipou/docs/13/32616067.txt: 9.501900672912598
