In [1]:
import sys
import os 
sys.path.append('..')

from utils.clinfoAI import ClinfoAI

from config        import OPENAI_API_KEY, NCBI_API_KEY, EMAIL
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

# Using Clinfo.AI (In Depth Tutorial)

In this tutorial, we will see how to run clinfo.ai in as a module, in which all the steps have been abstracted. 

If you have done tutorial 01, you can skip the next steps, if not, you will need to follow them to get the credentials, KEYs needed to run clinfo.ai

### 1.- Setting up enviorment:
1.a.- Install the conda enviroment using the yml file provided.

``` conda env create -f environment.yaml ```

1.b.- Select your enviorment to run notebook. I recommend using vscode: 



### 2.- Creating Accounts

You will need at least one account and at most two (depending on how many calls/hour you plan to do):
* OPENAI account: If you start a free account for the first time, you will get $5 in API credits.
* NCBI_API_KEY: This is only necessary if you plan to make more than 10 calls per hour.


Once you have created both accounts  go to **src\config.py** file and: 

* Set OPENAI_API_KEY to your openAI API key

If you created an NCBI API account add your key and email in the following values: 
* NCBI_API_KEY 
* EMAIL 
Otherwise leave them as None





### 3.- Defining your own promts:
We have designed prompts for each step of Clinfo.ai Workflow, leaveriging the power of in-contex-learning. If you want to us your own promps you can edit them **src\prompts**


In [2]:
### Step 1 : Ask a question ###
question    = "What is the prevalence of COVID-19 in the United States?"
clinfo = ClinfoAI(openai_key=OPENAI_API_KEY, email= EMAIL,engine="SemanticScholar")
answer = clinfo.forward(question=question)         # Pipepline


Task Name: pubmed_query_prompt
------------------------------------------------------------------------
Loading prompt: system  from file task_1_sys.json
Loading prompt: template  from file task_1_prompt.json

Task Name: relevance_prompt
------------------------------------------------------------------------
Loading prompt: system  from file task_2_sys.json
Loading prompt: template  from file task_2_prompt.json

Task Name: summarization_prompt
------------------------------------------------------------------------
Loading prompt: system  from file task_3_sys.json
Loading prompt: template  from file task_3_prompt.json

Task Name: synthesize_prompt
------------------------------------------------------------------------
Loading prompt: system  from file task_4_sys.json
Loading prompt: template  from file task_4_prompt.json
{'$schema': {'pubmed_query_prompt': {'system': PromptTemplate(input_variables=[], output_parser=None, partial_variables={}, template='Forget any previous insttructi

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: Rate limit reached for gpt-4 in organization org-SfN25uqrKlgiiS9AAszyjT1F on tokens per min (TPM): Limit 10000, Used 7239, Requested 3365. Please try again in 3.624s. Visit https://platform.openai.com/account/rate-limits to learn more..


user_prompt input_variables=['question', 'article_summaries_str'] output_parser=None partial_variables={} template='Below is a list of article summaries, and their citations. Using ONLY the articles provided and no other articles, synthesize the information into a single paragraph summary. Cite the articles in-line appropriately and provide a list of articles cited at the end. Focus the summary on findings from studies with the strongest level of evidence (large sample size, strong study design, low risk of bias, etc).Using this summary, provide a one-line TL;DR answer to the following question, hedging appropriately given the strength of the evidence:\n\nQuestion: "{question}"\n\nArticle summaries:\n"""{article_summaries_str}"""\n\nDesired format:\nLiterature Summary: <summary_of_evidence>\n\nTL;DR: <answer_to_question>\n\nReferences:\n1. <citation_1>\n2. <citation_2>\n3. <citation_3>\n...\n' template_format='f-string' validate_template=True


Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised RateLimitError: Rate limit reached for gpt-4 in organization org-SfN25uqrKlgiiS9AAszyjT1F on tokens per min (TPM): Limit 10000, Used 7046, Requested 3365. Please try again in 2.466s. Visit https://platform.openai.com/account/rate-limits to learn more..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for gpt-4 in organization org-SfN25uqrKlgiiS9AAszyjT1F on tokens per min (TPM): Limit 10000, Used 6691, Requested 3365. Please try again in 336ms. Visit https://platform.openai.com/account/rate-limits to learn more..


=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#
Literature Summary: The prevalence of COVID-19 in the United States varies across studies, populations, and time periods. A study by Chiu and Ndeffo-Mbah (2021) estimated a nationwide prevalence of 1.4% and a seroprevalence of 13.2% as of December 31, 2020, using a Bayesian semi-empirical modeling framework[3]. A study by Jones et al. (2023) found that 96.4% of persons aged ≥16 years had SARS-CoV-2 antibodies by the third quarter of 2022[4]. The prevalence of COVID-19 was also found to be influenced by socioeconomic factors, with more disadvantaged neighborhoods having higher prevalence rates[2], and rural counties showing an increase in prevalence rates over time[5]. Prevalence rates among specific populations, such as dental hygienists, were reported to be low[1]. The use of at-home COVID-19 tests also increased significantly over time, indicating a possible rise in prevalence[6]. A study by Benatia et al. (2020) estimated a median population 

In [3]:
# The answer dictionary contains all the outputs from each step of clinfo.ai (as explained in tutorial 01),
print(answer.keys())

dict_keys(['synthesis', 'article_summaries', 'irrelevant_articles', 'queries'])


In [4]:
print(answer["synthesis"])

Literature Summary: The prevalence of COVID-19 in the United States varies across studies, populations, and time periods. A study by Chiu and Ndeffo-Mbah (2021) estimated a nationwide prevalence of 1.4% and a seroprevalence of 13.2% as of December 31, 2020, using a Bayesian semi-empirical modeling framework[3]. A study by Jones et al. (2023) found that 96.4% of persons aged ≥16 years had SARS-CoV-2 antibodies by the third quarter of 2022[4]. The prevalence of COVID-19 was also found to be influenced by socioeconomic factors, with more disadvantaged neighborhoods having higher prevalence rates[2], and rural counties showing an increase in prevalence rates over time[5]. Prevalence rates among specific populations, such as dental hygienists, were reported to be low[1]. The use of at-home COVID-19 tests also increased significantly over time, indicating a possible rise in prevalence[6]. A study by Benatia et al. (2020) estimated a median population infection rate of 0.9% between March 31 a