# Extracting information from paper

This notebook illustrates some examples of working with text data using small, local language models.

## Running this notebook on a newer MacBook with Apple Silicon Chip

You will need an environment with Python and Jupyter installed. To create an environment with Anaconda for Python 3.12, execute:

```
conda create --name llm-narrative python=3.12
conda activate llm-narrative
conda install jupyter
jupyter notebook
```

## Running this notebook on older MacBooks or any other machine

Please run this script on [Google Colab](https://colab.research.google.com/). After opening the notebook there, please change the settings to using a GPU, check [here](https://www.geeksforgeeks.org/how-to-use-gpu-in-google-colab/) for instructions on how to do that.



## Install required libraries

For the newer MacBooks with Apple Chips we will use `mlx-lm` to load a small, quantized version of the Llama 3 8b instruct model, so that it can run on a single laptop (https://ollama.com/library/llama3). For older MacBooks and other machines we will use a quantized version of the model provided by the hugging face community (https://huggingface.co/astronomer/Llama-3-8B-Instruct-GPTQ-4-Bit).

Depending on the machine, different packages are required and will be installed below.

In [2]:
import platform
import requests

# For the paper analyser
import torch
import transformers
import argparse
import logging
import json
import os
import accelerate



In [1]:
# for the newer MacBooks with the Apple Chip
# changed for testing but change back later
! pip install torch transformers optimum accelerate auto-gptq bitsandbytes



### Install Llama 3 - 8b
Next we install the quantized version of the Llama 8b language model.



In [3]:
if platform.processor() == 'arm':
    from mlx_lm import load, generate
    model, tokenizer = load("mlx-community/Meta-Llama-3-8B-Instruct-4bit")
else:
    from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
    import torch

    MODEL_ID="astronomer/Llama-3-8B-Instruct-GPTQ-4-Bit"
    tokenizer = AutoTokenizer.from_pretrained("astronomer/Llama-3-8B-Instruct-GPTQ-4-Bit")

    config = AutoConfig.from_pretrained(MODEL_ID)
    config.quantization_config["disable_exllama"] = False
    config.quantization_config["exllama_config"] = {"version":2}

    model = AutoModelForCausalLM.from_pretrained(
            MODEL_ID,
            device_map='auto',
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
            # low_cpu_mem_usage=True,
            # load_in_4bit=True,
            config=config,
        )

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using `disable_exllama` is deprecated and will be removed in version 4.37. Use `use_exllama` instead and specify the version with `exllama_config`.The value of `use_exllama` will be overwritten by `disable_exllama` passed in `GPTQConfig` or stored in your config file.
2024-07-04 20:45:05.716699: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-04 20:45:05.716763: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-04 20:45:05.718136: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has al

model.safetensors:   0%|          | 0.00/5.74G [00:00<?, ?B/s]

Some weights of the model checkpoint at astronomer/Llama-3-8B-Instruct-GPTQ-4-Bit were not used when initializing LlamaForCausalLM: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.11.mlp.down_proj.bias', 'model.layers.1

generation_config.json:   0%|          | 0.00/136 [00:00<?, ?B/s]

### Running the model with an example prompt

We show that the model can run with an example prompt. First we define the system prompt, which tells the model what character to adopt. Then we give it an instruction to introduce itself. Again, depending on the machine and therefore model used, we use slightly different functions to generate output.

In [6]:
# neccessary because Kaggles GPUs are weird'
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

In [7]:
from transformers import pipeline
from IPython.display import display

SYSTEM_MSG = "You are a helpful chatbot assistant."

def generateFromPrompt(promptStr,maxTokens=100):
    if platform.processor() == 'arm':
      messages = [ {"role": "system", "content": SYSTEM_MSG},
              {"role": "user", "content": promptStr}, ]
      input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
      prompt = tokenizer.decode(input_ids)
      response = generate(model, tokenizer, prompt=prompt,max_tokens=maxTokens)
    else:
      message = [{"role": "user", "content": promptStr},]
      pipe = pipeline("text-generation", model=model, tokenizer=tokenizer,max_new_tokens=maxTokens)
      result = pipe(message)
      response = result[0]['generated_text'][1]['content']
    return(response)


response = generateFromPrompt("Please introduce yourself")

print(response+"...")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Nice to meet you! I'm LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a conversational manner. I'm trained on a massive dataset of text from the internet and can generate human-like responses to a wide range of topics and questions. I'm here to help answer your questions, provide information, and even have a fun conversation with you! What would you like to talk about?...


###  Now we need the following functions to search the internet for papers. We will use the API OpenAlex for this.

First, we define two functions that we need to porperly use OpenAlex.<br>
`reconstruct_text` is used to extract the abstract.<br>
`search_openalex` searches OpenAlex for a given search phrase and returns a specified number of found articles.

In [10]:
# function to parse the text
def reconstruct_text(inverted_index):
    word_index = []
    for k,v in inverted_index.items():
        for index in v:
            word_index.append([k,index])

    word_index = sorted(word_index,key = lambda x : x[1])

    word_list = []
    for i in range(len(word_index)):
        word_list.append(word_index[i][0])

    separator = ' '
    reconstructed_text = separator.join(word_list)

    return reconstructed_text

# function that uses openalex to search web for papers
def search_openalex(search_phrase, result_count=10, min_year='2013'):
    base_url = "https://api.openalex.org/works"  # Replace with the actual API endpoint

    # Create filters
    filters = [
        "has_abstract:true",
        "has_fulltext:true",
        f"from_publication_date:{min_year}-01-01"
    ]

    # Construct the query parameters
    params = {
    "search": search_phrase,
    "filter": str.join(",", filters),  # Only return works with abstracts
    "per_page": result_count,  # Limit the search
    }

    r = requests.get(base_url, params=params)
    res_json = r.json()

    abstract_list = []
    for i in range(len(res_json["results"])):
        abstract_list.append(reconstruct_text(res_json["results"][i]['abstract_inverted_index']))

    return res_json["results"], abstract_list

### Let's try getting papers for our search now!

In [11]:
# enter your search phrase
search_phrase = 'depression randomized control trial'
# enter how many abstracts you want to receive
number_of_abstracts = 10
# collect the abstracts
res_, abstract = search_openalex(search_phrase, number_of_abstracts)
# show title
print(res_[0]['title'])
# show abstract
abstract[0]

Rethinking the Dose-Response Relationship Between Usage and Outcome in an Online Intervention for Depression: Randomized Controlled Trial


'There is now substantial evidence that Web-based interventions can be effective at changing behavior and successfully treating psychological disorders. However, interest in the impact of usage on intervention outcomes has only been developed recently. To date, persistence with or completion of the intervention has been the most commonly reported metric of use, but this does not adequately describe user behavior online. Analysis of alternative measures of usage and their relationship to outcome may help to understand how much of the intervention users may need to obtain a clinically significant benefit from the program.The objective of this study was to determine which usage metrics, if any, are associated with outcome in an online depression treatment trial.Cardiovascular Risk E-couch Depression Outcome (CREDO) is a randomized controlled trial evaluating an unguided Web-based program (E-couch) based on cognitive behavioral therapy and interpersonal therapy for people with depression a

### Now, we want to analyse them

The next section provides you with a function you can call to search OpenALex and get the LLMs summary returned. <br>
It comes with default settings, so you only have to give a search phrase. However, we can refine and customize it!

In [12]:
# Function to extract knowledge graphs from paper/ abstract given via input
#
# Written 2024 by Joshua Sammet, Chair of medical Knowledge and Decision, University of St. Gallen

def parse_args():
    parser = argparse.ArgumentParser(description="Simple example of a training script.")
    parser.add_argument(
        "--pretrained_model_name_or_path",
        type=str,
        default=None,
        required=True,
        help="Path to pretrained model or model identifier from huggingface.co/models.",
    )

#--------------Different Prompts--------------
PICO_message_abstract="""You are an expert agent specialized in extracting PICO elements on abstracts from scientific publications.
The PICO elements are population, intervention, comparison and outcome. Your task is to identify the entities and
relations requested from an abstract of an scientific paper that is given to you in a prompt.
You must generate the output in a JSON containing a list with JSON objects having the following keys:
"head", "head_type", "relation", "tail", and "tail_type".
The "head" key must contain the text of the extracted entity from the provided user prompt,
the "head_type" key must contain the type of the extracted head entity which must be one of the PICO elements, the "relation" key must contain the type of relation
between the "head" and the "tail", the "tail" key must represent the text of an extracted entity which is the tail
of the relation, and the "tail_type" key must contain the type of the tail entity. Attempt to extract around 10 entities and relations.
"""
PICO_bulletpoints_abstract="""You are an expert agent specialized in extracting PICO elements on abstracts from scientific publications.
The PICO elements are population, intervention, comparison and outcome. Your task is to identify and extract the content from an abstract of an scientific paper that is given to you in a prompt.
You must generate the output in the form of a list of bullet points. The content of each bullet point should summarize an aspect of the abstract with regard to one PICO element. Please mention the relevant PICO elements at the beginning of each bullet point.
"""

"""
Function to find publications of interest and analyse them
INPUT:
model_name - name of specified model that should be used
search_phrase - Topic that should be search for (e.g. 'depression treatment random controlled trial'
number_of_abstracts - How many papers should be checked
entities - define the entities in the knowledge graph
relations - define the relations in the knowledge graph
prompt - Specify which prompt should be used to instruct the model

OUTPUT:
returns json file with knowledge graph for each apper
"""
# This version uses the llama3 model that was already loaded beforehand
def find_and_analyse_llama3(search_phrase, number_of_abstracts=1, prompt=None, input_text=None):
    # Get papers from OpenAlex
    results, abstracts = search_openalex(search_phrase, number_of_abstracts)

    # Define system prompt message
    if prompt==None:
        system_message = PICO_bulletpoints_abstract
    else:
        system_message = prompt

    prompt = f"<|im_start|>system\n{system_message}<|im_end|>\n <|im_start|> assistant\n "
    inputs = tokenizer(prompt, return_tensors='pt', return_attention_mask=False).to("cuda")
    output_ids = model.generate(inputs["input_ids"], max_new_tokens=50, pad_token_id=tokenizer.eos_token_id)

    if input_text == None:
        for i in range(len(abstracts)):
            print(f"For paper number {i+1}, titled {results[i]['title']}, the abstract gives the following information:\n")
            prompt = f"<|im_start|>user\n Extract the PICO elements fom the following abstract:\n {abstracts[i]}<|im_end|>\n<|im_start|> assistant\n "
            #prompt = f"<|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n Extract the knowledge graph fom the following abstract:\n {abstracts[i]}<|im_end|>\n<|im_start|> assistant\n "

            inputs = tokenizer(prompt, return_tensors='pt', return_attention_mask=False).to("cuda")
            output_ids = model.generate(inputs["input_ids"],max_new_tokens=100)
            answer = tokenizer.batch_decode(output_ids)[0]
            cut_answer = answer.split("<|im_start|> assistant\n",1)[1]
            print(cut_answer + '\n')
    else:
        print(f"For the given text, the abstract gives the following information:\n")
        prompt = f"<|im_start|>user\n Extract the PICO elements fom the following abstract:\n {abstracts[i]}<|im_end|>\n<|im_start|> assistant\n "
        #prompt = f"<|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n Extract the knowledge graph fom the following abstract:\n {input_text}<|im_end|>\n<|im_start|> assistant\n "

        inputs = tokenizer(prompt, return_tensors='pt', return_attention_mask=False).to("cuda")
        output_ids = model.generate(inputs["input_ids"],max_new_tokens=100)
        answer = tokenizer.batch_decode(output_ids)[0]
        cut_answer = answer.split("<|im_start|> assistant\n",1)[1]
        print(cut_answer + '\n')


    return None

In [13]:
# enter your search phrase
search_phrase = 'depression randomized control trial'
# analyse the paper
find_and_analyse_llama3(search_phrase)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


For paper number 1, titled Rethinking the Dose-Response Relationship Between Usage and Outcome in an Online Intervention for Depression: Randomized Controlled Trial, the abstract gives the following information:

  * PICO elements:
    * P: Problem/Population: People with depression and cardiovascular disease
    * I: Intervention: Un-guided Web-based program (E-couch) based on cognitive behavioral therapy and interpersonal therapy
    * C: Comparison: Not applicable (randomized controlled trial)
    * O: Outcome: Clinically significant improvement in depression score on the Patient Health Questionnaire (PHQ-9) of ≥ 5 points
    * E: Exposure: Usage metrics (



### Now it is your turn!

Play around with the model. You can tackle different questions, depending on your interests and skills:
- What effect has a change in the search phrase?
- What effect has a change in the prompt? Can we shift the models focus? How can we make the summaries briefer or more detailed?
- Test your own inputs! Is the models performance satisfactory? What should be changed to make it better?

In [None]:
# enter your search phrase
search_phrase = 'depression randomized control trial'
# enter your prompt
own_prompt = None
# enter your input text
own_abstract = None
# analyse the paper
find_and_analyse_llama3(search_phrase, prompt=own_prompt, input_text=own_abstract)