### **Config**

In [2]:
import os
import sys

sys.path.insert(0, '/home/marco/epfl/magma/')
import config

from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'

In [3]:
MODEL = 'led'
MODELS = {}

In [4]:
# Dataset path
data_dir = config.MAGMA_DIR + 'datasets/karger_books_base/'

### **Init**

In [5]:
import matplotlib.pyplot as plt
import numpy as np
import torch
import re
import pandas as pd
from tqdm import tqdm
from textwrap import fill
tqdm.pandas()

### **Function Definition**

##### Import Model and Tok

In [6]:
def import_model_tok(model_name_or_path, verbose=False):
    global MODELS

    if model_name_or_path in MODELS.keys():
        if verbose : print('[+] model already present in cache\n')
        return MODELS[model_name_or_path]
    if verbose : print('[*] importing the model\n')
        
    from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

    model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

    if verbose : print(model.config)
    MODELS[model_name_or_path] = model, tokenizer
    if verbose : print('[+] the model is now present in cache\n')
    return MODELS[model_name_or_path]

##### Print Examples

In [7]:
def print_examples(model_name_list, df, n_examples=10):
    
    df_examples = df.sample(n_examples, axis='index', random_state=config.SEED)
    
    for idx, row in df_examples.iterrows():
        print(idx)
        #print(fill(row.text, 100))
        print()
        for model_name in model_name_list:
            model, tokenizer = import_model_tok(model_name)
            model = model.to(device)
            
            text_enc = tokenizer.encode(row.text,
                truncation=True,
                max_length=8192,
                return_tensors='pt')
            
            summ_enc = model.generate(
                text_enc,
                min_length = 70,
                max_length = 512,
                length_penalty = 1,
                num_beams = config.NUM_BEAMS,
                no_repeat_ngram_size = 3,
                early_stopping = True)[0]
            summ_num_tok = len(summ_enc)
            summ = tokenizer.decode(summ_enc, skip_special_tokens=True)
            
            print('Prediction\n%s (%d tok):\n'%(model_name, summ_num_tok))
            if '<BULL>' in summ:
                summ_bullets = [b for b in summ.split('<BULL>') if len(b)>0]
                for b in summ_bullets:
                    print(fill(b, 100))
                    print()
            else:
                print(fill(summ, 100))
                print()
            
        print('Reference:')
        bullets = [b for b in row.bullets.split('<BULL>') if len(b)>0]
        for b in bullets:
            print(fill(b, 100))
            print()
            
        print(''.join(['#']*100))
        print()

## **LED on Karger Books**

In [9]:
df_train = pd.read_csv(data_dir+'train.csv').set_index(['book', 'chapter'])
df_val = pd.read_csv(data_dir+'val.csv').set_index(['book', 'chapter'])
df_test = pd.read_csv(data_dir+'test.csv').set_index(['book', 'chapter'])

### **Print and Summarization**

##### Print Train Examples

In [14]:
print_examples([
    magma_dir+'fine-tuning/allenai?led-base-16384_karger_books_base/checkpoint-60/'],
    df_train)

(9781910797150, 'ch04')

Prediction
/home/marco/epfl/magma/fine-tuning/allenai?led-base-16384_karger_books_base/checkpoint-60/ (124 tok):

The use of antiemetics should follow the same principles as those used to treat adults.

Prophylactic control of CINV in patients receiving highly emetogenic chemotherapy (HEC) requires a
combination of a 5-HT RA, dexamethasone and an NK-1 RA (aprepitant) for the prevention of
chemotherapy-induced nausea and vomiting (CINV).

Antiemetic therapy should be started before chemotherapy is administered on day 1 and continued
through the acute and delayed phase for as long as the chemotherapy is emetic (usually 2-3 days).

Reference:
 The primary goal of CINV therapy is the prevention of nausea and vomiting.

 Patients should be individually evaluated for their specific risk factors as well as the level of
anxiety present before the first course of treatment.

 Outcomes are improved by following international guidelines when selecting the antiemetic regim

Prediction
/home/marco/epfl/magma/fine-tuning/allenai?led-base-16384_karger_books_base/checkpoint-60/ (123 tok):

The principles of ethical biomedical ethics are at the core of research ethics and should be
carefully considered during the study design phase and ethics review process.

The principle of beneficence focuses on the fair distribution of the benefits and burdens of
research and recruitment protocols that are inclusive of those most likely to benefit from knowledge
gained.

When using third-party commercial apps or measurement tools, the likelihood of a loss of privacy is
100% for all users - yet the consequences will vary. For most people, these will be negligible, but
for a domestic abuse survivor or undocumented migrant, the consequences might be severe.

Reference:
 The principles that guide the ethical conduct of biomedical and behavioral research include:
respect for persons, beneficence, justice (Belmont Report) and respect for law and public interest
(Menlo Report).



##### Print Val Examples

In [15]:
print_examples([
    magma_dir+'fine-tuning/allenai?led-base-16384_karger_books_base/checkpoint-60/'],
    df_val)

(9781910797471, 'ch02')

Prediction
/home/marco/epfl/magma/fine-tuning/allenai?led-base-16384_karger_books_base/checkpoint-60/ (146 tok):

The sympathetic nervous system (SNS) plays a significant role in the pathophysiology of heart
failure (HF).

Neurohormonal pathways activated in HF are the sympathetic nervous nervous system and the
natriuretic peptide (NP) system.

Neuronal vasoconstriction is an important manifestation of HF, reflecting inactivity, consequences
of circulating substances such as tumor growth factor (TGF)-beta and reduced cardiac output.

Aldosterone (which may be released even in the setting of ACE inhibition) contributes to myocardial
and vascular fibrosis.

The functional status of patients in ACC/AHA class D (with marked symptoms and signs of HF) is
usually limited.

Reference:
 Heart failure (HF) is a disease of response to injury, which is initially appropriate and becomes
inappropriate.

 Inappropriate non-cardiac responses include activation of a variety of 

Prediction
/home/marco/epfl/magma/fine-tuning/allenai?led-base-16384_karger_books_base/checkpoint-60/ (214 tok):

High-grade gliomas are highly cellular, with varied nuclear morphology and a scanty perinuclear rim
of cytoplasm.

The diagnosis of glioblastoma requires a multidisciplinary approach, including the use of dynamic
susceptibility contrast-enhanced (DSC, T2) and susceptibility weighted imaging (SWI).

Mutations in the normal metabolite distribution are interpreted as markers of disease processes.

Diseases that are not yet identified, such as choline, N -acetyl aspartate, lactate, lipids, etc.,
should always be considered when assessing glioma infiltration.

Mutation-specific antibodies can be used to identify non-enhancing gliomatous tumors.

A variety of amino acid (non-FDG) tracers have been developed that looks at epigenetic changes
caused by a combination of mutations and the cell of origin.

Multivoxel spectroscopy is the most demanding and increasingly available modalit

##### Print Test Examples

In [16]:
print_examples([
    magma_dir+'fine-tuning/allenai?led-base-16384_karger_books_base/checkpoint-60/'],
    df_test)

(9781910797105, 'ch05')

Prediction
/home/marco/epfl/magma/fine-tuning/allenai?led-base-16384_karger_books_base/checkpoint-60/ (95 tok):

The neurodevelopmental hypothesis of schizophrenia is based on a static lesion in the maturational
stage of the brain.

Drug use increases the risk of developing schizophrenia.

The risk is further increased if minor psychotic symptoms pre-exist and if cannabis use starts early
in adolescence.

There is emerging evidence for the role of non-biological risk factors in schizophrenia, such as the
effects of urban upbringing and of ethnicity, as noted in Chapter 3.

Reference:
 Early neurodevelopmental, non-genetic risk factors exist for schizophrenia.

 Like genetic factors, environmental factors are many and varied.

 Birth complications increase the child's risk of schizophrenia in later life fourfold.

 Psychosocial risk factors are being re-established as important risk factors.

 Cannabis use increases the risk of schizophrenia as well as relapse.


Prediction
/home/marco/epfl/magma/fine-tuning/allenai?led-base-16384_karger_books_base/checkpoint-60/ (106 tok):

Binge eating disorder (BED) is associated with obstructive sleep apnea and restless legs syndrome.

Behavioral therapy (CBT) is a key component of long-term obesity management.

A food and activity diary should clarify eating patterns and behaviors, particularly events that
trigger eating, with a view to changing those patterns once they have been identified, at a
realistic rate of improvement.

Stimulus control: avoiding situations that lead to harmful behavior, and promoting situations that
influence healthy activity.

Reference:
 Behavioral therapy, alongside diet and lifestyle advice, is one of the three key components of
obesity management.

 A person's degree of motivation and expectations should be assessed.

 Techniques include goal setting, self-monitoring and stimulus control.

 Binge eating disorder and night eating syndrome are specific eating disorders and shou