This notebook demonstrates the inference for text summarization using Falcon-7b-instruct model.

Getting inference mainly based on defining three steps: model, tokenizer, and pipeline.

In [1]:
!pip install transformers
!pip install langchain



In [2]:
import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
  'tiiuae/falcon-7b-instruct',
  trust_remote_code=True
)
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b-instruct")




Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading the tokenizer from the `special_tokens_map.json` and the `added_tokens.json` will be removed in `transformers 5`,  it is kept for forward compatibility, but it is recommended to update your `tokenizer_config.json` by uploading it again. You will see the new `added_tokens_decoder` attribute that will store the relevant information.


In [3]:
from transformers import pipeline
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, max_new_tokens=100, do_sample=True, use_cache=True, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id)
from langchain import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline = pipe, model_kwargs = {'temperature':0.1})

In [4]:
from langchain import PromptTemplate,  LLMChain

template = """
              Write a concise summary of the following text delimited by triple backquotes.
              ```{text}```
              SUMMARY:
           """

prompt = PromptTemplate(template=template, input_variables=["text"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

In [5]:
!pip install datasets
from datasets import load_dataset

# Load the XSum dataset
dataset = load_dataset('xsum')

# Access the splits, e.g., 'test'
test_dataset = dataset['test']

test_df = test_dataset.to_pandas()



In [6]:
test_df.head()
# test_df.drop(columns=['id'], inplace=True)


Unnamed: 0,document,summary,id
0,"Prison Link Cymru had 1,099 referrals in 2015-...","There is a ""chronic"" need for more housing for...",38264402
1,Officers searched properties in the Waterfront...,"A man has appeared in court after firearms, am...",34227252
2,"Jordan Hill, Brittany Covington and Tesfaye Co...",Four people accused of kidnapping and torturin...,38537698
3,The 48-year-old former Arsenal goalkeeper play...,West Brom have appointed Nicky Hammond as tech...,36175342
4,Restoring the function of the organ - which he...,The pancreas can be triggered to regenerate it...,39070183


In [7]:
test_df.drop(columns=['id'], inplace=True)

In [8]:
test_df.head()

Unnamed: 0,document,summary
0,"Prison Link Cymru had 1,099 referrals in 2015-...","There is a ""chronic"" need for more housing for..."
1,Officers searched properties in the Waterfront...,"A man has appeared in court after firearms, am..."
2,"Jordan Hill, Brittany Covington and Tesfaye Co...",Four people accused of kidnapping and torturin...
3,The 48-year-old former Arsenal goalkeeper play...,West Brom have appointed Nicky Hammond as tech...
4,Restoring the function of the organ - which he...,The pancreas can be triggered to regenerate it...


In [9]:
test_df['document'][0]

'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.\nWorkers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.\nThe Welsh Government said more people than ever were getting help to address housing problems.\nChanges to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.\nPrison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because issues such as children or domestic violence were now considered.\nHowever, the same could not be said for men, the charity said, because issues which often affect them, such as post traumatic stress disorder or drug dependency, were often viewed as less of a priority.\nAndrew Stevens, who works in Welsh prisons trying to secure housing for prison leavers, said the need for acc

In [10]:
test_df['summary'][0]

'There is a "chronic" need for more housing for prison leavers in Wales, according to a charity.'

In [11]:
#generating the summary:

In [12]:
# Create an empty column 'model_generated' in test_df to store the generated summaries
test_df['model_generated'] = ""

# Define a function to generate summaries and populate the 'model_generated' column
def generate_and_store_summary(row):
    article_text = row['document']
    summary = llm_chain.run(article_text)
    return summary

# Apply the generate_and_store_summary function to generate summaries for the first 25 articles
test_df.loc[:24, 'model_generated'] = test_df.loc[:24].apply(generate_and_store_summary, axis=1)

# Display the updated DataFrame with generated summaries for the first 25 articles
print(test_df[['document', 'model_generated']].head(25))


                                             document  \
0   Prison Link Cymru had 1,099 referrals in 2015-...   
1   Officers searched properties in the Waterfront...   
2   Jordan Hill, Brittany Covington and Tesfaye Co...   
3   The 48-year-old former Arsenal goalkeeper play...   
4   Restoring the function of the organ - which he...   
5   But there certainly should be.\nThese are two ...   
6   Media playback is not supported on this device...   
7   It's no joke. But Kareem Badr says people did ...   
8   Relieved that the giant telecoms company would...   
9   "I'm really looking forward to it - the home o...   
10  The move is in response to an £8m cut in the s...   
11  The leaflets said the patient had been referre...   
12  Emily Thornberry said Labour would not "frustr...   
13  The National League sold the Republic of Irela...   
14  Iwan Wyn Lewis of Penygroes, Gwynedd, had been...   
15  The 33-year-old has featured only twice for th...   
16  Dr Waleed Abdalati told the

In [13]:
test_df.head()

Unnamed: 0,document,summary,model_generated
0,"Prison Link Cymru had 1,099 referrals in 2015-...","There is a ""chronic"" need for more housing for...",Prisons are currently struggling to house a l...
1,Officers searched properties in the Waterfront...,"A man has appeared in court after firearms, am...",Police conducted a search in the Waterfront P...
2,"Jordan Hill, Brittany Covington and Tesfaye Co...",Four people accused of kidnapping and torturin...,A Chicago court charged four suspects with ha...
3,The 48-year-old former Arsenal goalkeeper play...,West Brom have appointed Nicky Hammond as tech...,"- In four years with the Royals, Gunners' form..."
4,Restoring the function of the organ - which he...,The pancreas can be triggered to regenerate it...,I went to the US to try out a new way of eati...


Randomly visualization of generated text

In [14]:
test_df['model_generated'][11]

' East Sussex Healthcare NHS Trust apologizes for sending incorrect appointment letters which contained incorrect information leaflets. The error was caught and resolved quickly and letters of apology were sent out to affected patients. #UKNews #Health #PatientError'

In [15]:
test_df['model_generated'][24]

' The groom and bride, both 32, exchanged vows at Nevis Range ski resort before sliding down a slalom in their wedding attire.\n            They wed in front of photographer Hamish Frost amid recent bouts of snow in Scotland.\n            The ceremony was officiated by a Humanist Society of Scotland official at the top of a run, followed by a dinner party with their family and friends.\n            The groom, Jonathan, is the first wedding officiated in a ski resort, according to the couple,'

Printing summaries

In [16]:
for i in range(25):
    model_summary = test_df['model_generated'][i]
    reference_summary = test_df['summary'][i]
    print(f"{i + 1} - Reference Summary: {reference_summary}\nModel Summary: {model_summary}\n")

1 - Reference Summary: There is a "chronic" need for more housing for prison leavers in Wales, according to a charity.
Model Summary:  Prisons are currently struggling to house a large number of inmates and are relying on government-funded housing solutions, but with limited funding available for ex-offenders finding suitable accommodation is still challenging.
              The Welsh Government is in the process of building more one-bedroom flats in prison to accommodate more inmates.
              Homeless charity Emmaus currently provides stable accommodation to inmates through their partner, Emmaus South Wales.
              There is still much work to be done to provide suitable housing for former offenders in Wales.

2 - Reference Summary: A man has appeared in court after firearms, ammunition and cash were seized by police in Edinburgh.
Model Summary:  Police conducted a search in the Waterfront Park and Colonsay View areas of the city on Wednesday where they recovered three fir

In [17]:
!pip install rouge



In [18]:
from rouge import Rouge

# Initialize the ROUGE evaluator
rouge = Rouge()

# Select the first 25 rows of your DataFrame for evaluation
num_samples = 25
sampled_df = test_df.head(num_samples)

# Extract the generated summaries and reference summaries for the selected samples
generated_summaries = sampled_df['model_generated'].tolist()
reference_summaries = sampled_df['summary'].tolist()

# Calculate ROUGE scores for the selected samples
rouge_scores = rouge.get_scores(generated_summaries, reference_summaries, avg=True)

# Print the ROUGE scores
print("ROUGE Scores:", rouge_scores)


ROUGE Scores: {'rouge-1': {'r': 0.23566997741194806, 'p': 0.10291413629245508, 'f': 0.139615156458874}, 'rouge-2': {'r': 0.029025909702986023, 'p': 0.01052877857992537, 'f': 0.01493005677923789}, 'rouge-l': {'r': 0.19250589190949946, 'p': 0.0840009751102861, 'f': 0.11384451314342627}}


In [19]:
from nltk.translate.bleu_score import corpus_bleu

# Select the first 25 rows of your DataFrame for evaluation
num_samples = 25
sampled_df = test_df.head(num_samples)

# Extract the generated summaries and reference summaries for the selected samples
generated_summaries = sampled_df['model_generated'].tolist()
reference_summaries = sampled_df['summary'].tolist()

# Calculate BLEU score for the selected samples
bleu_score = corpus_bleu(reference_summaries, generated_summaries)
print("BLEU Score for 25 Summaries:", bleu_score)


BLEU Score for 25 Summaries: 9.225829346520394e-232


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


In [20]:
!pip install bert_score



In [21]:
from bert_score import score
# Select the first 25 rows of your DataFrame for evaluation
num_samples = 25
sampled_df = test_df.head(num_samples)

# Extract the generated summaries and reference summaries for the selected samples
generated_summaries = sampled_df['model_generated'].tolist()
reference_summaries = sampled_df['summary'].tolist()

# Calculate BERT Score
P, R, F1 = score(generated_summaries, reference_summaries, lang="en", verbose=True)

# Print BERT Score
print("BERT Precision:", P.mean().item())
print("BERT Recall:", R.mean().item())
print("BERT F1 Score:", F1.mean().item())


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


  0%|          | 0/1 [00:00<?, ?it/s]

computing greedy matching.


  0%|          | 0/1 [00:00<?, ?it/s]

done in 3.17 seconds, 7.89 sentences/sec
BERT Precision: 0.7872003316879272
BERT Recall: 0.8636748790740967
BERT F1 Score: 0.8231661915779114
