# Stage 1: Role Labeling

One way to identify narratives in newspaper text is through considering the character archetypes relied on to compose the framing of an article. The main figures in an article may be represented as the heroes, villains, or victims in the text to guide the reader towards reading the article in context with existing qualities implicit in these character archetypes. Gomez-Zara et al present a dictionary-based method for computationally determining the hero, villain, and victim in a newspaper text, which Stammbach et al adapt by using an LLM for the same task. 

## Fetch Articles (for Testing)

In [38]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from huggingface_hub import InferenceClient
from transformers import BertTokenizer
from utils.preprocessing import *
from utils.accelerators import *
from utils.multithreading import *
from utils.database import *
from utils.model import *
from utils.files import *
from datasets import Dataset
from rouge import Rouge
from tqdm import tqdm
import statistics
import hashlib
import random
import openai
import time
import math
import re

### Connect to Database

Credentials are sourced from the `.env` file.

In [39]:
_, db = getConnection(use_dotenv=True)

### Query Database

Fetches a limited number of articles from the database that haven't been processed yet, 
returning specified fields like url, title, and parsing result text.

In [41]:
collection = "articles.sampled.triplets"
fields = {"url": 1, "title": 1, "parsing_result.text": 1}
query = {"triplets": {"$exists": False}, 
         "parsing_result.text_length": {"$lt": 1000}}
articles = fetchArticleTexts(db, 50, 0, fields, query, collection)

Example article:

In [42]:
example_article = random.choice(articles)
title = example_article.get("title")
text = example_article.get("parsing_result").get("text")
print(f"Title: {title}\nText: {text}")


Title: Electronic Music Awards & Foundation Show 2016: Facts & Photos
Text: Oliver Willis It looks like nothing was found at this location. Maybe try searching? AI Crime Watch



Processes the 'parsing_result' of each article to clean the text, and filters out articles 
that lack a 'title' or 'parsing_result'.


In [43]:
# Basic text cleaning, e.g. removing newlines, tabs, etc.
articles = cleanArticles(articles)

Cleaning articles: 100%|██████████| 50/50 [00:00<00:00, 12360.91it/s]


In [44]:
# Filter out articles with no title or no parsing result 
articles = [article for article in articles if article.get(
    "title", "") and article.get("parsing_result", "")]

print("Number of articles:", len(articles))

Number of articles: 50


### Export as JSON

Saves the given data to a JSON file for optional visual inspection.

In [45]:
exportAsJSON("../data/input/articles.json",  articles)

***

## Load Model

Vicuna-13B is an open-source chatbot developed by refining LLaMA through user-contributed conversations gathered from ShareGPT. Initial assessments employing GPT-4 as a referee indicate that Vicuna-13B attains over 90%* quality of OpenAI ChatGPT and Google Bard, surpassing other models such as LLaMA and Stanford Alpaca in over 90%* of instances. 

See:
* https://github.com/lm-sys/FastChat
* https://huggingface.co/lmsys/vicuna-13b-v1.5-16k

```bash
# Start the controller service
nohup python3 -m fastchat.serve.controller --host 0.0.0.0 --port 21001 &

# Start the model_worker service
nohup python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-13b-v1.5-16k --num-gpus 2 &
nohup python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-33b-v1.3 --num-gpus 2 &

# Start the gradio_web_server service
nohup python3 -m fastchat.serve.gradio_web_server --host 0.0.0.0 --port 7860 &

# Launch the RESTful API server
nohup python3 -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8080 &
```

Check GPU utilization:

In [46]:
num_gpus = torch.cuda.device_count()
print(f'Number of available GPUs: {num_gpus}')

Number of available GPUs: 2


List infos about the available GPUs:

In [47]:
gpu_info_list = listAvailableGPUs()

GPU 0:
  Name: Tesla P100-PCIE-16GB
  Memory: 16276.00 MiB
  Compute Capability: 6.0

GPU 1:
  Name: Tesla P100-PCIE-16GB
  Memory: 16276.00 MiB
  Compute Capability: 6.0



In [48]:
!nvidia-smi

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Sun Oct 22 15:51:06 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:17:00.0 Off |                    0 |
| N/A   46C    P0    34W / 250W |  13224MiB / 16384MiB |      0%      Default |
|                               |            

Test Model:

In [49]:
model = RemoteModel(model_name="vicuna-13b-v1.5-16k",
                    api_base="http://merkur72.inf.uni-konstanz.de:8080/v1",
                    api_key="EMPTY")

params = {'n': 5}
result = model.generateAnswers("Once upon a time", params=params)
print(result, type(result))

[<OpenAIObject at 0x7fccb25041d0> JSON: {
  "index": 0,
  "text": "in a faraway land, there was a beautiful princess named Sophia.",
  "logprobs": null,
  "finish_reason": "length"
}, <OpenAIObject at 0x7fccb2504590> JSON: {
  "index": 1,
  "text": ", there was a young man who dreamed of becoming a great musician.",
  "logprobs": null,
  "finish_reason": "length"
}, <OpenAIObject at 0x7fccb2504040> JSON: {
  "index": 2,
  "text": ", there was a man called Jack. Jack was a simple man and lived a",
  "logprobs": null,
  "finish_reason": "length"
}, <OpenAIObject at 0x7fccb25047c0> JSON: {
  "index": 3,
  "text": "in a small village, there was a young girl named Sophia. Sophia",
  "logprobs": null,
  "finish_reason": "length"
}, <OpenAIObject at 0x7fccb2504810> JSON: {
  "index": 4,
  "text": "there was a little girl named Lily. She was a happy, curious and",
  "logprobs": null,
  "finish_reason": "length"
}] <class 'list'>


***

## Define Prompt Template:

In [50]:
# PROMPT_TEMPLATE = "Please identify entities which are portrayed as hero, villain and victim in the following news article. A hero is an individual, organisation, or entity admired for their courage, noble qualities, and outstanding achievements. A villain is a character, organisation, or entity known for their wickedness or malicious actions, often serving as an antagonist in a story or narrative. A victim is an individual, organisation, or entity who suffers harm or adversity, often due to an external force or action. Every entity can only be one of those roles. The solution must be returned in this format {{hero: \"Name\", villain: \"Name\", victim: \"Name\"}}. Article Headline: ''{headline}''. Article Text: ''{article_text}''  Solution: "

# PROMPT_TEMPLATE = "Please identify entities which are portrayed as hero, villain and victim in the following news article. Every entity can only be one of those roles. If not existing return None as name. The solution must be returned in this format {{hero: \"Name\", villain: \"Name\", victim: \"Name\"}}. Article Headline: ''{headline}''. Article Text: ''{article_text}''  Solution: "

# PROMPT_TEMPLATE = "Please identify entities which are portrayed as hero, villain and victim in the following news article. Each entity can only assume one role. If none apply, use 'None'. The solution must be returned in this format {{hero: \"Name\", villain: \"Name\", victim: \"Name\"}}. Article Headline: ''{headline}''. Article Text: ''{article_text}''  Solution: "

PROMPT_TEMPLATE = "Given the news article below, identify entities categorized as a hero, villain, or victim. Each entity can only assume one role. If none apply, use 'None'. The solution must be provided in this format: {{hero: \"Name\", villain: \"Name\", victim: \"Name\"}}. \n Headline: '{headline}' \n Text: '{article_text}' \n Solution: "

# Test the template with a dummy text
prompt_test = PROMPT_TEMPLATE.format(headline = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.', article_text='Lorem ipsum dolor sit amet, consectetur adipiscing elit.')
print(prompt_test)


Given the news article below, identify entities categorized as a hero, villain, or victim. Each entity can only assume one role. If none apply, use 'None'. The solution must be provided in this format: {hero: "Name", villain: "Name", victim: "Name"}. 
 Headline: 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.' 
 Text: 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.' 
 Solution: 


## Define Parameter for Text Generation

Each parameter influences the text generation in a specific way. Below are the parameters along with a brief explanation:

**`max_length`**:
* Sets the maximum number of tokens in the generated text (default is 50).
* Generation stops if the maximum length is reached before the model produces an EOS token.
* A higher `max_length` allows for longer generated texts but may increase the time and computational resources required.

**`min_length`**:
* Sets the minimum number of tokens in the generated text (default is 10).
* Generation continues until this minimum length is reached even if an EOS token is produced.

**`num_beams`**:
* In beam search, sets the number of "beams" or hypotheses to keep at each step (default is 4).
* A higher number of beams increases the chances of finding a good output but also increases the computational cost.

**`num_return_sequences`**:
* Specifies the number of independently computed sequences to return (default is 3).
* When using sampling, multiple different sequences are generated independently from each other.

**`early_stopping`**:
* Stops generation if the model produces the EOS (End Of Sentence) token, even if the predefined maximum length is not reached (default is True).
* Useful when an EOS token signifies the logical end of a text (often represented as `</s>`).

**`do_sample`**:
* Tokens are selected probabilistically based on their likelihood scores (default is True).
* Introduces randomness into the generation process for diverse outputs.
* The level of randomness is controlled by the 'temperature' parameter.

**`temperature`**:
* Adjusts the probability distribution used for sampling the next token (default is 0.7).
* Higher values make the generation more random, while lower values make it more deterministic.

**`top_k`**:
* Limits the number of tokens considered for sampling at each step to the top K most likely tokens (default is 50).
* Can make the generation process faster and more focused.

**`top_p`**:
* Also known as nucleus sampling, sets a cumulative probability threshold (default is 0.95).
* Tokens are sampled only from the smallest set whose cumulative probability exceeds this threshold.

**`repetition_penalty`**:
* Discourages the model from repeating the same token by modifying the token's score (default is 1.5).
* Values greater than 1.0 penalize repetitions, and values less than 1.0 encourage repetitions.


In [51]:
params = {'do_sample': True,
        'early_stopping': True,
        # 'max_length': 100,
        # 'min_length': 1,
        'logprobs': 1,
        'n': 3,
        #'best_of': 1,
        
        'num_beam_groups': 2,
        'num_beams': 5,
        'num_return_sequences': 5,
        'max_tokens': 50,
        'min_tokens': 0,
        'output_scores': True,
        'repetition_penalty': 1.0,
        'temperature': 0.6,
        'top_k': 50,
        'top_p': 1.0 
        }

## Define Helper Functions

In [52]:
def extractTriplet(answer):
    """ Extracts the triplet from the answer string. """
    
    # Extract keys and values using regex
    keys = re.findall(r'(\w+):\s*\"', answer)
    values = re.findall(r'\"(.*?)\"', answer)
    result = dict(zip(keys, values))

    if result == {}:    
        keys = re.findall(r'(\w+):\s*([^,]+)', answer)
        result = dict((k, v.strip('"')) for k, v in keys)
    
    return result

In [53]:
def getAnswersTriplets(article, model, template, params):
    """ Generates answers for the given article using the model and template. """

    # Extract the article headline and text
    article_headline=article.get("title", "")
    article_text = article.get("parsing_result").get("text")

    # Generate the answer
    prompt = template.format(headline = article_headline, article_text = article_text)
    answers = model.generateAnswers(prompt, params)

    return answers

In [54]:
def splitText(text, n_tokens, tokenizer, overlap=10):
    """Splits the input text into chunks with n_tokens tokens using HuggingFace tokenizer, 
    with an overlap of overlap tokens from the previous and the next chunks."""
    
    tokens = tokenizer.tokenize(text)
    chunks = []
    i = 0

    # No previous chunk at the beginning, so no need for overlap
    chunks.append(tokenizer.convert_tokens_to_string(tokens[i:i+n_tokens]))
    i += n_tokens

    while i < len(tokens):
        # Now, we include overlap from the previous chunk
        start_index = i - overlap
        end_index = start_index + n_tokens
        chunk = tokens[start_index:end_index]
        chunks.append(tokenizer.convert_tokens_to_string(chunk))
        i += n_tokens - overlap  # Moving the index to account for the next overlap

    return chunks

In [55]:
def processBatch(articles, model, template, params, chunk_size=1024, overlap=256, show_progress=False, verbose=False):
    """Processes a batch of articles and extracts the triplets."""
    runtimes = []  # List to store the runtime for each article

    # Iterate over the articles
    for article in tqdm(articles, desc="Generating answers", disable=not show_progress):
        start_time = time.time()  # Start the timer

        # Extract the article headline and text
        article_headline = article.get("title", "")
        article_text = article.get("parsing_result").get("text")

        # Split the article text into chunks
        chunks = splitText(article_text, chunk_size,
                            model.tokenizer, overlap=overlap)

        # print("Chunks:", len(chunks))

        chunk_results = []
        for chunk_id, chunk in enumerate(chunks):

            if verbose:
                print("Chunk:", chunk_id)
                print("Chunk Length:", calcInputLength(model.tokenizer, chunk))

            prompt = template.format(headline=article_headline, article_text=chunk)
            answers = model.generateAnswers(prompt, params)

            # Extract the triplet from seach answer
            for answer in answers:
                answer["triplet"] = extractTriplet(answer.get("text"))

            results = {
                "chunk_id": chunk_id,
                "chunk": chunk,
                "answers": answers
            }
            chunk_results.append(results)

        article["triplets"] = chunk_results

        end_time = time.time()  # End the timer
        runtime = end_time - start_time  # Calculate the runtime
        runtimes.append(runtime)  # Store the runtime

    return articles, runtimes

In [56]:
def updateArticle(db, id: str, values: dict = {}, collection="articles"):
    "Updates scraping task in database"
    filter = {"_id": ObjectId(id)}
    values = {"$set": {**values}}
    r = db[collection].update_one(filter, values)
    return r

In [57]:
def updateArticles(db, articles, collection = "articles"):
    """Updates the articles in the database."""

    for article in tqdm(articles, desc="Uploading results"):
        id = article.get("_id")
        values = {"triplets": article.get("triplets", [])}
        updateArticle(db, id, values, collection) # TODO: Uncomment to update the database

## Test Examples

In [58]:
article = articles[40]
print("Article Title:", article.get("title"))
print("Article Text:", article.get("parsing_result").get("text")[:200])

Article Title: Missouri Gov. Nixon to attend Detroit auto show
Article Text: Biden doesn’t want to be president, so let’s help him with that Nails on a chalkboard: Biden’s disastrous trip to Maui GOP presidential contest has come down to Trump vs. DeSantis DETROIT (AP) - Misso


In [59]:
splitText(article.get("title"), 5, model.tokenizer, overlap=1)

['Missouri Gov. N', 'Nixon to attend Detroit', 'Detroit auto show']

In [60]:
title = article.get("title")
text = article.get("parsing_result").get("text")
prompt = PROMPT_TEMPLATE.format(headline =title, article_text = text)
input_length = calcInputLength(model.tokenizer, prompt)

print("Prompt: >>>", prompt, "<<<")
print("Prompt Input Length:", input_length)

Prompt: >>> Given the news article below, identify entities categorized as a hero, villain, or victim. Each entity can only assume one role. If none apply, use 'None'. The solution must be provided in this format: {hero: "Name", villain: "Name", victim: "Name"}. 
 Headline: 'Missouri Gov. Nixon to attend Detroit auto show' 
 Text: 'Biden doesn’t want to be president, so let’s help him with that Nails on a chalkboard: Biden’s disastrous trip to Maui GOP presidential contest has come down to Trump vs. DeSantis DETROIT (AP) - Missouri Gov. Jay Nixon is going to this year’s North American International Auto Show in Detroit. Nixon said in a statement that he plans to promote Missouri’s automotive industry while at the auto show Tuesday and Wednesday. Nixon touted what he called a comeback of the industry in Missouri in an announcement of his trip. He says he’ll meet with executives from Ford Motor Company and General Motors. This will be Nixon’s sixth year at the show as governor. Copyright

In [61]:
answer = getAnswersTriplets(article, model, PROMPT_TEMPLATE, params)
print("Answer:", answer)

Answer: [<OpenAIObject at 0x7fccb2507290> JSON: {
  "index": 0,
  "text": "\n{hero: \"Jay Nixon\", villain: \"None\", victim: \"None\"}",
  "logprobs": null,
  "finish_reason": "stop"
}, <OpenAIObject at 0x7fccb2506160> JSON: {
  "index": 1,
  "text": "\n{hero: \"Jay Nixon\", villain: \"None\", victim: \"None\"}",
  "logprobs": null,
  "finish_reason": "stop"
}, <OpenAIObject at 0x7fccb2507f60> JSON: {
  "index": 2,
  "text": "\n{hero: \"Jay Nixon\", villain: \"None\", victim: \"None\"}",
  "logprobs": null,
  "finish_reason": "stop"
}]


In [62]:
articles, runtimes = processBatch(articles[:5], model, PROMPT_TEMPLATE, params, chunk_size = 1024, overlap= 64, show_progress=True)

Generating answers: 100%|██████████| 5/5 [00:57<00:00, 11.50s/it]


In [63]:
articles[2]

{'_id': ObjectId('64d8eb3a516b2658722949b1'),
 'title': 'Trump Calls Kim Jong Un A ‘Maniac,’ Then Showers Him With Nausea-Inducing Praise',
 'url': 'http://www.ifyouonlynews.com/politics/trump-calls-kim-jong-un-a-maniac-then-showers-him-with-nausea-inducing-praise/',
 'parsing_result': {'text': 'View more information »'},
 'triplets': [{'chunk_id': 0,
   'chunk': 'View more information »',
   'answers': [<OpenAIObject at 0x7fccb2507100> JSON: {
      "index": 0,
      "text": "\n{hero: None, villain: None, victim: None}",
      "logprobs": null,
      "finish_reason": "stop",
      "triplet": {
        "hero": "None",
        "villain": "None",
        "victim": "None}"
      }
    },
    <OpenAIObject at 0x7fccb25040e0> JSON: {
      "index": 1,
      "text": "\n{hero: \"None\", villain: \"Donald Trump\", victim: \"North Korea\"}\n\nThe article reports on Donald Trump's conflicting statements about Kim Jong Un. On one hand, Trump called Kim a \"",
      "logprobs": null,
      "finish

In [64]:
if runtimes:
    avg_runtime = sum(runtimes) / len(runtimes)
    print(f"Average runtime: {avg_runtime:.4f} seconds")
else:
    avg_runtime = 0

if len(runtimes) > 1:
    std_runtime = statistics.stdev(runtimes)
    print(f"Standard Deviation of runtime: {std_runtime:.4f} seconds")
else:
    std_runtime = 0

Average runtime: 11.4986 seconds
Standard Deviation of runtime: 1.4202 seconds


## Make Predictions

In [65]:
LIMIT = 10 # Number of articles to process in each batch
CHUNK_SIZE = 20_000 # Number of tokens in each chunk
OVERLAP = 64 # Number of overlapping tokens between chunks
COLLECTION = "articles.sampled.triplets"

In [68]:
batch_id = 0

while True:
    print(f"------ Batch {batch_id} ------")

    # Fetch the next batch of articles
    articles = fetchArticleTexts(db, LIMIT, 0, fields, query, COLLECTION)
    
    # Stop if no more articles are available
    if not articles:
        break
    
    # Process the batch of articles
    articles, runtimes = processBatch(articles, model, PROMPT_TEMPLATE, params, chunk_size=CHUNK_SIZE, overlap=OVERLAP, show_progress=True)

    # Update the articles in the database
    updateArticles(db, articles, COLLECTION)
    print(f"Updated {len(articles)} articles", end="\n\n")

    batch_id += 1
    break

------ Batch 0 ------


Generating answers: 100%|██████████| 10/10 [01:44<00:00, 10.45s/it]
Uploading results: 100%|██████████| 10/10 [00:00<00:00, 639.26it/s]

Updated 10 articles






In [69]:
articles[0]

{'_id': ObjectId('64d8eb3a516b2658722945bb'),
 'title': 'Trump Spokesperson Channels Inner Death Eater: ‘Are There Any Pure Breeds Left’',
 'url': 'http://www.ifyouonlynews.com/politics/trump-spokesperson-channels-inner-death-eater-are-there-any-pure-breeds-left/',
 'parsing_result': {'text': 'View more information »'},
 'triplets': [{'chunk_id': 0,
   'chunk': 'View more information »',
   'answers': [<OpenAIObject at 0x7fccb26a1030> JSON: {
      "index": 0,
      "text": "\n{hero: None, villain: \"Trump Spokesperson\", victim: \"Pure Breeds\"}\n\nExplanation:\n\n* The article does not mention any hero.\n* The Trump Spokesp",
      "logprobs": null,
      "finish_reason": "length",
      "triplet": {
        "villain": "Trump Spokesperson",
        "victim": "Pure Breeds"
      }
    },
    <OpenAIObject at 0x7fccb26a1490> JSON: {
      "index": 1,
      "text": "\n{hero: \"None\", villain: \"Katrina Pierson\", victim: \"None\"}",
      "logprobs": null,
      "finish_reason": "stop"

In [None]:
raise SystemExit("Stopped before updating the database!")

SystemExit: Stopped before updating the database!

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


## Convert to Results to old Format

In [None]:
# Fetch articles from database
fields = {"triplets": 1}
query = {"triplets": {"$exists": True}}
articles = fetchArticleTexts(db,  limit=0, skip=0, fields=fields, query=query, collection="articles")


In [None]:
articles[0]

In [None]:
invalid_answers = [
    "None",
    "'None'",
    "'none'",
    "None}",
    "{None",
    "Not applicable",
    "Not available",
    "Not specified",
    "No data",
    "No value",
    "Invalid",
    "Unspecified",
    "Empty",
    "Missing",
    "Null",
    "Undefined",
    "N/A",
    "NA",
    "Not provided",
    "No information",
    "Not set",
    "No entry",
    "No response",
    "Not applicable",
    "Not determined",
    "No result",
    "No answer",
    "No record",
    "No match",
    "No selection",
    "Not found",
    "Not valid",
    "Not given",
    "Not filled",
    "Not assigned",
    "No choice",
    "Not used",
    "No sample",
    "Not measured",
    "No response",
    "Not reported",
    "Not registered",
    "Not logged",
    "No feedback",
    "No score",
    "No grade",
    "No rating",
    "No rating available",
    "No rating provided",
    "No rating assigned",
    "No rating given",
    "No rating received",
    "No rating found",
    "No rating available",
    "No rating recorded",
    "No rating obtained",
    "No rating submitted",
    "No rating included",
]

In [None]:
def isNone(input_string, alternative_names):
    """Checks if the input string contains one of the alternative names."""
    
    for name in alternative_names:
        if name == input_string.strip():
            return True
    return False

In [None]:
result = db[COLLECTION].update_many({}, {"$unset": {"triplets": ""}})
#result = db.articles.sampled.triplets.update_many({}, {"$unset": {"processing_result": ""}})
#result = db.articles.sampled.triplets.update_many({}, {"$unset": {"denoising_result": ""}})
#result = db.articles.sampled.triplets.update_many({}, {"$unset": {"embedding_result": ""}})

In [None]:
# Iterate through all documents in the collection
for article in tqdm(articles, desc="Uploading results"):
    chunks = article.get("triplets", [])  # Get the "triplets" property

    #print(chunks)

    heros, villains, victims = [], [], []

    # Extract data from the "triplets" property
    for chunk in chunks:


        triplet = chunk.get("triplet", {})
        hero = triplet.get("hero", "None")
        villain = triplet.get("villain", "None")
        victim = triplet.get("victim", "None")

        #print(hero, villain, victim)

        if not isNone(hero, invalid_answers):
            heros.append(hero)
        if not isNone(villain, invalid_answers):
            villains.append(villain)
        if not isNone(victim, invalid_answers):
            victims.append(victim)

    # Create the processing_result structure
    processing_result = {
        "hero": heros,
        "villain": villains,
        "victim": victims
    }
   
    #print(processing_result)
    
    # Update the document in the database   
    id = article.get("_id")
    values = {"processing_result": processing_result}
    updateArticle(db, id, values)