In [None]:
import os
import re
import pickle
import random
import time
from datetime import datetime
from concurrent.futures import ThreadPoolExecutor, as_completed

    
def chunk_prompt_maker(text_pair):
    full_text, chunk_text = text_pair
    prompt_message = [
        {
            "role": "user",
            "content": (
                """
You are an NLP model specialized in threat intelligence extraction. Your task is to extract a knowledge graph related to cybersecurity threats from a given threat intelligence report or blog article, including entities (nodes) and relationships (edges) among those entities, and output the results in a specified format.
[Entities (Nodes)]
What's an entity:

In cyber threat intelligence, "entity" refers to any unit of information that can be independently identified, described, and analyzed, and it forms a fundamental component of each link in threat activity. It is important to note that entities here are not limited to those verified indicators (IOCs) specifically used for detection, such as specific IP addresses, domain names, or file hashes, but a broader concept. Entities can be objects with clear names and characteristics (such as "get-logon-history.ps1"), or data objects such as "RAR file", even if it has no fixed naming rules.

The main characteristics of entities are:

    Independence: Each entity exists as an independent unit of information and can be extracted and analyzed separately;

    Relevance: There may be inherent connections between entities, and by associating this information, a complete attack chain or threat portrait can be constructed;

    Diversity: Entities can cover various forms of information such as files, scripts, configuration files, registry entries, network traffic data, log records, etc.;

    Contextual significance: Even if an entity (such as a downloaded "RAR file") does not have a specific name, as long as it has contextual significance and analytical value in threat intelligence analysis, it is still a valid entity.

Important Note

    Only extract entities from named entities that directly appear in the current chunk, i.e., only pay attention to the entities explicitly mentioned in the current chunk.

    In the process of relationship extraction, if it is discovered that an entity in the current chunk has a relationship with an entity that does not directly appear in the chunk (for example, through indefinite pronouns, chapter hints, or contextual implications), then add that outside-chunk entity to the entity list and extract the corresponding relationship.

Entity Extraction Steps
Stage 1 – Entity Extraction and Classification

Stage 1.1 Fully scan the #current chunk's text# to identify all entities related to cybersecurity threats, including:

    Explicitly named indicators (IPs, domains, filenames, hash values, etc.)

    Implicit threat components (attack stages, undocumented tools, generic file types)

    Contextually significant objects ("RAR file", "registry entry") even without specific names

Stage 1.2 – Type Assignment

    Assign one predefined category to each entity.

    Use threat-actor/intrusion-set formatted names for unnamed attackers/attacks.

    Apply other category only when no predefined type matches.

Stage 1.3 – Alias & Lineage Handling

    Record all aliases or alternative names for entities.

    Identify evolutionary relationships for the "mother entity" field (e.g., malware variants).

Stage 1.4 – Recheck (Self-Reflection on Completeness)
Before finalizing entity extraction, pause and critically reflect on whether any relevant entities may have been overlooked.
First, check if any explicitly named entities were unintentionally ignored.
Then, examine whether non-explicitly named entities have been missed—these may be referenced using generic terms like file, image, or script, but actually point to specific entities in context.

Predefined entity types for categorization:

    threat-actor
    Individuals, groups, or organizations involved in cyberattacks, such as hacker groups or APT organizations.

        If a specific attacker is mentioned but not named, use the format as the entity name: Attacker(using: if no attacker is mentioned, specify the malware/threat_tool used). Example: Attacker(using: EternalBlue)

    campaign
    A series of attack actions with specific objectives and a timeframe, such as nation-state hacker campaigns.

    intrusion-set
    A set of attack activities with common goals and tactics, techniques, and procedures (TTPs), usually executed by a specific threat actor.

        If a specific attack is mentioned but not named, use the format as the entity name: Attacking(from: specific attacker) or Attacking(using: if no attacker is mentioned, specify the malware/threat_tool used). Example: Attacking(from: APT1), Attacking(using: EternalBlue)

    malware
    Any software used to disrupt systems, steal data, or perform malicious activities, such as viruses or ransomware.

    hacker-tool
    Software specifically designed for cyberattacks and used by threat actors to execute malicious tasks. This includes hacking tools and penetration testing tools (e.g., Mimikatz, Metasploit) that were originally created with offensive cyber capabilities in mind.

    detailed-part-of-malware/tool
    Descriptions of the internal details of malware or tools, such as function names, component names, or module names.

    general_entity
    Entities related to cyberattacks that are not inherently malicious but are exploited during such activities. This includes components, software, or parts of tools (like functions or modules) that were originally designed for general use but are repurposed in cyberattack scenarios.

    indicator
    Characteristics used to detect or identify malicious activity, such as malicious URLs, IP addresses, and hash values.

    file
    A file object in a computer, used to describe malware or infected files.

    ipv4-addr
    An IPv4 address, which can be used to describe attack origins or C2 servers.

    ipv6-addr
    An IPv6 address, similar to ipv4-addr but used for modern IP protocols.

    domain-name
    A network domain name used to describe malicious websites or C2 servers.

    course-of-action
    Recommended defensive strategies or remediation measures, such as blocking IPs or updating antivirus software.

    url
    A webpage address, usually used to describe phishing sites or malicious links.

    attack-pattern
    Describes the strategies and techniques used by attackers or malware, such as phishing attacks or SQL injection.

    vulnerability
    A security flaw in a computer or software that can be exploited by attackers, such as CVE-2023-1234.

    observed-data
    Recorded events in a network or system, such as file executions or IP access logs.

    location
    Describes geographic locations, such as countries, cities, or organizational addresses, which may be linked to threat actors or attacks.

    identity
    Describes individuals, organizations, or companies, which can be used to identify victims or intelligence sources.

    infrastructure
    Hardware or software resources utilized to facilitate malicious activities, including but not limited to C2 servers, phishing sites, and other attack-enabling components. If the entity has a concrete identifier, such as a URL, IPv4 address, IPv6 address, or domain name, use the corresponding specific type instead of this generic category.

    other
    Any other entities related to cyber threats; you should list the entity under this type if none of the above fits.

Stage 2 – Relationship Extraction and Classification

Stage 2.1 Extract direct and indirect relationships (rel) among entities in the current chunk and classify them into predefined relationship types (rel_type).

    The value of “rel” should be taken from the text of the chunk (simplified if necessary).

    The value of “rel_type” should be chosen from the predefined relationship types provided in the Predefined 'rel_type' supplement. If there is no match, categorize as “other.”
    
    Pay attention! The value of 'rel' should be derived directly from the original text to maintain accuracy and traceability. Only in cases where the original text is too long or complex, you can simplify it, and only if the relationship is not explicitly stated in the text, you can use a 'rel_type' as the 'rel' based on the context. In most cases, the 'rel' should not as the same as the 'rel_type'. 

Stage 2.2 If an entity in the current chunk has a relationship with an entity not directly mentioned in the chunk, but in the full text part(e.g., through indefinite pronouns, chapter references, or context hints), add that outside-chunk entity to the entity list and extract the corresponding relationship. A common example is current chunk has a list of multiple urls/hashs/filename/domains with out any specific relationship, but you should think if the list of entities has a relationship with the topic threat entity. And the listing means the those "entities" are related to the topic threat entity with 'indicates' or 'characterizes' relationship.

Stage 2.3 – Recheck
Ensure that every entity extracted in Stage 1 has at least one corresponding relationship description. There must not be an entity listed without any relationship.
If necessary, infer that the entity might have some indirect relationship with another entity in the chunk. Otherwise, that entity might be related to a main entity in the broader text (e.g., an APT, Malware, Threat Actor, Vulnerability, etc.) that was introduced outside the current chunk. In that case, add the main entity (from outside the chunk) to the current chunk’s entity list and extract their relationship.

Stage 2.4 – Attack Tactic Classification
Based on the MITRE ATT&CK framework, determine which phase(s) of the attack each relationship corresponds to (e.g., Reconnaissance, Initial Access, etc.). One or multiple phases can be assigned; if it cannot be determined or does not apply, use ["other"].

Supplement: Predefined 'rel_type'

    indicates

        Used when an indicator (e.g., IP address) points to an attack pattern or malware, meaning the indicator can be used to detect the threat.

        Example: An IP address (indicator) indicates (indicates) a malware.

    characterizes

        Links observed data to a STIX object, meaning the observed data describes the object's behavior.

        Example: An observed-data characterizes (characterizes) an attack-pattern.

    dynamic-analysis-of

        Represents dynamic analysis of malware.

        Example: A malware-analysis tool dynamic-analysis-of some malware.

    analysis-of

        Represents an analysis based on another object (e.g., malware, tools).

        Example: A malware-analysis tool dynamic-analysis-of some malware or a malware analysis the target's network.

    authored-by

        Indicates a STIX object was created by an identity.

        Example: A malware is developed by a hacker group.

    attributed-to

        Indicates an attack activity, malware, or infrastructure is attributed to a threat actor.

        Example: intrusion-set attributed-to threat-actor.

    controls

        Indicates an entity controls another entity, often used for infrastructure and tools.

        Example: threat-actor controls C2 server.

    exfiltrate-to

        Indicates data was stolen and transmitted to a target.

        Example: malware exfiltrate-to server.

    delivers

        Indicates an attack object (e.g., phishing email) delivered malware or tools.

        Example: attack-pattern delivers malware.

    consists-of

        Indicates an object consists of multiple subcomponents.

        Example: A malware consists-of multiple modules and functions.

    uses

        Indicates an entity uses another entity to conduct attacks.

        Example: threat-actor uses tool.

    owns

        Indicates an identity owns an infrastructure.

        Example: hacker group owns server.

    located-at

        Indicates the geographic location of an entity.

        Example: C2 server located-at a country.

    mitigates

        Indicates a defense measure mitigates an attack.

        Example: patch mitigates vulnerability.

    has

        Indicates an object has a specific component.

        Example: malware has backdoor feature.

    originates-from

        Indicates an attack originates from a location.

        Example: attack-pattern originates-from Russia.

    variant-of

        Indicates an entity is a variant of another entity.

        Example: ransomware variant-of malware family.

    static-analysis-of

        Indicates static analysis of malware.

        Example: malware-analysis static-analysis-of malware.

    investigates

        Indicates an object investigates another object.

        Example: report investigates threat-actor.

    targets

        Indicates the target of an attack.

        Example: malware targets bank.

    compromises

        Indicates an entity compromised a system.

        Example: APT attack compromises government network.

    exploits

        This relationship type is used when one entity exploits or attacks the vulnerability or defect of another entity, or when the target entity itself is a vulnerability-type entity.
    
    other

        If a relationship exists but does not fit into the categories above.

    

Supplement: Predefined 'tactic'
    **Reconnaissance**  
    Before launching an attack, the attacker gathers information about the target—such as organizational structure, public-facing services, and employee details—to support planning.

    **Resource Development**  
    The attacker prepares tools and infrastructure, such as malicious domains, phishing sites, malware, or fake identities.

    **Initial Access**  
    The attacker gains entry into the target system through methods like phishing, exploiting vulnerabilities, or malicious attachments.

    **Execution**  
    Malicious code or scripts are executed on the compromised system to initiate further attack steps.

    **Persistence**  
    Techniques are used to maintain access over time, such as backdoors, configuration changes, or scheduled tasks.

    **Privilege Escalation**  
    The attacker gains higher-level access by exploiting flaws or misconfigurations to obtain admin-level control.

    **Defense Evasion**  
    Security mechanisms are bypassed using tactics like code obfuscation, log deletion, or abuse of legitimate tools.

    **Credential Access**  
    User credentials (e.g., passwords or tokens) are stolen or cracked to expand access within the environment.

    **Discovery**  
    The attacker maps the internal network, systems, and security controls to plan further movement.

    **Lateral Movement**  
    The attacker uses compromised accounts or tools to move across systems within the network.

    **Collection**  
    Sensitive data is gathered from the compromised environment for later use or exfiltration.

    **Command and Control**  
    The attacker communicates with infected systems through external servers to issue commands or extract data.

    **Exfiltration**  
    Stolen data is transmitted out of the target environment to an external system controlled by the attacker.

    **Impact**  
    The attacker causes disruption or damage, such as encrypting data, deleting files, or halting operations.


Stage 3 – Output Generation
[Normalization of Obfuscated URLs, IPs, and Emails]

Obfuscated URLs, IP addresses, and email addresses must be converted to their original format:

    Replace [.] with . in URLs and IPs (e.g., 192[.]168[.]1[.]1 → 192.168.1.1).

    Replace # with @ and [.] with . in emails (e.g., contact#example[.]com → contact@example.com).

    Only apply this to URLs, IPs, and emails—leave other obfuscations unchanged.

Both the entity list and the relationship list must strictly follow the formats below. Ensure the entity names are consistent in both the entity list and the relationships, and use correct JSON formatting:
Part 1: Entity List

    The entire JSON array must be strictly enclosed between #Entity_List_Start# and #Entity_List_End#.

    Each entity node must include the following attributes (all attribute values should be strings or string arrays):

        name: The specific name of the entity.

        type: The category of the entity (each entity node must have only one type value).

        alias: The alias name of the entity mentioned in the text.

            If multiple aliases exist, format as ["Alias1", "Alias2"].

            If there is only one, format as ["Actual_Value"].

            If none, use ["None"].

        mother entity: If the entity is a variant or evolution of another entity, provide the name of its parent entity; otherwise, use ["None"].

Part 2: Entity Relationships

    Extract relationship descriptions between entities from the text and output them as a JSON array of objects, with keys sub, rel, rel_type, tactic, and obj.

    The entire JSON array must be strictly enclosed between #Relationship_List_Start# and #Relationship_List_End#.

    Each relationship object must follow this format:

    {
        "sub": "<Source Entity>",
        "rel": "<Relationship Text>",
        "rel_type": ["<Relationship Type Category>"],
        "tactic": ["<Tactic Category>"],
        "obj": "<Target Entity>"
    }

        sub: Must exactly match the source entity name extracted in Part 1.

        rel: A verb or phrase summarizing the relationship as described in the text (if the original text is long, it can be simplified).

        rel_type: An array listing one or more of the predefined relationship types (e.g., "uses", "targets"). Even if there is only one, it should still be formatted as an array.

        tactic: A string array representing the mapped attack tactic(s). It can contain zero, one, or multiple categories. Use ["other"] if unrecognized.

        obj: Must exactly match the target entity name extracted in Part 1.

Below is an example:

#Entity_List_Start#
```json
[
  { "name": "exampleAPT", "type": "threat-actor", "alias": ["exampleAPTnickname"], "mother entity": ["None"] },
  { "name": "exampleTool", "type": "hacker-tool", "alias": ["None"], "mother entity": ["None"] },
  { "name": "exampleCVE", "type": "vulnerability", "alias": ["None"], "mother entity": ["None"] }
]

#Entity_List_End#

#Relationship_List_Start#

[
  {
    "sub": "exampleAPT",
    "rel": "utilized",
    "rel_type": ["uses"],
    "tactic": ["Lateral Movement"],
    "obj": "exampleTool"
  },
  {
    "sub": "exampleTool",
    "rel": "is using",
    "rel_type": ["exploits"],
    "tactic": ["Execution", "Privilege Escalation"],
    "obj": "exampleCVE"
  },
  {
    "sub": "exampleAPT",
    "rel": "leverages vulnerability",
    "rel_type": ["exploits"],
    "tactic": ["other"],
    "obj": "exampleCVE"
  }
]

#Relationship_List_End#
Now, here is the full text of the article and the current chunk of text. Please extract the entities and relationships according to the above rules:
"""+str(full_text)+"""
The current chunk of text is:
"""+str(chunk_text)+"""
"""
            )
        }
    ]
    return prompt_message

def merger_prompt_maker(merged_text):
    prompt_message = [
        {
            "role": "user",
            "content": (
                """
You are working on merging results from a distributed knowledge graph construction task for cybersecurity threat intelligence. Before your work, a single article was split into multiple chunks, and each chunk has gone through entity and relationship extraction. Now, you need to merge the processing results of these chunks to form a complete knowledge graph.

Your core task: Please merge the results of multiple chunks according to the following strict rules:

### Merging Rules
1. Entity Merging:
   - Consider two entities the same if either:
     a) Their names are identical or semantically equivalent (including case differences).
   - Merging Strategy:
     * Keep the simplest naming format (e.g., "APT28" and "APT28 (Fancy Bear)" merge into the former, while "APT28" adds an attribute `"alias": ["Fancy Bear"]`), and add a relationship:  
       `"sub": "APT28", "rel": "variant of", "rel_type": ["variant-of"], "tactic": ["other"], "obj": "Fancy Bear"`.
     * Merge all alias lists (removing duplicates).
     * Merge all mother entity relationships (keep the most complete evolution chain).

2. Relationship Merging:
   - Only merge relationships if all the following conditions are met:
     a) The subject (sub) is the same after entity merging.
     b) The object (obj) is the same after entity merging.
     c) The relationship description (rel) is exactly the same text.
     d) Differences in 'rel_type' and 'tactic' are ignored for the merging condition.
   - Merging Strategy:
     * Relationships with different rel_type or tactic remain as separate entries or, if merged, preserve their union.
     * Maintain the complete original relationship description.

3. Special Handling:
   - Only perform merging; all unique entities and relationships must be retained.
   - Cross-chunk implied relationships (inferred through mother entities) must be explicitly created.
   - Ensure the final result contains all original information, only eliminating redundant expressions.

### Output Requirements
Maintain the same JSON structure as the original, but:
1. Streamline the entity list according to the merging rules.
2. Streamline the relationship list according to the merging rules.
3. Retain all aliases and mother entity relationships.
4. Keep all distinct relationship descriptions.
5. The tactic field is similar to the rel_type field; preserve all unique values.

Please strictly output the final merged results in the following format:

### [Final Entity List]
#Final_Entity_List_Start#
json
[
  {
    "name": "<standardized name>",
    "type": "<best classification>",
    "alias": ["<original name 1>", "<alias 2>", ...],
    "mother entity": ["<complete mother entity chain>"]
  },
  ...
]
#Final_Entity_List_End#

### [Final Relationship List]
#Final_Relationship_List_Start#
json
[
  {
    "sub": "<merged subject>",
    "rel": "<original relationship description>",
    "rel_type": ["<deduplicated type list>"],
    "tactic": ["<deduplicated tactic list>"],
    "obj": "<merged object>"
  },
  ...
]
#Final_Relationship_List_End#

Below are the entities and relationships I extracted from multiple chunks:

"""+str(merged_text)
                
            )
        }
    ]
    return prompt_message

model_dict = {
    "gpt4t": 'gpt-4-turbo',
    "gpt41": 'gpt-4.1',
    "gpt41mini": 'gpt-4.1-mini',
    "gpt41nano": 'gpt-4.1-nano',
    "gpt4o": 'gpt-4o-2024-11-20',
    "gpt4omini": 'gpt-4o-mini-2024-07-18',
    
    "gpto4mini": "o4-mini",
    "gpto3": "o3",
    "gpto3mini": 'o3-mini-2025-01-31',
    "gpto1": 'o1-2024-12-17',
    "gpto1mini": 'o1-mini-2024-09-12',
    
    "hsr1": 'deepseek-r1-250120',
}

reason_model=['o1','o3','o4']
gpt_reason_model_for_json=['o1','o3','o4']
reason_model_higheffort=['o3-mini','o4-mini']

def tokenlen(text):
    import tiktoken
    encoding = tiktoken.get_encoding("cl100k_base")
    token_integers = encoding.encode(str(text))
    return len(token_integers)

def ask_group_link(prompt_list, model, token=4096, temp=0.6, streamprint=False, max_workers=8, forcegpt=False):
    if 'gpt' in model or 'hs' in model:
        total_length = tokenlen(str(prompt_list))
        runasgpt = False
        if total_length < 10000:
            runasgpt = True
        else:
            if not forcegpt:
                print("The total length of the prompt is too long, total length is:", total_length, "set forcegpt as True")
            if forcegpt:
                runasgpt = True
        if runasgpt:
            results = [None] * len(prompt_list)
            with ThreadPoolExecutor(max_workers=128) as executor:
                futures = {executor.submit(ask, prompt, token, temp, model, streamprint): idx for idx, prompt in enumerate(prompt_list)}
                for future in as_completed(futures):
                    idx = futures[future]
                    try:
                        results[idx] = future.result()
                    except Exception as e:
                        print(f"An error occurred: {e}")
                        results[idx] = None
            return results
    else:
        total_prompts = len(prompt_list)
        results = [None] * total_prompts
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            futures = {executor.submit(ask, prompt, token, temp, model, streamprint): idx for idx, prompt in enumerate(prompt_list)}
            for future in as_completed(futures):
                idx = futures[future]
                try:
                    results[idx] = future.result()
                except Exception as e:
                    print(f"An error occurred: {e}")
                    results[idx] = None
        return results

def ask(prompt, token, temp, model, streamprint=True, mode="text"):
    api_key = "EMPTY"
    if model.startswith("gpt"):
        api_key = "<GPT_API_KEY>"
    if model.startswith("hs"):
        api_key = "<HS_API_KEY>"
    if model.startswith("gpt"):
        api_base = 'https://api.openai.com/v1'
    if '/' in model or 'local' in model:
        api_base = "http://localhost:8000/v1"

    if model == "local":
        client = OpenAI(api_key=api_key, base_url=api_base)
        models = client.models.list()
        match = re.search(r"id='(.*?)', created=", str(models))
        setmodel = match.group(1) if match else None
    else:
        setmodel = model_dict.get(model, '<MODEL_DICT_MISSING_KEY>')
    if streamprint:
        print('Using model:', setmodel)
    if "/" in model or "local" in model or "gpt" in model:
        client = OpenAI(api_key=api_key, base_url=api_base)
        if mode == "image":
            response = client.chat.completions.create(
                messages=prompt,
                model=setmodel,
                max_tokens=token,
                temperature=temp,
            )
            final_response = response.choices[0].message.content
            print(f"Response in image mode: {final_response}")
        if mode == "text":
            if not any(reason in setmodel for reason in reason_model):
                if streamprint:
                    request_params = {
                        "model": setmodel,
                        "messages": prompt,
                        "stream": True,
                        "max_tokens": token,
                        "temperature": temp,
                    }
                    stream = client.chat.completions.create(**request_params)
                    final_response = ""
                    for chunk in stream:
                        if chunk.choices[0].delta.content:
                            print(chunk.choices[0].delta.content, end="")
                            final_response += chunk.choices[0].delta.content
                else:
                    response = client.chat.completions.create(
                        model=setmodel,
                        messages=prompt,
                        temperature=temp,
                        max_tokens=token,
                        stream=False,
                    )
                    final_response = response.choices[0].message.content
            if any(reason in setmodel for reason in reason_model):
                use_high_effort = any(a_model in setmodel for a_model in reason_model_higheffort)
                reasoning_param = {"effort": "high"} if use_high_effort else {}
                response = client.responses.create(
                    model=setmodel,
                    input=prompt,
                    text={"format": {"type": "text"}},
                    reasoning=reasoning_param,
                    tools=[],
                    max_output_tokens=token,
                    top_p=1,
                    store=False
                )
                final_response = response.output[1].content[0].text
    if "hsr1" in model:
        client = Ark(
            api_key=api_key,
            timeout=1800,
        )
        response = client.chat.completions.create(
            model='deepseek-r1-250120',
            messages=prompt,
            temperature=temp,
            max_tokens=token,
        )
        reasoning_content = getattr(response.choices[0].message, 'reasoning_content', "")
        formatted_reasoning = f"<think>{reasoning_content}</think>" if reasoning_content else ""
        final_response = f"{formatted_reasoning}{response.choices[0].message.content}"
    if '/' in setmodel:
        setmodel = setmodel.split('/')[-1]
    record = {
        "prompt": str(prompt),
        "setmodel": setmodel,
        "temp": temp,
        "max_tokens": token,
        "response": final_response
    }
    now = datetime.now()
    date_str = now.strftime("%y%m%d")
    prompt_str = re.sub(r'\W+', '', str(prompt))[:30]
    folder_path = f"<HISTORY_DIR>/{setmodel}/{date_str}"
    os.makedirs(folder_path, exist_ok=True)
    max_digits = 100
    current_digits = 10
    while current_digits <= max_digits:
        random_suffix = f"{random.randint(10**(current_digits-1), 10**current_digits - 1)}"
        file_name = f"{prompt_str}RANDOMKEY{random_suffix}.pkl"
        file_path = os.path.join(folder_path, file_name)
        if not os.path.exists(file_path):
            with open(file_path, 'wb') as f:
                pickle.dump(record, f)
            break
        current_digits += 1
    if current_digits > max_digits:
        raise RuntimeError("Cannot generate a unique filename; all options exhausted.")
    return final_response

def process_article(full_text, mode, server):
    from langchain_text_splitters import RecursiveCharacterTextSplitter
    try:
        max_workers_forchunk = 1
        if mode == 'o':
            max_workers_forchunk = 16
    except Exception as e:
        print("process_article: Error in task mode configuration:", str(e))
        return None
    try:
        text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
            model_name="gpt-4",
            chunk_size=400,
            chunk_overlap=40
        )
        chunks = text_splitter.split_text(full_text)
    except Exception as e:
        print("process_article: Error in text splitting:", str(e))
        return None
    all_prompts = []
    try:
        for chunk in chunks:
            try:
                print("Chunk length:", len(chunk), tokenlen(chunk))
                all_prompts.append(chunk_prompt_maker((full_text, chunk)))
            except Exception as inner_e:
                print("process_article: Error generating prompt:", str(inner_e))
                return None
    except Exception as e:
        print("process_article: Error iterating prompt generation:", str(e))
        return None
    try:
        ans = ask_group_link(
            all_prompts,
            token=32 * 1024,
            temp=0.7,
            model="local",
            streamprint=False,
            max_workers=max_workers_forchunk,
            forcegpt=True
        )
        if not ans or len(ans) == 0:
            print("process_article: Error in getting chunk responses: returned empty list")
            return None
    except Exception as e:
        print("process_article: Error in getting chunk responses:", str(e))
        return None
    processed_answers = []
    try:
        for idx, response in enumerate(ans):
            try:
                parts = response.split("</think>")
                chunk_content = parts[1] if len(parts) > 1 else response
                chunk_wrapper = [
                    f"\n\n[Chunk{idx}_START]",
                    chunk_content.strip(),
                    f"[Chunk{idx}_END]\n"
                ]
                processed_answers.append("\n".join(chunk_wrapper))
            except Exception as inner_e:
                print(f"process_article: Error processing chunk {idx} response:", str(inner_e))
                return None
    except Exception as e:
        print("process_article: Error iterating processed chunk responses:", str(e))
        return None
    try:
        merged_chunks_text = "".join([
            f"/* TOTAL {len(processed_answers)} CHUNKS START: */\n",
            *processed_answers,
            "\n/* END OF MERGED CHUNKS */"
        ])
    except Exception as e:
        print("process_article: Error merging chunk responses:", str(e))
        return None
    try:
        merger_prompt = merger_prompt_maker(merged_chunks_text)
        merger_prompt_list = [merger_prompt]
        final_ans = ask_group_link(
            merger_prompt_list,
            token=32 * 1024,
            temp=0.7,
            model="local",
            streamprint=False,
            max_workers=1,
            forcegpt=True
        )
        if not final_ans or len(final_ans) == 0:
            print("process_article: Error in getting final response: returned empty list")
            return None
    except Exception as e:
        print("process_article: Error in getting final response:", str(e))
        return None
    try:
        if "</think>" in final_ans[0]:
            parts = final_ans[0].split("</think>")
            if len(parts) > 1:
                final_ans[0] = parts[1]
    except Exception as e:
        print("process_article: Error processing final response delimiter:", str(e))
        return None
    return final_ans[0]
