In [2]:
from gliner import GLiNER

model = GLiNER.from_pretrained("urchade/gliner_medium-v2.1")

text = "While Musk pushes for rapid expansion, analysts worry about margin compression and demand."

# Define CUSTOM labels on the fly
labels = ["Person", "Business Metric", "Strategic Move"]

entities = model.predict_entities(text, labels)

for entity in entities:
    print(f"{entity['text']} => {entity['label']}")

# Output:
# Musk => Person
# rapid expansion => Strategic Move
# margin compression => Business Metric
# demand => Business Metric

  from .autonotebook import tqdm as notebook_tqdm
Fetching 5 files: 100%|██████████| 5/5 [02:06<00:00, 25.40s/it]
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Musk => Person
analysts => Person
margin compression => Business Metric
demand => Business Metric


In [None]:
new_text = """Air Force One returned to Joint Base Andrews, an air base in Maryland, out of an abundance of caution, Press Secretary Karoline Leavitt said. It landed shortly after 11 pm (local time), after about an hour and 20 minutes in the air.
Journalists travelling with Trump reported that the lights in the cabin went out briefly after takeoff, as per the news agency AFP.
Trump and his entourage resumed their trip to the World Economic Forum after switching to another plane. Trump took off two-and-a-half hours after his initial departure. He is scheduled to arrive on Wednesday and leave on Thursday."""
labels = ['person','place','object','']
#labels = ["Person", "Business Metric", "Strategic Move"]
entities = model.predict_entities(new_text, labels)

for entity in entities:
    print(f"{entity['text']} => {entity['label']}")

Air Force One => object
Joint Base Andrews => place
Maryland => place
Press Secretary Karoline Leavitt => person
Trump => person
Trump => person
World Economic Forum => place
Trump => person


 This is interesting, different labels are leading it to classify differently.

In [15]:
text = """ U.S. President Donald Trump barrels into Davos, Switzerland, on Wednesday, where he is likely to escalate his push for acquiring Greenland despite European protests in the biggest fraying of transatlantic ties in decades.
Trump, who marked the end of his turbulent first year in office on Tuesday, is expected to overshadow the annual World Economic Forum (WEF) gathering where global elites discuss economic and political trends in the Swiss mountain resort.
"""
labels = ['person','place','object','time','political event','meeting']
#labels = ["Person", "Business Metric", "Strategic Move"]
entities = model.predict_entities(text, labels)

for entity in entities:
    print(f"{entity['text']} => {entity['label']}")

Donald Trump => person
Davos => place
Switzerland => place
Wednesday => time
Greenland => place
European protests => political event
Tuesday => time
World Economic Forum => meeting


In [18]:
text = """ U.S. President Donald Trump barrels into Davos, Switzerland, on Wednesday, where he is likely to escalate his push for acquiring Greenland despite European protests in the biggest fraying of transatlantic ties in decades.
Trump, who marked the end of his turbulent first year in office on Tuesday, is expected to overshadow the annual World Economic Forum (WEF) gathering where global elites discuss economic and political trends in the Swiss mountain resort.
"""
labels = ['Donald Trump','Davos','World Economic Forum','European Protests','Switzerland']
entities = model.predict_entities(text, labels)

for entity in entities:
    print(f"{entity['text']} => {entity['label']}")

Donald Trump => Donald Trump
Davos => Davos
Switzerland => Switzerland
European protests => European Protests
World Economic Forum => World Economic Forum
WEF => World Economic Forum


# coreference resolution

In [1]:
%pip install fastcoref

Collecting fastcoref
  Downloading fastcoref-2.1.6.tar.gz (27 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting scipy>=1.7.3 (from fastcoref)
  Downloading scipy-1.17.0-cp313-cp313-macosx_14_0_arm64.whl.metadata (62 kB)
Collecting spacy>=3.0.6 (from fastcoref)
  Downloading spacy-3.8.11-cp313-cp313-macosx_11_0_arm64.whl.metadata (27 kB)
Collecting datasets>=2.5.2 (from fastcoref)
  Downloading datasets-4.5.0-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.4.1,>=0.3.0 (from datasets>=2.5.2->fastcoref)
  Downloading dill-0.4.0-py3-none-any.whl.metadata (10 kB)
Collecting multiprocess<0.70.19 (from datasets>=2.5.2->fastcoref)
  Downloading multiprocess-0.70.18-py313-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2025.10.0,>=2023.1.0 (from fsspec[http]<=2025.10.0,>=2023.1.0->datasets>=2.5.2->fastcoref)
  Downloading fsspec-2025.10.0-py3-none-any.whl.metad

In [3]:
from fastcoref import FCoref

# 1. Load the model (optimized for speed)
model = FCoref(device='cpu') # Use 'cuda:0' if you have a GPU

text = "U.S. President Donald Trump barrels into Davos, Switzerland, on Wednesday, where he is likely to escalate his push for acquiring Greenland despite European protests in the biggest fraying of transatlantic ties in decades."

# 2. Predict Clusters
preds = model.predict(texts=[text])
clusters = preds[0].get_clusters(as_strings=True)

# Output: [['U.S. President Donald Trump', 'he', 'his']]
print(f"Detected Entity Chain: {clusters[0]}")

01/23/2026 14:19:27 - INFO - 	 missing_keys: []
01/23/2026 14:19:27 - INFO - 	 unexpected_keys: []
01/23/2026 14:19:27 - INFO - 	 mismatched_keys: []
01/23/2026 14:19:27 - INFO - 	 error_msgs: []
01/23/2026 14:19:27 - INFO - 	 Model Parameters: 90.5M, Transformer: 82.1M, Coref head: 8.4M
01/23/2026 14:19:27 - INFO - 	 Tokenize 1 inputs...
Map: 100%|██████████| 1/1 [00:00<00:00, 154.99 examples/s]
01/23/2026 14:19:27 - INFO - 	 ***** Running Inference on 1 texts *****
Inference: 100%|██████████| 1/1 [00:00<00:00, 28.37it/s]

Detected Entity Chain: ['U.S. President Donald Trump', 'he', 'his']





In [9]:
from fastcoref import LingMessCoref

# Load the SOTA model (LingMess) instead of the distilled one
model = LingMessCoref(device='cpu')
text = """U.S. President Donald Trump barrels into Davos, Switzerland, on Wednesday, where he is likely to escalate his push for acquiring Greenland despite European protests in the biggest fraying of transatlantic ties in decades.
Trump, who marked the end of his turbulent first year in office on Tuesday, is expected to overshadow the annual World Economic Forum (WEF) gathering where global elites discuss economic and political trends in the Swiss mountain resort.
"""
# 2. Predict Clusters
preds = model.predict(texts=[text])
clusters = preds[0].get_clusters(as_strings=True)
print(f"Detected Entity Chain: {clusters}")

ValueError: LongformerModel does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please request the support for this architecture: https://github.com/huggingface/transformers/issues/28005. If you believe this error is a bug, please open an issue in Transformers GitHub repository and load your model with the argument `attn_implementation="eager"` meanwhile. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="eager")`

In [None]:
from fastcoref import LingMessCoref

model = LingMessCoref(device='cpu')
text = """U.S. President Donald Trump barrels into Davos, Switzerland, on Wednesday, where he is likely to escalate his push for acquiring Greenland despite European protests in the biggest fraying of transatlantic ties in decades.
Trump, who marked the end of his turbulent first year in office on Tuesday, is expected to overshadow the annual World Economic Forum (WEF) gathering where global elites discuss economic and political trends in the Swiss mountain resort.
"""
preds = model.predict(texts=[text])
clusters = preds[0].get_clusters(as_strings=True)
print(f"Detected Entity Chain: {clusters}")

In [13]:
from fastcoref import FCoref
from gliner import GLiNER



text = "U.S. President Donald Trump barrels into Davos, Switzerland, on Wednesday, where he is likely to escalate his push for acquiring Greenland despite European protests in the biggest fraying of transatlantic ties in decades."

model = GLiNER.from_pretrained("urchade/gliner_medium-v2.1")
labels = ['person','place','object']
#labels = ["Person", "Business Metric", "Strategic Move"]
entities = model.predict_entities(text, labels)
# 1. Load the model (optimized for speed)
model = FCoref(device='cpu') # Use 'cuda:0' if you have a GPU
preds = model.predict(texts=[text])
clusters = preds[0].get_clusters(as_strings=True)

print(entities)
print(clusters)

Fetching 5 files: 100%|██████████| 5/5 [00:00<00:00, 54899.27it/s]
01/23/2026 14:40:54 - INFO - 	 Loading the following GLiNER type: <class 'gliner.model.UniEncoderSpanGLiNER'>...
01/23/2026 14:41:01 - INFO - 	 missing_keys: []
01/23/2026 14:41:01 - INFO - 	 unexpected_keys: []
01/23/2026 14:41:01 - INFO - 	 mismatched_keys: []
01/23/2026 14:41:01 - INFO - 	 error_msgs: []
01/23/2026 14:41:01 - INFO - 	 Model Parameters: 90.5M, Transformer: 82.1M, Coref head: 8.4M
01/23/2026 14:41:01 - INFO - 	 Tokenize 1 inputs...
Map: 100%|██████████| 1/1 [00:00<00:00, 166.61 examples/s]
01/23/2026 14:41:02 - INFO - 	 ***** Running Inference on 1 texts *****
Inference: 100%|██████████| 1/1 [00:00<00:00, 29.69it/s]

[{'start': 15, 'end': 27, 'text': 'Donald Trump', 'label': 'person', 'score': 0.9866982102394104}, {'start': 41, 'end': 46, 'text': 'Davos', 'label': 'place', 'score': 0.9498966932296753}, {'start': 48, 'end': 59, 'text': 'Switzerland', 'label': 'place', 'score': 0.8984706401824951}, {'start': 129, 'end': 138, 'text': 'Greenland', 'label': 'place', 'score': 0.5037164092063904}]
[['U.S. President Donald Trump', 'he', 'his']]





In [1]:
%pip install setfit

Collecting setfit
  Downloading setfit-1.1.3-py3-none-any.whl.metadata (12 kB)
Collecting sentence-transformers>=3 (from sentence-transformers[train]>=3->setfit)
  Using cached sentence_transformers-5.2.0-py3-none-any.whl.metadata (16 kB)
Collecting evaluate>=0.3.0 (from setfit)
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Collecting scikit-learn (from setfit)
  Downloading scikit_learn-1.8.0-cp313-cp313-macosx_12_0_arm64.whl.metadata (11 kB)
Collecting accelerate>=0.20.3 (from sentence-transformers[train]>=3->setfit)
  Downloading accelerate-1.12.0-py3-none-any.whl.metadata (19 kB)
Collecting threadpoolctl>=3.2.0 (from scikit-learn->setfit)
  Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Downloading setfit-1.1.3-py3-none-any.whl (75 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
Using cached sentence_transformers-5.2.0-py3-none-any.whl (493 kB)
Downloading accelerate-1.12.0-py3-none-any.whl (380 kB)
Downloading scikit_learn-1.8.0-cp313-

In [None]:
from datasets import Dataset
from setfit import SetFitModel, SetFitTrainer
from sentence_transformers.losses import CosineSimilarityLoss

# 1. Prepare your data (Simulated output from your GLiNER/Coref pipeline)
# Note: You likely have this in a Pandas DataFrame already.
data = [
    {"text": "Target: Interest Rates | Text: The fed's decision to hike rates is necessary to curb inflation.", "label": "For"},
    {"text": "Target: Interest Rates | Text: Higher rates are going to strangle the housing market completely.", "label": "Against"},
    {"text": "Target: AI Regulation | Text: We need strict safety guardrails before deploying these models.", "label": "For"},
    {"text": "Target: AI Regulation | Text: Over-regulation will only stifle innovation in the tech sector.", "label": "Against"},
    # ... add a few more examples per class ...
]

# Convert to Hugging Face Dataset
dataset = Dataset.from_list(data)

# Map labels to integers
label_mapping = {"Against": 0, "Neutral": 1, "For": 2}
def encode_labels(record):
    return {"label": label_mapping[record["label"]]}

dataset = dataset.map(encode_labels)

# 2. Load a Sentence Transformer model
# 'paraphrase-mpnet-base-v2' is excellent for semantic understanding, 
# but for financial/news specific text, you might later try 'sentence-transformers/all-MiniLM-L6-v2' for speed.
model = SetFitModel.from_pretrained(
    "sentence-transformers/paraphrase-mpnet-base-v2",
    labels=["Against", "Neutral", "For"]
)

# 3. Initialize Trainer
trainer = SetFitTrainer(
    model=model,
    train_dataset=dataset,
    loss_class=CosineSimilarityLoss, # The magic of SetFit: Contrastive Learning
    batch_size=16,
    num_iterations=20, # Generates 20 pairs per sentence for contrastive learning
    num_epochs=1
)

# 4. Train
trainer.train()

# 5. Inference (Simulating your pipeline)
target = "Donald Trump"
sentence = "U.S. President Donald Trump barrels into Davos, Switzerland, on Wednesday, where he is likely to escalate his push for acquiring Greenland despite European protests in the biggest fraying of transatlantic ties in decades."
formatted_input = f"Target: {target} | Text: {sentence}"

preds = model([formatted_input])
print(f"Stance on '{target}': {preds[0]}")
# Output: Stance on 'Crypto Ban': Against

Map: 100%|██████████| 4/4 [00:00<00:00, 773.25 examples/s]
model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.
  trainer = SetFitTrainer(
Map: 100%|██████████| 4/4 [00:00<00:00, 1582.16 examples/s]
***** Running training *****
  Num unique pairs = 160
  Batch size = 16
  Num epochs = 1


Step,Training Loss
1,0.1733


Stance on 'Crypto Ban': Against


In [4]:
# 5. Inference (Simulating your pipeline)
target = "Donald Trump"
sentence = "U.S. President Donald Trump barrels into Davos, Switzerland, on Wednesday, where he is likely to escalate his push for acquiring Greenland despite European protests in the biggest fraying of transatlantic ties in decades."
formatted_input = f"Target: {target} | Text: {sentence}"

preds = model([formatted_input])
print(f"Stance on '{target}': {preds[0]}")

Stance on 'Donald Trump': For


# sample article summary:
Recovery efforts are underway after Hurricane Melissa left a path of devastation in the Caribbean this week.
The United Nations said the damage in Jamaica, where the storm made landfall on Tuesday (October 28) as a Category 5 hurricane, was on a level "never seen before." Cuba is also reported to be calculating cost of damages after homes collapsed and blocked roads, with an estimated 735,000 people reported to be in shelters and the full extent of damage undetermined.
At least 31 people have died in relation to Hurricane Melissa's devastation across several countries. At least 25 people died and several remain trapped in homes in Petit-Goáve, Haiti, after a river was flooded by the powerful storm, Mayor Jean Bertrans Subrème told the Associated Press.
“I am overwhelmed by the situation,” Subrème said, adding that he’d requested assistance from the government.
At least three other deaths, including two caused by a landslide, were also reported in Haiti in relation to Hurricane Melissa, the Haitian Civil Protection Agency confirmed in a statement. At least one person has died in the Dominican Republic, according to officials, who confirmed more than 1,000 others were evacuated or displaced via CNN.
Melissa made landfall in Cuba Wednesday (October 29) morning as an "extremely dangerous" Category 3, the National Hurricane Center in Miami confirmed via NBC News. The storm previously made landfall in Jamaica on Tuesday as a Category 5 at maximum sustained winds of 185 MPH, which tied with the Labor Day Hurricane of 1935 and Hurricane Dorian in 2019 in the Caribbean and the second-highest wind speed recorded in the Atlantic, behind only Hurricane Allen in 1980.
Severe flooding was reported as heavy rains and strong winds hit the province, with more than 750,000 residents had evacuated their homes across the country. The storm was downgraded to Category 4 at 4:00 p.m. ET on Tuesday and a Category 3 early Wednesday morning.
Jamaica was reported to have "suffered major impact" after the hurricane made landfall, with at least two or three hospitals suffering severe damage and housing expected to be "severely impacted" in the storm's path, Prime Minister Andrew Holness said via NBC News.

so now, we figure out all the entities in this summary
and we need to map the stance on each entity as pos neg or neutral 

so plan of action is, 
there's 3 ways of doing this
1. sentence by sentence
2. paragraph by paragraph
3. as an entire passage. 

we'll go backwards. 

1. we need to train our SETFIT stance model to completion v1
with sufficient models 

2. pass entire passage

3. chunk passage into paragraphs and run it

4. chunk each paragraph into sentences and chunk them, 

5. mix and match?

# Training SETFIT

In [None]:
ß