## Smolagents for Financial Name Entity Recognition

#### 1. First install the packages. 

In [1]:
%pip install gliner smolagents huggingface_hub



Collecting sympy==1.13.1 (from torch>=2.0.0->gliner)
  Using cached sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Using cached sympy-1.13.1-py3-none-any.whl (6.2 MB)
[0mInstalling collected packages: sympy
[0mSuccessfully installed sympy
Note: you may need to restart the kernel to use updated packages.


#### 2. Load the libraries we need for the script to work

In [56]:
from smolagents import CodeAgent, HfApiModel
from huggingface_hub import login, InferenceClient 

In [3]:
import torch, json, os, re, time, random, math
import numpy as np
import pickle

from collections import defaultdict
from datetime import datetime

In [4]:
import math, h5py

import scipy
from PIL import Image
from scipy import ndimage

import pandas as pd
import os

In [15]:
import json,os
import nltk
from nltk import tokenize
import torch

#### 3. Download the Punkt tokenizer model from the Natural Language Toolkit (NLTK) library

In [16]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /home/moebius/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

#### 4. Download the FiNER/139 data set. "finer_tag_names" will have all the categories

In [40]:
import datasets

finer_train = datasets.load_dataset("nlpaueb/finer-139", split="train")

finer_tag_names = finer_train.features["ner_tags"].feature.names

In [62]:
text="""Google combines Maps and Waze teams as pressures mount to cut costs
        Rebecca Bellan		
        Google plans to combine the teams working on its Maps product and on Waze, a mapping service that Google acquired in 2013. The merger comes as the search engine giant feels the pressure to cut costs and consolidate operations, reports The Wall Street Journal.
        Waze’s team of 500 employees will fall under Google’s Geo organization, which oversees Maps, Earth and Street View, starting Friday. Neha Parikh, Waze’s current CEO, will leave her role.
        Google told WSJ it plans to keep Waze as a standalone service — Waze is known for its crowdsourcing of en route information like locations of speed cameras, cop cars and roadkill.
        Google also said it didn’t expect any layoffs as part of the reorganization. However, layoffs abound in the tech world, whether you’re a startup or an Amazon. And they often hit the hardest where there are redundancies between teams. Indeed, Google said it expects the restructuring of the different mapping services to reduce overlap in mapmaking.
        Alphabet and Google CEO Sundar Pichai has said he hopes to make Google 20% more productive by running “on fewer resources.” Speaking at Code Conference in September, the executive said the company had become slower due to overhiring and seemed to hint that merging teams that work on overlapping products would help the company stay on top.
        """

In [66]:
# This should just print the above text.
print(text)

Google combines Maps and Waze teams as pressures mount to cut costs
        Rebecca Bellan		
        Google plans to combine the teams working on its Maps product and on Waze, a mapping service that Google acquired in 2013. The merger comes as the search engine giant feels the pressure to cut costs and consolidate operations, reports The Wall Street Journal.
        Waze’s team of 500 employees will fall under Google’s Geo organization, which oversees Maps, Earth and Street View, starting Friday. Neha Parikh, Waze’s current CEO, will leave her role.
        Google told WSJ it plans to keep Waze as a standalone service — Waze is known for its crowdsourcing of en route information like locations of speed cameras, cop cars and roadkill.
        Google also said it didn’t expect any layoffs as part of the reorganization. However, layoffs abound in the tech world, whether you’re a startup or an Amazon. And they often hit the hardest where there are redundancies between teams. Indeed, Google

#### 5. Define the labels (entity types) you want to recognize

In [69]:
labels = ["Source","Financial Metric", "Location", "Date","Organization","Person", "Product", "Percentage", "Monetary Value", "Monetary Value","Duration"]


#### 6. GLiNER is a way to use ModernBert for NER

In [65]:
from gliner import GLiNER

# Load the pre-trained GLiNER model with ModernBERT as the backbone
model = GLiNER.from_pretrained("knowledgator/modern-gliner-bi-large-v1.0")


# Predict entities in the text
entities = model.predict_entities(text, labels, threshold=0.3)

# Display the recognized entities
for entity in entities:
    print(f"{entity['text']} => {entity['label']}")


Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Google => Organization
Maps => Product
Waze => Product
Rebecca Bellan => Person
Google => Organization
Maps => Product
Waze => Product
Google => Organization
The Wall Street Journal => Source
Google => Organization
Maps => Product
Earth => Product
Street View => Product
Neha Parikh => Person
Google => Organization
Google => Organization
Google => Organization
Alphabet => Organization
Google => Organization
Sundar Pichai => Person
Google => Organization
20% => Percentage
September => Date


#### 7. Initialize the SmolAgent on Hugging Face

In [24]:
hf_token = "<YOUR TOKEN>"

login(hf_token,add_to_git_credential=False) 

#### 8. The Qwen Model is a good start for NER

In [49]:
repo_id = "Qwen/Qwen2.5-Coder-32B-Instruct"
llm_engine = HfApiModel(model_id=repo_id, provider="together", timeout=3000)  

#### 9. Setup your agent. If you are wondering why its a CodeAgent check out the GAIA multi-agent flow from Huggingface

In [55]:
agent = CodeAgent(tools=[], model=llm_engine)
agent.run(f"Make a list of entities and explain their context in 1-2 sentences for all of these entites {labels} in this test {text}. Reason step by step what the result of this action could be and add it do the output dataset.")



{'Source': 'TechCrunch',
 'Financial Metric': 'Productivity',
 'Location': "Not explicitly mentioned, but likely refers to Google's headquarters or offices",
 'Date': "10 months ago (from the article's publication date)",
 'Organization': "Google, Waze, Google's Geo organization",
 'Person': 'Neha Parikh, Sundar Pichai',
 'Product': 'Google Maps, Waze',
 'Percentage': '20%',
 'Monetary Value': 'Not explicitly mentioned',
 'Duration': 'Not explicitly mentioned, but the reorganization starts on Friday'}

#### 10 . That is it already