TokenProbs

Extract token-level probability scores from generative language models (GLMs) without fine-tuning. Often times, it is relevent to request probability assessment to binary or multi-class outcomes. GLMs are not well-suited for this task. Instead, use LogitExtractor to obtain label probabilities without fine-tuning.

Installation

Install with pip:

conda create -n TokenProbs python=3.11 # Note: not available for 3.13
conda activate TokenProbs
pip3 install TokenProbs

Install via Github Repository:

conda create -n TokenProbs python=3.12 # Note: not available for 3.13
conda activate TokenProbs

git clone https://github.com/francescoafabozzi/TokenProbs.git
cd TokenProbs
pip3 install -e . # Install in editable mode

Usage

See examples/FinancialPhrasebank.ipynb for an example of using LogitExtractor to extract token-level probabilities for a sentiment classification task.

from TokenProbs import LogitExtractor

extractor = LogitExtractor(
    model_name = 'mistralai/Mistral-7B-Instruct-v0.1',
    quantization="8bit" # None = Full precision, "4bit" also suported
)

# Test sentence
sentence = "AAPL shares were up in morning trading, but closed even on the day."

# Prompt sentence
prompt = \
"""Instructions: What is the sentiment of this news article? Select from {positive/neutral/negative}.
\nInput: %text_input
Answer:"""

prompted_sentence = prompt.replace("%text_input",sentence)

# Provide tokens to extract (can be TokenIDs or strings)
pred_tokens = ['positive','neutral','negative']


# Extract normalized token probabilities
probabilities = extractor.logit_extraction(
    input_data = prompted_sentence,
    tokens = pred_tokens,
    batch_size=1
)

print(f"Probabilities: {probabilities}")
Probabilities: {'positive': 0.7, 'neutral': 0.2, 'negative': 0.1}

# Compare to text output
text_output = extractor.text_generation(input_data,batch_size=1)

Trouble Shooting Installation

Import Errors due to torch

If recieving import errors due to torch, specific torch version may be required. Follow the steps below:

Step 1: Identify the CUDA versions (for GPU users):

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

In this case, the CUDA version is 12.3.

Step 2: Navigate to the Pytorch website and select the version that matches the CUDA version.

There is no cuda version for 12.3, so select torch CUDA download < 12.3 (i.e., 12.1)

Step 3: Pip uninstall torch and download with the correct version:

pip3 uninstall torch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Issues with bitsandbytes

If recieving CUDA Setup failed despite GPU being available. error, identify the location of the cuda driver, typically found under /usr/local/ and input the following commands via the command line. The example below shows this for cuda-12.3.:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.3 # change 12.3 to appropriate location
export BNB_CUDA_VERSION=123 # 123 (i.e., 12.3) also needs to be changed

Troubling Shooting for CPU users

torch is compatible with CPU, but requires numpy version < 2.0. Thus, to run via CPU:

pip3 uninstall numpy
pip3 install numpy==1.26.4

Trouble Shooting for gated repositories

If recieving errors due to gated repositories, you must login to huggingface via the command line interface (CLI):

huggingface-cli login # You will be prompted for your access token
# Then try to run again

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
TokenProbs		TokenProbs
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TokenProbs

Installation

Usage

Trouble Shooting Installation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TokenProbs

Installation

Usage

Trouble Shooting Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages