Skip to content

francescoafabozzi/TokenProbs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TokenProbs

Extract token-level probability scores from generative language models (GLMs) without fine-tuning. Often times, it is relevent to request probability assessment to binary or multi-class outcomes. GLMs are not well-suited for this task. Instead, use LogitExtractor to obtain label probabilities without fine-tuning.

Installation

Install with pip:

conda create -n TokenProbs python=3.11 # Note: not available for 3.13
conda activate TokenProbs
pip3 install TokenProbs 

Install via Github Repository:

conda create -n TokenProbs python=3.12 # Note: not available for 3.13
conda activate TokenProbs

git clone https://github.com/francescoafabozzi/TokenProbs.git
cd TokenProbs
pip3 install -e . # Install in editable mode 

Usage

See examples/FinancialPhrasebank.ipynb for an example of using LogitExtractor to extract token-level probabilities for a sentiment classification task.

from TokenProbs import LogitExtractor

extractor = LogitExtractor(
    model_name = 'mistralai/Mistral-7B-Instruct-v0.1',
    quantization="8bit" # None = Full precision, "4bit" also suported
)

# Test sentence
sentence = "AAPL shares were up in morning trading, but closed even on the day."

# Prompt sentence
prompt = \
"""Instructions: What is the sentiment of this news article? Select from {positive/neutral/negative}.
\nInput: %text_input
Answer:"""

prompted_sentence = prompt.replace("%text_input",sentence)

# Provide tokens to extract (can be TokenIDs or strings)
pred_tokens = ['positive','neutral','negative']


# Extract normalized token probabilities
probabilities = extractor.logit_extraction(
    input_data = prompted_sentence,
    tokens = pred_tokens,
    batch_size=1
)

print(f"Probabilities: {probabilities}")
Probabilities: {'positive': 0.7, 'neutral': 0.2, 'negative': 0.1}

# Compare to text output
text_output = extractor.text_generation(input_data,batch_size=1)

Trouble Shooting Installation

Import Errors due to torch

If recieving import errors due to torch, specific torch version may be required. Follow the steps below:

Step 1: Identify the CUDA versions (for GPU users):

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

In this case, the CUDA version is 12.3.

Step 2: Navigate to the Pytorch website and select the version that matches the CUDA version.

There is no cuda version for 12.3, so select torch CUDA download < 12.3 (i.e., 12.1)

Step 3: Pip uninstall torch and download with the correct version:

pip3 uninstall torch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Issues with bitsandbytes

If recieving CUDA Setup failed despite GPU being available. error, identify the location of the cuda driver, typically found under /usr/local/ and input the following commands via the command line. The example below shows this for cuda-12.3.:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.3 # change 12.3 to appropriate location
export BNB_CUDA_VERSION=123 # 123 (i.e., 12.3) also needs to be changed

Troubling Shooting for CPU users

torch is compatible with CPU, but requires numpy version < 2.0. Thus, to run via CPU:

pip3 uninstall numpy
pip3 install numpy==1.26.4

Trouble Shooting for gated repositories

If recieving errors due to gated repositories, you must login to huggingface via the command line interface (CLI):

huggingface-cli login # You will be prompted for your access token
# Then try to run again

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages