Skip to content

HughieHu/FinTrust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

Introduction

FinTrust is the first comprehensive benchmark designed specifically to evaluate the trustworthiness of Large Language Models (LLMs) in financial applications. As finance is a high-stakes domain with strict trustworthy standards, our benchmark provides a systematic framework to assess LLMs across seven critical dimensions: trustfulness, robustness, safety, fairness, privacy, transparency, and knowledge discovery.

Our benchmark comprises 15,680 answer pairs spanning textual, tabular, and time-series data. Unlike existing benchmarks that primarily focus on task completion, FinTrust evaluates alignment issues in practical contexts with fine-grained tasks for each dimension of trustworthiness.

Dataset Access

📊 HuggingFace Full Dataset: [https://huggingface.co/datasets/HughieHu/FinTrust]

Repository Structure

This repository contains the following directories:

  • fairness/: Evaluates models' ability to provide unbiased responses
  • knowledge_discovery/: Tests models' capability to uncover non-trivial investment insights
  • privacy/: Assesses resistance to information leakage
  • robustness/: Examines models' resilience and ability to abstain when confidence is low
  • safety/: Tests handling of various LLM attack strategies with financial crime scenarios
  • transparency/: Evaluates disclosure of limitations and potential conflicts of interest
  • trustfulness/: Measures models' accuracy and factuality in financial contexts

Each directory contains:

  • api_call.py: Script to call the model API and generate responses
  • A sample dataset of 100 test cases
  • postprocess_response.py: Script to process and evaluate model responses

Usage Instructions

  1. First, create a .env file in the root directory with the following parameters:

    PROMPT_JSON_PATH=""  # Path to the sample test data
    MODEL_KEY=""         # Model name (must match a key in MODEL_CATALOG)
    MAX_PARALLEL=""      # Number of parallel API calls
    OPENAI_API_KEY=""    # Your OpenAI API key
    TOGETHER_API_KEY=""  # Your Together API key
    
  2. Run the API call script to generate responses:

    python [dimension]/api_call.py
    
  3. Process the responses to get evaluation results:

    python [dimension]/postprocess_response.py
    

Note: The MODEL_KEY should correspond to a key in the MODEL_CATALOG dictionary defined in each api_call.py file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages