FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

Introduction

FinTrust is the first comprehensive benchmark designed specifically to evaluate the trustworthiness of Large Language Models (LLMs) in financial applications. As finance is a high-stakes domain with strict trustworthy standards, our benchmark provides a systematic framework to assess LLMs across seven critical dimensions: trustfulness, robustness, safety, fairness, privacy, transparency, and knowledge discovery.

Our benchmark comprises 15,680 answer pairs spanning textual, tabular, and time-series data. Unlike existing benchmarks that primarily focus on task completion, FinTrust evaluates alignment issues in practical contexts with fine-grained tasks for each dimension of trustworthiness.

Dataset Access

📊 HuggingFace Full Dataset: [https://huggingface.co/datasets/HughieHu/FinTrust]

Repository Structure

This repository contains the following directories:

fairness/: Evaluates models' ability to provide unbiased responses
knowledge_discovery/: Tests models' capability to uncover non-trivial investment insights
privacy/: Assesses resistance to information leakage
robustness/: Examines models' resilience and ability to abstain when confidence is low
safety/: Tests handling of various LLM attack strategies with financial crime scenarios
transparency/: Evaluates disclosure of limitations and potential conflicts of interest
trustfulness/: Measures models' accuracy and factuality in financial contexts

Each directory contains:

api_call.py: Script to call the model API and generate responses
A sample dataset of 100 test cases
postprocess_response.py: Script to process and evaluate model responses

Usage Instructions

First, create a .env file in the root directory with the following parameters:

PROMPT_JSON_PATH=""  # Path to the sample test data
MODEL_KEY=""         # Model name (must match a key in MODEL_CATALOG)
MAX_PARALLEL=""      # Number of parallel API calls
OPENAI_API_KEY=""    # Your OpenAI API key
TOGETHER_API_KEY=""  # Your Together API key

Run the API call script to generate responses:
```
python [dimension]/api_call.py
```
Process the responses to get evaluation results:
```
python [dimension]/postprocess_response.py
```

Note: The MODEL_KEY should correspond to a key in the MODEL_CATALOG dictionary defined in each api_call.py file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

Introduction

Dataset Access

Repository Structure

Usage Instructions

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
fairness		fairness
knowledge-discovery		knowledge-discovery
privacy		privacy
robustness		robustness
safety		safety
transparency		transparency
trustfulness		trustfulness
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

Introduction

Dataset Access

Repository Structure

Usage Instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages