Quantifying Association Capabilities of Large Language Models

Code for Quantifying Association Capabilities of Large Language Models and Its Implications on Privacy Leakage

USAGE

There are two parts in the experiment. The code and data is saved in /ENRON and /LAMA respectively.

ENRON EMAIL

1. Download Dataset

The Enron email dataset is downloaded from http://www.cs.cmu.edu/~enron/ and unzipped to ENRON/enron/maildir/.

2. Prepare Data

Some data files are too large to be uploaded. Here is how you can prepare these data.

ENRON/enron/parsed_emails.pkl: Run the scripts in mailparser.ipynb
ENRON/enron_count/cooccur.pkl: python ENRON/enron_count/cooccurrence.py

3. Predict Results

The prediction script is reused from this repo: https://github.com/jeffhj/LM_PersonalInfoLeak

However, some results are uploaded under ENRON/final_result_pkl/.

4. Analyze

/ENRON/analysis-email*.py are the analysis scripts used in the experiments.

LAMA

1. Download Dataset

The LAMA dataset is downloaded from https://dl.fbaipublicfiles.com/LAMA/data.zip

We also used Wikidump to extract contexts. The preprocessing script is LAMA/wikidump-prepare.py.

2. Prepare prompts

Prepare prompts: prompt-prepare.py
Prepare contexts:
1. LAMA/find_occurrence.py
2. LAMA/occurrence_agg.py
3. LAMA/extract_context.py

3. Predict Results

Predict: LAMA/pred.py
Predict (context): LAMA/pred_context.py

4. Analyze

/LAMA/analysis*.py are the analysis scripts used in the experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
ENRON		ENRON
LAMA		LAMA
.gitignore		.gitignore
README.MD		README.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENRON

ENRON

LAMA

LAMA

.gitignore

.gitignore

README.MD

README.MD

Repository files navigation

Quantifying Association Capabilities of Large Language Models

USAGE

ENRON EMAIL

1. Download Dataset

2. Prepare Data

3. Predict Results

4. Analyze

LAMA

1. Download Dataset

2. Prepare prompts

3. Predict Results

4. Analyze

About

Releases

Packages

Languages

hanyins/LM_Association_Quantification

Folders and files

Latest commit

History

Repository files navigation

Quantifying Association Capabilities of Large Language Models

USAGE

ENRON EMAIL

1. Download Dataset

2. Prepare Data

3. Predict Results

4. Analyze

LAMA

1. Download Dataset

2. Prepare prompts

3. Predict Results

4. Analyze

About

Resources

Stars

Watchers

Forks

Languages