ConceptX

Analyzing Latent Concept in Pre-trained Transformer Models

The code has been split into three parts. 1) get concept clusters, 2) generate auto-labels data, 3) calculate the alignment between auto-labels and concepts.

Get concept clusters

get_clusters/getclusters.sh provides step-by-step commands to create concept clusters. You need to specify path to a sentence file.

Proprocessing

This step invovles tokenizing the input sentences and extracting word-level contextualized embeddings. The setup requires setting up neurox environment using env_neurox.yml.

conda env create --file=env_neuron.yml

Run clustering

Cluster the word-level contextualized embeddings. This step requires setting up the clustering environment.

conda env create --file=env_clustering

Generate Auto-labels

Label the sentence file with pre-defined concepts

Linguistic Annotations

Label words with their linguistic information such as parts-of-speech, suffixes, wordNet, etc. The following command tags the sentence file with their part of speech information.

python --model_name "QCRI/bert-base-multilingual-cased-pos-english" --sentence_file data/text.in --output_file text.in.pos

Trivial labels

auto-labels/Trivial/README provides step by step instructions to create trivial labels for the input sentence file.

Calculate alignment

This step calculates the alignment score between a given label file and the concept clusters.

python scripts/align_with_single_auto_tag.py label_file sentence_file cluster_file

cluster_file is the cluster output of step 1. sentence_file is the list of sentences whose words are labeled.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
auto_labels		auto_labels
data		data
get_clusters		get_clusters
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto_labels

auto_labels

data

data

get_clusters

get_clusters

scripts

scripts

LICENSE

LICENSE

README.md

README.md

Repository files navigation

ConceptX

Get concept clusters

Proprocessing

Run clustering

Generate Auto-labels

Linguistic Annotations

Trivial labels

Calculate alignment

About

Releases

Packages

Languages

License

hsajjad/ConceptX

Folders and files

Latest commit

History

Repository files navigation

ConceptX

Get concept clusters

Proprocessing

Run clustering

Generate Auto-labels

Linguistic Annotations

Trivial labels

Calculate alignment

About

Resources

License

Stars

Watchers

Forks

Languages