Data Measurements Tool

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned
DataMeasurementsTool	🤗	indigo	red	streamlit	1.0.0	app.py	false

Data Measurements Tool

🚧 Doing Construction 🚧

For more information, check out out blog post!

How to run:

After cloning (and potentially setting up your virtual environment), run:

pip install -r requirements.txt

This installs all the requirements for the tool.

Command Line Interface

From there, you can measure different aspects of different datasets by running run_data_measurements.py with different options. The options specify the HF Dataset, the Dataset config, the Dataset columns being measured, the measurements to use, and further details about caching and saving.

To see the full list of options, do:

python3 run_data_measurements.py -h or python3 run_data_measurements.py --help

Example for hate_speech18 dataset:

python3 run_data_measurements.py --dataset="hate_speech18" --config="default" --split="train" --feature="text"

Example for getting just the nPMI measurement from hate_speech18:

python3 run_data_measurements.py --dataset=hate_speech18 --config default --split train --feature text --calculation npmi

Example for IMDB dataset:

python3 run_data_measurements.py --dataset="imdb" --config="plain_text" --split="train" --label_field="label" --feature="text"

User Interface

streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 370 Commits
.github/workflows		.github/workflows
data_measurements		data_measurements
lengths		lengths
npmi		npmi
scripts		scripts
utils		utils
widgets		widgets
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
run.sh		run.sh
run_data_measurements.py		run_data_measurements.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Measurements Tool

How to run:

Command Line Interface

User Interface

About

Releases

Packages

Contributors 7

Languages

License

huggingface/data-measurements-tool

Folders and files

Latest commit

History

Repository files navigation

Data Measurements Tool

How to run:

Command Line Interface

User Interface

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages