Corpus-Specific Automatic Hyperlinking

python preprocess.py <datadir>

python train_test_split.py <datadir> <min_examples> <test_size>

Use the following to save a summary of the final dataset:

TOKENIZERS_PARALLELISM=false python -m spacy debug data config.cfg --ignore-warnings --verbose --no-format --paths.train train.spacy --paths.dev test.spacy > data-summary.txt

You may have to switch the spancat component in config.cfg to use the spancat factory instead of spancat_singlelabel to get more details on the span labels.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.output		.output
.vscode		.vscode
images		images
output		output
parser		parser
screenshots/Writeup		screenshots/Writeup
.envrc		.envrc
.gitignore		.gitignore
Makefile		Makefile
README.org		README.org
base_config.cfg		base_config.cfg
components.py		components.py
config.cfg		config.cfg
experiment.ipynb		experiment.ipynb
experiment.org		experiment.org
milestone.org		milestone.org
milestone.tex		milestone.tex
preprocess.py		preprocess.py
report.org		report.org
train_test_split.py		train_test_split.py
util.py		util.py

rynoV/CPSC-599-NLP-project

Folders and files

Latest commit

History

Repository files navigation

Corpus-Specific Automatic Hyperlinking

Resources

About

Resources

Stars

Watchers

Forks

Languages