Skip to content

DCS full master - SQLite (CoNLL-U 2026-03-05)

Latest

Choose a tag to compare

@gasyoun gasyoun released this 06 Jun 20:24
· 6 commits to main since this release

Full Digital Corpus of Sanskrit as a queryable SQLite master, built by the M1-M7 pipeline in src/DCS-data-2026/ from gasyoun/dcs-conllu @ 04e0778 (2026-03-05). 270 texts / 754,726 sentences / 5,688,416 tokens / 98,606 attested lemmas / 74 treebank texts. Validated: cross-walk 0 mismatches, coverage matches upstream. Tables: text / chapter / sentence / token (flatten-all UD FEATS+MISC) / mwt / lemma / provenance. Use: gunzip dcs_full.sqlite.gz ; sqlite3 dcs_full.sqlite. Regenerate: python src/DCS-data-2026/import_dcs_conllu.py --all --db dcs_full.sqlite. 287 MB gz (~920 MB uncompressed). CC BY 4.0 (Oliver Hellwig / DCS).