-
Get the tags inventory, in order to replace the HTML tags.
-
Check the validity of files as XMl files.
-
Check the validity of files as TEI files.
-
Get the inventory of all the characters in
bodyelements. -
Split the words.
-
Generate the index (fill text, titles, ??) (maybe ask https://michaelmeyer.fr/ for a lexicon?).
-
Fixed some duplicated
@xml:idwithin the same document. -
For the
@xml:idattributes that were invalid NCNames, I did the following replacements:,with_comma_;*with_asterisk_;@with_at-sign_;:with_colon_;[with_left-square-bracket_;]with_right-square-bracket_;|with_pipeline_;/with_forward-slash_;(with_left-parenthesis_;)with_right-parenthesis_;- space with nothing.
-
Replaced
{http://www.tei-c.org/ns/1.0}NOTEtag with{http://www.tei-c.org/ns/1.0}note. -
Replaced
{http://www.tei-c.org/ns/1.0}ptag with{http://www.tei-c.org/ns/1.0}lg. -
Added correct markers for divisions in file
sa_viSNupurANa-crit.xml.
-
Further segmentation for the contents of
lgelements that contain the delimiter|(see filesa_viSNupurANa-crit.xml). -
There are
lgelements without@xml:id. -
Check
@corresp, to point to existing ID-s. -
For some files, the word separator is the full stop, see
mahān.mahī.astabhayad.. -
For some files, should the
pelement be kept? -
For the file
sa_bhartRhari-vAkyapadIya.xml, somenoteelements are, in fact,headelements, but no division of text is set, and I think there have to be divisions. -
The file
sa_kiraNatantra1-6.xmlis very poorly segmented.
-
The XML files have to be valid.
-
The TEI files have to valid according to TEI schema.
SwiftLaTeX, a WYSIWYG Browser-based LaTeX Editor - maybe this can help with generation in the browser of a LaTeX file from a TEI file.
MuPDF WASM - maybe this can help with generation in the browser of a PDF file from a TEI file.
xml_schema_generator - for generation of XML schema from XML files.
Convert HTML to PDF using JavaScript
How To Convert HTML to PDF using JavaScript
Sebastian Nehrdich via INDOLOGY, 2024.11.03 For those of you who want to use the dharmamitra Sanskrit grammatical capabilities that Oliver Hellwig and I published recently there is now a simple python package that calls our API: https://pypi.org/project/dharmamitra-sanskrit-grammar/
Please be aware that we might need to rate-limit the API so in case you have trouble to access or need more volume, feel free to reach out to me.
Also, for those of you who use emacs there is now a simple dharmamitra emacs extension that integrates translation and grammatical analysis capabilities into emacs: https://github.com/dharmamitra/dharmamitra-emacs
If you want to learn more about how the grammar model works, our preprint (which is more or less identical with the submitted version of the paper) is available here: https://arxiv.org/abs/2409.13920