Corrections to Burnouf IAST #420

funderburkjim · 2018-09-27T20:04:45Z

In the review of sanskrit coding conventions, it was noticed (see):

However, the non-italic Sanskrit proper names have not been converted
to modern IAST; with @sanskritisampada 's help to identify the non-italic Sanskrit words, these will
also soon be converted to modern IAST.

This work has now been done. This issue aims to provide some documentation.

funderburkjim · 2018-09-27T20:24:38Z

plainwords

The files mentioned are in this Burnouf/iastwork directory.

There are some Sanskrit words in Burnouf that appear in plain text, in IAST.
But these have not been converted to Sanskrit IAST, rather they remain in
Burnouf's IAST. We want to identify these words and then change the spelling
to standard IAST for Sanskrit.

The first step is to generate a list of all 'words' that appear in plain text in the Burnouf digitization.
There are about 20000 distinct such words identified.

Of course, many of these words are French. Using the pyenchant python library, and a related dictionary of French words, we may identify many of the 20000 words as French.

Note on pyenchant:
Although still available and working with Python 2.7, apparently this library is no longer maintained. Here is the repository for it. It would be good to know if there
is some replacement for this library, which will work with Python 3, since Python 2 will become obsolete in 2020.

This filter resulted in plainwords_french.txt (15202 words) and plainwords_other.txt (4843).

Each of these files shows a word on each line, along with how often it occurs (as a plain word) in the Burnouf digitization.

The program using pyenchant is fr_pyenchant.py.

funderburkjim · 2018-09-27T20:54:50Z

Initial work

An html file was prepared to help in providing context to the 4500 'other' words. (plainwords_other.html).

At this point, the task was turned over to @sanskritisampada . Her goal was to mark the French words (with an 'F') and the Sanskrit words (with an 'S') in the list of plainwords_other words.

Even with context, this is a difficult task; partly due to the nature of the Burnouf dictionary:

cognate words in many languages
scientific (Latinate) names of plants and animals
modern versions of place names
probably other word categories not yet noticed.
Probable French words incorporated into French from Sanskrit.

funderburkjim · 2018-09-27T21:21:27Z

Google word detection tool

Sampada reported that this word identification was quite slow-going. This prompted a search for
ways to speed the process, and somewhere along the line I became aware of the language detection
functionality of Google Translate. In particular, there is a Python api, as described [here].

This was adapted for the current purposes in the sample_detect.py program.

After merging with what had been done thus far, the result was burnouf_sampada_detect.txt.
Note that each line now has, in addition to each word and its frequency,

a placeholder for the word identification
The language according to the Google language detection tool
a confidence number, also provided by the language detection tool.

Interestingly, even though the language detection is often quite odd, Sampada found it sped
up the process of identification.

The end result of the identifications thus far is in burnouf_sampada_detect_all.txt, with

1525 words marked as French
968 marked as Sanskrit
86 marked as place names ('P')

All in all, about 2579 were marked, and 2264 remain unmarked.

funderburkjim · 2018-09-27T21:24:56Z

French corrections

During the process of marking, Sampada identified many spelling corrections for French words.
With some editing, these were converted into digitization correction transactions, about 270.

funderburkjim · 2018-09-27T21:41:13Z

Sanskrit corrections and markup

The plainwords identified as Sanskrit were examined with regard to their spelling correctness in light of
modern IAST spelling conventions. As mentioned in the discussion of Burnouf's use of diacritics in representing Sanskrit words, many of these conventions differ from the modern IAST conventions.
Spelling changes were made so that the resulting digitization uses modern IAST spellings for these non-italic Sanskrit words.

After such modernization changes, the resulting Sanskrit words were converted to SLP1 and compared to the spellings of headwords in the Monier-Williams dictionary. This resulted in several corrections
to spellings (for instance, 'Crishna' was changed to 'Kṛṣṇa' in 3 places.)

The identified Sanskrit words, whether needing correction or not, were entered in a form which maintains their identification as Sanskrit words:

<s1 slp1="tretAyuga">Tretāyuga</s1>

This markup form had previously been used for a similar purpose in the revision to the MW digitization.

All the Sanskrit plain word digitization changes are present in the manualByLine_sancorr.txt file.

gasyoun · 2018-09-29T22:42:56Z

'Crishna' was changed to 'Kṛṣṇa'

This is amazing.

drdhaval2785 · 2020-12-18T05:42:27Z

@funderburkjim the IAST conversion in BUR be treated over?

All in all, about 2579 were marked, and 2264 remain unmarked.

This line stopped me pressing close button.

funderburkjim · 2020-12-19T21:08:56Z

There appears to be more that could be done to improve Burnouf , starting with further
examination based on burnouf_sampada_detect_all.txt.

gasyoun · 2020-12-19T21:29:03Z

There appears to be more that could be done to improve Burnouf

Let a French Sanskrit scholar be born and finalize it.

sanskritisampada · 2020-12-19T21:44:15Z

Perhaps I could contribute further after the AP 90 task is complete.

…

On Sat, 19 Dec 2020, 22:29 Mārcis Gasūns, ***@***.***> wrote: There appears to be more that could be done to improve Burnouf Let a French Sanskrit scholar be born and finalize it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#420 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACTKJX2AW3GAK64NAKPPZJTSVULKZANCNFSM4FXVO4XQ> .

gasyoun · 2020-12-19T22:08:02Z

I could contribute further after the AP 90 task is complete.

You're a true miracle, Sampada.

funderburkjim · 2020-12-19T23:00:24Z

@sanskritisampada Good idea

funderburkjim added enhancement Documentation labels Sep 27, 2018

funderburkjim added a commit that referenced this issue Sep 27, 2018

Burnouf iast changes. See #420

5fa65c1

drdhaval2785 mentioned this issue Dec 20, 2020

todo list in 2021 (in descending order of importance) sanskrit-lexicon/COLOGNE#325

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrections to Burnouf IAST #420

Corrections to Burnouf IAST #420

funderburkjim commented Sep 27, 2018

funderburkjim commented Sep 27, 2018 •

edited

Loading

funderburkjim commented Sep 27, 2018 •

edited

Loading

funderburkjim commented Sep 27, 2018

funderburkjim commented Sep 27, 2018

funderburkjim commented Sep 27, 2018

gasyoun commented Sep 29, 2018

drdhaval2785 commented Dec 18, 2020

funderburkjim commented Dec 19, 2020

gasyoun commented Dec 19, 2020

sanskritisampada commented Dec 19, 2020 via email

gasyoun commented Dec 19, 2020

funderburkjim commented Dec 19, 2020

Corrections to Burnouf IAST #420

Corrections to Burnouf IAST #420

Comments

funderburkjim commented Sep 27, 2018

funderburkjim commented Sep 27, 2018 • edited Loading

plainwords

funderburkjim commented Sep 27, 2018 • edited Loading

Initial work

funderburkjim commented Sep 27, 2018

Google word detection tool

funderburkjim commented Sep 27, 2018

French corrections

funderburkjim commented Sep 27, 2018

Sanskrit corrections and markup

gasyoun commented Sep 29, 2018

drdhaval2785 commented Dec 18, 2020

funderburkjim commented Dec 19, 2020

gasyoun commented Dec 19, 2020

sanskritisampada commented Dec 19, 2020 via email

gasyoun commented Dec 19, 2020

funderburkjim commented Dec 19, 2020

funderburkjim commented Sep 27, 2018 •

edited

Loading

funderburkjim commented Sep 27, 2018 •

edited

Loading