-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrections to Burnouf IAST #420
Comments
plainwordsThe files mentioned are in this Burnouf/iastwork directory. There are some Sanskrit words in Burnouf that appear in plain text, in IAST. The first step is to generate a list of all 'words' that appear in plain text in the Burnouf digitization. Of course, many of these words are French. Using the pyenchant python library, and a related dictionary of French words, we may identify many of the 20000 words as French.
This filter resulted in plainwords_french.txt (15202 words) and plainwords_other.txt (4843). Each of these files shows a word on each line, along with how often it occurs (as a plain word) in the Burnouf digitization. The program using pyenchant is fr_pyenchant.py. |
Initial workAn html file was prepared to help in providing context to the 4500 'other' words. (plainwords_other.html). At this point, the task was turned over to @sanskritisampada . Her goal was to mark the French words (with an 'F') and the Sanskrit words (with an 'S') in the list of plainwords_other words. Even with context, this is a difficult task; partly due to the nature of the Burnouf dictionary:
|
Google word detection toolSampada reported that this word identification was quite slow-going. This prompted a search for This was adapted for the current purposes in the sample_detect.py program. After merging with what had been done thus far, the result was burnouf_sampada_detect.txt.
Interestingly, even though the language detection is often quite odd, Sampada found it sped The end result of the identifications thus far is in burnouf_sampada_detect_all.txt, with
All in all, about 2579 were marked, and 2264 remain unmarked. |
French correctionsDuring the process of marking, Sampada identified many spelling corrections for French words. |
Sanskrit corrections and markupThe plainwords identified as Sanskrit were examined with regard to their spelling correctness in light of After such modernization changes, the resulting Sanskrit words were converted to SLP1 and compared to the spellings of headwords in the Monier-Williams dictionary. This resulted in several corrections The identified Sanskrit words, whether needing correction or not, were entered in a form which maintains their identification as Sanskrit words:
This markup form had previously been used for a similar purpose in the revision to the MW digitization. All the Sanskrit plain word digitization changes are present in the manualByLine_sancorr.txt file. |
This is amazing. |
@funderburkjim the IAST conversion in BUR be treated over?
This line stopped me pressing close button. |
There appears to be more that could be done to improve Burnouf , starting with further |
Let a French Sanskrit scholar be born and finalize it. |
Perhaps I could contribute further after the AP 90 task is complete.
…On Sat, 19 Dec 2020, 22:29 Mārcis Gasūns, ***@***.***> wrote:
There appears to be more that could be done to improve Burnouf
Let a French Sanskrit scholar be born and finalize it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#420 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACTKJX2AW3GAK64NAKPPZJTSVULKZANCNFSM4FXVO4XQ>
.
|
You're a true miracle, Sampada. |
@sanskritisampada Good idea |
In the review of sanskrit coding conventions, it was noticed (see):
This work has now been done. This issue aims to provide some documentation.
The text was updated successfully, but these errors were encountered: