Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

o vs O Corrections in PWG, Part 1 #130

Closed
zaaf2 opened this issue Oct 17, 2015 · 135 comments
Closed

o vs O Corrections in PWG, Part 1 #130

zaaf2 opened this issue Oct 17, 2015 · 135 comments
Assignees
Labels

Comments

@zaaf2
Copy link

zaaf2 commented Oct 17, 2015

This issue is about an analysis of the data contained in the file http://drdhaval2785.github.io/o_vs_O/output1/PWG.html,
generated by the o_vs_O method of highest probability (one dictionary in first word and more dictionaries in second word), as applied to PWG.

    1. अपराःणक → अपराह्णक

OCR error.

image

@zaaf2
Copy link
Author

zaaf2 commented Oct 17, 2015

    1. असुक → असूक

False positive.
PWG has:
image

(…)

image

असूक in AP has another origin and meaning: “असूक a. See असूयक.”

@zaaf2
Copy link
Author

zaaf2 commented Oct 17, 2015

    1. आच्यदोह → आच्यादोह

Factual error:
image

As mentioned by PW, the Tandya Brahmana 21,2,5 has आच्यादोह :
image

According to MW आच्या in आच्यादोह (with ā) comes from the Vedic ind. p. (aka gerund) of आच् (< आ-√अच्), instead of the regular ind. p. with ă आच्य. Here the MW screenshot:

image

Regarding the Vedic ind.p., v. MacDonell, A Vedic Grammar for Students:
image

@zaaf2
Copy link
Author

zaaf2 commented Oct 17, 2015

    1. आतीषादीय → आतीषादीय

Factual error.
As mentioned by PW, the Tandya Brahmana 12,11,15 has आतीषादीय :

image

@zaaf2
Copy link
Author

zaaf2 commented Oct 17, 2015

    1. आषाडी → आषाढी

Factual error, corrected by PWG itself in the section „Verbesserungen und Nachträge“ (vol. 5):

आषाडी [L=9704] [p= 1-0728] (°ढी?) f. N. pr. einer Localität R. 4, 27, 11.

आषाडी [L=67037] [p= 5-1128] zu streichen, da an der angeführten Stelle आषाढी in der gangbaren Bed. zu lesen ist.

@gasyoun
Copy link
Member

gasyoun commented Oct 17, 2015

@funderburkjim is Devanagari OK with you?
@zaaf2 amazing work! Wonder how many thousands of mistakes are already documented in the „Verbesserungen und Nachträge“ part. Similar work was integrated only in MW. Never in PWG or any other.

@drdhaval2785
Copy link
Contributor

drdhaval2785 commented Oct 18, 2015 via email

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

@drdhaval2785 I should have mentioned in the opening of this issue that its object is an analysis of the data contained in the file http://drdhaval2785.github.io/o_vs_O/output1/PWG.html,
generated by the o_vs_O method of highest probability (one dictionary in first word and more dictionaries in second word), as applied to PWG.
I will edit now the opening of this issue to correct this.

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. इंकार → ईंकार

False positive.

image

Just another way to write इङ्कार, as in MW:
इङ्कार [p= 164] : and इङ्-कृत = हिङ्-कार, हिङ्कृत q.v. [L=28636]

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. इन्दुकेशरिन् → इन्दुकेसरिन्

False positive.

image

केशरिन् and केसरिन् are alternative forms of the same word. Cf. MW:

केशरिन् [p= 311] : mfn. having a mane MBh. i, iii [L=56139]; m. (ई) a lion MBh. Suṡr. Bhartṛ. &c [L=56140]
केसरिन् [p= 311] : mfn. having a mane MBh. i, iii [L=56151] m. a lion MBh. Suṡr. Bhartṛ. &c [L=56152]

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. इलाय् ईलय्

False positive

PWG:
image

MW:
एलाय [p= 232] : Nom. P. एलायति, to be wanton or playful, be merry. [L=40174]
ईल् [p= 170] : Caus. P. ईलयति, to move TS. vi, 4, 2, 6 (cf. ईर्, Caus.) [L=29820]

SCH:
īlay [L=7805] [p= 107-2], īláyati sich bewegen , TS. 6 , 4 , 2 , 6. Vgl. Kaus. von īr. -- Auch: von der Stelle bewegen , Āpast. Śr. 1 , 16 , 11. 4

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. उत्पलवती ― उत्पलावती

Insufficient elements to reach a conclusion.

PWG:
image

MW:
उत्पला-वती [p= 181] : f. N. of a river MBh. [L=31710]; f. of an अप्सरस्. [L=31711]
PW:
उत्पलावती [L=18409] [p= 1225-1] f. N.pr. eines Flusses Mbh.6,342. = ताम्रपर्णी Gal.

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. उपदेशसहस्री → उपदेशसाहस्री

Factual error. Śaṅkara’s work is called उपदेशसाहस्री

PWG:
image

MW:
उप-देश-साहस्री [p= 199] : f. N. of certain works. [L=34706]
साहस्र [p= 1212] : mf(ई, or आ)n. (fr. सहस्र) relating or belonging to a thousand, consisting of or bought with or paid for a thousand, thousand fold, exceedingly numerous, infinite VS. &c [L=243642]

VCP :
image

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

  • 11 उष्मपुर ― ऊष्मपुर

Acceptable alternative forms.

PWG:

image

MW:
उष्मन् [p= 220] : m. heat, ardour, steam Mn. MBh. Suṡr. &c (in many cases, where the initial उ is combined with a preceding अ, not to be distinguished from ऊष्मन् q.v.) [L=37852]
ऊष्मन् [p= 223] : m. ( √उष् cf. उष्मन्), heat, glow, ardour, hot vapour, steam, vapour AV. vi, 18, 3 VS. ṠBr. KātyṠr. BhP. (also figuratively said of passion or of money &c ) [L=38352]

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. एकव्यवहारिक → एकव्यावहारिक

Factual error.
Already corrected in PWG in „Nachträge“ (vol. 7):
एकव्यवहारिक [L=119212] [p= 7-1722] (Nachträge), wohl °व्यावहारिक zu verbessern.

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. एकशीति → एकाशीति

OCR error.

image

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. ऐन्द्रवरुण → ऐन्द्रावारुण

Factual error. The change should include ऐन्द्रावरुण and ऐन्द्रावारुण (both forms incorrectly mentioned in PWG with first ă instead of ā)

PWG:
ऐन्द्रवरुण [L=69068] [p= 5-1223] adj. zu Indra und Varuṇa in Beziehung stehend Ait. [Page05.1224] Br. 6, 14. 25. °वारुण Pańḱav. Br. 8, 8, 6.

MW:
ऐन्द्रावरुण [p= 234] : mfn. relating to इन्द्र and वरुण AitBr. Vait. [L=40471]
ऐन्द्रावारुण [p= 234] : mfn. = ऐन्द्रावरुण above TāṇḍyaBr. [L=40473]

The Tandya Brahmana 8.8.6 has ऐन्द्रावारुण :
image

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. कन्यकुमारि → कन्यकुमारी

Factual error.

PWG:
image

MW:
कन्य-कुमारी [p= 249] : f. N. of दुर्गा TĀr. [L=43075]
कन्या-कुमारी [p= 249] : f. = कन्य-कु°
कुमारि [p= 292] : (shortened for °री q.v. ; cf. Pāṇ. 6-3, 63)
कुमारी a [p= 292] : f. a young girl, one from ten to twelve years old, maiden, daughter AV. AitBr. &c [L=52291]

Taittiriya Aranyaka 10.1.7:
image

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. करालकेशर ― करालकेसर

False positive.
केशर and केसर are alternative forms of the same word (cf. case 7).

PWG:
image

MW:
केशर [p= 310] : &c » केसर. [L=56028]

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. कुडुहुञ्ची ― कुडूहुञ्ची

Insufficient elements to reach a conclusion.
PWG:
image

PW:
image
The mentioned work is in a manuscript edition:
image

MW:
कुडूहुञ्ची [p= 289] : f. (a Mahratti N. of) Solanum trilobatum Npr. [L=51760]

(Npr. = निघण्टुप्रकाश)

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. कृतकम → कृतकाम

OCR error

image

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. क्रोलायन → क्रौलायन

Factual error.
Although the form क्रोलायन is found in a manuscript (MS., v. MW), it is ungrammatical. The secondary suffix आयन, forming patronymics, requires vṛddhi-strengthening of the first syllable (cf. Whitney, Sanskrit Grammar, 1219).
PWG:
image

PW:
image

MW:
क्रौलायन a [p= 323] : m. patr. fr. क्रोल (for °ड) Pravar. (क्रोल्° MS.) [L=58522]

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. गृहचार → गृहाचार

OCR error.
image

@zaaf2
Copy link
Author

zaaf2 commented Oct 18, 2015

    1. चतुःस्तन ― चतुःस्थान

False positive. Different words.

PWG:
image
चतुस्तन [L=24654] [p= 2-0935] (चतुर् + स्तन) adj. f. vierzitzig: गौः Çat. Br. 6, 5, 2, 18.
स्तन 1) die weibliche Brust, Zitze (bei Menschen und Thieren)

MW:
चतुः-स्थान [p= 384] : » चतु-स्°. [L=71129]
चतु-स्थान [p= 383] : mfn. having a fourfold basis Nār. i, 8. [L=71082]
स्तन [p= 1257] : m. (…) the female breast (either human or animal) , teat, dug, udder RV. &c [L=254308]

@gasyoun
Copy link
Member

gasyoun commented Oct 19, 2015

@Shalu411 ever heard such Marathi word as in 17.?

@zaaf2
Copy link
Author

zaaf2 commented Oct 19, 2015

Regarding case 19 (क्रोलायन → क्रौलायन), now I think it is better to preserve the reading क्रोलायन. It is an attested form, mentioned as such by MW and PW. I think it is important to preserve as much as possible the correspondence between the digital and the printed version, which should be treated as a historical document, with its imperfections and all.

@zaaf2
Copy link
Author

zaaf2 commented Oct 19, 2015

    1. चतुरुषण ― चतुरूषण

Acceptable alternative form.
image

MW:
उषण [p= 220] : n. black pepper [L=37701]; n. the root of Piper Longum [L=37702]
ऊषण [p= 223] : n. black pepper Suṡr. [L=38346]

@zaaf2
Copy link
Author

zaaf2 commented Oct 19, 2015

    1. चिलिचिमि ― चिलिचीमि

Alternative form, mentioned as such by PWG:
image
(I think each of these forms should be accessible as headwords)

MW:
चिलिचिम [p= 399] : m. a kind of fish Car. i, 25 Suṡr. i, 20, 3 and 8. [L=74362]
चिलिची°मि [p= 399] : m. id. L. Sch. » also चिलमीलिका. [L=74364]

@zaaf2
Copy link
Author

zaaf2 commented Oct 19, 2015

    1. जम्बूनदमय ― जाम्बुनदमय

False positive. A form reported as incorrect by PWG itself:
image

BEN:
jAmbunadamaya [L=5388] [p= 0330-b] and jAmbUnadamaya [Page0331-a+ 39]; jâmbu¤10nada + maya, adj., f. yî, Golden, Pańch. 175, 8.

@zaaf2
Copy link
Author

zaaf2 commented Nov 2, 2015

@gasyoun Re 92. I proposed “no change” following what was discussed above, about changing or not lexicographical errors. Of course it is much more probable that PW corrected a previous error in PWG. But I think our objective is not to correct the errors committed by the PWG author, but the errors made in spite of his intentions.

@zaaf2
Copy link
Author

zaaf2 commented Nov 2, 2015

Re 748. dAvikAkUla ― dAvikakUla MW72,PWG (दाविकाकूल ― दाविककूल)
I am not so sure any more. PWG’s दाविककूल could be defended. कूल is neuter. But then there is Böhtlingk’s Pāṇini edition. A change here would be problematic. Better to leave it as it is. No change.

@zaaf2
Copy link
Author

zaaf2 commented Nov 2, 2015

As could be observed at #131 (Re 247. niHzAmam -> niHzamam), there is an OCR error under PWG निःषम (due to the poor quality of the printed text):

दुःपमम् → दुःषमम्

PWG:

  • निःषम [L=40248] [p= 4-0255], (निस् + सम) P. 8, 3, 88. निःषमम् adv. gaṇa तिष्ठद्ग्वादि zu P. 2, 1, 17. = दुःपमम् zur Unzeit Ak. 3, 5, 14.

image

Pāṇini 8.3.88 (Böhtlingk’s edition, Leipzig 1887):
image

@gasyoun
Copy link
Member

gasyoun commented Nov 3, 2015

On 748 I just had a talk with Sergey from Moscow. He said the same just hours before and I could not post it before. Böhtlingk’s edition, Leipzig 1887 is a great source for comparison in those rare cases, where it's quoted. I even have the original edition on my desk, but, shame, did not open it. So no change - bad idea.
dAvikakUla in MW72 is wrong, based on PWG. Böhtlingk’s has it right, so does MW. This one has to be fixed, as it's rude and well documented.
247. poor quality of the printed text = invisible 👍
It's just time @zaaf2 to close Part 1 and start Part 2, before it get's too long.

@funderburkjim
Copy link
Contributor

Re 46. लक्षणवादरहस्य ― लक्षणावादरहस्य

This was concluded to be a NO-CHANGE.

While not disagreeing with the choice, the thought occurs that we should consider the two spellings to be variants. Currently there is no provision in the dictionaries to handle variant spellings. If there were a system for identifying 'equivalent' spellings, this would be such a case.

@funderburkjim
Copy link
Contributor

Re: 51. विचित्वरा ― विचित्वारा

The form of the record (having the parenthetical (विचित्वारा) following the headword) may be a pattern using in PWG to identify alternate spellings.

Everyone should realize that we are now applying to other dictionaries (PWG in this case) the kind of scrutiny that was applied to MW several years ago. One upshot of this scrutiny is that we see things where additional markup would help to expose (and therefore make useable) features of the dictionary. In particular, adding markup to identify alternate spellings, as here , would probably add to the utility of the dictionary.

To give an idea of what I mean by 'additional markup', here's a seat-of-the-pants possibility for addtional markup in this case (I'm adding markup to a record of pwg.txt):

current record of pwg.txt:
<H1>000{vicitvarA}1{vicitvarA}¦ ({#vicitvArA#}) s. u. {#vijitvara#} .

possible additional markup:  put the author-identified variant spelling in an '<OR>' tag:
<H1>000{vicitvarA}1{vicitvarA}¦ (<OR>{#vicitvArA#}</OR>) s. u. {#vijitvara#} .

Note that only markup (XML-tags) has been added - the text has not been changed.

With such markup, programs could make use of the markup, for instance, to generate a list
of headwords INCLUDING VARIANTS. Perhaps such a list could replace pwghw2.txt.

Just a thought.

@funderburkjim
Copy link
Contributor

Re. 9. उत्पलवती ― उत्पलावती

Acc. to the Smith digitization of Mahabharata, utpalAvatim occurs at 06010033.

@gasyoun
Copy link
Member

gasyoun commented Nov 3, 2015

Just a thought will remain such if no Jim around. But anyway - that's not top priority.
Although it might increase the total number of possible words, 434k is quite impressive already.

@funderburkjim
Copy link
Contributor

re '® is a markup for plants in PW.' @gasyoun is right. This was markup that Thomas put in the original digitization. This feature is documented in the 'pw-meta.txt' file, which is part of the pwtxt.zip , one of the pw download items.

Incidentally, in MW this would be marked as <bot>xxxx</bot>. It could not do any harm to bring the markup conventions into greater consistency across the dictionaries.

@funderburkjim
Copy link
Contributor

Regarding 66 makes sense. I agree with change. I've come down from the ledge of 'OCR changes only'; thanks for talking me down before I jumped!

Am trying to think how to add markup to the digitization. Current idea is that the markup should be simple such as :

<pc old="OLD">NEW</pc>
'pc' == Print Change
meaning that the printed form was OLD, and we have changed to NEW.
The markup would be the same, regardless of the reason.

Such changes should also be documented in a file for each dictionary, the file being called something like pwg_printchange.txt . This is a more neutral-sounding name than 'corrections_factual',

The displays can use the markup to provide a brief indication that the digitization intentionally differs from the print edition, and link to the printchange.txt file.

The printchange file can have the free form of current corrections_factual, and in particular
have links to relevant github issues (such as this #130 issue).

For cases where the change is to a headword, we could also take this into account via the hw2 file,
as mentioned for the <OR> suggested markup mentioned above.

The above sounds like it might have the virtues of

  • being simple enough to feasibly implement and maintain
  • being complete enough to allow a knowledgeable reader to be notified of print-digitization differences and be able to track down the reasoning behind a particular change.

@funderburkjim
Copy link
Contributor

@zaaf2 Would you elaborate on your 'crowdsourcing' idea?

@gasyoun
Copy link
Member

gasyoun commented Nov 4, 2015

How about pwg_printerrata.txt instead of pwg_printchange.txt?

@zaaf2
Copy link
Author

zaaf2 commented Nov 4, 2015

Suggestion for crowdsourcing the work on @drdhaval2785's lists.
A MW List Display search for दाविकाकूल (case 748),for example, would result in a screen such as this:

image

In the next screen we would have something like this:

image

@funderburkjim
Copy link
Contributor

@zaaf2 Such a well-presented suggestion! Would you transfer it to another issue, so that it may
remain under consideration when the corrections of this issue are installed?

@funderburkjim
Copy link
Contributor

re केतसाप् ― केतसप् Also, there was a 'pad/pAd' similar case. I think it is normally true in PWG that the stem form is presented for nominals, as in MW. I wonder how prevalent it is to find that, as in these
sap/sAp and pad/pAd cases, PWG uses a nominative singular form as the headword citation form.

From 81, you've also identified 'vah/vAh' as a similar phenomenon . There you use the term 'strong form', which may be a better way to think of it than 'nominative singular'.

This is similar to the 'vat/vant' spelling variation.

So, maybe these can be tailored as additional alternate form spelling rules for hwnorm1.

@funderburkjim
Copy link
Contributor

@funderburkjim
Copy link
Contributor

@zaaf2 re 102. मूखदूषण → मुखदूषण Should this be called a print error?

@funderburkjim
Copy link
Contributor

@gasyoun How about pwg_printerrata.txt instead of pwg_printchange.txt? I prefer the word
change rather than the word 'errata'. The word 'change' is descriptive of what we are doing (changing the printed edition in the digital edition). The word 'errata' seems more presumptuous.

@funderburkjim
Copy link
Contributor

re 748. दाविककूल → दाविकाकूल I think this change should be made. This is a compound, the first element of which is the name of a river, such names being always(?) feminine., i.e. kA.

@zaaf2 Agree?

@funderburkjim
Copy link
Contributor

@zaaf2 Here is my summary of the corrections to be made based on this issue.

Would you double check that I've interpreted things properly?

Then, I'll install the corrections.

@gasyoun
Copy link
Member

gasyoun commented Nov 5, 2015

@zaaf2 Maybe Lexicographer errors instead of Lexicography errors?

@zaaf2
Copy link
Author

zaaf2 commented Nov 5, 2015

re 748. दाविककूल → दाविकाकूल
Error in the PWG printed edition. I agree.

देविका f. is the name of the river. दाविक is the adjective, “(water) coming from the river देविका”. दाविकाकूल itself is also an adjective, “(rice etc.) coming from the banks (कूल) of the देविका”. I was not sure about the change because I thought the first member of the compound was the adj. दाविक, and I could not explain the second ā in दाविकाकूल. Now I see my doubt is unfounded. As one can see in the commentary to Pāṇini’s rule, the adj. दाविकाकूल comes directly from the Tatpuruṣa compound देविकाकूल n. (which may be translated as “bank of the देविका river”). When देविकाकूल as a whole is transformed into the adjective by an (absorbed) -a suffix (v. Whithey 1208.h), then the special rule in question takes effect, and दे- is changed to दा-, the rest of the word remaining unchanged.
image

@zaaf2
Copy link
Author

zaaf2 commented Nov 5, 2015

@gasyoun I am not aware I used the expression Lexicography errors. Lexicographer errors? Perhaps Lexicographer’s errors would be better? I would go for Lexicographical errors. We say typographical errors, not typographer errors.

@zaaf2
Copy link
Author

zaaf2 commented Nov 5, 2015

@funderburkjim re 102. मूखदूषण → मुखदूषण
Yes. Error in the printed PWG edition.
I mistakenly saw an OCR error.
image
There is no मूख.
MW:
मुख-दूषण [p= 819] : n. (L. ) (Bhpr. ) " mouth-defiler ", an onion. [L=164884]
मुख [p= 819] : n. (m. g. अर्धर्चा*दि ; ifc. f(आ, or ई). cf. Pāṇ. iv, 1, 54, 58) the mouth, face, countenance RV. &c , &c [L=164836]

@funderburkjim
Copy link
Contributor

Re: 71. आज्ञाप्ति ― आज्ञप्ति No change. OCR error.

I think this should be changed, as an "OCR error" (typo). As MW has AjYapti but not AjYApti.
@zaaf2 Agree?

@funderburkjim
Copy link
Contributor

Corrections now installed.
pwg_printchange.txt also made part of this CORRECTIONS repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants