Further <ls> cleanup #136

funderburkjim · 2022-07-08T03:17:49Z

This issue devoted to continuation (from #134) of cleanup of ls markup in mw.txt.
Two fertile areas:

876 matches in 874 lines for "<ls[^<]* and"
Example: 
OLD
<ls>Mn. ix, 49 and 51.</ls>  
NEW
<ls>Mn. ix, 49</ls> and <ls n="Mn. ix,">51.</ls>

1098 matches in 1085 lines for "<ls[^<]*;"
Example:
OLD
<ls>Mn. iii, 257; v, 73</ls>
NEW
<ls>Mn. iii, 257</ls>; <ls n="Mn.">v, 73</ls>

The text was updated successfully, but these errors were encountered:

gasyoun · 2022-07-09T17:35:43Z

fertile areas

💯

Ref: sanskrit-lexicon/MWS#136

funderburkjim · 2022-07-15T16:54:18Z

This week's changes to mw.txt, primarily to ls markup, are now completed.
The work is done in issue136 directory.

The sequence of changes are in files change_1.txt through change_4.txt, with corresponding notes in the readme.txt file of issue136 directory.

The change_all.txt file shows list of all 4221 lines changed in mw.txt.

The ls markup in mw.txt has received considerable attention in the last few weeks.
Thanks to @Andhrabharati and @gasyoun for pointing out areas that needed attention.

At the moment, I don't have in mind further lines of improvement to ls markup in mw.txt, and will likely return to ls markup improvements in pw and pwg.

Andhrabharati · 2022-07-15T17:50:26Z

Probably @funderburkjim may consider removing the 3 <slp1> tags in the <ls> strings--

change <ls>Kielhorn., <s1 slp1="mahABAzya">Mahābhāṣya</s1>, vol. i, preface, p.9 f.</ls>
as <ls>Kielhorn., Mahābhāṣya, vol. i, preface, p.9 f.</ls>

change <ls>VS. (<s1 slp1="kARva">Kāṇva</s1>) ii, 24</ls>
as <ls>VS. (Kāṇva.) ii, 24</ls>

change <ls>YajurV. <s1 slp1="parIS">Parīś</s1>. xv</ls>
as <ls>YajurV. Parīś. xv</ls>

Andhrabharati · 2022-07-15T17:59:20Z

And also correct the space-between-digits [0-9] [0-9] errors inside the <ls> (45 occurrences, which link to a wrong place) and <pc> (4 occurrences, which lead to a wrong page) blocks.

Andhrabharati · 2022-07-15T18:36:15Z

Some more minor corrections, on a quick look--

change <ls>R , i, 27, 7</ls>
as <ls>R. i, 27, 7</ls>
change <ls>Daś. -</ls>
as <ls>Daś.</ls>
change <ls>L. -</ls>
as <ls>L.</ls>
change <ls>Rājat. -</ls>
as <ls>Rājat.</ls>
change <ls>R. -</ls>
as <ls>R.</ls>
change <ls>L., also</ls>
as <ls>L.</ls>, also
change <ls>ŚBr., as</ls>
as <ls>ŚBr.</ls>, as
change <ls>Beta bengalensis</ls>
as <bot>Beta bengalensis</bot>
change <ls>T., but according to</ls>; <ls>Uṇ. i, 67</ls>
as <ls>T.</ls>, but according to <ls>Uṇ. i, 67</ls>
change <ls>MBh. etc</ls>,
as <ls>MBh.</ls> etc.
change <ls>Kāv. etc.</ls>
as <ls>Kāv.</ls> etc.

And it is not out of place to mention that many punctuation errors, as at cases 9 & 10 above, are seen throughout the text; and a handful cases of hyphen marks at wrong places which indicate the <hom> numbers of the next entry word are present.

funderburkjim · 2022-07-15T20:39:29Z

@Andhrabharati Will take a look at these flaws; is there a systematic way to find other instances like 9, 10 ?

Andhrabharati · 2022-07-16T05:11:54Z

a handful cases of hyphen marks at wrong places which indicate the <hom> numbers of the next entry word are present.

Though these are not related to <ls> items, they are more important as concerning the HWs (and metalines) themselves; some (6) are to be found by the regex -[0-9].<info .

And, there is a <hom> related issue #131, that @funderburkjim yet needs to put his eye on!

Ref: sanskrit-lexicon/MWS#136

funderburkjim · 2022-07-21T03:10:21Z

further cleanup.

These take into account suggestions since previous commit.
The change transactions are change_5.txt.
All in all, about 800 lines were changed.

A large number of 'new' ls abbreviations were added to mwath as 'Unknown'.
(Also, the Maṇḍ. abbreviation was given tooltip -- see the 'pywork' commit above.)

@Andhrabharati could help by providing tooltips for these Unknown cases.
ls_abbrev_instances_unknown.txt file has instances of most of these Unknown cases.

An attempt was made to define programmatically a 'normal' ls instance. Using this rule,
there remain about 40 'abnormal' instances identified in file lsabnormal_5.txt.

If there are no more correction suggestions for 'ls' in mw, this issue can be closed and I'll take
a look at the hom issue mentioned above.

Andhrabharati · 2022-07-21T06:04:16Z

Seen that the space between digits and other corrections related to <ls> items are all considered now.

The remaining point in this issue is #136 (comment), which could be done before going to #131, or to be kept in mind while doing the <hom> corrections.

I would prefer them being corrected here itself, as these are not marked <hom> explicitly.

So Jim can decide the action accordingly to close this issue.

Andhrabharati · 2022-07-21T06:10:29Z

A large number of 'new' ls abbreviations were added to mwath as 'Unknown'. (Also, the Maṇḍ. abbreviation was given tooltip -- see the 'pywork' commit above.)

@Andhrabharati could help by providing tooltips for these Unknown cases. ls_abbrev_instances_unknown.txt file has instances of most of these Unknown cases.

An attempt was made to define programmatically a 'normal' ls instance. Using this rule, there remain about 40 'abnormal' instances identified in file lsabnormal_5.txt.

If @funderburkjim likes to do it here itself, I can surely help resolving these, but I would suggest doing this piece of work while some action is taken on the issue #135 (which is related to the same and also has some more relevant points).

Hope to listen back Jim's opinion.

Andhrabharati · 2022-07-21T06:50:13Z

Some small extra corrections related to spaces:

There are 112 double space instances in the mw.txt, that are to be made single spaces.
10 cases of ,, 9 cases of ; and 10 cases of ) to have the preceding space deleted.
3 dangling > to be deleted.

Andhrabharati · 2022-07-21T07:10:10Z

In the tooltip.txt,

99.98 ib. int the same place [Cologne Addition] Title

to be corrected as

99.98 ib. in the same place [Cologne Addition] Title

Andhrabharati · 2022-07-21T07:19:58Z

There are two instances of ** under <L>229710 and <L>237718 in the mw.txt, that may be deleted.

Incidentally, the <ls>ĀpastPray.</ls>, which is at <L>237718, is without a tooltip, but is not listed in either abbrevlist_unknown.txt or in ls_abbrev_instances_unknown.txt, though present in both mwauth.txt and tooltip.txt.

When looked for the equivalence among these 4 files, noticed that both mwauth.txt & tooltip.txt have 168 no.s of to be expanded "Unknown reference" entries, whereas both abbrevlist_unknown.txt & ls_abbrev_instances_unknown.txt listed just 147 no.s.

What is the reason for the difference of 21 between the two sets of files?

The 21 additional entries in tooltip.txt are--

ĀpGṛh.
ĀpastPray.
Śak. (Chézy)
Śak. (Pi.)
AV. Paipp.
AV., SBE.
Kaegi, Der Ṛgveda
Ludwig, RV.
Muir's Sanskrit Texts
Muir, S. T.
Pañc. B.
Pat. (K.)
R. (B)
R. (B.)
R. G.
R. [B.]
RV. AnuvAnukr.
SV.Anukr.
Uttamac.²
YajurV. Parīś.
Zachariae, Beiträge

In these, Śak. (Pi.) occurs 18 times!!

Andhrabharati · 2022-07-21T09:16:14Z

Resolving 9 out of 10 instances of `<ls n="Unknown">`:

under <L>67611, <ls n="Unknown">lii, 19</ls> to be made as <ls n="AV.Pariś.">lii, 19</ls>, taking the prev. ls item (AV.Pariś.) as the ref.
[cf. PWG entry of govITI.]
under <L>71651, <ls n="Unknown">xxx.</ls> to be made as <ls n="Vīrac.">xxx.</ls>, taking the prev. ls item (Vīrac.) as the ref.
[cf. pwk entry candraketu and the Ind. St. 14 thereupon (p. 159).]
under <L>71652, <ls n="Unknown">xxx.</ls> to be made as <ls n="Vīrac.">xxx.</ls>, taking the prev. ls item (Vīrac.) as the ref.
[cf. pwk entry candrakeSa and the Ind. St. 14 thereupon (p. 159).]
under <L>71680, <ls n="Unknown">xv</ls> to be made as <ls n="Vīrac.">xv.</ls>, taking the prev. ls item (Vīrac.) as the ref.
[cf. pwk entry candracUqa and the Ind. St. 14 thereupon (p. 159).]
under <L>71827, <ls n="Unknown">xxx.</ls> to be made as <ls n="Vīrac.">xxx.</ls>, taking the prev. ls item (Vīrac.) as the ref.
[cf. pwk entry candravikrama and the Ind. St. 14 thereupon (p. 159).]
under <L>71874, <ls n="Unknown">xxx.</ls> to be made as <ls n="Vīrac.">xxx.</ls>, taking the prev. ls item (Vīrac.) as the ref.
[cf. pwk entry candrasena and the Ind. St. 14 thereupon (p. 159).]
under <L>84603, <s1 slp1="saMgIta-darpaRa">Saṃgīta-darpaṇa</s1>, <ls n="Unknown">vi</ls> to be made as <ls>Saṃgīta-darpaṇa, vi</ls>
under <L>95073.91, <ls n="Unknown">52, 5</ls> to be made as <ls n="R.">2, 52, 5</ls>; this is a print correction, and has the prev. ls item (MBh. &c.) as the ref.
[cf. pwk entry darh having "— 5) दृढ꣫ , दृळ्ह꣫" and the PWG entry darh having "°स्थूण R. 2, 105, 16. नौ 2, 52, 5."]
under <L>95074.05, <ls n="Unknown">52, 5</ls> to be made as <ls n="R.">2, 52, 5</ls>; this is a print correction, and has the prev. ls item (MBh. &c.) as the ref.
[cf. pwk entry darh having "— 5) दृढ꣫ , दृळ्ह꣫" and the PWG entry darh having "°स्थूण R. 2, 105, 16. नौ 2, 52, 5."]

Shouldn't the last two "दृढ (or दृळ्ह॑)" be marked as or-group candidates?

There are plenty more of such "unmarked groups", separated out as diff. HWs in the whole data of mw.txt.
#132 (comment)

Andhrabharati · 2022-07-21T15:59:50Z

@funderburkjim

while you're on this MW work, would you mind generating the IAST version of mw.txt again [so that I can do a better (rather, faster) work using it]?
[I am having the version which is more than one year old (Apr 2021); lot many updates have taken place on the text during this period.]

Ref: sanskrit-lexicon/MWS#136

funderburkjim · 2022-07-22T02:34:00Z

2nd batch of corrections.

These take into account the preceding comments by @Andhrabharati.
About 170 lines changes in mw.txt.
Change transaction details are in change_6.txt.

Note 1: The one remaining n="Unknown" was solved:

 <L>81877<pc>433,1<k1>tattvaboDa
   knowledge or understanding of truth, <ls n="Sarvad.">xii, 46</ls>
   [cf. PWG, and MW tattvaprakASa]

Note 2: Shouldn't the last two "दृढ (or दृळ्ह॑)" be marked as or-group candidates?

  They already are so marked  in
   L>95073.9<pc>490,2<k1>df|a and <L>95074<pc>490,2<k1>dfQa
which have the 'or' markup: <info or="95074,dfQa;95073.9,df|a"/>
 The `or` markup is not repeated for the '2a' subsidiary entries.

The iast version of revised mw.txt is temp_mw_6_iast.zip.

The unknown ls abbreviations file is revised and contains 170 items: abbrevlist_unknown.txt

Instances of the abbreviations with unknown tooltips are in
ls_abbrev_instances_unknown1.txt)
and ls_abbrev_instances_unknown1_iast.txt) based on temp_mw_6_iast.txt.

funderburkjim · 2022-07-22T02:47:19Z

We can discuss tooltips for the unknown literary source abbreviations under #135.
The best format for me would be via an edit of abbrevlist_unknown.txt, where
each Unknown reference text is replaced by the appropriate tooltip text for the abbreviation.
The ls_abbrev_instances_unknown1_iast.txt file might be helpful in examining the cases.

Perhaps now we can consider this #136 closeable?

Andhrabharati · 2022-07-22T03:47:26Z

@funderburkjim

Wonderful updates!
And, thanks for the IAST file.

About to finish resolving the unknown reference entities (just another 15 remaining).
Will post the results in #135.

Andhrabharati · 2022-07-22T03:47:44Z

And you can close this issue now.

Ref: sanskrit-lexicon/MWS#136

funderburkjim added a commit that referenced this issue Jul 15, 2022

ls cleanup continues. #136

0d6a703

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Jul 15, 2022

mw: further ls cleanup.

22529fc

Ref: sanskrit-lexicon/MWS#136

funderburkjim closed this as completed Jul 15, 2022

funderburkjim reopened this Jul 15, 2022

funderburkjim added a commit to sanskrit-lexicon/csl-pywork that referenced this issue Jul 21, 2022

mwauth additional unknown literary source abbreviations.

15c92a6

Ref: sanskrit-lexicon/MWS#136

funderburkjim added a commit that referenced this issue Jul 21, 2022

#136 continued

bcf913f

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Jul 21, 2022

MW. Further ls cleanup.

a741782

Ref: sanskrit-lexicon/MWS#136

funderburkjim added a commit to sanskrit-lexicon/csl-pywork that referenced this issue Jul 21, 2022

mwauth edit. Ref: sanskrit-lexicon/MWS#136

8eb9d64

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Jul 21, 2022

MW: 2nd batch of corrections,

1d017dc

Ref: sanskrit-lexicon/MWS#136

funderburkjim added a commit that referenced this issue Jul 22, 2022

change_6.txt in #136

6a0a8a8

funderburkjim closed this as completed Jul 22, 2022

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Jul 27, 2022

MW: ls abbreviation corrections.

bca895c

Ref: sanskrit-lexicon/MWS#136

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further <ls> cleanup #136

Further <ls> cleanup #136

funderburkjim commented Jul 8, 2022

gasyoun commented Jul 9, 2022

funderburkjim commented Jul 15, 2022

Andhrabharati commented Jul 15, 2022 •

edited

Loading

Andhrabharati commented Jul 15, 2022 •

edited

Loading

Andhrabharati commented Jul 15, 2022 •

edited

Loading

funderburkjim commented Jul 15, 2022

Andhrabharati commented Jul 16, 2022 •

edited

Loading

funderburkjim commented Jul 21, 2022

Andhrabharati commented Jul 21, 2022

Andhrabharati commented Jul 21, 2022

Andhrabharati commented Jul 21, 2022 •

edited

Loading

Andhrabharati commented Jul 21, 2022 •

edited

Loading

Andhrabharati commented Jul 21, 2022 •

edited

Loading

Andhrabharati commented Jul 21, 2022 •

edited

Loading

Andhrabharati commented Jul 21, 2022 •

edited

Loading

funderburkjim commented Jul 22, 2022 •

edited

Loading

funderburkjim commented Jul 22, 2022

Andhrabharati commented Jul 22, 2022

Andhrabharati commented Jul 22, 2022

Further <ls> cleanup #136

Further <ls> cleanup #136

Comments

funderburkjim commented Jul 8, 2022

gasyoun commented Jul 9, 2022

funderburkjim commented Jul 15, 2022

Andhrabharati commented Jul 15, 2022 • edited Loading

Andhrabharati commented Jul 15, 2022 • edited Loading

Andhrabharati commented Jul 15, 2022 • edited Loading

funderburkjim commented Jul 15, 2022

Andhrabharati commented Jul 16, 2022 • edited Loading

funderburkjim commented Jul 21, 2022

further cleanup.

Andhrabharati commented Jul 21, 2022

Andhrabharati commented Jul 21, 2022

Andhrabharati commented Jul 21, 2022 • edited Loading

Andhrabharati commented Jul 21, 2022 • edited Loading

Andhrabharati commented Jul 21, 2022 • edited Loading

Andhrabharati commented Jul 21, 2022 • edited Loading

Resolving 9 out of 10 instances of <ls n="Unknown">:

Andhrabharati commented Jul 21, 2022 • edited Loading

funderburkjim commented Jul 22, 2022 • edited Loading

2nd batch of corrections.

funderburkjim commented Jul 22, 2022

Andhrabharati commented Jul 22, 2022

Andhrabharati commented Jul 22, 2022

Andhrabharati commented Jul 15, 2022 •

edited

Loading

Andhrabharati commented Jul 15, 2022 •

edited

Loading

Andhrabharati commented Jul 15, 2022 •

edited

Loading

Andhrabharati commented Jul 16, 2022 •

edited

Loading

Andhrabharati commented Jul 21, 2022 •

edited

Loading

Andhrabharati commented Jul 21, 2022 •

edited

Loading

Andhrabharati commented Jul 21, 2022 •

edited

Loading

Andhrabharati commented Jul 21, 2022 •

edited

Loading

Resolving 9 out of 10 instances of `<ls n="Unknown">`:

Andhrabharati commented Jul 21, 2022 •

edited

Loading

funderburkjim commented Jul 22, 2022 •

edited

Loading