Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MWS accent correction, continue phase 3 #142

Closed
funderburkjim opened this issue Oct 21, 2022 · 26 comments
Closed

MWS accent correction, continue phase 3 #142

funderburkjim opened this issue Oct 21, 2022 · 26 comments
Labels

Comments

@funderburkjim
Copy link
Contributor

In #141, the last comments pertained began what was called phase 3. This is a page-by-page, column-by-column comparison of the scanned images with the cologne digitization of mw.txt. This comparison focuses primarily on the accents in the metaline and headline portion (before the broken bar) of the digitization.

This issue continues that task.

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Oct 21, 2022
funderburkjim added a commit that referenced this issue Oct 21, 2022
@funderburkjim
Copy link
Contributor Author

funderburkjim commented Oct 21, 2022

Recall from #141, that changes for page 1-59 are in issue141/change_mw_6.txt, of 10-17-2022.

The current work is done in mwissues/issue142 directory,.

This table is a log of progress.

date page range change-file
10-17-2022 0001-0059 issue141/change_mw_6.txt
10-20-2022 0060-0130 change_mw_01.txt
10-24-2022 0131-220 change_mw_02.txt
10-27-2022 0221-0299 change_mw_03.txt
10-31-2022 0300-0399 change_mw_04.txt
11-06-2022 0400-0499 change_mw_05.txt
11-12-2022 0500-0599 change_mw_06.txt
11-15-2022 0600-0699 change_mw_07.txt
11-19-2022 0700-0799 change_mw_08.txt
11-25-2022 0800-0899 change_mw_09.txt
11-28-2022 0900-0999 change_mw_10.txt
12-05-2022 1000-1099 change_mw_11.txt
12-13-2022 1100-1199 change_mw_12.txt
12-20-2022 1200-1308 change_mw_13.txt

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Oct 24, 2022
funderburkjim added a commit that referenced this issue Oct 24, 2022
funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Oct 27, 2022
funderburkjim added a commit that referenced this issue Oct 27, 2022
funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Nov 1, 2022
funderburkjim added a commit that referenced this issue Nov 1, 2022
funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Nov 6, 2022
funderburkjim added a commit that referenced this issue Nov 6, 2022
funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Nov 10, 2022
funderburkjim added a commit that referenced this issue Nov 10, 2022
@Andhrabharati
Copy link
Contributor

Andhrabharati commented Nov 12, 2022

So far about 48800 changes in 499 pages, i.e. almost 100/page on average [in this issue alone].

Guess @funderburkjim feels this exercise is worthy enough of his time, and would be alloting further time to continue the work in the remaining pages.

@funderburkjim
Copy link
Contributor Author

Yes, currently at page 617. Almost half-way. Probably 5-7 weeks to end (about 20 pages/day).

@Andhrabharati
Copy link
Contributor

Andhrabharati commented Nov 12, 2022

Today's correction file (06) is dated as 20th, instead of 12th, by error.

@gasyoun
Copy link
Member

gasyoun commented Nov 12, 2022

Probably 5-7 weeks to end

So until end of 2022. When should we plan for our yearly call? Right after?

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Nov 15, 2022
funderburkjim added a commit that referenced this issue Nov 15, 2022
@funderburkjim
Copy link
Contributor Author

Today's correction file (06) is dated as 20th, instead of 12th, by error.

Corrected.

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Nov 19, 2022
funderburkjim added a commit that referenced this issue Nov 19, 2022
funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Nov 25, 2022
funderburkjim added a commit that referenced this issue Nov 25, 2022
funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Nov 28, 2022
funderburkjim added a commit that referenced this issue Nov 28, 2022
funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Dec 5, 2022
funderburkjim added a commit that referenced this issue Dec 5, 2022
funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Dec 14, 2022
@Andhrabharati
Copy link
Contributor

Andhrabharati commented Dec 17, 2022

@funderburkjim

You forgot to add the last commit (1100-1199) in the 'progress log table' above.

Also pl. see my post at sanskrit-lexicon/SKD#16 (comment) reg. the annexure pages.

Hope you would be finishing this commendable task before this Christmas; this indeed is the most worthy exercise (in my view) in the last 25+ years of MW work at CDSL.

@funderburkjim
Copy link
Contributor Author

Updated progress log table. Thanks for mentioning.

@funderburkjim
Copy link
Contributor Author

Regarding annexure pages accent review, I was hoping you would cover that when I finish the body text accent review.

However, if you decide not to do that, then I will consider it later.

@Andhrabharati
Copy link
Contributor

Sure, I can resume my unfinished task (referred above) once you're done with your work and give me the updated iast file.

@gasyoun
Copy link
Member

gasyoun commented Dec 17, 2022

Sure, I can resume my unfinished task (referred above) once you're done with your work and give me the updated iast file.

Good to know.

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Dec 20, 2022
funderburkjim added a commit that referenced this issue Dec 20, 2022
@funderburkjim
Copy link
Contributor Author

accent review completed.

Review ends at page 1308.
Hurray! Plan to deal with a few (~ 100) cases noticed along the way before preparing a version for perusal by @Andhrabharati .

@funderburkjim
Copy link
Contributor Author

two accents.

There are a relatively few cases where an entry headword is marked with two accents.
two_accents.txt is the current list of 177.

These should be reviewed manually sometime.

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Dec 21, 2022
funderburkjim added a commit that referenced this issue Dec 21, 2022
@funderburkjim
Copy link
Contributor Author

a few extra

Based on notes made during the accent review, several additional entries were reviewed.
Some entries were changed, some were identified as open questions, and some were identified as no change required.
The commit above (b42..) can be used to review the changes.
readme_extra.txt may be consulted.

@gasyoun
Copy link
Member

gasyoun commented Dec 21, 2022

Review ends at page 1308.

I bow to the way you deal with issues.

two_accents.txt is the current list of 177.

agnī/-va/ruṇau should be read as agnī́-varuṇau and agnī-váruṇau or agnī́-váruṇau

The commit above (b42..) can be used to review the changes.

@Andhrabharati would you be willing to look at the 200 lines?

@Andhrabharati
Copy link
Contributor

@funderburkjim hasn't made room for my stepping in, @gasyoun ; he wants to do something still (look at his prev. post).

@funderburkjim
Copy link
Contributor Author

funderburkjim commented Dec 21, 2022

A crude statistic shows that the primary difference between the original and final versions of MW in this exercise
is due to removal of 'inherited' accents in compounds.

$ grep -E "<k2>[^<]*[\/^]" temp_mw_00.txt | wc -l  
BEFORE:  107142  metalines with accents

$ grep -E "<k2>[^<]*[\/^]" temp_mw_extra.txt | wc -l
AFTER: 48852  

(- 107142 48852) = 58290   metalines whose accents  are removed

------------------------------------------------------------------------
<e>[34]  are, roughly, the compounds.
$ grep -E "<k2>[^<]*[\/^].*?<e>[34]" temp_mw_00.txt | wc -l
BEFORE:  73303  metalines (of compounds) with accents

$ grep -E "<k2>[^<]*[\/^].*?<e>[34]" temp_mw_extra.txt | wc -l
AFTER: 15001 
-----------------------------------------------------------------------
(- 73303 15001) = 58302 metalines (of compounds) whose accents are removed

NOTE: Since 58302 is almost equal to 58290, the net accent removals are predominantly
attributable to removal of accents in compounds.

@gasyoun
Copy link
Member

gasyoun commented Dec 21, 2022

due to removal of 'inherited' accents in compounds

15001 changes?

@funderburkjim
Copy link
Contributor Author

See revision of comment above. Roughly 58000+ metalines changed.

@funderburkjim
Copy link
Contributor Author

Closing this issue.
#145 continues this review.

@Andhrabharati
Copy link
Contributor

Andhrabharati commented Dec 22, 2022

two_accents.txt is the current list of 177.

agnī/-va/ruṇau should be read as agnī́-varuṇau and agnī-váruṇau or agnī́-váruṇau

The commit above (b42..) can be used to review the changes.

@Andhrabharati would you be willing to look at the 200 lines?

@gasyoun

See what pwk says on this word--

image

@funderburkjim

I would again request you to make a page like https://sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/pwkvn/03/ (option 3 of https://sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/pwkvn/03/), for MW, PWG and pwk+pwkvn; this definitely would be helpful to easily track such queries, as MW is heavily depending on those two works.
[I had asked for this sometime back, and might've skipped your notice.]

@funderburkjim
Copy link
Contributor Author

Such a display with MW is not as easy as the pwkvn/03/ page, because there are differences in spelling conventions between MW and Boehtlingk. e.g. kar (pw) vs kf (mw) [slp1 spelling of root 'to do']. Without handling these spelling differences, the pwkvn/03 display could be adapted, but would sometimes stumble.
Another approach is https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/sample/dalglob1.php.
This handles the spelling differences well, but is visually less useful.

So getting a perfected display is non-trivial.

My solution when working with the mw accents and consulting occasionally pw or pwg has been to
open the simple-search list displays (input=slp1) in two tabs (one for pw, one for mw), and open servepdf (for mw) in a third tab (or these views could be opened in separate windows), This configuration makes it fairly easy to consult PW when necessary . This setup can be facilitated in windows-11 using a collection.

@gasyoun
Copy link
Member

gasyoun commented Dec 22, 2022

MW, PWG and pwk+pwkvn; this definitely would be helpful to easily track such queries, as MW is heavily depending on those two works.

@funderburkjim kar (pw) vs kf (mw) - with the acceneted words it will not become an issue at all. @Andhrabharati is not asking for a universal tool for all cases.

Another approach is https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/sample/dalglob1.php.

Yeah, it's not even reachable from homepage. Hope it can get some love in 2023.

two tabs (one for pw, one for mw), and open servepdf (for mw) in a third tab (or these views could be opened in separate windows)

Three open tabs is 2 tabs too much for me.

See what pwk says on this word--

Thanks, so the last one. Is it still a single pada?

@Andhrabharati
Copy link
Contributor

Thanks, so the last one. Is it still a single pada?

yes, it is a dvandvasamAsa.

@gasyoun
Copy link
Member

gasyoun commented Dec 22, 2022

it is a dvandvasamAsa

But why only a small part of them have two accents at once? Archaic ones?

@Andhrabharati
Copy link
Contributor

Andhrabharati commented Dec 22, 2022

it is a dvandvasamAsa

But why only a small part of them have two accents at once? Archaic ones?

Almost every 'dual' category entry that I came across is with double accent; just wait till I finish reading through the MW entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants