Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greek text #6

Closed
funderburkjim opened this issue May 10, 2022 · 9 comments
Closed

Greek text #6

funderburkjim opened this issue May 10, 2022 · 9 comments
Labels
enhancement New feature or request

Comments

@funderburkjim
Copy link
Contributor

This issue devoted to mining the Greek text for Benfey dictionaries
from the version prepared by @Andhrabharati (reference).

@gasyoun gasyoun added the enhancement New feature or request label May 11, 2022
funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue May 20, 2022
funderburkjim added a commit that referenced this issue May 20, 2022
@funderburkjim
Copy link
Contributor Author

Greek text added

Work done in greek directory.
The method used was to transfer the greek text fragments from Andhrabharati's BEN_main_L2a.txt file (see link above) into the bur.txt digitization of csl-orig/v02/bur.
First we require a few adjustments so that the list of entries in the CSL and AB versions are the same.
Then for each corresponding entry, we require that the number of <lang n="greek">.*?</lang> patterns be the same. This requirement is not initially met for various reasons, the main reason being how greek text spanning a line break is marked with two lang fragments in CSL, but one only in AB. This is generally resolved by removing a lang element in CSL version. After several adjustments, this requirement is met.
Then, for a given entry, we align the N greek text fragments from CSL and AB, and replace the first Greek text fragment in CSL with the first greek text fragment in AB, and similarly for the 2nd, 3rd, ... ,, Nth fragments.
When this is done for all the entries, then the transfer is considered to be complete.
And in both CSL and AB versions we have 1243 greek text fragments.

@funderburkjim
Copy link
Contributor Author

@Andhrabharati
There was one entry (L=14998) where I think your version had a small missing text fragment.
I attempted to add the greek text as κρανρος but think the ν is wrong. Please review (use change_ab_3.txt to see exactly the change ), and make a comment below for the correction to κρανρος , which I will then enter separately into csl-orig/v02/ben/ben.txt.

@funderburkjim
Copy link
Contributor Author

proof reading

proof_greek.txt was prepared to help someone proof read the greek text.
This listing could be used along with the scan-page display url:
https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/servepdf.php?dict=ben&page=969.

Note the page is given in the <pc> element in proof_greek.

Perhaps @jmigliori could undertake this proof-reading task when his time permits?

@Andhrabharati
Copy link

Here is the replacement text--
κρανρος > κραῦρος

@Andhrabharati
Copy link

(change_ab_3.txt) line 73:
ἄχνυμαι ἄχος > ἄχνυμαι, ἄχος
(comma missing in the beginning portion)

funderburkjim added a commit that referenced this issue May 20, 2022
@funderburkjim
Copy link
Contributor Author

two items above adjusted.
I think we can close this now. Will open separate issue for proof-reading, in hopes that @jmigliori or someone else undertakes that.

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue May 21, 2022
@funderburkjim
Copy link
Contributor Author

funderburkjim commented May 21, 2022

punctuation

In ref, @Andhrabharati observes

more often than not, the punctuation mark (mostly , and ;) are missed after the ending Greek tag.

This comment is applicable to the Benfey greek text.
Was able to find a way to add the punctuation (from AB version) to the ben.txt file.
This has been done.
About 750 lines changed.
One way to examine the changes is in the diff file for revised proof_greek.txt (see).

Or, the change file : change_7.txt
Will investigate similar punctuation revision for recent greek text additions to bop and bur dictionaries.

@Andhrabharati
Copy link

Probably the csl INM file also needs this punctuation 'revisit' for the Greek strings. (I haven't seen Jim's file so far.)

@funderburkjim
Copy link
Contributor Author

greek text in addenda section

This from ben_Addenda.txt file provided by @Andhrabharati (reference).

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue May 24, 2022
funderburkjim added a commit that referenced this issue May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants