Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numbering under <ls>...</ls> portions of Koeln MW99 data #95

Closed
Andhrabharati opened this issue Jan 5, 2021 · 2 comments
Closed

Numbering under <ls>...</ls> portions of Koeln MW99 data #95

Andhrabharati opened this issue Jan 5, 2021 · 2 comments

Comments

@Andhrabharati
Copy link
Contributor

Andhrabharati commented Jan 5, 2021

The main division of a book is always marked in small Roman numbers [ivxc] in MW99 print, followed by a comma for further numbers in Indo-Arabic [0-9].

(a) correct "([ix]). ([0-9])" with "\1, \2" : 55 occurrences (proofing errors)

(b) correct "([^r])iv. ([0-9])" with \1iv, \2 : 7 occurrences (proofing errors)
;; while checking for "v." cases in different combinations, found puṇyamaheśākhya having the ls marked wrongly - "<s1 slp1="divya">Divya</s1>'v." instead of "<ls>Divyâv.</ls>".

(c) Pāṇ. is the largest deviant, having changed these (either unwittingly or deliberately) to Indo-Arabic numerals [0-9] (and followed by a dash or erroneously marked/tagged) in the Koeln data! All these can be by found by "Pāṇ. ([0-9])" & "(.?)[0-9]-(.*)</l" and appropriately corrected : over 8000 occurrences.

(d) Also seen that at many places i and 1 were taken wrongly (one for another). In some fonts (0,1,2,6,8) look smaller within the 'x-height' (Marcis should be knowing this term as he worked on Fonts as well!!), and (3,4,5,7,9) look bigger extended towards the bottom of base-line. (These all could be found by checking for isolated i, " i " places.)

Seen that a part of this topic (limited to my #c, Pāṇ.) was raised by Marcis (@gasyoun) earlier as issue #63 and also got closed, but not sure what was the conclusion there. The file I got few days back from Jim has all these uncorrected.

@------------------
There sure are different styles adopted in different books, and in my opinion we should not strive to make them uniform (or normalised) in data portion.

Of course, we can (and should) have such normalization (done internally) for search purposes, as was proposed by Dhaval (@drdhaval2785 ) elsewhere

@Andhrabharati
Copy link
Contributor Author

@funderburkjim

Just seen that except the point (c) which you had resolved sometime later, other points [(a), (b) and (d)] still need to be attended.
Hope you would look at this and do the needful soon, to close the issue.

@Andhrabharati
Copy link
Contributor Author

I've corrected the above points (a, b & d) in my file now.

As apparently no one else seems to have time (or interest ?) to look at these observations, closing this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant