You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The main division of a book is always marked in small Roman numbers [ivxc] in MW99 print, followed by a comma for further numbers in Indo-Arabic [0-9].
(b) correct "([^r])iv. ([0-9])" with \1iv, \2 : 7 occurrences (proofing errors)
;; while checking for "v." cases in different combinations, found puṇyamaheśākhya having the ls marked wrongly - "<s1 slp1="divya">Divya</s1>'v." instead of "<ls>Divyâv.</ls>".
(c) Pāṇ. is the largest deviant, having changed these (either unwittingly or deliberately) to Indo-Arabic numerals [0-9] (and followed by a dash or erroneously marked/tagged) in the Koeln data! All these can be by found by "Pāṇ. ([0-9])" & "(.?)[0-9]-(.*)</l" and appropriately corrected : over 8000 occurrences.
(d) Also seen that at many places i and 1 were taken wrongly (one for another). In some fonts (0,1,2,6,8) look smaller within the 'x-height' (Marcis should be knowing this term as he worked on Fonts as well!!), and (3,4,5,7,9) look bigger extended towards the bottom of base-line. (These all could be found by checking for isolated i, " i " places.)
Seen that a part of this topic (limited to my #c, Pāṇ.) was raised by Marcis (@gasyoun) earlier as issue #63 and also got closed, but not sure what was the conclusion there. The file I got few days back from Jim has all these uncorrected.
@------------------
There sure are different styles adopted in different books, and in my opinion we should not strive to make them uniform (or normalised) in data portion.
Of course, we can (and should) have such normalization (done internally) for search purposes, as was proposed by Dhaval (@drdhaval2785 ) elsewhere
The text was updated successfully, but these errors were encountered:
Just seen that except the point (c) which you had resolved sometime later, other points [(a), (b) and (d)] still need to be attended.
Hope you would look at this and do the needful soon, to close the issue.
The main division of a book is always marked in small Roman numbers [ivxc] in MW99 print, followed by a comma for further numbers in Indo-Arabic [0-9].
(a) correct "([ix]). ([0-9])" with "\1, \2" : 55 occurrences (proofing errors)
(b) correct "([^r])iv. ([0-9])" with \1iv, \2 : 7 occurrences (proofing errors)
;; while checking for "v." cases in different combinations, found puṇyamaheśākhya having the ls marked wrongly - "<s1 slp1="divya">Divya</s1>'v." instead of "<ls>Divyâv.</ls>".
(c) Pāṇ. is the largest deviant, having changed these (either unwittingly or deliberately) to Indo-Arabic numerals [0-9] (and followed by a dash or erroneously marked/tagged) in the Koeln data! All these can be by found by "Pāṇ. ([0-9])" & "(.?)[0-9]-(.*)</l" and appropriately corrected : over 8000 occurrences.
(d) Also seen that at many places i and 1 were taken wrongly (one for another). In some fonts (0,1,2,6,8) look smaller within the 'x-height' (Marcis should be knowing this term as he worked on Fonts as well!!), and (3,4,5,7,9) look bigger extended towards the bottom of base-line. (These all could be found by checking for isolated i, " i " places.)
Seen that a part of this topic (limited to my #c, Pāṇ.) was raised by Marcis (@gasyoun) earlier as issue #63 and also got closed, but not sure what was the conclusion there. The file I got few days back from Jim has all these uncorrected.
@------------------
There sure are different styles adopted in different books, and in my opinion we should not strive to make them uniform (or normalised) in data portion.
Of course, we can (and should) have such normalization (done internally) for search purposes, as was proposed by Dhaval (@drdhaval2785 ) elsewhere
The text was updated successfully, but these errors were encountered: