-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
non-english mw words #127
Comments
The end results are iin two text files:
Suggested first taskAlthough the Enchant dictionaries do not find these words as English words, nonetheless, a large portion of them look to actually be English words. Using Browser 'define: X' and/or https://www.merriam-webster.com/, This is just a suggestion. What do you think @AnnaRybakovaT ? Good place to start? What else would you |
Dear Jim, |
@AnnaRybakovaT Your system of identification of the two cases looks consistent, and easy to work with. 👍 When this step is done, the next step will be to examine further the nf (not-found) in the context of MW usage -- we'll think about this further when the time comes. |
Construction detailsThis is a documentation summary of the files constructed leading up to the two files mentioned above.
|
Dear Jim,
There is only my suggestion. If it is better to make the 1st step as you described above (I mean - to add only "found" and "nf"), I will do it by this way. |
Adding those extra comments to the 'nf' is fine, since it will help in the next step of further analysis of the nf. |
Seems @AnnaRybakovaT is where she belongs to again, thanks @funderburkjim for the guidance. |
Dear Jim, |
Hi, Anna -- we must have been communicating telepathically, as I was thinking 'Where is Anna?' earlier today! |
Indeed. I heard your question one day before you heard and asked Anna to push what he has. The task is big, so I proposed she splits it into parts. Let's have our annual a call on 26th of December? @funderburkjim @drdhaval2785 @AnnaRybakovaT @Andhrabharati @SergeA ? Last time it was around 12:00 Moscow time, or? |
Noon Moscow would be 8PM in New York (my time zone). That time ok with me. I suggest one discussion point be how to proceed with less from me. I want to spend considerably more time on (a) improving my Sanskrit literacy, (b) a long-standing mathematics project ignored for almost 4 years now. There is a huge backlog of sanskrit-lexicon tasks that are currently assigned to me. I aim to address these, but at a less intensive pace.* Perhaps others will adopt some of these tasks, or perhaps others may wish to move the sanskrit-lexicon project into new directions. It will be interesting to see how things unfold.
|
Many of these words (if not all) could be traced in the mw text, by regex searching for the word followed by [^\.], i.e., xxx[^\.] You seem to have missed some of these, as you had removed the ending punctuation mark!! As such, you may update the (above) lists by you, after checking. |
@AnnaRybakovaT Thoughts looking at 'words_mw_noneng_temp.txt'
We should probably somehow make use of the accepted words (i.e., those whose spelling we decide to leave unchanged) in mw.txt. For example, the word 'Capricornus' appears in AP90.txt and is one that you 'found'. It seems you have examined about 56% of the cases. Keep going! |
@Andhrabharati re words_notmw.txt Note that within my analysis (see Construction details note above) For example 'Acacia' appears in words_notmw.txt. Within mw.txt, this word DOES occur 113 times, |
Yes, checked that they are all marked now; but they weren't at the time of my working those days (during March 2021). These are the 4 lines from the mw_iast.txt (dt 04.04.21) by you, which was the last one I had considered (after which I stopped tracking the mw, and shifted to other works)-
Anyways, there are just about 500 words in the "words_notmw.txt", and is not a big issue to discuss more. |
Dear Jim, |
What would be the agenda, @gasyoun? |
One does not know in advance.
Yes, it will increase in 2022-2032.
Sounds like a plan.
Can I send you a mathemathician to help out so you can ignore it even longer?
As per Sanskrit literacy - may I know what do exactly do you want to read?
Exactly, kind of ghostword or newEnglish. But as we have German dicitonaries with the same issues, so ghostword could be used?
Exactly.
So glad @AnnaRybakovaT is back - not only beutifull, but smart and hard working she is. |
For starters, Kale's Hitopadesha, Lanman reader stories, Bhagavad Gita, Peter's Ramopakhyana, maybe Indishe Spruch verses -- I would like to be able to dip into any of these and sight read with ease. |
@funderburkjim I am presently working on SCH and likely to be posting the results, before this month ending. |
Dear Jim, Addendum to Anna's comment of Jan 24, 2022 (Jim) |
May you never feel weekness.
Absolutely impressed.
It's good you started with Kale. Indishe Spruch are mostly hard to understand, as is sometimes Bhagavad Gita. Peter's Ramopakhyana is interesting, but still more advanced than Lanman reader stories. It's good you started with Kale. |
Good work done, @AnnaRybakovaT; you indeed are a smart worker as @gasyoun mentioned above. Just seen that there are some missings and errors in your file, and I'm sure @funderburkjim would be reviewing them all over before incorporating them into Cologne files. Here are a few quick ones-
|
Thanks a lot for your checking and explanation of missing cases (I had no ideas what it could be)!!! |
Would you mind regenerating the "latest" iast and deva files for the mw.txt? I have noticed quite a few issues that need corrections, and thought of doing a complete proofing once for all. This time, I estimate a time-frame of about 6-8 months for the full proofing. Hope to see your response soon on this. |
Would you be interested to do this [as @funderburkjim is either not interested in this proposal, or did not "see" this above post yet (being busy on PWG ls working)]? Or else, I will take up some other big work for a long term, starting a few days from now. |
You want new devanagari files, I can. |
https://github.com/sanskrit-lexicon/csl-devanagari/blob/main/v02/mw/mw.txt is the latest MW Devanagari version. |
In the last file by @AnnaRybakovaT at the #127 (comment), both
are proper in the text, being the plural of Rakshas & Ushas respectively, and no change required in those words. |
I had seen you copying Anna's work after a gap of 6 months; and now another year-and-half has elapsed. |
@Andhrabharati Am taking up review of words_mw_noneng_1.txt. |
processing of nonenglish words.Work directory is unique_eng.
For a few old words, these were useful: For Latin words, sometimes this was useful: https://www.online-latin-dictionary.com/latin-english-dictionary.php |
Further research and usageThere is a lot of good information in the research by @AnnaRybakovaT and @Andhrabharati. Not clear where to put it so that it may be available when needed another time. Maybe where @drdhaval2785 has put his word studies. |
Though you have mentioned that (Anna's and) my 'research' contained some good info, you had ignored/skipped this post above. |
A quick looking into the 40 print-changes prompted me to comment thus--
;; AB there are few more cases of such 's-z' variants-- realization (5) vs. realisation (4); cauterization (4) vs. cauterisation (1) |
Another info, that I wanted to present here--
This does not indicate "per annum" as Anna thought; for the context (there are some more places that pw has used "per anum") seems to mean "from/by anus", anum being the inflected form of Anus (Latin word). |
@Andhrabharati Revised per your comment(s). For details, see commits above. |
I presumed that these two plurals also would/should be marked, as |
@Andhrabharati |
This comment is one branch of #99.
By various means (which I'll describe below tomorrow), a list of 1509 words was developed which
The text was updated successfully, but these errors were encountered: