Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metaline <pc> not containing the column data #15

Closed
Andhrabharati opened this issue Mar 26, 2022 · 43 comments
Closed

Metaline <pc> not containing the column data #15

Andhrabharati opened this issue Mar 26, 2022 · 43 comments

Comments

@Andhrabharati
Copy link

Andhrabharati commented Mar 26, 2022

It is seen that SKD metalines do not have the 'c' data in 'pc' field!

As @drdhaval2785 says he regularly consults SKD, he might be interested to incorporate the values with a small script; it is somewhat time-consuming to look for the entry word in the page (esp. at the online scans), without the column indication.

Or @funderburkjim might do this himself.

Here is the corresponding necessary data generated from the text file itself--
SKD pc values in metalines.txt

@funderburkjim
Copy link
Contributor

This is similar to the pc (page-column) errors in md.txt (refer: sanskrit-lexicon/MD#7).

@AnnaRybakovaT This would be a good next project for you, if time permits before you leave for the season. What do you say?

@Andhrabharati
Copy link
Author

@funderburkjim
you might wish to 'make' the posted file to be in uniform form throughout (I did not do it myself deliberately!), before using it for replacement by @AnnaRybakovaT

@funderburkjim
Copy link
Contributor

@Andhrabharati Looks uniform to me. Where is it not uniform?

@Andhrabharati
Copy link
Author

The 2nd column is not having the <pc> tag except for the initial 200+ lines (out of 42K+ lines).

Either it should be present throughout, or absent everywhere.

@funderburkjim
Copy link
Contributor

ok. We can work around that difference

@Andhrabharati
Copy link
Author

Here is the updated uniformly 'constructed' version of the above file--
SKD pc values in metalines.txt

@AnnaRybakovaT
Copy link
Contributor

This would be a good next project for you, if time permits before you leave for the season. What do you say?

I agree))

@funderburkjim
Copy link
Contributor

@AnnaRybakovaT I'll get instructions for you on this soon.
In the meantime, do some reading on Python dictionaries, since this will be useful in
the solution of this improvement that @Andhrabharati has set for us.

An introduction to Python dictionaries: https://www.w3schools.com/python/python_dictionaries.asp.

As usual, our programs use only some of the features of dictionaries.
You can try the following in an interactive session (python -i) or in a test program (temp.py)

d = {}  # initialize an empty dictionary
d['a'] = 0   # set 'a' to be a *key* of the dictionary with value 0
d['a'] = d['a'] + 1  # set the value of the dictionary at 'a' to be 1 more than it was
if 'b' in d: # test if 'b' is a key of dictionary
  print("'b' is a key of d")
else:
 print("'b' is not a key of d")

keys = d.keys()  # gather all the keys of dictionary
for key in keys:  # loop over the keys
 print("value of d at key %s is %s" %(key,d[key]))

Small exercise: write a program that takes a string (such as
the dog and the cat sat in a hat) and writes a count of each letter appearing in the string.
e.g.

t 5
h 3
etc.

Use a dictionary.

You might vary your program by using the Python 'sorted' function so that the
letters print in alphabetical order.

@funderburkjim
Copy link
Contributor

@AnnaRybakovaT
Please clone this repository (https://github.com/sanskrit-lexicon/SKD).

I've made a stub directory (corrections/issue15/) where our work can go.
Thus far, there is only a brief readme.txt file in issue15 directory.

This project seems very similar to the MD pc errors project

Rather than having me set up the project, why don't you give it a try ?
Bring over to the issue15 directory as much of the material from https://github.com/sanskrit-lexicon/MD/tree/master/deva_iast_comp/step2a as needed.
And begin the process of altering the programs and readme, etc. to fit our issue 15 task.

Get as far as you can, then push your revised skd repository, and formulate questions where you might get stuck.

@AnnaRybakovaT
Copy link
Contributor

Please clone this repository

Dear Jim,
Thanks so much for your detailed instruction!
I need some days before to start this task. In some days we have to open our tourist shops but they are still not ready. From morning and untill late evening I do preparations and when I come home my brain and my body protest to do anything. So I prefer to focus now only on the shops to finish this work as much sooner.

@AnnaRybakovaT
Copy link
Contributor

Get as far as you can, then push your revised skd repository, and formulate questions where you might get stuck.

Dear Jim,
I am here again. Sorry for so long "some days", to be honest today I have the first free evening from the beginning of this month. Unexpectably our island have a lot of tourists and every working day after 2 diffucult years is important.

So I suppose we need:

  • digentry.py
  • updateByLine.py
  • SKD.pc.values.in.metalines.txt
  • test_make_change_pc.py (I can rename this file regarding our current task)
  • temp_SKD_0.txt (where can I find this file?)

Just now I can't push the revised skd repository:

Rybakova@ST-Rybakova MINGW64 ~/Documents/sanskrit-lexicon/SKD/corrections/issue15 (master)
$ git push
remote: Permission to sanskrit-lexicon/SKD.git denied to AnnaRybakovaT.
fatal: unable to access 'https://github.com/sanskrit-lexicon/SKD/': The requested URL returned error: 403

@funderburkjim
Copy link
Contributor

@AnnaRybakovaT Hi! Nice to hear from you. Which is 'our island'?

Try push again, think it should work for you now.

@Andhrabharati
Copy link
Author

Andhrabharati commented Apr 26, 2022

She lives somewhere in 'Greece', as I understand, @funderburkjim!

@Andhrabharati
Copy link
Author

Andhrabharati commented Apr 26, 2022

I am here again. Sorry for so long "some days", to be honest today I have the first free evening from the beginning of this month.

In fact, you had returned much sooner @AnnaRybakovaT (your another post elsewhere mentioned your return would be sometime after next November)!

@AnnaRybakovaT
Copy link
Contributor

AnnaRybakovaT commented Apr 26, 2022

Try push again, think it should work for you now.

Yes, it works!

I live on Patmos, it is tiny beautiful island with 3000 local inhabitants and 20 000 tourists during summer time.
I am Russian, I was living in Moscow but 6 years ago I had vacation on this island and met my future husband for whoom life in Moscow was absolutely impossible, so I had to move to Greece. By this way I have opened a new page of my life 100% diffrent in comparrison with previous.

@AnnaRybakovaT
Copy link
Contributor

(your another post elsewhere mentioned your return would be sometime after next November

It is true but before I would like to finish this current task.

@funderburkjim
Copy link
Contributor

@AnnaRybakovaT

These suggestions may help you get started.

get temp_skd.txt

Get latest version of skd.txt from https://github.com/sanskrit-lexicon/csl-orig/blob/master/v02/skd/skd.txt, rename the file as temp_skd.txt, and move
temp_skd.txt into this skd/corrections/issue15 directory

You can use the 'download' button on the page above,
or use the following 'curl' command

curl https://raw.githubusercontent.com/sanskrit-lexicon/csl-orig/master/v02/skd/skd.txt -o temp_skd.txt

start modifying program

test_make_change_pc.py is the program previously used.

The command to run this program will be
python test_make_change_pc.py temp_skd.txt SKD.pc.values.in.metalines.txt changes.txt

First, put an 'exit(1)' statement after 'entries = digentry.init(filein)',
and run the program. It should properly read in temp_skd.txt into an
array of entries.

modify Pcerror class

The init method of the class parses a line of SKD.pc.values.in.metalines.txt.
Our first line is

<L>1	<pc>1-001	[1-001-a]

We need three attributes for the instances of our Pcerror class.
Let's call the attribute names L, pcold, and pcnew
For the first line, the values will be strings 1, 1-001, and 1-001-a
Design a regex to do the parsing
m = re.search(regex,line)
self.L = m.group(1), etc.

Also, in init_pcrecs, remove the dbg statements (they are not relevant now).
Finally, move the exit(1) statement to go after 'pcrecs = init_pcrecs(...)'
and rerun your program.

When this part is working properly, we'll be ready to modify generate_changes.

Note: Keep readme.txt up to date.

@funderburkjim
Copy link
Contributor

@AnnaRybakovaT Haven't heard from you for a while.
Are you waiting on me, or just busy with other things?

@AnnaRybakovaT
Copy link
Contributor

@AnnaRybakovaT Haven't heard from you for a while. Are you waiting on me, or just busy with other things?

Dear Jim,
I am sorry again for my pause, I will be back in a week.

Next week I am taking part in an annual conference of Oriental studies in St.Petersburg (of course by Zoom) and I have to focus all free time for preparation of my presentation. Since now such scientific work is not my professional field and during the last 4 years I lost a lot of skills, it takes for me much more time just to write one article or to prepare one speach. In any case it is a big pleasure for me - I have possibility to learn something new (a topic of my research is Nepal) and still keep connections with Russian oriental studies' community.

@AnnaRybakovaT
Copy link
Contributor

Haven't heard from you for a while.

Dear Jim,
I hope you will excuse me, I have disappeared again.
I had brought my laptop to the store around 2 weeks ago (since mycurrent schedule is 10 am - 10 pm in the store without day-off and soon it will be untill midnight) and I expected to work a bit from the store but only today I have managed to switch on it. I notice that with every day we have more and more clients in the shop so it is why I would like to ask you to continue this current task after our tourist season. I feel so sorry that I can't complete this task now, every day I am thinking about this unfinished deal. To be honest I hoped to do even something today, but just during last 15 min I had to pay attention to people who were inside of the store and I realize that in such conditions it is imposibble to focuse om something else. So I hope we can continue this task in the autumn. If everything is fine I will come back in November or December.

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Jun 7, 2022
funderburkjim added a commit that referenced this issue Jun 7, 2022
@funderburkjim
Copy link
Contributor

@Andhrabharati Went ahead and did this correction. Enjoy!
Old:
image

new:
image

@funderburkjim
Copy link
Contributor

@AnnaRybakovaT
When you get back to this, the solution may be of interest.
I used a 'slow', but conceptually simple, linear search to match records in SKD.pc.values.in.metalines.txt to entries in skd.txt.

A good next learning step would be to
replace this linear search with a much faster python 'dictionary' lookup.

@Andhrabharati
Copy link
Author

@Andhrabharati Went ahead and did this correction. Enjoy!

As I had mentioned elsewhere, I rarely refer to SKD; so nothing much to enjoy for me.

@drdhaval2785 might feel it so, probably.

@drdhaval2785
Copy link

Yes. I do use SKD. Any improvement there is useful to me.

@gasyoun
Copy link
Member

gasyoun commented Jun 16, 2022

Any improvement there is useful to me.

What major improvements still lacking?

@AnnaRybakovaT
Copy link
Contributor

I used a 'slow', but conceptually simple, linear search to match records in SKD.pc.values.in.metalines.txt to entries in skd.txt.

Dear Jim and dear all,
Hopefully you are fine.

I have got back!!!)))
Fortunately this tourist season in Greece has been quite long, busy and successful. Now I have a "winter" break for a few months. Shall I focus on this issue or you prefer me to do any other tasks?

@funderburkjim
Copy link
Contributor

@AnnaRybakovaT
Hi, Welcome back!

AFAIK, this issue has been resolved : all the metalines for skd now have column designation, like the '-b' in
<L>25050<pc>3-511-b<k1>BikzukaH<k2>BikzukaH.

I have a non-programming task regarding quality of pdf images for mw dictionary. Will open a new issue and describe it further as exercise for you. I aim to do this soon, but my mind is elsewhere at the moment,

@Andhrabharati
Copy link
Author

Andhrabharati commented Dec 10, 2022

A good next learning step would be to
replace this linear search with a much faster python 'dictionary' lookup.

@funderburkjim
I think Anna would like to learn this.

And I can arrange the replacement for MW 'bad' pages from my physical copy (London print), if required.

Also I have made a better copy from the archive version of the pdf, from its unprocessed (uncompressed) image (jpeg2) pages, if it sounds interesting.

@Andhrabharati
Copy link
Author

Andhrabharati commented Dec 10, 2022

Speaking of having the better scan pages, how about replacing the present SKD scans with the excellent SKD scans shared by Thomas recently (appropriately downsized)?
#14 (comment)

@AnnaRybakovaT
Copy link
Contributor

I have a non-programming task regarding quality of pdf images for mw dictionary. Will open a new issue and describe it further as exercise for you.

Excellent! I am waiting for further details.

@gasyoun
Copy link
Member

gasyoun commented Dec 12, 2022

Speaking of having the better scan pages, how about replacing the present SKD scans with the excellent SKD scans shared by Thomas recently (appropriately downsized)?

@funderburkjim I believe there is no reason why not?

@funderburkjim
Copy link
Contributor

retract mw scan review?

In reviewing my notes regarding alleged bad scan pages for MW, I had noted about 10 such instances.
However, it seems all but two of these were NOT bad scans! Maybe I was hurrying and misidentified.
The only 2 of those 10 that do need replacement are

  • page 63, 1st column aBi-dUzita bad print
  • page 87 bottom

Thus, we should find replacements for these 2 pages.
But maybe a review of ALL mw pages is not a good use of time for @AnnaRybakovaT.
@Andhrabharati . What is your opinion?

@funderburkjim
Copy link
Contributor

skd scan replacemenet

@Andhrabharati - If you have a replacement of all the scans, I can develop instructions for you to get those in a form
which I can easily install into sanskrit-lexicon web site.

@funderburkjim
Copy link
Contributor

all dictionaries needing better scans.

I know there is at least one dictionary that could use better quality scans - GRA (Grassman). A few of sanskrit-lexicon scans for GRA have missing data (e.g. page 115), and many pages are of marginal quality and highly skewed. It would be nice to have good quality scans for the entire dictionary.

There are many of the Cologne dictionaries whose scan quality is unknown to me. I think it would be good to review scans for all of the dictionaries, evaluate the quality, and determine whether a few scans or all of the scans should be replaced; and then find good replacement scans and go through process of installing these at Cologne.
As with GRA, it would be nice to be assured that the scan links are as good as possible for the dictionaries covered by the Cologne sanskrit-lexicon.

Taking the lead in such a comprehensive review could be a valuable contribution for @AnnaRybakovaT.

Request comments by others.

@Andhrabharati
Copy link
Author

Andhrabharati commented Dec 14, 2022

But maybe a review of ALL mw pages is not a good use of time for @AnnaRybakovaT.
@Andhrabharati . What is your opinion?

I feel spending a few hours (3-4) [or even a day or two] is not a bad deal, as it would eliminate any bad scans (if present) in the repo.

@Andhrabharati
Copy link
Author

Andhrabharati commented Dec 14, 2022

I know there is at least one dictionary that could use better quality scans - GRA (Grassman).
Request comments by others.

I do have a good scan of GRA (1873) from Bayerisch Stattlib, and also a revision of the work by Maria Kozianka (1996).
This revision is similar to the revision/update of Bloomfield's Vedic Concordance; and incidentally both these revisions took place about a century later (wrt to the original editions)!

It would be a plausible option to update Cologne's GRA with this 1996 work-- @thomasincambodia and @funderburkjim may ponder on this suggestion.

Finally, isn't it better to talk about this scan pages matter at a 'new' issue, instead of here at this 'closed' issue?

@Andhrabharati
Copy link
Author

skd scan replacemenet

@Andhrabharati - If you have a replacement of all the scans, I can develop instructions for you to get those in a form which I can easily install into sanskrit-lexicon web site.

Yes, I have stored the SKD scan pages from Thomas and can do the needful.
[The pcloud link, where these were shared earlier by Thomas, is expired now.]

@gasyoun
Copy link
Member

gasyoun commented Dec 14, 2022

It would be a plausible option to update Cologne's GRA with this 1996 work-- @thomasincambodia and @funderburkjim may ponder on this suggestion.

It will become a copyright issue.

good scan of GRA (1873) from Bayerisch Stattlib, and also a revision of the work by Maria Kozianka (1996).
This revision is similar to the revision/update of Bloomfield's Vedic Concordance; and incidentally both these revisions took place about a century later

Interesting note

I feel spending a few hours (3-4) [or even a day or two] is not a bad deal, as it would eliminate any bad scans (if present) in the repo.

4 hours would not be enough to even open all these pages.

Request comments by others.

We could try.

@funderburkjim
Copy link
Contributor

@AnnaRybakovaT Instructions for reviewing mw scans posted at sanskrit-lexicon/MWS#144

@funderburkjim
Copy link
Contributor

Discussion regarding skd scans is moved to #16.

@funderburkjim
Copy link
Contributor

python dictionary learning

Anna had made some progress with Python. In comments above, apparently next reasonable step in Python study seems to be the python 'dictionary' data structure.

@AnnaRybakovaT Are you still interested in furthering your Python skills?

If so, there are many online resources that can get you started with Python dictionaries.
I think the w3schools Python dictionary material is a good starting point.

@funderburkjim
Copy link
Contributor

thomas new github name

Thomas changed his github name to @maltenth .
@thomasincambodia no longer present.

@AnnaRybakovaT
Copy link
Contributor

Are you still interested in furthering your Python skills?

Dear Jim,
Thanks for so kinds words. Yes, I am interested in this skill.
After the mw scan task I can study more about 'dictionary' data structure, in case if you will not offer me another more actual issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants