New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Pali lookup with new system from digitalpalidictionary (DPD) #2733
Comments
Starting to look at this issue, it appears that /suttacentral/client/elements/lookups/sc-lookup-pli.js` is the code that needs to be modified on SC. There is this file on the TheBuddha'sWords repo that I believe is used for the definition lookup: https://github.com/thebuddhaswords/BW2/blob/main/js/dpd_deconstructor.js |
So I wanted to dig in and see what might actually happen when we change this. So I went to MN111 and clicked on This breaks the word in two and gives links to two separate sc definition pages: puthu and pañña I'm assuming that we would be using similar data to that found here that includes the deconstructor data that splits compounds in dpd_deconstructor.js, the inflected to the non-inflected (aka headword) in dpd_i2h.js and finally the definition in dpd_ebts.js So if we use the DPD data, we first have to look up So we would end up with something like this: This means that the clickable headword of the definition would be Is this what we want? Would we expect that the individual parts be turned into links to their own definition pages? Question on integrationSo is the intention to remove the CPED and replace it with the definitions from the DPD? Will the CPED be needed at all after this change is implemented? @buddhist-uni Thoughts? |
Yes, that sounds pretty good, I haven't looked at the details. The main thing would be to look at what the DPD offers and make the best use of that, rather than trying to make it fit in with what we already have. As for removing CPED, yes, that would be the result. The CPED is only half done, as we didn't (and still don't) have all the updated entries from Cone. DPD seems to incorporate these as well (judging from headword definitions) so there's no point in duplicating it. We only really did the CPED because of lack of anything better. |
I agree that this would be a helpful improvement to the site 😊 |
One thing that I hadn't considered is that the lookup feature doesn't apply just to English, but several languages: It appears that if you have, for example, Spanish set as the site language and Spanish lookup set, that when you click at the bottom to go to the definition page (e.g. https://suttacentral.net/define/bhaga?lang=es) that it will use the Spanish version of the NCPED. So I guess even if we replace the English NCPED, we will need to keep the same code working for the other languages? |
I have made some test in this PR: #3142 |
@cittadhammo Glad to see you're on the case! It is already looking good, here are my suggestions for improvement.
Let me know your thoughts. @sujato what would your suggestion be for integrating a monthly DPD update? I can either export to a specified location in my own dpd-db repo or a make a commit / pull request to a repo of your choice. |
Not sure if I understand point 1:
This gives the following result: This does not add any more information in my opinion, just clutter the UI? Oh, but I can see now in
I guess, we could try to group the similar headword to give the following result:
Why are the values in I understand the other point, I will try to work on it soon. ;-) |
Ok, I modified the behaviour to have something like this: I think this is what you ment @bdhrs (https://www.phind.com/ was my friend on this one...) |
There is currently no user reference for this on SC. I believe the philosophy is to keep things as simple as possible. "Decisions not options" and all. But this is something Bhante @sujato needs to weigh in on. I'm not sure if everyone has noticed that when you click on one of the headwords in the lookup definitions (so in the image above The other bit that is problematic is that often the DPD has an entry for what would be considered a compound word. So the clickable headword in the lookup definition will (even once the DPD is showing there) take you to a definition page that won't include the PED. I would love it if, for example, the words circled in green below could be clickable and take you to a definition page. I'm not sure if there is a simple way to only have the real words clickable, e.g. not have I think that once this (wonderful!) feature is ready for wider testing, we should have it live on the staging site for a while so more people can test it out. Thank you so much for working on this. It's going to be such a vast improvement over the existing lookup feature. |
@thesunshade Yes, that would be nice and it was an old idea. It is like this in the tipitaka pali reader I believe ;-) Can you invite the main dev on this one? I do not at present understand how the pali lookup pick the words in the page. A few hint would be nice. |
@ihongda is the main dev. @blake-sc was the one who originally adapted the lookup feature from the DPR, although I don't believe he is active here now.
Do you mean how the clicking works, or what happens between the click and the definition display? I'm not sure either. |
I was just outside, will reply later. |
https://github.com/suttacentral/suttacentral/blob/main/client/elements/text/sc-text-bilara.js line 1260
When Pali lookup is enabled, the event will be added to the span where the Pali word is located, and when clicked, a Pali definition search will be performed. |
Ok, thank you @ihongda, maybe it suffices to add those specific spans around the pali word generated by the dictionary and bind them to the event listener. From what I can see we could do:
This should make them clickable. I can see that it is quite intricate, but let me know if I am on the good track ;-) |
I just noticed in the DevTools that the new version of the lookup seems to involve the downloading of two large assets: They aren't that big, but it's worth considering the performance impact. Related to this, I wonder if it might make sense to create a dictionary API to be able to serve up just the definitions that are needed. Of course once there was a dictionary API there are probably lots of other cool things we could be doing with it. Edit to add:OK, it appears that there are indeed some kind of dictionary glossary. It is fetched on the definition pages: https://suttacentral.net/api/glossary And one for specific words that returns the pts entry. e.g. https://suttacentral.net/api/dictionary_full/saddhi%E1%B9%81?language=en So now I'm wondering if we can't offer up the full DPD entry using that same API. Here is the json file of the PED. https://github.com/suttacentral/sc-data/blob/main/dictionaries/complex/en/pli2en_pts.json And here is an example of an entry:
You see that the definition is just html. |
Yes, this is one of my question, as stated in the PR:
Also, if the DPD is implemented in the API, it would also serve in the defintion page https://suttacentral.net/define isn't it? |
Not sure where it's best to have discussions (here or on gitter) but I wanted to make a record of two outstanding issues:
|
Yes, I have them in my important to do list: https://github.com/suttacentral/suttacentral/wiki/DPD-Pali-Lookup-Implementation#to-do I think I will focus on that in the next days. |
Our Pali lookup was originally created by Blake based on the then best appraoch. A new approach that is far more accurate has been created by the Digital Pali Dictionary project.
We should investigate how this works and see if it can be applied to SC and Bilara.
related but deleted files
https://github.com/digitalpalidictionary/dpd-db/blob/main/tbw/tbw_exporter.py
https://github.com/digitalpalidictionary/dpd-db/tree/main/tbw
https://github.com/digitalpalidictionary/dpd-db/tree/main/tbw/output
The text was updated successfully, but these errors were encountered: