Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Pali lookup with new system from digitalpalidictionary (DPD) #2733

Open
sujato opened this issue Jul 31, 2023 · 19 comments
Open

Replace Pali lookup with new system from digitalpalidictionary (DPD) #2733

sujato opened this issue Jul 31, 2023 · 19 comments
Assignees
Labels
Type: improvement Make stuff better

Comments

@sujato
Copy link
Contributor

sujato commented Jul 31, 2023

Our Pali lookup was originally created by Blake based on the then best appraoch. A new approach that is far more accurate has been created by the Digital Pali Dictionary project.

We should investigate how this works and see if it can be applied to SC and Bilara.

related but deleted files

https://github.com/digitalpalidictionary/dpd-db/blob/main/tbw/tbw_exporter.py

https://github.com/digitalpalidictionary/dpd-db/tree/main/tbw

https://github.com/digitalpalidictionary/dpd-db/tree/main/tbw/output

@thesunshade thesunshade added the Type: improvement Make stuff better label Feb 28, 2024
@thesunshade thesunshade self-assigned this Apr 14, 2024
@thesunshade thesunshade changed the title replace pali lookup with new system from digitalpalidictionary Replace Pali lookup with new system from digitalpalidictionary (DPD) Apr 14, 2024
@thesunshade
Copy link
Collaborator

Starting to look at this issue, it appears that /suttacentral/client/elements/lookups/sc-lookup-pli.js` is the code that needs to be modified on SC.

There is this file on the TheBuddha'sWords repo that I believe is used for the definition lookup:

https://github.com/thebuddhaswords/BW2/blob/main/js/dpd_deconstructor.js

@thesunshade
Copy link
Collaborator

thesunshade commented Apr 14, 2024

So I wanted to dig in and see what might actually happen when we change this. So I went to MN111 and clicked on puthupañño

image

This breaks the word in two and gives links to two separate sc definition pages: puthu and pañña

I'm assuming that we would be using similar data to that found here that includes the deconstructor data that splits compounds in dpd_deconstructor.js, the inflected to the non-inflected (aka headword) in dpd_i2h.js and finally the definition in dpd_ebts.js

So if we use the DPD data, we first have to look up puthupañño in i2h to get puthupañña. We then check ebts and find a definition. Although this is a compound word, it has its own definition as such. Therefore the deconstructor data is never needed.

So we would end up with something like this:
image

This means that the clickable headword of the definition would be puthupañña. When you got to https://suttacentral.net/define/puthupañña you wouldn't get anything at all from the PED. You would only see the same definition from the DPD.

Is this what we want?

Would we expect that the individual parts be turned into links to their own definition pages?

2024-04-15 10_34_43-

Question on integration

So is the intention to remove the CPED and replace it with the definitions from the DPD? Will the CPED be needed at all after this change is implemented?

@buddhist-uni Thoughts?

@sujato
Copy link
Contributor Author

sujato commented Apr 14, 2024

Yes, that sounds pretty good, I haven't looked at the details. The main thing would be to look at what the DPD offers and make the best use of that, rather than trying to make it fit in with what we already have.

As for removing CPED, yes, that would be the result. The CPED is only half done, as we didn't (and still don't) have all the updated entries from Cone. DPD seems to incorporate these as well (judging from headword definitions) so there's no point in duplicating it. We only really did the CPED because of lack of anything better.

@khemarato
Copy link
Contributor

I agree that this would be a helpful improvement to the site 😊

@thesunshade
Copy link
Collaborator

One thing that I hadn't considered is that the lookup feature doesn't apply just to English, but several languages:
image

It appears that if you have, for example, Spanish set as the site language and Spanish lookup set, that when you click at the bottom to go to the definition page (e.g. https://suttacentral.net/define/bhaga?lang=es) that it will use the Spanish version of the NCPED.

So I guess even if we replace the English NCPED, we will need to keep the same code working for the other languages?

@cittadhammo
Copy link

I have made some test in this PR: #3142

@bdhrs
Copy link

bdhrs commented Apr 17, 2024

@cittadhammo Glad to see you're on the case! It is already looking good, here are my suggestions for improvement.

  1. Start with the headword (dhamma 1.1) before the part of speech (masc) because very often one inflected form is shared by two very different headwords.

  2. With the whole dictionary contained in the footer section, you could make any word in that section clickable, which would perform another dictionary search. That will be useful for deconstructed compounds, components of words, roots, etc. Maybe forward and back arrows to move back and forth between new and old dict searches.

  3. A future possibility is to include the English to Pāḷi dictionary for reverse lookup.

  4. It would be a nice touch if the DPD headword and content followed the user preference for niggahīta ṃ or ṁ. It's probably quite easy to check the preference and .replace().

Let me know your thoughts.

@sujato what would your suggestion be for integrating a monthly DPD update? I can either export to a specified location in my own dpd-db repo or a make a commit / pull request to a repo of your choice.

@cittadhammo
Copy link

cittadhammo commented Apr 17, 2024

@bdhrs

Not sure if I understand point 1:

  1. Start with the headword (dhamma 1.1) before the part of speech (masc) because very often one inflected form is shared by two very different headwords.

This gives the following result:

dhama

This does not add any more information in my opinion, just clutter the UI?

Oh, but I can see now in dpd_i2h.js that some word do have different headword, like:

    "akiñci": [
      "akaci 1",
      "akiñci",
      "akaci 2"
    ],

I guess, we could try to group the similar headword to give the following result:

akiñci:

  1. akaci:
    1. def of "akaci 1"
    2. def of "akaci 2"
  2. akiñci:
    1. def of "akiñci"

Why are the values in dpd_i2h.js not sorted ? (I can sort the array before displaying result no problem)

I understand the other point, I will try to work on it soon. ;-)

@cittadhammo
Copy link

Ok, I modified the behaviour to have something like this:

displ

I think this is what you ment @bdhrs (https://www.phind.com/ was my friend on this one...)

@thesunshade
Copy link
Collaborator

4. It would be a nice touch if the DPD headword and content followed the user preference for niggahīta ṃ or ṁ. It's probably quite easy to check the preference and .replace().

There is currently no user reference for this on SC. I believe the philosophy is to keep things as simple as possible. "Decisions not options" and all. But this is something Bhante @sujato needs to weigh in on.

I'm not sure if everyone has noticed that when you click on one of the headwords in the lookup definitions (so in the image above mahantā and mahata that it will take you to a definition page like https://suttacentral.net/define/mahatā. In the current version of the production site, as long as those words in the lookup have at least a definition in the NCPED, then at a minimum that will be shown on that definition page. I don't think the DPD definitions have been added to that definition page, but that should probably be part of this whole project.

The other bit that is problematic is that often the DPD has an entry for what would be considered a compound word. So the clickable headword in the lookup definition will (even once the DPD is showing there) take you to a definition page that won't include the PED.

I would love it if, for example, the words circled in green below could be clickable and take you to a definition page. I'm not sure if there is a simple way to only have the real words clickable, e.g. not have ena be clickable
2024-04-17 20_37_03-

I think that once this (wonderful!) feature is ready for wider testing, we should have it live on the staging site for a while so more people can test it out.

Thank you so much for working on this. It's going to be such a vast improvement over the existing lookup feature.

@cittadhammo
Copy link

@thesunshade Yes, that would be nice and it was an old idea. It is like this in the tipitaka pali reader I believe ;-)

Can you invite the main dev on this one? I do not at present understand how the pali lookup pick the words in the page. A few hint would be nice.

@thesunshade
Copy link
Collaborator

@ihongda is the main dev. @blake-sc was the one who originally adapted the lookup feature from the DPR, although I don't believe he is active here now.

how the pali lookup pick the words in the page.

Do you mean how the clicking works, or what happens between the click and the definition display? I'm not sure either.

@ihongda
Copy link
Contributor

ihongda commented Apr 17, 2024

@cittadhammo @thesunshade 🙏

I was just outside, will reply later.

@ihongda
Copy link
Contributor

ihongda commented Apr 18, 2024

https://github.com/suttacentral/suttacentral/blob/main/client/elements/text/sc-text-bilara.js

line 1260

  _enablePaliLookup() {
    if (!this.spansForWordsGenerated) {
      this._putWordsIntoSpans('.root .text', 'word');
    }
    this._addWordSpanId('span.word');
    setTimeout(() => {
      this._addPaliLookupEvent('.root .text .word');
    }, 0);
  }

When Pali lookup is enabled, the event will be added to the span where the Pali word is located, and when clicked, a Pali definition search will be performed.

@cittadhammo
Copy link

Ok, thank you @ihongda, maybe it suffices to add those specific spans around the pali word generated by the dictionary and bind them to the event listener. From what I can see we could do:

  • When the onPaliWordClick(e) is triggered (open the bottom sheet)
    • Add span with class word
    • Add id word_#
    • Attach event listener _addPaliLookupEvent(selector) to the pali words inside the definition of the dpd

This should make them clickable. I can see that it is quite intricate, but let me know if I am on the good track ;-)

@thesunshade
Copy link
Collaborator

thesunshade commented Apr 19, 2024

I just noticed in the DevTools that the new version of the lookup seems to involve the downloading of two large assets:
image

They aren't that big, but it's worth considering the performance impact.

Related to this, I wonder if it might make sense to create a dictionary API to be able to serve up just the definitions that are needed. Of course once there was a dictionary API there are probably lots of other cool things we could be doing with it.

Edit to add:

OK, it appears that there are indeed some kind of dictionary glossary. It is fetched on the definition pages: https://suttacentral.net/api/glossary

And one for specific words that returns the pts entry. e.g. https://suttacentral.net/api/dictionary_full/saddhi%E1%B9%81?language=en

So now I'm wondering if we can't offer up the full DPD entry using that same API.

Here is the json file of the PED. https://github.com/suttacentral/sc-data/blob/main/dictionaries/complex/en/pli2en_pts.json

And here is an example of an entry:

 {
    "word": "akaṭa",
    "text": "<dl id='akaṭa'><dt><dfn>Akaṭa</dfn></dt><dd><p><span class='grammar'>adjective</span> not made, not artificial, natural; <i class='term'>-yūsa</i> natural juice <a class='ref' href='https://suttacentral.net/pli-tv-kd6/en/brahmali#14.5.12'>Vin.i.206</a>.</p><p class='eti'>a + kaṭa</p></dd></dl>"
  },

You see that the definition is just html.

@cittadhammo
Copy link

cittadhammo commented Apr 19, 2024

Yes, this is one of my question, as stated in the PR:

I have not used the arango database, but simply the json file from the link above. I don't know what is the best approach, loading js or feeding the data in the database and making request. I have not used much db in my programming life.

These json file could be fetched automatically from dpd repo when building Suttacentral for prod.

Also, if the DPD is implemented in the API, it would also serve in the defintion page https://suttacentral.net/define isn't it?

@thesunshade
Copy link
Collaborator

Not sure where it's best to have discussions (here or on gitter) but I wanted to make a record of two outstanding issues:

  • We need to make sure that when non-English Pali lookup is selected that it continues to work the same way it did before
  • We need to decide if we are going to make the compound breakdown clickable, for example, [appa + ussukka + tā]

@cittadhammo
Copy link

cittadhammo commented Apr 26, 2024

Yes, I have them in my important to do list: https://github.com/suttacentral/suttacentral/wiki/DPD-Pali-Lookup-Implementation#to-do

I think I will focus on that in the next days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: improvement Make stuff better
Projects
None yet
Development

No branches or pull requests

6 participants