Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1 Create HadithDiffer class to compare two texts of a Hadith #2

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

suhailmahmood
Copy link

Notes:

  1. The result of comparison is a number in the range 0 to 1.
  2. Reducing to root and then comparing is not done. As far as my knowledge Hadith goes, I think it does not make much of a difference in the final similarity score. If two hadith units are different in a word (meaning they are essentially same hadith, differing only slightly), it is most likely that the differing words in the two texts will be different altogether (different words with similar meanings), as opposed to being different forms of a word stemming from the same root. So whether we reduce the word to its root or simply compare as is does not seem to make much difference. Please feel free to share your thoughts/arguments on this.
  3. I have used two external pip packages, namely BeautifulSoup for stripping any html markup, and lxml as the parser for BeautifulSoup. I could have used the built-in html parser here too, but lxml is faster. Let me know if I should use the built-in html parser instead.

Usage example:

To compare ignoring the diacritics:

similarity = HadithDiffer().set_hadith_texts(text1, text2).ignore_diacritics().compare()

To compare without ignoring the diacritics:

similarity = HadithDiffer().set_hadith_texts(text1, text2).ignore_diacritics(False).compare()
# or simply,
similarity = HadithDiffer().set_hadith_texts(text1, text2).compare()

@hasankhan
Copy link

jazakAllah khair

@ahadith can you take a look please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants