Text Similarity using BM25 & WordNet

Prerequisite for running code

Python 2.x - https://askubuntu.com/questions/101591/how-do-i-install-the-latest-python-2-7-x-or-3-x-on-ubuntu
Numpy - pip install numpy
Scipy - pip install scipy
NLTK - pip install nltk
Pattern - pip install pattern
Sklearn - pip install sklearn

Command for running code

python execute.py

OUTPUT

Algorithms

Syntactic Similarity -

I used BM25(Best Matching) algorithm for syntactic similarity, it generates the similarity score between two sentences.

BM25 Algorithm -

bm25_score(CD,QD) = 𝚺(i=1 to n) idf(qi)*(f(qi,CD)*k1+1)/(f(qi,CD)+k1*(1-b+(b*|CD|/avgdl)))
idf(t) = 1 + log(C/1+df(t))

Where,

    CD = corpus document, e.g.- list of all the answers
    QD = query document, e.g.- list of model answer
    idf(qi) = inverse document frequency (IDF) of the term qi in CD
    C = count of the total number of documents in CD
    df(t) = frequency of the number of documents in which the term t is present
    f(qi, CD) = frequency of the term qi in CD
    |CD| = total length of the CD
    avgdl = average document length of CD
    k1, b = Constants

Semantic Similarity -

I used NLTK's WordNet corpus for generating the semantic similarity score between two sentences. I used synsets function to get all the lexnames of a word then calulated the path similarity between words then took the maximum value among all the lexnames for a single word. After that I calculated the average of all scores for a single sentence and that is the value of semantic similarity score.

Final Score is the average of bm25_score and semantic_similarity_score.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
bm25.py		bm25.py
execute.py		execute.py
normalization.py		normalization.py
output.JPG		output.JPG
semantic_similarity.py		semantic_similarity.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Similarity using BM25 & WordNet

Prerequisite for running code

Command for running code

OUTPUT

Algorithms

About

Releases

Packages

Languages

shubham16394/Text-Similarity

Folders and files

Latest commit

History

Repository files navigation

Text Similarity using BM25 & WordNet

Prerequisite for running code

Command for running code

OUTPUT

Algorithms

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages