Keyword Based SearchEngine with InvertedIndex (TFIDF)

SearchEngine_Inverted Index.ipyn: This is the source code file. Which has Four main processing tasks

Scraping text from 6 URL Websites and Store the Preprocessed Text data alone in text file for each HTML pages.
Creating Inverted index and document frequency with Posting locations.
Finding the similarity between 6 docs using CosineSimilarity metrics.
Implemented Inverted index as Simple Search Engine with information retrieval. (Cross-platform Application software support with python -- Tkinter)

Dataset Link

Wiki page

Final Result and Conclusion

Computing Similarity between the documents using Cosine similarity metrics Cosine similarity is a metric used to determine how similar the documents are irrespective of their size.

From the above results we can conclude that,

Doc 1 – Machine Learning,
Doc 2- Engineering,
Doc 3 – research,
Doc 4 – Data mining,
Doc 5 – Data mining # datamining
Doc 6- ss chung

Eliminating 1.0 cosine score, because comparing the same document (di,di) will give 1.0 which is useless for analysis.

Top matches sorted

(Doc4, Doc5) -1 similar matches, content of Doc5 is the part of Doc4.
(Doc1, Doc4)-0.87 Similar matches Machine Learning vs Data mining
(Doc1, Doc2)- 0.82 Similar matches Machine Learning vs Engineering
(Doc2, Doc4)- 0.78 Similar matches Engineering vs Data mining
(Doc5, Doc6)-0.65 Similar matches
(Doc 1, Doc 6)-0.61 Similar matches
(Doc 1, Doc 3)-0.61 Similar matches
(Doc 2, Doc 3)-0.61 Similar matches
(Doc 3, Doc 4)-0.59 Similar matches
(Doc 3, Doc 5)-0.59 Similar matches
(Doc 2, Doc 6)-0.55 Similar matches

Simple Search Engine Implementation using python Tkinter

GUI – Cross Platform Application Software

Search for the term ‘research’

Search for the term ‘data’

Reference

Credits :https://github.com/matteobertozzi/blog-code/blob/master/py-inverted-index/invindex.py

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
PreprocessedDataset		PreprocessedDataset
Results		Results
Similarity_Results		Similarity_Results
README.md		README.md
SearchEngine_Inverted Index.ipynb		SearchEngine_Inverted Index.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keyword Based SearchEngine with InvertedIndex (TFIDF)

Dataset Link

Final Result and Conclusion

Simple Search Engine Implementation using python Tkinter

GUI – Cross Platform Application Software

Reference

About

Releases

Packages

Languages

sabareeswarans11/SearchEngine_InvertedIndex

Folders and files

Latest commit

History

Repository files navigation

Keyword Based SearchEngine with InvertedIndex (TFIDF)

Dataset Link

Final Result and Conclusion

Simple Search Engine Implementation using python Tkinter

GUI – Cross Platform Application Software

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages