Skip to content

Academic project to design a web application for text search (user input) from a directory of documents containing different types of files (PDF, txt, HTML, XML, docx, xlxs etc. parsed using Apache Tika) using Apache Lucene.

Notifications You must be signed in to change notification settings

jtsharma2308/Document-Search-Using-Lucene

Repository files navigation

Document-Search-Using-Lucene

About the App

The amount of information available to a person is growing day by day; hence retrieving the correct information in a timely manner plays a very important role. This project is about indexing document collections and fetching the right information with the help of a database(folder). The indexing of document collection is performed by Lucene, while the search application is strongly integrated with a database. In this project, a highly efficient, customized search tool is built using Lucene. The search tool is capable of indexing and searching databases, PDF documents, word documents, text files, html files, xml files, ppts, excel sheets, json files etc.

Brief Description:

  • A web application for text search (user input) from a directory of documents containing different types of files (PDF, txt, HTML, XML, docx, xlxs etc. parsed using Apache Tika) using Apache Lucene.
  • The result output of the query identifies the documents that are the most relevant to the query and includes a list of matching words and phrases with a ranking score that represent relevance and the corresponding document file.
  • Calculates the corresponding precision and recall as a way to show the improved accuracy of the relevance ranking technique.

About

Academic project to design a web application for text search (user input) from a directory of documents containing different types of files (PDF, txt, HTML, XML, docx, xlxs etc. parsed using Apache Tika) using Apache Lucene.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published