Intranet_Search_Engine

A searching facility over the webpages and pdfs over the intranet

The whole code is developed in python and the webinterface by PHP, using Bootstrap for its desgining.

This code bast has a crawler, that performs a DFS by using Urllib2 for requesting the html source of the URLS, and also a crawler implemnetd in scrapy.

The incremental indexer indexes the list of URL'S that have been crawled(in List_Of_URLS_to_be_indexed.txt), by indexing the essential html content. PDFs ,if had been crawled,shall be downloaded and converted to txt for indexing by using linux system calls

The 'searcher' files( the files with 'searcher' as a substring of the file names') are for parsing the query (from engine.php) and search form the indexed directory(text_indexed_directory). The searacher files also give recommendations as of spell checking and all that.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
css		css
fonts		fonts
js		js
screenshots		screenshots
text_indexed_directory		text_indexed_directory
Incremental_Indexer.py		Incremental_Indexer.py
List_Of_URLS_to_be_indexed.txt		List_Of_URLS_to_be_indexed.txt
README.md		README.md
crawling_with_URLLIB2.py		crawling_with_URLLIB2.py
engine.php		engine.php
file.txt		file.txt
index.php		index.php
index1.php		index1.php
new_pdf_searcher.py		new_pdf_searcher.py
new_searcher.py		new_searcher.py
scrapy_Crawler.py		scrapy_Crawler.py
wrongQuery_pdf_searcher.py		wrongQuery_pdf_searcher.py
wrongQuery_searcher.py		wrongQuery_searcher.py
wrong_engine.php		wrong_engine.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intranet_Search_Engine

About

Releases

Packages

Languages

geeteshtabjul/Intranet_Search_Engine

Folders and files

Latest commit

History

Repository files navigation

Intranet_Search_Engine

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages