Python Web Indexing

So in here i make a script for search data from spesific domain, for example i want to gain data about "chocolate" on web with domain "wikipedia.com", this script will automate your searching by listing all domain inside urls file

Installation

Dependencies

- requests
- beautifulsoup4

Install Dependencies

pip install -r ./requirements.txt

Run program

python main.py

Configuration

After running this program you may ask "why urls output feels weird, how to fix it?", Basically you can set it to strict mode go to "config/url.py", set must start with to true and enable domain on must contain, result will be optimized, but keep it mind urls output must be less from bare minimum or standard config

Update

Maybe some of you asking "how to search data from all website?", For now, you can search data only providing a keyword without giving a spesific domain. On domain input section, you can provide '*' as a wildcard, script will automatic read it as search from all website domain

Unit Test

Run Unit Test

python test.py

Notes

This program only use for educational purpose, please use this on your own

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
handlers		handlers
logs		logs
output		output
tests		tests
utils		utils
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Web Indexing

Installation

Configuration

Update

Unit Test

Notes

About

Releases

Packages

Languages

License

linuxhackingid/lhi-search-engine-scraper

Folders and files

Latest commit

History

Repository files navigation

Python Web Indexing

Installation

Configuration

Update

Unit Test

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages