Skip to content

Simple website crawler to get Meta tags and <H1> on Python

License

Notifications You must be signed in to change notification settings

sergeymusenko/simple-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

simple-crawler

Simple website crawler to get all URLs, Meta tags and <H1> from your web site.

Open main.py and set up init_url variable with you start URL.
Adjust use_pause variable so do not abuse your web server.
Crawler does not go by redirections (check allow_redirects=False).
Ignores React/JavaScript links if web site uses them.

In Python. Using BeautifulSoup. Saves report in CSV file.

https://github.com/sergeymusenko/simple-crawler/tree/main

Installation:

pip install bs4

About

Simple website crawler to get Meta tags and <H1> on Python

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages