reference: https://github.com/c4software/python-sitemap
crawl one domain and create a sitemap.xml of all public link in it.
Language supported:; Python3
$ python main.py --domain https://www.sedna.com --output sitemap.xml
$ python main.py --domain https://www.sedna.com/ --output sitemapimages.xml --images --debug --verbose
$ python main.py --domain https://www.sedna.com/ --output sitemapimages.xml --images --report --skipext pdf --skipext xml --parserobots --num-workers 4
Number of found URL : 110 Number of links crawled : 108 Number of link block by robots.txt : 0 Number of link exclude : 2 Nb Code HTTP 200 : 108
sitemapimagessecretweapon.xml has images in sitemap.xml from crawl output For more static file retrieval, can extend in a similar way as images
$ python main.py --domain https://www.sedna.com --output sitemap.xml --drop "id=[0-9]{5}"
$ python main.py --domain https://www.sedna.com --output sitemap.xml --exclude "action=edit"