Skip to content

super-Django-CC is a simle web interface for commoncrawl.org

License

Notifications You must be signed in to change notification settings

imfht/super-Django-CC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

Online site

Visit url.fht.im

Preview

Build

install from source

make sure you've installed python3 and virtualenv.

1. create virtual work directory and active it.

virtualenv venv -p /usr/bin/python3 # or use which python find your python3 path
source venv/bin/active

2. install requirements

cd super-Django-CC && pip install -r requirements.txt

3. Run it

python manager.py runserver 127.0.0.1:8001

Then visit localhost:8001 you will get a preview.

build by docker

get the code && docker build && docker run

git clone https://github.com/imfht/super-Django-CC && cd super-Django-CC && docker build . -t super_django_cc 

Run it

docker run -p8001:8001 -d super_django_cc

Then visit localhost:8001 you will get a preview.

Q&A

  1. What is this?
    show how many urls and websites was exposed to web crawls.
  2. Why I get very few result for my site?
    all the data is from commoncrawl.org, throght it crawled loooots of pages in the internet. But crawl all website's page is impossable.
  3. TOS & Rate limiting
    TOS of the site as same as http://commoncrawl.org/terms-of-use/. Respectful robots is welcome. Respectful means the max rate is 5 req/s. If you wanner increase it please use commoncrawl's open data or contact me.

About

super-Django-CC is a simle web interface for commoncrawl.org

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published