No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
README
api.py
blacklist
clus_sub.py
clus_top.py
common.py
config.py
feature.py
logger.py
robotexclusionrulesparser.py
scanner.py
test_input.csv

README

dependency: 
sudo pip install requests==2.3.0
sudo pip install logutils
sudo pip install pymysql
sudo pip install python-hashes

1. configure.py: set basic parameters in the file
2. scanner.py: prob+webpage feteching
3. feature.py: feature extractor
4. clus_top.py: top-level clusters
5. clus_sub.py: get the final clustering results.