Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

utsusemi Build Status

logo

utsusemi = "空蝉"

A tool to generate a static website by crawling the original site.

Using framework

  • Serverless Framework

How to deploy

:octocat: STEP 1. Clone

$ git clone https://github.com/k1LoW/utsusemi.git
$ cd utsusemi
$ npm install

📝 STEP 2. Set environment variables OR Edit config.yml

Set environment variables.

OR

Copy config.example.yml to config.yml. And edit.

Environment / config.yml Document is here 📖 .

🚀 STEP 3. Deploy to AWS

$ AWS_PROFILE=XXxxXXX npm run deploy

And get endpoints URL and UtsusemiWebsiteURL

💣 Destroy utsusemi

Run following command.

$ AWS_PROFILE=XXxxXXX npm run destroy

Usage

Start crawling /in?path={startPath}&depth={crawlDepth}

Start crawling to targetHost.

$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/in?path=/&depth=3

And, access UtsusemiWebsiteURL.

force option

Disable cache

$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/in?path=/&depth=3&force=1

Purge crawling queue /purge

Cancel crawling.

$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/purge

Delete object of utsusemi content /delete?prefix={objectPrefix}

Delete S3 object.

$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/delete?path=/

Show crawling queue status /status

$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/status

Set N crawling action POST /nin

Start crawling to targetHost with N crawling action.

$ curl -X POST -H "Content-Type: application/json" -d @nin-sample.json https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/nin

Architecture

Architecture

Crawling rule

  • HTML -> depth = depth - 1
  • CSS -> The source request in the CSS does not consume depth.
  • Other contents -> End ( depth = 0 )
  • 403, 404, 410 -> Delete S3 object
You can’t perform that action at this time.