anchor

::: warning STATEMENT: data is for personal use only!!! :::

0. Goal

Anchor some data in the web and automatically save periodically.

Currently running tasks:

1. Why do this

Things are always changing, and I want to find a way to record this change easily.

A web crawler is a great tool to get data from web efficiently. So does GitHub Action, which automate the process.

Solve real problems by combing existing tools is what anchor will do.

2. Design concept

2.1 Tools

GitHub Action
Python3

2.2 Architecture

Inspired by Scrapy.

The process is very simple.

stateDiagram
    [*] --> Requester
    Requester --> Exception
    Exception --> [*]

    Requester --> Processor
    Processor --> Exception
    Processor --> Exporter
    Exporter --> Exception
    Exporter --> [*]

DataItem: user-defined data model
Requester: issue a network request
Responser: store information of response
Processor: pure function to convert data from requester to DataItem
Exporter: deal with DataItem, like saving to .json file or exporting to DB, etc.
Task: a task scheduled by Anchor Engine
Anchor Engine: asynchronous style task handler

Changelogs

0.9.4 (2023-01-30)

Feature

add jd, bilibili, meituan, netease, pdd, 360 career task

0.9.3 (2023-01-18)

Feature

add alibaba-career-task and byte-dance-career-task
add retry to GitHub Action

0.9.2 (2023-01-17)

Feature

basic functions completed
add tencent-career-task and baidu-career-task

Reference:

https://scrapy.org/

Name		Name	Last commit message	Last commit date
Latest commit History 549 Commits
.github/workflows		.github/workflows
data		data
engine		engine
task		task
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
actions.txt		actions.txt
main.py		main.py
requirements.txt		requirements.txt

License

plantree/anchor

Folders and files

Latest commit

History

Repository files navigation

anchor

0. Goal

1. Why do this

2. Design concept

2.1 Tools

2.2 Architecture

Changelogs

0.9.4 (2023-01-30)

Feature

0.9.3 (2023-01-18)

Feature

0.9.2 (2023-01-17)

Feature

Reference:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages