Skip to content
sillyChen edited this page Dec 13, 2017 · 4 revisions

Welcome to the Marmot wiki!

You can see example/lesson to learn how to use.

World-Wide-Web robot, also known as spiders and crawlers. The principle is to falsify network data by constructing appointed HTTP protocol data packet, then request resource to the specified host, goal is to access the data returned. There are a large number of web information, human's hand movement such as copy-paste data from web page is time-consuming and laborious, thus inspired the data acquisition industry.

Batch access to public network data does not break the law, but because there is no difference, no control, very violent means will lead to other services is not stable, therefore, most of the resources provider will filtering some data packets(falsify), in this context, batch small data acquisition has become a problem. Integrated with various requirements, such as various API development, automated software testing(all this have similar technical principle). So this project come into the world(very simple).

The Marmot is very easy to understand, just like Python's library requests(Not yet Smile~ --| ). By enhancing native Golang HTTP library, help you deal with some trivial logic (such as collecting information, checking parameters), and add some fault-tolerant mechanisms (such as add lock, close time flow, ensure the high concurrent run without accident). It provides a human friendly API interface, you can reuse it often. Very convenient to support Cookie Persistence, Crawler Proxy Settings, as well as others general settings, such as HTTP request header settings, timeout/pause settings, data upload/post settings. It support all of the HTTP methods POST/PUT/GET/DELETE/... and has built-in spider pool and browser UA pool, easy to develop UA+Cookie persistence distributed spider.

In addition, It also provides third party tool package. The library is simple and practical, just a few lines of code to replace the previous Spaghetti code, has been applied in some large projects such as Full Golang Automatic Amazon Distributed crawler|spider, has withstood the test of two thousand long acting proxy IP and high concurrency, single machine every day to get millions of data.

The main uses: WeChat development/ API docking / automated test / rush ticket scripting / site monitoring / vote plug-in / data crawling

Clone this wiki locally