Skip to content

raphaellu/cafe-map-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cafe-map-crawler

Description

A web crawler that collects the basic information of 82,463 restaurants in Shanghai, including title, rate, area, address and average price per person for every restaurant.

Purpose

I implemented this project mainly to get familiar with web crawling, especially how to deal with (1) HTML parsing/server requesting and (2) bypassing scraping restrictions imposed by the website.

I think it would be nice to know how to prevent malicious users from scraping valuable data on company's website. However, as that one cannot expect a coach not knowing about the game to be good, in order to learn anti-scraping techniques, one has to know how to scrape and avoid anti-scraping restrictions first. I found it very interesting to explore different ways to bypass the website's prevention of crawling while gracefully following the robot's agreement. (and crawling some websites can be very tough!)

Dependencies

Comments

More details

All data is collected from dianping public pages and only for academic/non-profit use. Data has not been distributed anywhere.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages