disqus-crawler

Crawl DISQUS comments from a blog into a local MongoDB database

Installation

Clone the github repository and cd into it

git clone git@github.com:louisguitton/disqus-crawler.git
cd disqus-crawler
python3 -m venv venv
source venv/bin/activate
pip install --upgrade -r requirements.txt

Usage example

Open main.sh and change the url to the blog page you want to crawl
Make sure a mongod instance is running on your computer (Installation instructions for MongoDB are here)

mongod --config /usr/local/etc/mongod.conf

Make sure a splash instance is running (more information here)

$ docker run -p 8050:8050 scrapinghub/splash
2019-10-10 12:03:39.116598 [-] Server listening on http://0.0.0.0:8050

Run the main.sh script

$ sh main.sh
CRAWLING ... http://www.purseblog.com/louis-vuitton/louis-vuitton-spring-2016-bag-ad-campaign/
2019-10-10 14:07:28 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: purseblog)
...

Usage

mongo

use disqus
db.comments.count()
db.comments.find().pretty().limit(2)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
disqus-crawler		disqus-crawler
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_posts.py		get_posts.py
main.sh		main.sh
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

disqus-crawler

Installation

Usage example

Meta

About

Releases 1

Packages

Languages

License

louisguitton/disqus-crawler

Folders and files

Latest commit

History

Repository files navigation

disqus-crawler

Installation

Usage example

Meta

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages