Rust Async Web Crawler: IMDb & Douban

This project is a high-performance, modular, fully asynchronous crawler written in Rust, capable of scraping the IMDb Top 1000 and Douban Movie Rankings.

🚀 Designed with performance, scalability, and clean architecture in mind.

✨ Features

✅ Asynchronous HTTP requests powered by reqwest and tokio
✅ HTML parsing using CSS-like selectors via scraper
✅ Strategy pattern to configure pluggable fetch behavior
✅ PageParser trait for modular IMDb / Douban parsing
✅ Built-in logging,and HTML dump for debugging
✅ Clean architecture with extensibility in mind

📦 Dependencies

# Cargo.toml

[dependencies]
reqwest = "0.12.20"
scraper = "0.23.1"
tokio = { version = "1.36", features = ["full"] }
futures = "0.3"
rand = "0.9.1"
async-trait = "0.1"

🚀 Getting Started

1. Clone the repo

git clone https://github.com/Levio-z/async-scrape
cd async-scrape

2. Run the IMDb crawler

cargo run --bin imdb

3. Run the IMDb crawler with HTML output enabled

cargo run --bin imdb --features output_html

4. Run the Douban crawler

cargo run --bin douban

💡 Recommended layout uses Rust multi-bin structure: src/bin/imdb.rs, src/bin/douban.rs

🧠 Design Patterns Used

Pattern	Usage Description
Strategy	`FetchStrategy` enables flexible control over request behavior (e.g., logging, proxy)
Singleton	`LazyLock<Arc<Client>>` provides global HTTP client instance
Decorator	Wraps fetch strategy for extended behavior (e.g., logging)
Factory	`create_douban_parser()`, `create_client()` centralize instantiation logic
Trait Object	`PageParser` allows runtime selection of site-specific parsers

📌 Output Sample

001: title:The Shawshank Redemption                link:https://www.imdb.com/title/tt0111161/
002: title:The Godfather                           link:https://www.imdb.com/title/tt0068646/
...

🤝 Contributing

Feel free to submit issues, suggestions, or implement a new parser for your favorite movie site!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
movies-scrape		movies-scrape
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Rust Async Web Crawler: IMDb & Douban

✨ Features

📦 Dependencies

🚀 Getting Started

1. Clone the repo

2. Run the IMDb crawler

3. Run the IMDb crawler with HTML output enabled

4. Run the Douban crawler

🧠 Design Patterns Used

📌 Output Sample

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

License

learn-rust-projects/async-scrape

Folders and files

Latest commit

History

Repository files navigation

Rust Async Web Crawler: IMDb & Douban

✨ Features

📦 Dependencies

🚀 Getting Started

1. Clone the repo

2. Run the IMDb crawler

3. Run the IMDb crawler with HTML output enabled

4. Run the Douban crawler

🧠 Design Patterns Used

📌 Output Sample

🤝 Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages