Python Crawler Template / Python 爬虫工具包

A production-ready Python web scraping template with dual framework support (Scrapy & Requests), built-in anti-scraping strategies, and multi-storage backends.

一个生产级的 Python 网络爬虫模板，支持 Scrapy 和 Requests 双框架切换，内置反爬策略和多数据存储后端。

🌟 Features / 项目特色

Dual Framework Support - Switch between Scrapy and Requests easily
Anti-Scraping Built-in - UA pool, proxy rotation, randomized delays
Multi-Storage Backends - CSV, Excel, MySQL, MongoDB
Production Ready - Configuration files, error handling, logging
Well Documented - Bilingual README, runnable examples

🚀 Quick Start / 快速开始

Installation / 安装

git clone https://github.com/martin00000/python-crawler-template.git
cd python-crawler-template
pip install -r requirements.txt
pip install -e .

Basic Usage / 基础用法

from src import BaseScraper

# Create a simple scraper
scraper = BaseScraper(url="https://example.com")
data = scraper.fetch()

# Save to CSV
scraper.save("output.csv")

📖 Documentation / 文档

API Reference / API 参考

BaseScraper / 基础爬虫类

Method	Description / 描述
`fetch()`	Fetch page content / 抓取页面内容
`parse(response)`	Parse response data / 解析响应数据
`save(filename, format)`	Save to file / 保存到文件

Utils / 工具函数

get_user_agents() - Get random user agent / 获取随机 UA
get_proxy() - Get available proxy / 获取可用代理
random_delay(min, max) - Random sleep / 随机延时

📝 Examples / 示例

See the examples/ directory for more:

simple_scraper.py - Basic usage / 基础用法
pagination_scraper.py - Multi-page scraping / 分页爬取
login_scraper.py - With authentication / 登录验证

Running Examples / 运行示例

cd examples
python simple_scraper.py

🔧 Configuration / 配置

Edit config.yaml to customize:

# config.yaml
settings:
  delay_between_requests: 1.0
  max_retries: 3
  
proxies:
  enabled: true
  rotation_interval: 10
  
storage:
  default_format: csv
  output_dir: ./outputs

🧪 Testing / 测试

pytest tests/ -v

📄 License / 授权

This project is licensed under the MIT License - see the LICENSE file for details.

本项目采用 MIT 许可证 - 详见 LICENSE 文件。

🤝 Contributing / 贡献

Contributions are welcome! Please feel free to submit a Pull Request.

欢迎贡献！请随时提交 Pull Request。

📧 Contact / 联系

alan - dingying02@gmail.com

Project Link: https://github.com/martin00000/python-crawler-template

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Crawler Template / Python 爬虫工具包

🌟 Features / 项目特色

🚀 Quick Start / 快速开始

Installation / 安装

Basic Usage / 基础用法

📖 Documentation / 文档

API Reference / API 参考

BaseScraper / 基础爬虫类

Utils / 工具函数

📝 Examples / 示例

Running Examples / 运行示例

🔧 Configuration / 配置

🧪 Testing / 测试

📄 License / 授权

🤝 Contributing / 贡献

📧 Contact / 联系

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
src		src
tests		tests
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Python Crawler Template / Python 爬虫工具包

🌟 Features / 项目特色

🚀 Quick Start / 快速开始

Installation / 安装

Basic Usage / 基础用法

📖 Documentation / 文档

API Reference / API 参考

BaseScraper / 基础爬虫类

Utils / 工具函数

📝 Examples / 示例

Running Examples / 运行示例

🔧 Configuration / 配置

🧪 Testing / 测试

📄 License / 授权

🤝 Contributing / 贡献

📧 Contact / 联系

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages