全网可能最智能的豆瓣小组租房筛选系统

project structure

.
├── VERSION.txt
├── auto_git_version.sh
├── readme.md
├── requirements.txt
└── src
    ├── config.yaml
    ├── douban
    │   ├── data
    │   │   ├── 2022-03-09-beijingzufang.csv
    │   │   ├── 2022-03-09-beijingzufang_凤凰汇购物中心.csv
    │   │   ├── 2022-03-09-zhufang.csv
    │   │   └── 2022-03-09-zhufang_凤凰汇购物中心.csv
    │   ├── s1_crawl
    │   │   ├── crawl_base.py
    │   │   ├── crawl_via_api.py
    │   │   ├── crawl_via_html.py
    │   │   ├── db.py
    │   │   ├── errors.py
    │   │   ├── readme.md
    │   │   └── settings.py
    │   ├── s2_analysis
    │   │   └── main.py
    │   ├── settings.py
    │   └── utils.py
    ├── gaode
    │   ├── base.py
    │   ├── client.py
    │   └── globals
    │       ├── addr2coords.json
    │       ├── coords2name.json
    │       └── coordspair2duration.json
    ├── main.py
    └── settings.py

init

environment variables

为安全考虑，以下变量需要需写入环境变量，再经由程序读取。

GAODE_KEY: 高德key
DOUBAN_COOKIE: 豆瓣已登录后的网页cookie

python environment

# a python virtualenv is strongly suggested or directly install the requirements
pip install -r requirements.txt

usage sample

test if project does work

# `-c` means `city`, for gaode api to recognize your search area
# `-a` means `address`, normally it's your company address, which maybe a building
# `-g` means `groups`, which are joined id list of douban groups by separator `|`
# `-o` means `output_format`, use `-o CSV` for write into csv and for later analysis
# `-n` means `count`, it means how many items you wanna scrape for each group, 5000 is a suitable amount
python src/main.py -c 北京 -a 凤凰汇购物中心 -g "zhufang|beijingzufang" -o CSV -n 50

another way to run via configuring the `src/config.yaml`

city: 北京
target_address: 凤凰汇购物中心
groups: zhufang|beijingzufang
count: 5000
max_duration: 30
min_budget: 3000
max_budget: 6000
after_date: null
enable_scrape: true
enable_analysis: true
exclude_only_for_girls: false
include_only_from_personal: false
exclude_unknown_duration: false
exclude_unknown_price: false

Then running just via:

python src/main.py

scrape but not analyze

python src/main.py --no-enable_analysis

not scrape but analyze

python src/main.py --no-enable_scrape

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

VERSION.txt

VERSION.txt

auto_git_version.sh

auto_git_version.sh

readme.md

readme.md

requirements.txt

requirements.txt

Repository files navigation

全网可能最智能的豆瓣小组租房筛选系统

project structure

init

environment variables

python environment

usage sample

test if project does work

another way to run via configuring the `src/config.yaml`

scrape but not analyze

not scrape but analyze

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
src		src
.gitignore		.gitignore
VERSION.txt		VERSION.txt
auto_git_version.sh		auto_git_version.sh
readme.md		readme.md
requirements.txt		requirements.txt

MarkShawn2020/douban-renting

Folders and files

Latest commit

History

Repository files navigation

全网可能最智能的豆瓣小组租房筛选系统

project structure

init

environment variables

python environment

usage sample

test if project does work

another way to run via configuring the src/config.yaml

scrape but not analyze

not scrape but analyze

About

Resources

Stars

Watchers

Forks

Languages

another way to run via configuring the `src/config.yaml`