Douban is a community website. The site provides a variety of services such as book and video recommendations, offline crosstown activities, and group topic exchanges. In this application, Selenium, BeautifulSoup are used to crawl all information of Douban group.
Douban mobile APP has the feature of generating post’s long diagram. The generated post long diagram has a suitable layout for mobile reading, and contains post’s content, popular comments and QR code for accessing that post.
However, this feature is not available in douban web. So the client automation tool Appium is used for this program.
Scrapying all group member informationi acoording to Douban Group ID
- User's ID
- User's Name
- User's Home Page URL
- User's Location
Scrapying all elite post information (request)
- post's text conetnent (txt)
- post's link(csv txt)
- post's content images / gifs (named by the order in the post)
- post's comment information (commenter‘s ID, text content)
Scrapying all elite post information (selenium)
- post's text content (txt)
- post's links (txt)
- post's images / gifs (named by the order in the post)
- post's screenshot (pdf)
- Solve the anti-crawl problem through Selenium
- Login problem can be resolved by manually
- scanning the QR code or find_element_by_class_name().click
- There will be no anti-crawl problems once logged in
- Demo Result
- Create a path according to the title of the post, collate the data information
proxy usded in DoubanPostCollector.ipynb from
Step1: Enter the following command
open .bash_profile
Step2: Add following three commands in to bash_profile
export JAVA_HOME=$(/usr/libexec/java_home)
export ANDROID_HOME=${HOME}/Library/Android/sdk
export PATH="${JAVA_HOME}/bin:${ANDROID_HOME}/tools:${ANDROID_HOME}/platform-tools:${PATH}"
Refer to the above link for the solution, download swt.jar and copy and paste it into the path of uiautomatorviewer
Notice:The path described in the link above may not be the path to uiautomatorviewer on your own computer,
Just add swt.jar to the x86_64 folder in your own uiautomatorviewer path, then maximize the window, then change the window size to see the phone icon recovered.
swt.jar Download links for different versions https://download.eclipse.org/eclipse/downloads/index.html
If adding the latest version still doesn't work, change to lower version and repeat the same steps
My system is macOS Monterey 12.2. 4.20 version can run successfully
adb shell dumpsys window | grep -E 'mCurrentFocus'