dynamic-WebCrawler

When I tried to fetch a webpage using React, I found selenium can help me do this.
The mechanism is that selenium can automatically open a browser and "use" it.
Hence, we can wait for the browser until it finishes rendering the components. We can successfully get the content in those components we want.
However, at the end I turned to fetch another dynamic page which using simpler language than React. XD

udn.py

It's a web-crawler using selenium to automatically scroll the page to load enough numbers of news.
"my_url" is the news list in udn.com in some category.
Before using it, we need to download a driver file for firefox or chrome.
Selenium will open a driver (firefox or chrome) and open the page, and fetch link and scroll down and fetch link...
After fetching all links of news we need, use BeautifulSoup to get content you need. (can also use selenium)
Then get the specific block you need, and output it in JSON.

mobile01.py

It's a simple static web-crawler. Fetch the data in mobile01. Use beautifulSoup. Output as JSON.
I encounter strange problems when using BeautifulSoup. It will disconnect sometimes.
If it disconnected, just read the articlesNum printed on console and modified the "articleNum" parameter in this file. Then, it can continue to fetch the next following articles.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
mobile01.py		mobile01.py
udn.py		udn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dynamic-WebCrawler

udn.py

mobile01.py

About

Releases

Packages

Languages

sophia303v/dynamic-WebCrawler

Folders and files

Latest commit

History

Repository files navigation

dynamic-WebCrawler

udn.py

mobile01.py

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages