pyce3: Multilingual Web Page Content Extractor for Python3

Introduction

pyce3 is a python3 package for multilingual web page content extraction. It is used to extract the content of article type web pages, such as news, blog posts, etc.

Usage

import pyce3
import requests

url = "http://caijing.chinadaily.com.cn/a/201911/21/WS5dd62455a31099ab995ed438.html"
html = requests.get(url).content
encoding, time, title, text, next_link = pyce3.parse(url, html)
print("编码："+encoding)
print('='*10)
print("标题："+title)
print("时间："+time)
print('='*10)
print("内容："+text)
print("NextPageLink: ", next_link)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyce3.py		pyce3.py
pypi.sh		pypi.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pyce3.py

pyce3.py

pypi.sh

pypi.sh

setup.py

setup.py

Repository files navigation

pyce3: Multilingual Web Page Content Extractor for Python3

Introduction

Usage

About

Releases

Packages

Languages

License

liuzl/pyce3

Folders and files

Latest commit

History

Repository files navigation

pyce3: Multilingual Web Page Content Extractor for Python3

Introduction

Usage

About

Resources

License

Stars

Watchers

Forks

Languages