Title-Squeezer

This is a simple Python program that reads in an HTML page, parses it and try to figure out the title and description of this page.

This program tries to read as less as possible and consume less CPU and memory than other HTML parsers.

Usage

curl -L -s --compressed https://www.yahoo.com/ | ./title_squeezer.py -v

Using -v will print out every HTML tag it successfully parses.

Programmable Interface

>>> import title_squeezer
# First construct a Squeezer instance
>>> squeezer = title_squeezer.Squeezer()
# Then feed in HTML data
>>> squeezer.feed(b'<html><head><title>Hello wo')
Title(
    enough=False,
    title='Hello wo',
    description=None,
    charset=None
)
# Feed more data
>>> squeezer.feed(b'rld!</title></head>')
Title(
    enough=True,
    title='Hello world!',
    description=None,
    charset=None
)

License

The original author of this program, Title-Squeezer, is StarBrilliant. This file is released under General Public License version 3. You should have received a copy of General Public License text alongside with this program. If not, you can obtain it at http://gnu.org/copyleft/gpl.html . This program comes with no warranty, the author will not be resopnsible for any damage or problems caused by this program.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
COPYING		COPYING
Readme.md		Readme.md
title_squeezer.py		title_squeezer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COPYING

COPYING

Readme.md

Readme.md

title_squeezer.py

title_squeezer.py

Repository files navigation

Title-Squeezer

Usage

Programmable Interface

License

About

Releases

Packages

Languages

License

m13253/title-squeezer

Folders and files

Latest commit

History

Repository files navigation

Title-Squeezer

Usage

Programmable Interface

License

About

Resources

License

Stars

Watchers

Forks

Languages