1515This module defines a class :class: `HTMLParser ` which serves as the basis for
1616parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
1717
18- .. class :: HTMLParser(*, convert_charrefs=True)
18+ .. class :: HTMLParser(*, convert_charrefs=True, scripting=False )
1919
2020 Create a parser instance able to parse invalid markup.
2121
22- If *convert_charrefs * is `` True `` (the default), all character
23- references (except the ones in ``script ``/ ``style `` elements ) are
22+ If *convert_charrefs * is true (the default), all character
23+ references (except the ones in elements like ``script `` and ``style ``) are
2424 automatically converted to the corresponding Unicode characters.
2525
26+ If *scripting * is false (the default), the content of the ``noscript ``
27+ element is parsed normally; if it's true, it's returned as is without
28+ being parsed.
29+
2630 An :class: `.HTMLParser ` instance is fed HTML data and calls handler methods
2731 when start tags, end tags, text, comments, and other markup elements are
2832 encountered. The user should subclass :class: `.HTMLParser ` and override its
@@ -37,6 +41,9 @@ parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
3741 .. versionchanged :: 3.5
3842 The default value for argument *convert_charrefs * is now ``True ``.
3943
44+ .. versionchanged :: 3.14.1
45+ Added the *scripting * parameter.
46+
4047
4148Example HTML Parser Application
4249-------------------------------
@@ -161,24 +168,24 @@ implementations do nothing (except for :meth:`~HTMLParser.handle_startendtag`):
161168.. method :: HTMLParser.handle_data(data)
162169
163170 This method is called to process arbitrary data (e.g. text nodes and the
164- content of `` < script>...</script> `` and ``< style>...</style> ``).
171+ content of elements like `` script `` and ``style ``).
165172
166173
167174.. method :: HTMLParser.handle_entityref(name)
168175
169176 This method is called to process a named character reference of the form
170177 ``&name; `` (e.g. ``> ``), where *name * is a general entity reference
171- (e.g. ``'gt' ``). This method is never called if * convert_charrefs * is
172- `` True `` .
178+ (e.g. ``'gt' ``).
179+ This method is only called if * convert_charrefs * is false .
173180
174181
175182.. method :: HTMLParser.handle_charref(name)
176183
177184 This method is called to process decimal and hexadecimal numeric character
178185 references of the form :samp: `&#{ NNN } ; ` and :samp: `&#x{ NNN } ; `. For example, the decimal
179186 equivalent for ``> `` is ``> ``, whereas the hexadecimal is ``> ``;
180- in this case the method will receive ``'62' `` or ``'x3E' ``. This method
181- is never called if *convert_charrefs * is `` True `` .
187+ in this case the method will receive ``'62' `` or ``'x3E' ``.
188+ This method is only called if *convert_charrefs * is false .
182189
183190
184191.. method :: HTMLParser.handle_comment(data)
@@ -292,8 +299,8 @@ Parsing an element with a few attributes and a title:
292299 Data : Python
293300 End tag : h1
294301
295- The content of ``script `` and ``style `` elements is returned as is, without
296- further parsing:
302+ The content of elements like ``script `` and ``style `` is returned as is,
303+ without further parsing:
297304
298305.. doctest ::
299306
@@ -304,10 +311,10 @@ further parsing:
304311 End tag : style
305312
306313 >>> parser.feed(' <script type="text/javascript">'
307- ... ' alert("<strong>hello!</strong>");</script>' )
314+ ... ' alert("<strong>hello! ☺ </strong>");</script>' )
308315 Start tag: script
309316 attr: ('type', 'text/javascript')
310- Data : alert("<strong>hello!</strong>");
317+ Data : alert("<strong>hello! ☺ </strong>");
311318 End tag : script
312319
313320Parsing comments:
@@ -336,7 +343,7 @@ correct char (note: these 3 references are all equivalent to ``'>'``):
336343
337344Feeding incomplete chunks to :meth: `~HTMLParser.feed ` works, but
338345:meth: `~HTMLParser.handle_data ` might be called more than once
339- (unless *convert_charrefs * is set to `` True ``) :
346+ if *convert_charrefs * is false :
340347
341348.. doctest ::
342349
0 commit comments