Skip to content

Latest commit

 

History

History
312 lines (229 loc) · 14.2 KB

email.parser.rst

File metadata and controls

312 lines (229 loc) · 14.2 KB

:mod:`email.parser`: Parsing email messages

.. module:: email.parser
   :synopsis: Parse flat text email messages to produce a message object structure.

Source code: :source:`Lib/email/parser.py`


Message object structures can be created in one of two ways: they can be created from whole cloth by instantiating :class:`~email.message.Message` objects and stringing them together via :meth:`~email.message.Message.attach` and :meth:`~email.message.Message.set_payload` calls, or they can be created by parsing a flat text representation of the email message.

The :mod:`email` package provides a standard parser that understands most email document structures, including MIME documents. You can pass the parser a string or a file object, and the parser will return to you the root :class:`~email.message.Message` instance of the object structure. For simple, non-MIME messages the payload of this root object will likely be a string containing the text of the message. For MIME messages, the root object will return True from its :meth:`~email.message.Message.is_multipart` method, and the subparts can be accessed via the :meth:`~email.message.Message.get_payload` and :meth:`~email.message.Message.walk` methods.

There are actually two parser interfaces available for use, the classic :class:`Parser` API and the incremental :class:`FeedParser` API. The classic :class:`Parser` API is fine if you have the entire text of the message in memory as a string, or if the entire message lives in a file on the file system. :class:`FeedParser` is more appropriate for when you're reading the message from a stream which might block waiting for more input (e.g. reading an email message from a socket). The :class:`FeedParser` can consume and parse the message incrementally, and only returns the root object when you close the parser [1].

Note that the parser can be extended in limited ways, and of course you can implement your own parser completely from scratch. There is no magical connection between the :mod:`email` package's bundled parser and the :class:`~email.message.Message` class, so your custom parser can create message object trees any way it finds necessary.

FeedParser API

The :class:`FeedParser`, imported from the :mod:`email.feedparser` module, provides an API that is conducive to incremental parsing of email messages, such as would be necessary when reading the text of an email message from a source that can block (e.g. a socket). The :class:`FeedParser` can of course be used to parse an email message fully contained in a string or a file, but the classic :class:`Parser` API may be more convenient for such use cases. The semantics and results of the two parser APIs are identical.

The :class:`FeedParser`'s API is simple; you create an instance, feed it a bunch of text until there's no more to feed it, then close the parser to retrieve the root message object. The :class:`FeedParser` is extremely accurate when parsing standards-compliant messages, and it does a very good job of parsing non-compliant messages, providing information about how a message was deemed broken. It will populate a message object's defects attribute with a list of any problems it found in a message. See the :mod:`email.errors` module for the list of defects that it can find.

Here is the API for the :class:`FeedParser`:

Works exactly like :class:`FeedParser` except that the input to the :meth:`~FeedParser.feed` method must be bytes and not string.

.. versionadded:: 3.2

Parser class API

The :class:`Parser` class, imported from the :mod:`email.parser` module, provides an API that can be used to parse a message when the complete contents of the message are available in a string or file. The :mod:`email.parser` module also provides header-only parsers, called :class:`HeaderParser` and :class:`BytesHeaderParser`, which can be used if you're only interested in the headers of the message. :class:`HeaderParser` and :class:`BytesHeaderParser` can be much faster in these situations, since they do not attempt to parse the message body, instead setting the payload to the raw body as a string. They have the same API as the :class:`Parser` and :class:`BytesParser` classes.

.. versionadded:: 3.3
   The BytesHeaderParser class.


Since creating a message object structure from a string or a file object is such a common task, four functions are provided as a convenience. They are available in the top-level :mod:`email` package namespace.

.. currentmodule:: email

.. function:: message_from_string(s, _class=email.message.Message, *, \
                                  policy=policy.compat32)

   Return a message object structure from a string.  This is exactly equivalent to
   ``Parser().parsestr(s)``.  *_class* and *policy* are interpreted as
   with the :class:`~email.parser.Parser` class constructor.

   .. versionchanged:: 3.3
      Removed the *strict* argument.  Added the *policy* keyword.

.. function:: message_from_bytes(s, _class=email.message.Message, *, \
                                 policy=policy.compat32)

   Return a message object structure from a :term:`bytes-like object`.  This is exactly
   equivalent to ``BytesParser().parsebytes(s)``.  Optional *_class* and
   *strict* are interpreted as with the :class:`~email.parser.Parser` class
   constructor.

   .. versionadded:: 3.2
   .. versionchanged:: 3.3
      Removed the *strict* argument.  Added the *policy* keyword.

.. function:: message_from_file(fp, _class=email.message.Message, *, \
                                policy=policy.compat32)

   Return a message object structure tree from an open :term:`file object`.
   This is exactly equivalent to ``Parser().parse(fp)``.  *_class*
   and *policy* are interpreted as with the :class:`~email.parser.Parser` class
   constructor.

   .. versionchanged:: 3.3
      Removed the *strict* argument.  Added the *policy* keyword.

.. function:: message_from_binary_file(fp, _class=email.message.Message, *, \
                                       policy=policy.compat32)

   Return a message object structure tree from an open binary :term:`file
   object`.  This is exactly equivalent to ``BytesParser().parse(fp)``.
   *_class* and *policy* are interpreted as with the
   :class:`~email.parser.Parser` class constructor.

   .. versionadded:: 3.2
   .. versionchanged:: 3.3
      Removed the *strict* argument.  Added the *policy* keyword.

Here's an example of how you might use this at an interactive Python prompt:

>>> import email
>>> msg = email.message_from_string(myString)  # doctest: +SKIP

Additional notes

Here are some notes on the parsing semantics:

Footnotes

[1]As of email package version 3.0, introduced in Python 2.4, the classic :class:`~email.parser.Parser` was re-implemented in terms of the :class:`~email.parser.FeedParser`, so the semantics and results are identical between the two parsers.