-
Notifications
You must be signed in to change notification settings - Fork 0
/
Readme.txt
112 lines (83 loc) · 3.43 KB
/
Readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
Welcome
=======
`emaildata <https://pypi.python.org/pypi/emaildata/>`__ is a python
package for extracting content from email messages. It is a fork of the
`emailcontent <https://pypi.python.org/pypi/emailcontent/>`__ package
but adds from features.
emaildata features
------------------
`emaildata <https://pypi.python.org/pypi/emaildata/>`__ extracts this
types of contents from emails:
- Extract metadata.
- Extract text (plain text and html).
- Extract attachments.
Extracting metadata
-------------------
This feature was included from the *metadata* module of the
`emailcontent <https://pypi.python.org/pypi/emailcontent/>`__. This
module was copied module, few methods of is *MetaData* class were
removed, and the module was made more pylint friendly.
To extract metadata from an email message headers you create an instance
of the *MetaData* class passing a message to the constructor. You can
retrieve the metadata with the method *to\_dict*. This can also be done
using the method *set\_message*::
import email
from emaildata.metadata import MetaData
message = email.message_from_file(open('message.eml'))
extractor = MetaData(message)
data = extractor.to_dict()
print data.keys()
message2 = email.message_from_file(open('message2.eml'))
extractor.set_message(message2)
data2 = extractor.to_dict()
Extracting text
---------------
The class `Text` in the `text` module have static methods for extracting
text and html from messages::
import email
from emaildata.text import Text
message = email.message_from_file(open('message.eml'))
text = Text.text(message)
html = Text.html(message)
Extracting attachments
-----------------------
The method `extract` in the Attachment class returns an iterator with the decoded
contents of the attachments of a message::
import email
from emaildata.attachment import Attachment
message = email.message_from_file(open('message.eml'))
for content, filename, mimetype, message in Attachment.extract(message):
print filename
with open(filename, 'w') as stream:
stream.write(content)
# If message is not None then it is an instance of email.message.Message
if message:
print "The file {0} is a message with attachments.".format(filename)
By default this method only iterates by the attachments with a filename. To retrieve all
attachments you have to pass `False` as the second parameter (`only_with_filename`)::
import email
import mimetypes
import uuid
from emaildata.attachment import Attachment
message = email.message_from_file(open('message.eml'))
for content, filename, mimetype, message in Attachment.extract(message, False):
if not filename:
filename = str(uuid.uuid1()) + (mimetypes.guess_extension(mimetype) or '.txt')
print filename
with open(filename, 'w') as stream:
stream.write(content)
# If message is not None then it is an instance of email.message.Message
if message:
print "The file {0} is a message with attachments.".format(filename)
Changelog
---------
Version 0.3 (2015-05-3)
~~~~~~~~~~~~~~~~~~~~~~~~
- Implemented class for extracting attachments from messages.
Version 0.2 (2015-05-3)
~~~~~~~~~~~~~~~~~~~~~~~~
- Implemented class for extracting plain text and html from messages.
Version 0.1 (2015-03-15)
~~~~~~~~~~~~~~~~~~~~~~~~
- Initial version.
- Support for metadata extraction.