Skip to content

Commit

Permalink
Release 0.1.1
Browse files Browse the repository at this point in the history
* Pin RTFDE to lark 1.1.8
* Exclude test modules from being installed as top level module when building the wheel from source.
* Remove unnecessary shebang from non-standalone code
* Updated all licenses to consistently state LGPLv3
* Updated python version to v3.8 and removed use of v3.9 byte-manipulation methods
   - Next major change, when 3.9 is more commonly used, we will push up to 3.9 and revert it forward again.

---------
Co-authored-by: Sandro <shfu29r4bu@liamekaens.com>
  • Loading branch information
seamustuohy committed Dec 3, 2023
1 parent 66780b8 commit 5dda668
Show file tree
Hide file tree
Showing 21 changed files with 178 additions and 106 deletions.
8 changes: 0 additions & 8 deletions CONTRIBUTING.md
Expand Up @@ -109,14 +109,6 @@ log.setLevel(logging.DEBUG)
```




### Grammar Debugging

RTFDE



### Lark Debug Logs
If you want to see underlying Lark language parsing toolkit's logging you can activate its logger like this.

Expand Down
7 changes: 3 additions & 4 deletions RTFDE/__init__.py
@@ -1,12 +1,11 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Date Format: YYYY-MM-DD
#
# This file is part of RTFDE, a RTF De-Encapsulator.
# Copyright © 2020 seamus tuohy, <code@seamustuohy.com>
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# under the terms of the GNU Lesser General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option)
# any later version.
#
Expand All @@ -21,8 +20,8 @@
"""

__author__ = 'seamus tuohy'
__date__ = '2023-06-18'
__version__ = '0.1.0'
__date__ = '2023-12-03'
__version__ = '0.1.1'

import logging
from logging import NullHandler
Expand Down
3 changes: 1 addition & 2 deletions RTFDE/deencapsulate.py
@@ -1,11 +1,10 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# This file is part of RTFDE, a RTF De-Encapsulator.
# Copyright © 2020 seamus tuohy, <code@seamustuohy.com>
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# under the terms of the GNU Lesser General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option)
# any later version.
#
Expand Down
3 changes: 1 addition & 2 deletions RTFDE/exceptions.py
@@ -1,11 +1,10 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# This file is part of RTFDE, a RTF De-Encapsulator.
# Copyright © 2020 seamus tuohy, <code@seamustuohy.com>
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# under the terms of the GNU Lesser General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option)
# any later version.
#
Expand Down
3 changes: 1 addition & 2 deletions RTFDE/grammar.py
@@ -1,11 +1,10 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# This file is part of RTFDE, a RTF De-Encapsulator.
# Copyright © 2020 seamus tuohy, <code@seamustuohy.com>
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# under the terms of the GNU Lesser General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option)
# any later version.
#
Expand Down
10 changes: 7 additions & 3 deletions RTFDE/text_extraction.py
@@ -1,11 +1,10 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# This file is part of RTFDE, a RTF De-Encapsulator.
# Copyright © 2022 seamus tuohy, <code@seamustuohy.com>
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# under the terms of the GNU Lesser General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option)
# any later version.
#
Expand Down Expand Up @@ -257,7 +256,12 @@ def unicode_escape_to_chr(item: bytes) -> str:
ValueError: The escaped unicode character is not valid.
"""
try:
nnnn = int(item.removeprefix(b'\\u')) # raises ValueError if not int.
prefix = b'\\u'
if item.startswith(prefix):
nnnn = item[len(prefix):]
else:
nnnn = item
nnnn = int(nnnn) # raises ValueError if not int.
except ValueError as _e:
raise ValueError(f"`{item}` is not a valid escaped unicode character.") from _e
if nnnn < 0: # § -NNNNN is a negative integer expressed in decimal digits
Expand Down
3 changes: 1 addition & 2 deletions RTFDE/transformers.py
@@ -1,11 +1,10 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# This file is part of RTFDE, a RTF De-Encapsulator.
# Copyright © 2020 seamus tuohy, <code@seamustuohy.com>
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# under the terms of the GNU Lesser General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option)
# any later version.
#
Expand Down
3 changes: 1 addition & 2 deletions RTFDE/utils.py
@@ -1,11 +1,10 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# This file is part of package name, a package description short.
# Copyright © 2022 seamus tuohy, <code@seamustuohy.com>
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# under the terms of the GNU Lesser General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option)
# any later version.
#
Expand Down
61 changes: 41 additions & 20 deletions docs/RTFDE/deencapsulate.html
Expand Up @@ -26,14 +26,13 @@ <h1 class="title">Module <code>RTFDE.deencapsulate</code></h1>
<summary>
<span>Expand source code</span>
</summary>
<pre><code class="python">#!/usr/bin/env python3
# -*- coding: utf-8 -*-
<pre><code class="python"># -*- coding: utf-8 -*-
#
# This file is part of RTFDE, a RTF De-Encapsulator.
# Copyright © 2020 seamus tuohy, &lt;code@seamustuohy.com&gt;
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# under the terms of the GNU Lesser General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option)
# any later version.
#
Expand All @@ -47,12 +46,13 @@ <h1 class="title">Module <code>RTFDE.deencapsulate</code></h1>
from lark import Lark
from lark.tree import Tree
from lark.lexer import Token
from lark.exceptions import UnexpectedInput

from RTFDE.transformers import RTFCleaner, StripControlWords
from RTFDE.transformers import StripNonVisibleRTFGroups
from RTFDE.transformers import StripUnusedSpecialCharacters
from RTFDE.utils import encode_escaped_control_chars
from RTFDE.utils import log_validators, log_transformations
from RTFDE.utils import log_validators, log_transformations, is_logger_on
from RTFDE.transformers import get_stripped_HTMLRTF_values, DeleteTokensFromTree, strip_binary_objects
from RTFDE.grammar import make_concise_grammar
from RTFDE.text_extraction import TextDecoder
Expand Down Expand Up @@ -136,8 +136,12 @@ <h1 class="title">Module <code>RTFDE.deencapsulate</code></h1>
self.found_binary = found_binary
log.info(&#34;Binary data found and extracted from rtf file.&#34;)
escaped_rtf = encode_escaped_control_chars(non_binary_rtf)
log_transformations(escaped_rtf)
self.parse_rtf(escaped_rtf)
if is_logger_on(&#34;RTFDE.transform_logger&#34;) is True:
log_transformations(escaped_rtf)
try:
self.parse_rtf(escaped_rtf)
except UnexpectedInput as _e:
raise MalformedEncapsulatedRtf(f&#34;Malformed encapsulated RTF discovered:&#34;) from _e
Decoder = TextDecoder()
Decoder.update_children(self.full_tree)
self.get_doc_tree()
Expand Down Expand Up @@ -260,7 +264,8 @@ <h1 class="title">Module <code>RTFDE.deencapsulate</code></h1>
# debug=True,
propagate_positions=True)
self.full_tree = self.parser.parse(rtf)
log_transformations(self.full_tree)
if is_logger_on(&#34;RTFDE.transform_logger&#34;) is True:
log_transformations(self.full_tree)


def strip_htmlrtf_tokens(self) -&gt; Tree:
Expand Down Expand Up @@ -317,7 +322,8 @@ <h1 class="title">Module <code>RTFDE.deencapsulate</code></h1>
operating_tokens.append(token)
else:
operating_tokens += list(token.scan_values(lambda t: t.type == &#39;CONTROLWORD&#39;))
log_validators(f&#34;Header tokens being evaluated: {operating_tokens}&#34;)
if is_logger_on(&#34;RTFDE.validation_logger&#34;) is True:
log_validators(f&#34;Header tokens being evaluated: {operating_tokens}&#34;)

for token in operating_tokens:
cw_found,found_token = self.check_from_token(token=token, cw_found=cw_found)
Expand Down Expand Up @@ -386,7 +392,8 @@ <h1 class="title">Module <code>RTFDE.deencapsulate</code></h1>
first_token = doc_tree.children[0].value
if first_token != b&#34;\\rtf1&#34;:
log.debug(&#34;RTF stream does not contain valid valid RTF document heading. The file must start with \&#34;{\\rtf1\&#34;&#34;)
log_validators(f&#34;First child object in document tree is: {first_token!r}&#34;)
if is_logger_on(&#34;RTFDE.validation_logger&#34;) is True:
log_validators(f&#34;First child object in document tree is: {first_token!r}&#34;)
raise MalformedRtf(&#34;RTF stream does not start with {\\rtf1&#34;)

@staticmethod
Expand Down Expand Up @@ -540,8 +547,12 @@ <h2 id="raises">Raises</h2>
self.found_binary = found_binary
log.info(&#34;Binary data found and extracted from rtf file.&#34;)
escaped_rtf = encode_escaped_control_chars(non_binary_rtf)
log_transformations(escaped_rtf)
self.parse_rtf(escaped_rtf)
if is_logger_on(&#34;RTFDE.transform_logger&#34;) is True:
log_transformations(escaped_rtf)
try:
self.parse_rtf(escaped_rtf)
except UnexpectedInput as _e:
raise MalformedEncapsulatedRtf(f&#34;Malformed encapsulated RTF discovered:&#34;) from _e
Decoder = TextDecoder()
Decoder.update_children(self.full_tree)
self.get_doc_tree()
Expand Down Expand Up @@ -664,7 +675,8 @@ <h2 id="raises">Raises</h2>
# debug=True,
propagate_positions=True)
self.full_tree = self.parser.parse(rtf)
log_transformations(self.full_tree)
if is_logger_on(&#34;RTFDE.transform_logger&#34;) is True:
log_transformations(self.full_tree)


def strip_htmlrtf_tokens(self) -&gt; Tree:
Expand Down Expand Up @@ -721,7 +733,8 @@ <h2 id="raises">Raises</h2>
operating_tokens.append(token)
else:
operating_tokens += list(token.scan_values(lambda t: t.type == &#39;CONTROLWORD&#39;))
log_validators(f&#34;Header tokens being evaluated: {operating_tokens}&#34;)
if is_logger_on(&#34;RTFDE.validation_logger&#34;) is True:
log_validators(f&#34;Header tokens being evaluated: {operating_tokens}&#34;)

for token in operating_tokens:
cw_found,found_token = self.check_from_token(token=token, cw_found=cw_found)
Expand Down Expand Up @@ -790,7 +803,8 @@ <h2 id="raises">Raises</h2>
first_token = doc_tree.children[0].value
if first_token != b&#34;\\rtf1&#34;:
log.debug(&#34;RTF stream does not contain valid valid RTF document heading. The file must start with \&#34;{\\rtf1\&#34;&#34;)
log_validators(f&#34;First child object in document tree is: {first_token!r}&#34;)
if is_logger_on(&#34;RTFDE.validation_logger&#34;) is True:
log_validators(f&#34;First child object in document tree is: {first_token!r}&#34;)
raise MalformedRtf(&#34;RTF stream does not start with {\\rtf1&#34;)

@staticmethod
Expand Down Expand Up @@ -952,7 +966,8 @@ <h2 id="raises">Raises</h2>
first_token = doc_tree.children[0].value
if first_token != b&#34;\\rtf1&#34;:
log.debug(&#34;RTF stream does not contain valid valid RTF document heading. The file must start with \&#34;{\\rtf1\&#34;&#34;)
log_validators(f&#34;First child object in document tree is: {first_token!r}&#34;)
if is_logger_on(&#34;RTFDE.validation_logger&#34;) is True:
log_validators(f&#34;First child object in document tree is: {first_token!r}&#34;)
raise MalformedRtf(&#34;RTF stream does not start with {\\rtf1&#34;)</code></pre>
</details>
</dd>
Expand Down Expand Up @@ -981,8 +996,12 @@ <h3>Methods</h3>
self.found_binary = found_binary
log.info(&#34;Binary data found and extracted from rtf file.&#34;)
escaped_rtf = encode_escaped_control_chars(non_binary_rtf)
log_transformations(escaped_rtf)
self.parse_rtf(escaped_rtf)
if is_logger_on(&#34;RTFDE.transform_logger&#34;) is True:
log_transformations(escaped_rtf)
try:
self.parse_rtf(escaped_rtf)
except UnexpectedInput as _e:
raise MalformedEncapsulatedRtf(f&#34;Malformed encapsulated RTF discovered:&#34;) from _e
Decoder = TextDecoder()
Decoder.update_children(self.full_tree)
self.get_doc_tree()
Expand Down Expand Up @@ -1148,7 +1167,8 @@ <h2 id="args">Args</h2>
# debug=True,
propagate_positions=True)
self.full_tree = self.parser.parse(rtf)
log_transformations(self.full_tree)</code></pre>
if is_logger_on(&#34;RTFDE.transform_logger&#34;) is True:
log_transformations(self.full_tree)</code></pre>
</details>
</dd>
<dt id="RTFDE.deencapsulate.DeEncapsulator.set_content"><code class="name flex">
Expand Down Expand Up @@ -1234,7 +1254,8 @@ <h2 id="raises">Raises</h2>
operating_tokens.append(token)
else:
operating_tokens += list(token.scan_values(lambda t: t.type == &#39;CONTROLWORD&#39;))
log_validators(f&#34;Header tokens being evaluated: {operating_tokens}&#34;)
if is_logger_on(&#34;RTFDE.validation_logger&#34;) is True:
log_validators(f&#34;Header tokens being evaluated: {operating_tokens}&#34;)

for token in operating_tokens:
cw_found,found_token = self.check_from_token(token=token, cw_found=cw_found)
Expand Down Expand Up @@ -1363,4 +1384,4 @@ <h4><code><a title="RTFDE.deencapsulate.DeEncapsulator" href="#RTFDE.deencapsula
<p>Generated by <a href="https://pdoc3.github.io/pdoc" title="pdoc: Python API documentation generator"><cite>pdoc</cite> 0.10.0</a>.</p>
</footer>
</body>
</html>
</html>
7 changes: 3 additions & 4 deletions docs/RTFDE/exceptions.html
Expand Up @@ -26,14 +26,13 @@ <h1 class="title">Module <code>RTFDE.exceptions</code></h1>
<summary>
<span>Expand source code</span>
</summary>
<pre><code class="python">#!/usr/bin/env python3
# -*- coding: utf-8 -*-
<pre><code class="python"># -*- coding: utf-8 -*-
#
# This file is part of RTFDE, a RTF De-Encapsulator.
# Copyright © 2020 seamus tuohy, &lt;code@seamustuohy.com&gt;
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# under the terms of the GNU Lesser General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option)
# any later version.
#
Expand Down Expand Up @@ -200,4 +199,4 @@ <h4><code><a title="RTFDE.exceptions.UnsupportedRTFFormat" href="#RTFDE.exceptio
<p>Generated by <a href="https://pdoc3.github.io/pdoc" title="pdoc: Python API documentation generator"><cite>pdoc</cite> 0.10.0</a>.</p>
</footer>
</body>
</html>
</html>
7 changes: 3 additions & 4 deletions docs/RTFDE/grammar.html
Expand Up @@ -26,14 +26,13 @@ <h1 class="title">Module <code>RTFDE.grammar</code></h1>
<summary>
<span>Expand source code</span>
</summary>
<pre><code class="python">#!/usr/bin/env python3
# -*- coding: utf-8 -*-
<pre><code class="python"># -*- coding: utf-8 -*-
#
# This file is part of RTFDE, a RTF De-Encapsulator.
# Copyright © 2020 seamus tuohy, &lt;code@seamustuohy.com&gt;
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# under the terms of the GNU Lesser General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option)
# any later version.
#
Expand Down Expand Up @@ -595,4 +594,4 @@ <h1>Index</h1>
<p>Generated by <a href="https://pdoc3.github.io/pdoc" title="pdoc: Python API documentation generator"><cite>pdoc</cite> 0.10.0</a>.</p>
</footer>
</body>
</html>
</html>
7 changes: 3 additions & 4 deletions docs/RTFDE/index.html
Expand Up @@ -28,15 +28,14 @@ <h1 class="title">Package <code>RTFDE</code></h1>
<summary>
<span>Expand source code</span>
</summary>
<pre><code class="python">#!/usr/bin/env python3
# -*- coding: utf-8 -*-
<pre><code class="python"># -*- coding: utf-8 -*-
# Date Format: YYYY-MM-DD
#
# This file is part of RTFDE, a RTF De-Encapsulator.
# Copyright © 2020 seamus tuohy, &lt;code@seamustuohy.com&gt;
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# under the terms of the GNU Lesser General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option)
# any later version.
#
Expand Down Expand Up @@ -124,4 +123,4 @@ <h1>Index</h1>
<p>Generated by <a href="https://pdoc3.github.io/pdoc" title="pdoc: Python API documentation generator"><cite>pdoc</cite> 0.10.0</a>.</p>
</footer>
</body>
</html>
</html>

0 comments on commit 5dda668

Please sign in to comment.