-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/ENH: make attachements compatible with kids, and allow list in RF #2197
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2197 +/- ##
==========================================
- Coverage 94.45% 92.53% -1.93%
==========================================
Files 43 43
Lines 7650 7930 +280
Branches 1511 1576 +65
==========================================
+ Hits 7226 7338 +112
- Misses 262 393 +131
- Partials 162 199 +37
☔ View full report in Codecov by Sentry. |
@MartinThoma |
@MartinThoma therefore attachements should return a dictionary where keys are the names in the dictionnary the point to output the return within a list is here to not change the interface. based on the better understanding of the spec, a list can not have more than on entry. |
I'm sorry, I don't understand the dilemma. What are the options we have?
I don't understand. Can you give me a pseudo-code example? |
As the key in the name tree are uniques, the attachements can only return one file specification stored in my dilemma is: we will get the following interface: class AttachmentBytes(bytes):
...
class PdfReader: # the same for PdfWriter
...
@property
def attachments(self) -> Mapping[str, Union[AttachmentBytes, Dict[str, AttachmentBytes]]]:
... |
The current signature is def attachments(self) -> Mapping[str, List[bytes]] What is Changing the signature to |
@MartinThoma |
@MartinThoma |
@MartinThoma |
Sorry, I forgot about this 🙈 A bit of notes for myself: RF is short for "Related Files" and "EF" is short for "Embedded File"; see Table 44 – Entries in a file specification dictionary. It is an optional entry with a dictionary:
This PR attempts to solve two issues:
I'd prefer to have a PR + tests which deal with #2087 first before we fix #2090 or do bigger refactorings. First dealing with #2087 would make the current PR smaller, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's super hard for me to review this PR. Would you mind if I made a couple of smaller merges with the trivial parts (e.g. the constants, some smaller tests)?
I'd use the GitHub Co-authored-by feature to give you full credit, of course.
@property | ||
def attachments_names(self) -> List[str]: | ||
""" | ||
Returns: | ||
dictionary of filename -> Union[bytestring or List[ByteString]] | ||
if the filename exists multiple times a List of the different version will be provided | ||
List of names | ||
""" | ||
catalog = cast(DictionaryObject, self.trailer["/Root"]) | ||
# From the catalog get the embedded file names | ||
try: | ||
filenames = cast( | ||
ArrayObject, | ||
cast( | ||
DictionaryObject, | ||
cast(DictionaryObject, catalog["/Names"])["/EmbeddedFiles"], | ||
)["/Names"], | ||
) | ||
except KeyError: | ||
return {} | ||
attachments: Dict[str, Union[bytes, List[bytes]]] = {} | ||
# Loop through attachments | ||
for i in range(len(filenames)): | ||
f = filenames[i] | ||
if isinstance(f, str): | ||
if filename is not None and f != filename: | ||
continue | ||
name = f | ||
f_dict = filenames[i + 1].get_object() | ||
f_data = f_dict["/EF"]["/F"].get_data() | ||
if name in attachments: | ||
if not isinstance(attachments[name], list): | ||
attachments[name] = [attachments[name]] # type:ignore | ||
attachments[name].append(f_data) # type:ignore | ||
else: | ||
attachments[name] = f_data | ||
return attachments | ||
return self.attachments.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to have the new attachments_names
property. I think users and pypdf itself should call self.attachments.keys()
directly.
F = "/F" # A file specification string of the file as described in Section 3.10.1 | ||
UF = "/UF" # A unicode string of the file as described in Section 3.10.1 | ||
EF = "/EF" # dictionary, containing a subset of the keys F , UF , DOS , Mac , and Unix | ||
RF = "/RF" # dictionary, containing arrays of /EmbeddedFile | ||
DESC = "/Desc" # description of the file as de |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those could be in an individual PR - that would be easy to review/quick to merge :-)
@pubpub-zz Should we close this PR? |
closes #2087
closes #2090
add also compatibility with RF (adding list)
still in progress