Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certificate parser update #1981

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

patrickdalla
Copy link
Collaborator

@patrickdalla patrickdalla commented Nov 14, 2023

Closes #1978
CertificateParser was refactored to extract certificates as subitems. So it can be used in conjunction with tika PKCS7Parser.

Tika PKCS7Parser ignores any certificate information, extracting only the content of the signed PKCS7 file to be parser. So, CertificateParser would be responsible for internal certificates information extraction when configured in conjunction with Tika PKCS7Parser.

To test:
1)User certificates can be exported as p7b files from Windows with full certificate chain. These kind of files will have each certificate extracted as subitem. Though PKCS7Parser will throw an exception as it does not have any content. This exception will be registered in metadata X-TIKA:EXCEPTION:embedded_exception.

  1. Real signed files formated as PKCS7 have the certificates and its corresponding signed content. These kind of files will have its certificates used to sign parsed as subitems, and the content parsed by Tika PKCS7Parser.

@lfcnassif lfcnassif mentioned this pull request Nov 16, 2023
@patrickdalla
Copy link
Collaborator Author

For testing:
ARQUIVOS_PROCESSO_202311161041346480.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review CertificateParser to support new tika "x-x509-cert" contentType.
1 participant