Extracting and showing PDF metadata (title and author) #447

dinhani · 2023-05-05T05:17:42Z

I have some PDF files where the filename are just MD5 hashes.
What I need for these files is to show the PDF title that is present in the PDF metadata instead of the filename.
So what I am doing here is extracting author and title from the metadata and showing them when available.

Personally I would prefer title to be displayed with the other tags like the Author/Owner, but I don't know if Tags are appropriate for that.

If you think this is useful, I will add some tests for PDF parsing because they are missing.

a5huynh · 2023-05-05T16:27:07Z

Hi @dinhani, thanks for opening this pull request! The "Author" tag is definitely appropriate here if you don't mind adding that in. Let us know when you're ready for this to go through review

…in the search

dinhani · 2023-05-06T01:32:55Z

@a5huynh It is ready for review.

I decided to show the document title under the filename because I think it is important to show both information, so I created a subtitle section.

The general idea is to have something reusable for extracting metadata from other file types like .docx, .epub and .mobi, for example.

a5huynh · 2023-05-09T06:26:52Z

Thanks @dinhani! I'll take a look tomorrow and merge this in if all looks good 😄 . We'll try and get a release out by the end of the week w/ the updates!

a5huynh · 2023-05-09T16:44:16Z

Everything looks good @dinhani , merging this in!

feat: extracting pdf title and author from metadata and showing them …

aa2939a

…in the search

dinhani marked this pull request as ready for review May 6, 2023 01:25

a5huynh merged commit 867936e into spyglass-search:main May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting and showing PDF metadata (title and author) #447

Extracting and showing PDF metadata (title and author) #447

dinhani commented May 5, 2023

a5huynh commented May 5, 2023

dinhani commented May 6, 2023

a5huynh commented May 9, 2023

a5huynh commented May 9, 2023 •

edited

Loading

Extracting and showing PDF metadata (title and author) #447

Extracting and showing PDF metadata (title and author) #447

Conversation

dinhani commented May 5, 2023

a5huynh commented May 5, 2023

dinhani commented May 6, 2023

a5huynh commented May 9, 2023

a5huynh commented May 9, 2023 • edited Loading

a5huynh commented May 9, 2023 •

edited

Loading