Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pdf component #138

Merged
merged 21 commits into from
May 30, 2024
Merged

feat: add pdf component #138

merged 21 commits into from
May 30, 2024

Conversation

chuang8511
Copy link
Member

@chuang8511 chuang8511 commented May 20, 2024

Because

  • We want users to transform PDF to markdown format file

This commit

  • Transform PDF to markdown in VDP.

Note

  • Python code should have its own test code. So, I mock cmd.
    • However, the current pdfTransformer is PoC for the first phase, so I did not add the test code for python.

Copy link

linear bot commented May 20, 2024

@chuang8511 chuang8511 marked this pull request as ready for review May 21, 2024 13:07
pkg/operator/pdf/v0/config/definition.json Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/config/definition.json Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/config/definition.json Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/config/definition.json Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/config/tasks.json Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/convertPdfToMarkdown.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/main.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/main.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/convertPdfToMarkdown_test.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/python/pdfTransformer.py Outdated Show resolved Hide resolved
Copy link
Member Author

@chuang8511 chuang8511 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied comments and will push the modification soon!

pkg/operator/pdf/v0/main.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/main.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/python/pdfTransformer.py Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/mockCommand.go Outdated Show resolved Hide resolved
@chuang8511
Copy link
Member Author

image
I was trying to look for where we set this Pdf.
But, it seems it transforms in frontend.

pkg/mock/generator.go Outdated Show resolved Hide resolved
@chuang8511 chuang8511 requested a review from jvallesm May 22, 2024 14:23
@jvallesm
Copy link
Collaborator

image I was trying to look for where we set this Pdf. But, it seems it transforms in frontend.

I see, I think it would be beneficial to have a title field in the base task object, so we only rely on the frontend transformation as a fallback. In cases where we have acronyms like here we can't just use the simplistic transformation (_ -> + Title case). Please report this to frontend / product so we can prioritise this. For now I don't think this should prevent us from shipping the feature.

Copy link
Collaborator

@jvallesm jvallesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my suggestions. There are a few unaddressed comments but they're design / cosmetic suggestions rather than blocking change requests. Let me know what you think about them, if you intend to address any of them, and let's move forward when all the conversations are resolved.

pkg/mock/generator.go Outdated Show resolved Hide resolved
pkg/mock/generator.go Outdated Show resolved Hide resolved
pkg/mock/usage_handler_mock.gen.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/main.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/struct.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/convert_pdf_to_markdown_test.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/convertPdfToMarkdown_test.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/main.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/main.go Outdated Show resolved Hide resolved
pkg/operator/pdf/v0/python/pdfTransformer.py Outdated Show resolved Hide resolved
@chuang8511
Copy link
Member Author

@jvallesm
#138 (comment)

I am not quite familiar with GitHub UXUI.
I have taken this out and fixed all Pdf to PDF already!
Just sync this information here with you.

@chuang8511 chuang8511 requested a review from jvallesm May 23, 2024 13:19
jvallesm
jvallesm previously approved these changes May 23, 2024
Copy link
Collaborator

@jvallesm jvallesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏋️ 💦 Thanks for the thorough revision.

This is good to go 🚀

@chuang8511 chuang8511 marked this pull request as draft May 23, 2024 17:56
@chuang8511
Copy link
Member Author

For the reviewers @jvallesm (( cc'd @donch1989
I tested with Dockerfile, and it seems the code is compiled.
So, it seems I cannot execute the code with the way by the path.

So, I am testing to put all python code in Go file and execute it directly, which I also do not like.
But, it seems to be the easy way to deal with it.

I will push my code on Friday and notify you again to sync with you the result.

@donch1989
Copy link
Member

donch1989 commented May 27, 2024

@chuang8511
You can try to use go embed. This can help you embed non-golang resources in the golang package.

@donch1989
Copy link
Member

Hi @chuang8511
We did some structural refactoring in this PR. You will need to rebase your branch. I guess it won't be too difficult since most of the changes are just moving files.

@chuang8511 chuang8511 marked this pull request as ready for review May 28, 2024 11:29
@donch1989
Copy link
Member

@chuang8511
I'll merge this PR first. We can fix any bugs later if there are any.

@donch1989 donch1989 merged commit 517afcf into main May 30, 2024
8 checks passed
@donch1989 donch1989 deleted the chunhao/ins-3098 branch May 30, 2024 03:04
donch1989 pushed a commit that referenced this pull request Jun 6, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.19.0-beta](v0.18.0-beta...v0.19.0-beta)
(2024-06-05)


### Features

* add pdf component
([#138](#138))
([517afcf](517afcf))
* add task chunk text
([#139](#139))
([7b36553](7b36553))
* **instill:** adopt latest Model endpoints
([#146](#146))
([7f2537b](7f2537b))
* optimise ux for slack component
([#143](#143))
([ed60235](ed60235))
* refactor package structure
([#140](#140))
([4853d4c](4853d4c))
* support markdown to text function in text operator
([#149](#149))
([dcbae37](dcbae37))
* unify component interface
([#144](#144))
([ad35e10](ad35e10))


### Bug Fixes

* bug of failure of document component
([#152](#152))
([aed51f8](aed51f8))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
namwoam pushed a commit to namwoam/component that referenced this pull request Jun 24, 2024
Because

- We want users to transform PDF to markdown format file

This commit

- Transform PDF to markdown in VDP.

Note
- Python code should have its own test code. So, I mock `cmd`.
- However, the current pdfTransformer is PoC for the first phase, so I
did not add the test code for python.
namwoam pushed a commit to namwoam/component that referenced this pull request Jun 24, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.19.0-beta](instill-ai/component@v0.18.0-beta...v0.19.0-beta)
(2024-06-05)


### Features

* add pdf component
([instill-ai#138](instill-ai#138))
([517afcf](instill-ai@517afcf))
* add task chunk text
([instill-ai#139](instill-ai#139))
([7b36553](instill-ai@7b36553))
* **instill:** adopt latest Model endpoints
([instill-ai#146](instill-ai#146))
([7f2537b](instill-ai@7f2537b))
* optimise ux for slack component
([instill-ai#143](instill-ai#143))
([ed60235](instill-ai@ed60235))
* refactor package structure
([instill-ai#140](instill-ai#140))
([4853d4c](instill-ai@4853d4c))
* support markdown to text function in text operator
([instill-ai#149](instill-ai#149))
([dcbae37](instill-ai@dcbae37))
* unify component interface
([instill-ai#144](instill-ai#144))
([ad35e10](instill-ai@ad35e10))


### Bug Fixes

* bug of failure of document component
([instill-ai#152](instill-ai#152))
([aed51f8](instill-ai@aed51f8))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: 👋 Done
Development

Successfully merging this pull request may close these issues.

4 participants