Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MS Azure Vision OCR #85

Merged
merged 23 commits into from
Jun 12, 2024
Merged

Conversation

ml5ah
Copy link
Contributor

@ml5ah ml5ah commented Jun 11, 2024

This PR implements support for MS Azure Vision OCR. This is part of the roadmap for Tarsier and provides an alternative to the existing Google Cloud Vision OCR option.

It has been reported that MS Azure OCR performs better on webpages and therefore it makes this addition a worthwhile enhancement.

Please share feedback or potential improvements.

Issue: #84

@asim-shrestha
Copy link
Contributor

Very cool. Ideally we'd have some docs or notes for how to get the azure credential

@asim-shrestha
Copy link
Contributor

also, is @anishhiranandani the same account as you @ml5ah or are you collaborating on this PR?

@ml5ah
Copy link
Contributor Author

ml5ah commented Jun 12, 2024

Sure, @asim-shrestha, I updated the readme file with details of how to create the credentials JSON for Azure as well as a link that shows the UI screen where the credentials are.

Yes, anishhiranandani and ml5ah are both me. My bad, I think my Mac was logged on to the old account (anishhiranandani).

@asim-shrestha
Copy link
Contributor

No worries! @ml5ah i'll make an azure account tomorrow and check it out. Thanks for doing this, has been on the back burner for quite some time

@asim-shrestha
Copy link
Contributor

Would you be interested in doing some additional work in this repo?

pyproject.toml Outdated
Comment on lines 15 to 16
google-cloud-vision = "^3.4.5"
azure-ai-vision-imageanalysis = "^1.0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we should make these optional dependancies going forward



async def main(credentials_path: str, url: str, verbose: bool) -> None:

async def main(credentials_path: str, ocr_service: str, url: str, verbose: bool) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will break existing versions, lets make ocr_service and optional dependancy:

Suggested change
async def main(credentials_path: str, ocr_service: str, url: str, verbose: bool) -> None:
async def main(credentials_path: str, url: str, verbose: bool, ocr_service: str = "google") -> None:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @awtkns, done.

@ml5ah
Copy link
Contributor Author

ml5ah commented Jun 12, 2024

Would you be interested in doing some additional work in this repo?

Sure, I can try to spend some time to help out based on my bandwidth. Do you have any specific items in mind?



async def main(credentials_path: str, url: str, verbose: bool) -> None:

async def main(credentials_path: str, url: str, verbose: bool, ocr_service: str = "google") -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having ocr_service be type str we should probably make a type:

OcrProvider = Literal["google", "microsoft"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, did it slightly differently using StrEnum. Defined in tarsier/ocr

Comment on lines 17 to 23
match ocr_service:
case "google":
ocr_service = GoogleVisionOCRService(credentials)

case "microsoft":
ocr_service = MicrosoftAzureOCRService(credentials)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to throw an error here as the default case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@awtkns awtkns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Would you be able to share a couple screen shot examples of it in action?

@asim-shrestha
Copy link
Contributor

@ml5ah I do have some items that are more involved. Here's an open list of them: https://reworkd.notion.site/Tarsier-optimizations-37ad18b8301a4ac085c64adf7533dfbb?pvs=25

How about two phase tagging in tag optimizations?

@ml5ah
Copy link
Contributor Author

ml5ah commented Jun 12, 2024

Here's 2 screenshots of it in action:

screenshot
Screenshot 2024-06-12 at 15 27 22

@ml5ah
Copy link
Contributor Author

ml5ah commented Jun 12, 2024

Looking good! Would you be able to share a couple screen shot examples of it in action?

Sure, @awtkns , shared. I'm having some issues with the poetry.lock file -- not sure why. Are you able to take a quick look? Thanks!

@awtkns
Copy link
Member

awtkns commented Jun 12, 2024

Yup, i can fix the poetry lock issues~

@awtkns awtkns merged commit 03eb16d into reworkd:main Jun 12, 2024
@awtkns
Copy link
Member

awtkns commented Jun 12, 2024

Thanks for this @ml5ah! Tested, looks good. I removed the strenum dependancy in favor of the built-in Literal type. Also added an e2e test.

@ml5ah
Copy link
Contributor Author

ml5ah commented Jun 13, 2024

Thanks for this @ml5ah! Tested, looks good. I removed the strenum dependancy in favor of the built-in Literal type. Also added an e2e test.

Great! Thanks @awtkns

@ml5ah
Copy link
Contributor Author

ml5ah commented Jun 13, 2024

@ml5ah I do have some items that are more involved. Here's an open list of them: https://reworkd.notion.site/Tarsier-optimizations-37ad18b8301a4ac085c64adf7533dfbb?pvs=25

How about two phase tagging in tag optimizations?

Sure, @asim-shrestha. Sounds like a very interesting problem!

@asim-shrestha
Copy link
Contributor

Cool @ml5ah! Can you make an issue and we can collaborate / answer questions there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] Add support for MS Azure Vision OCR 👀 Integrate Microsoft Doc Intelligence for OCR
4 participants