Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃殌 Remove Annotations and Tag All text elements (optionally) #8

Merged
merged 6 commits into from
Nov 15, 2023

Conversation

awtkns
Copy link
Contributor

@awtkns awtkns commented Nov 14, 2023

  • Tarsier now tags using spans with a unique ID (allows easy removal)
  • Removes annotations after tarsier is done with them
  • Adds option to tag all text elements

Bonus: Since we are now annotating using spans we can now add CSS to the annotations so that vision model can see them easier

fixes #5 #7

tarsier/core.py Outdated
@@ -16,14 +16,15 @@ def __init__(self, ocr_service: OCRService):
with open(self._JS_TAG_UTILS, "r") as f:
self._js_utils = f.read()

async def page_to_image(self, driver: AnyDriver) -> Tuple[bytes, Dict[int, str]]:
async def page_to_image(self, driver: AnyDriver, tagUninteractableText: bool = False) -> Tuple[bytes, Dict[int, str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async def page_to_image(self, driver: AnyDriver, tagUninteractableText: bool = False) -> Tuple[bytes, Dict[int, str]]:
async def page_to_image(self, driver: AnyDriver, tag_text_elements: bool = False) -> Tuple[bytes, Dict[int, str]]:

tarsier/core.py Outdated
return {int(key): value for key, value in tag_to_xpath.items()}

async def _remove_tags(self, adapter: BrowserAdapter) -> None:
# await adapter.run_js(self._js_utils)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? We call this after tagging the page

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh do you mean the comment? Yes I think we could鈥攕ince it's private and only ever called after running _tag_page, the utils should alr be loaded

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just the comment

tarsier/core.py Outdated
Comment on lines 66 to 69
script = "removeTags();"
if isinstance(adapter, SeleniumAdapter):
script = f"return window.{script}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstractions shouldn't leak like this. Could make a call_method function or something in driver. Could also just pass in JS code directly to run

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah, true. Thoughts @awtkns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@asim-shrestha
Copy link
Contributor

Run black 猬涳笍

@awtkns
Copy link
Contributor Author

awtkns commented Nov 14, 2023

LGTM!

pyproject.toml Outdated Show resolved Hide resolved
@awtkns awtkns merged commit bcb3be0 into main Nov 15, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

馃悰 Annotations not removed when filling out inputs
3 participants