Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong coordinate #7

Closed
daaniyaan opened this issue Nov 28, 2023 · 11 comments
Closed

wrong coordinate #7

daaniyaan opened this issue Nov 28, 2023 · 11 comments

Comments

@daaniyaan
Copy link

daaniyaan commented Nov 28, 2023

I asked it to play Spotify and it guessed the play bottom at x 78% y 46% which is wrong.

screenshot_with_grid

maybe for a more detailed guess we can have more gridlines?
something like this maybe

CleanShot 2023-11-28 at 14 58 46

@daaniyaan
Copy link
Author

daaniyaan commented Nov 28, 2023

tried setting grid_interval as 100 instead of 500.
still no success.
it guess as 50 93 which is again not correct.

screenshot_with_grid

@Mekcyed
Copy link

Mekcyed commented Nov 30, 2023

@rohanarun Can you please stop spamming your message under EVERY Issue. Thanks

@Mekcyed
Copy link

Mekcyed commented Nov 30, 2023

@daaniyaan I tested today another apporach. I hink it would be best to use a multi-step-apporach.

  1. Use a wider Grid to let GPT Vision define Areas of Intereset and return wider coordinates.
  2. Use these coordinates to crop the screenshot and let GPT Vision do a better guess of find the needed buttons etc.

With a wide Image as the fullscreen Images it might be necessary to do this apporach not only once?

@daaniyaan
Copy link
Author

daaniyaan commented Nov 30, 2023

@daaniyaan I tested today another apporach. I hink it would be best to use a multi-step-apporach.

  1. Use a wider Grid to let GPT Vision define Areas of Intereset and return wider coordinates.
  2. Use these coordinates to crop the screenshot and let GPT Vision do a better guess of find the needed buttons etc.

With a wide Image as the fullscreen Images it might be necessary to do this apporach not only once?

this could be a good approach. how many times did you test it? noticed improvement?

@Mekcyed
Copy link

Mekcyed commented Nov 30, 2023

Tested quite a lot today. I noticed that it can handle wider grid sizes better.

Had this as first image:
tmp_50499e74-bebb-4e81-a653-cde599e3cdc8.png

And this after cropping it to the area of interest:

tmp_512325ca-89b0-4588-b060-908a3b891d66.png

@Mekcyed
Copy link

Mekcyed commented Nov 30, 2023

Dude I am just a random guy who finds the idea of using got vison interesting. Didn't even contribute a single thing to this repo.

Chill down

@OthersideAI OthersideAI deleted a comment from rohanarun Dec 6, 2023
@joshbickett
Copy link
Contributor

wide Image as the fullscreen Images it might be necessary to do this apporach not only once?

@Mekcyed This is indeed an interesting approach. I recommend checking out this PR that is working on something similar: #57

If you want to work on this initiative, I'd love to see a PR. Essentially you could do a "binary-search" on the screen, cutting it into ever smaller pieces to find the right coordinance to click!

@OthersideAI OthersideAI deleted a comment from rohanarun Dec 6, 2023
@joshbickett
Copy link
Contributor

Sorry about @rohanarun, I've blocked him from this repo now

@OthersideAI OthersideAI deleted a comment from rohanarun Dec 6, 2023
@OthersideAI OthersideAI deleted a comment from rohanarun Dec 6, 2023
@OthersideAI OthersideAI deleted a comment from rohanarun Dec 6, 2023
@joshbickett
Copy link
Contributor

@Mekcyed @daaniyaan we created a discord channel for Self-operating computer on our Discord here if you want to join and discuss more. Great seeing all these ideas!

@joshbickett
Copy link
Contributor

Let me know if you guys are done with this discuss here and want to move to discord and I'll close this ticket

@joshbickett
Copy link
Contributor

I've not head more on this discussion so I'll close this ticket for now. Feel free to open another issue if you have one or join our discord to continue the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants