Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idea #66

Closed
alafortu opened this issue Dec 2, 2023 · 5 comments
Closed

idea #66

alafortu opened this issue Dec 2, 2023 · 5 comments

Comments

@alafortu
Copy link

alafortu commented Dec 2, 2023

I played with gpt4V on other projects and it definitely has a hard time figuring out coordinates. I used other model trained on image identification to find the coordinates of the box made around the object detected and then I can pass it to gpt 4 to perform an action. For your use case, I juste tested this model "https://huggingface.co/foduucom/web-form-ui-field-detection" Far from being perfect, but maybe an idea to build on. If you auto computer can detect and get the proper coordinates of the input fields in an image, it could help or at least add a level of redundancy to improve accuracy in clicking and inputing stuff at the right places.

@Bunger-Beesechurger
Copy link

Bunger-Beesechurger commented Dec 2, 2023

@rohanarun
I'm not a contributor to this github, just part of the audience usually, but this seems earlier than your video. Early August is when this article came out, so it's been in the works even earlier than that. Stop spamming every issue. You said you've been working on your thing for over a year, but how much of the info came out before your video? I don't know whether it's plagiarizing or not, and if it is, I'm sorry. However, I can still be annoyed that on what should be a cool new project for tech advancement, we have to figure out if something is stealing or not.

Screenshot (813)

Article says "HyperWriteAI" and from this github's own main page: "Ongoing Development
At HyperwriteAI, we are developing Agent-1-Vision a multimodal model with more accurate click location predictions" so it is referencing this project.

@Kreijstal
Copy link

I mean you are saying you have a custom model, but all I see it's propietary and business products, your custom model is handwritten for the cases, but this is gpt-4V so it's not a rip off, they just had the idea (wouldn't it be cool if gpt-4 could control computers) and open sourced it first 🤷. It can't be a rip off because you started without gpt-4v, you trained a propietary custom model, these guys just did prompt engineering and got it wit gpt-4v to work, without taking any custom models.

If these guys get more fame it's because they open sourced it first, and then it's first come first serve. I think it's fair. imho.

Also your insecurity is showing, if your product was really good there is no need to spam it on every issue. Just give us something better and people will naturally flock to it.

@James4Ever0
Copy link

James4Ever0 commented Dec 3, 2023

Keep posting these will not help. AGI is for everyone, truely democratic.
It has been a long time that not a single company has wielded the wand towards the field of autonomous computers, until now. I have been waiting for this very moment for so long. It must be open source, and it will change the human history for good.

@James4Ever0
Copy link

James4Ever0 commented Dec 3, 2023

For inspiration, please check #37 #32

@michaelhhogue
Copy link
Collaborator

@alafortu Thanks for the suggestion. Low accuracy with GPT-4v is a known issue at the moment, and support for other models is planned in the future.

@OthersideAI OthersideAI deleted a comment from rohanarun Dec 4, 2023
@OthersideAI OthersideAI deleted a comment from rohanarun Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants