-
Notifications
You must be signed in to change notification settings - Fork 781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
大家一般怎么做数据标注的,如果我的训练样本有5000个图片,怎么快速标注? #163
Comments
唉,我也标到吐血,觉得最不科学就是这一步 |
判断验证码是什么库生成的,写一个对应库的示例程序,生成验证码图片,图片名字符合 验证码_序列号 的规则。 |
如果不是用程序模拟生成的验证码而是要用目标站点的验证码,推荐: 用python写个爬虫脚本 + 付费的验证码识别服务(一般几分钱一个验证码),如果填写的验证码在目标网站验证通过,则保存该验证码&对应标注信息,这样就可以得到一个绝对正确的标注集了。 人工标注的话太费眼睛了。。。 |
我的项目刚好也有 5000 个样本,是这么标注的:
一上午就搞完了。 |
我的思路是看目标站的程序是不是网上找的到的,能找到就直接把程序中生成验证码的部分拆出来,然后随机生成字符,用源程序的生成代码生成图片,然后保存。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
我从我项目的页面中,爬下来5千个验证码图片。
在做训练前,应该是需要对这5000个图片进行标注的吧,请问大家都是怎么快速标注的?
The text was updated successfully, but these errors were encountered: