New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

fine-tuning 질문있습니다. #4

Closed

YooSungHyun opened this issue Apr 21, 2022 · 1 comment

YooSungHyun commented Apr 21, 2022 •

edited

Loading

huggingface 모델을 보면 vocab_size가 32인걸로 보아,
fairseq에서 제공하는 vocab_list에 4가지 speacial token을 사용한 것으로 보이는데요.

혹시 그러면 fine-tuning하셨을때, ksponspeech는 전부 한글로 되어있을텐데, 한글자소-알파벳 음차표기법에 의해서 데이터 변환해서 사용하셨나요?

또한, 콩글리쉬나 숫자같은경우 (1/하나) 이렇게 있던데, 이런 것들도 다 데이터 전처리해서 둘중 하나로 선택해서 사용하신건가요?

YooSungHyun closed this as completed

YooSungHyun reopened this

YooSungHyun changed the title ~~vocab file 질문있습니다.~~ fine-tuning 질문있습니다.

Author

YooSungHyun commented Jun 29, 2022

자문자답:

vocab은 kospeech를 활용하면 자동으로 생성됨
전처리해서 사용한 것이 맞음 (kospeech 기반)

YooSungHyun closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment