Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Vietnamese sign language #73

Closed

Conversation

nqkhanh2002
Copy link

Description:

This PR adds support for the Vietnamese sign language dataset. Download the required video files manually from this link and place them in the sign_language_datasets/datasets/vn_sign/manual/vn_sign directory.

@AmitMY
Copy link
Contributor

AmitMY commented Jun 22, 2024

hi! so this is not really a dataset, it is only the alphabet.
Do you not want to include a dataset full of signs of Vietnamese?

@nqkhanh2002
Copy link
Author

I will try to review the format on the repo's available datasets and recreate the dataset

@nqkhanh2002
Copy link
Author

Hi @AmitMY ,
I have collected about 4000 sign language videos from this https://qipedc.moet.gov.vn/dictionary. I have looked at previous data sets like autsl , the next thing I need to create is openpose, comprehensive so that translate can support Vietnamese, right? Thank you for your quick support!

@AmitMY
Copy link
Contributor

AmitMY commented Jun 27, 2024

if your collection is automatic, you can update the data loader you wrote, to download these videos.
if you just downloaded them otherwise, you can use video_to_pose from https://github.com/sign-language-processing/pose to create MediaPipe poses, then you could use them directly in the spoken-to-signed-translation library

@nqkhanh2002
Copy link
Author

Yes @AmitMY , I was able to create a .pose file and extract it into gif according to this notebook https://colab.research.google.com/drive/1UtBmfBIhUa2EdLMnWJr0hxAOZelQ50_9?usp=sharing (image attached) but I need more language support Vietnamese into the main pipline as you said before

  • Add support for this dictionary in sign-language-processing/datasets (Make a PR)

Which I have seen in the definitions of other datasets have data additions such as openpose and holistic and .poseheader files (like autsl dataset )
Actually, after many issues I'm still feeling confused about what needs to be done. Thank you very much if you can give specific instructions. Thank you very much
image

@AmitMY
Copy link
Contributor

AmitMY commented Jun 27, 2024

You have two paths:

Managed by sign.mt

If you were to make a dataloader, with all videos (or links to videos) and words in Vietnamese, in a PR here in this repository, I would download and load this data into sign.mt, where poses would be extracted, etc.

Managed by yourself

If you don't want to make a dataloader, or you want to run the translation service yourself, you need to create a lexicon in the https://github.com/sign-language-processing/spoken-to-signed-translation project (for example, https://github.com/sign-language-processing/spoken-to-signed-translation/tree/main/assets/dummy_lexicon)

There, your CSV file will define all paths and words. Your directory will include all pose files you extract yourself.
Then, you could run the commands in that repository and it will generate sentences for you.

@nqkhanh2002
Copy link
Author

Hi @AmitMY
I have extracted all the necessary poses from the video and now I am editing the file download_lexicon.py. But I see that in the loaded code, there is an existing sign_suisse word like _POSE_HEADERS from tfds. How can I have this? Thank you very much

@AmitMY
Copy link
Contributor

AmitMY commented Jun 29, 2024

If you are going by "Managed by yourself", you don't need to modify download_lexicon.
you just need to create a lexicon (csv file + folder) the same way the dummy lexicon is set up.
you can use download_lexicon as an example

@nqkhanh2002
Copy link
Author

nqkhanh2002 commented Jun 29, 2024

Thank you @AmitMY
I have seen but I see that the process functions to automatically create the index.csv file are already in the download_lexicon.py file so I am trying to modify it and I have a problem that with the process using text_to_gloss the error is Language vi is not supported
en I found on IANA_TAGS

@AmitMY
Copy link
Contributor

AmitMY commented Jun 29, 2024

possibly because it is not supported using https://github.com/adbar/simplemma
spacy does support vietnamese, so use the spacy lematizer.
i close this PR.
if you have issues with that repository, open them there.

@AmitMY AmitMY closed this Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants