Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds CodeClippy dataset [WIP] #2666

Closed
wants to merge 4 commits into from

Conversation

arampacha
Copy link

CodeClippy is an opensource code dataset scrapped from github during flax-jax-community-week
https://the-eye.eu/public/AI/training_data/code_clippy_data/

@albertvillanova albertvillanova added the dataset contribution Contribution to a dataset script label Sep 23, 2022
@albertvillanova
Copy link
Member

Thanks for your contribution, @arampacha. Are you still interested in adding this dataset?

We are removing the dataset scripts from this GitHub repo and moving them to the Hugging Face Hub: https://huggingface.co/datasets

We would suggest you create this dataset there. Please, feel free to tell us if you need some help.

@oaguy1
Copy link

oaguy1 commented Jul 26, 2023

Sorry to resurrect a dead issue, but any chance the dataset will make it to HuggingFace? I would love to use it to finetune Llama 2 and HF makes this a breeze. Also happy to submit a PR prepping it for HF if that is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset contribution Contribution to a dataset script
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants