Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT-4 Instruction dataset #31

Open
KnutJaegersberg opened this issue Apr 2, 2023 · 9 comments
Open

GPT-4 Instruction dataset #31

KnutJaegersberg opened this issue Apr 2, 2023 · 9 comments

Comments

@KnutJaegersberg
Copy link

Take a look:

https://github.com/teknium1/GPTeacher

@PhoebusSi
Copy link
Owner

We will soon collect them and thank you for your support.

@KnutJaegersberg
Copy link
Author

This one is a mixture of other datasets, but It should contain a few new records. It now landed on huggingface.

https://huggingface.co/datasets/swype/instruct

@PhoebusSi
Copy link
Owner

Thank you very much for your reminder. We 'll collect it soon.

@PhoebusSi PhoebusSi reopened this Apr 6, 2023
@KnutJaegersberg
Copy link
Author

Here is another one, alpaca but generated gpt-4. Includes Chinese translations :)

https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM#fine-tuning-with-the-data

@KnutJaegersberg
Copy link
Author

Related to your project, because you started out with chain-of-thoughts fine tuning:

Researchers alpaca finetuned Galactica, Galpaca, which seems to have better reasoning in science and technological domains than llama:

https://twitter.com/oijna/status/1637566839235518464

https://huggingface.co/GeorgiaTechResearchInstitute/galpaca-30b

@dkqkxx
Copy link
Collaborator

dkqkxx commented Apr 11, 2023

I'll pay attention to these, thx.

@KnutJaegersberg
Copy link
Author

This is so insanely fast moving, I get confused.

https://github.com/databrickslabs/dolly/tree/master/data

@KnutJaegersberg
Copy link
Author

Author description (not mine):
"CAMEL datasets:PhysicsChemistry and Biology. Each dataset contains 20K problem-solution pairs, consisting of 25 topics, 25 subtopics and 32 problems for each "topic, subtopic" pair generated and solved by GPT4"

https://github.com/lightaime/camel#data-hosted-on-hugging-face

@KnutJaegersberg
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants