Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about data #206

Open
Luoqiu76 opened this issue Mar 21, 2024 · 0 comments
Open

about data #206

Luoqiu76 opened this issue Mar 21, 2024 · 0 comments

Comments

@Luoqiu76
Copy link

May I ask if you can tell me how the sharegpt_clean. json file is changed to openchat_v3.2_super.train.parquet? I noticed that there is a lot of data difference between the two, some of which were truncated due to being too long, but I also noticed that some garbled data is also discarded. But there are still many data in sharegpt_clean where the Model field is not marked as GPT3.5 or GPT4. How does this part of the data determine whether it belongs to GPT3.5 or GPT4, or whether it belongs entirely to GPT3.5?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant