You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Llama 3 is not reproducible in any meaningful capacity without a list of the dataset sources.
Please release a list of the sources.
The text was updated successfully, but these errors were encountered:
bennmann
changed the title
List the "publically available sources" 15T dataset list from Llama 3
List the "publicly available sources" 15T dataset list from Llama 3
Apr 18, 2024
related question: why train only on publicly available data from the internet? if you want quality language and good knowledge, wouldn't you want to train on things like textbooks, historical documents, scientific research papers, and the like? things that you could get in a library? i'm talking like classic fundamental knowledge. training on classical philosophy would probably improve reasoning skills. and training on the OG programming textbooks would be very good for programming.
Llama 3 is not reproducible in any meaningful capacity without a list of the dataset sources.
Please release a list of the sources.
The text was updated successfully, but these errors were encountered: