You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How u get Ratios of various data sources in the pre-training data for existing LLMs in Fig2?
As for me, the data in the Fig2 differs from the paper I read.
For example, GPT-3 paper (https://arxiv.org/abs/2005.14165) did not mention conversation or code data. But in Fig2 GPT-3 used conversation and code data as pretrain data.
And for PaLM, the Proportion of data in Table 2(https://arxiv.org/pdf/2204.02311.pdf) was also different from your ratios.
The text was updated successfully, but these errors were encountered:
How u get Ratios of various data sources in the pre-training data for existing LLMs in Fig2?
As for me, the data in the Fig2 differs from the paper I read.
For example, GPT-3 paper (https://arxiv.org/abs/2005.14165) did not mention conversation or code data. But in Fig2 GPT-3 used conversation and code data as pretrain data.
And for PaLM, the Proportion of data in Table 2(https://arxiv.org/pdf/2204.02311.pdf) was also different from your ratios.
The text was updated successfully, but these errors were encountered: