Experiment: Audio model - Unimodal data mix

## Description

The first audio experiment  #1699 (600M model, 500B tokens) yields a reasonable audio model, but it lacks semantic knowledge. We're experimenting with adding unimodal data (e.g., text-only DCLM) into the pre-training data mix. We're going sweep the percentage of text-only data from 0%, 10%, 20%, 50% at a smaller scale (e.g., 150M model, 100B tokens).

cc. @Helw150 

## Hypothesis or Goal

Find an optimal ratio between (speech, text) and text-only data for pre-training, and understand the impact on the amount of text-only data to S->S, T->T, S->T, T->S performance.


### Links

(Delete any that aren't applicable.)

* WandB Report:  (link)
* Data Browser: (link)
* (etc.)


## Results

(What did you find, including relevant evaluation metrics, etc.)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experiment: Audio model - Unimodal data mix #1978

Description

Hypothesis or Goal

Links

Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Experiment: Audio model - Unimodal data mix #1978

Description

Description

Hypothesis or Goal

Links

Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions