Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up dataset conversion readme #168

Merged
merged 9 commits into from
May 22, 2023

Conversation

codestar12
Copy link
Contributor

Cleans up existing readme and adds finetuning dataset example

scripts/data_prep/README.md Outdated Show resolved Hide resolved
scripts/data_prep/README.md Outdated Show resolved Hide resolved
scripts/data_prep/README.md Outdated Show resolved Hide resolved
scripts/data_prep/README.md Outdated Show resolved Hide resolved
scripts/data_prep/README.md Outdated Show resolved Hide resolved
codestar12 and others added 2 commits May 18, 2023 19:04
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
Copy link
Contributor

@alextrott16 alextrott16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The finetuning stuff looks good to me, so I'm approving that part. But, as mentioned in a comment. I have caught some QOL stuff with the preprocessing function that I think needs to be addressed in a separate PR. That will require me to edit the finetuning section, but what you added here is a very useful starting point for that.

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a link in the top level description to streaming?

@codestar12 codestar12 merged commit d691eb3 into mosaicml:main May 22, 2023
6 checks passed
bmosaicml pushed a commit that referenced this pull request Jun 6, 2023
bmosaicml pushed a commit that referenced this pull request Jun 8, 2023
* clean up dataset conversion readme

* Update scripts/data_prep/README.md

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

* Update scripts/data_prep/README.md

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

* addresses feedback on PR

* add links to relevant proprocessing functions

* add link to streaming

---------

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants