-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for csv files #592
Conversation
6704251
to
9f49717
Compare
9f49717
to
31199ff
Compare
d34e894
to
909247b
Compare
909247b
to
623a446
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments
Co-authored-by: George Mastrapas <32414777+gmastrapas@users.noreply.github.com> Co-authored-by: Michael Günther <guenthermi50@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also add an integration test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Some minor comments
Co-authored-by: George Mastrapas <32414777+gmastrapas@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
dcb8084
to
17d199b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only added some minor comments
|
||
At the model saving time, you will discover, we are saving two models to your local directory. | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add a note that csv data is before the finetuning loaded into memory (a DocumentArray object) and thereby locally stored images are also loaded into memory
finetuner/data.py
Outdated
def load_finetune_data_from_csv( | ||
file: Union[str, TextIO], | ||
task: str = 'text-to-text', | ||
options: CSVOptions = CSVOptions(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is not recommended to call a constructor in a function declaration, since it is then only called once and every function while use the exact same instance.
So it is better to do:
def load_finetune_data_from_csv(
[...]
options=None,
[...]
):
options = options or CSVOptions()
17d199b
to
8cb3f77
Compare
📝 Docs are deployed on https://ft-feat-support-csv--jina-docs.netlify.app 🎉 |
This PR adds support for csv files to the
finetuner.fit
function.Both
train_data
andeval_data
can now be supplied as paths to a csv file or an in memoryTextIO
stream. The format of the provided csv can be any that are listed by thecsv.list_dialects
function. Currently, each row of the csv file can contain:Each row must of the same format, and to indicate that the second column represents labels and not text, provide a dictionary with
is_labeled = True
as thecsv_options
argument.