Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add score document support in csv #696

Merged
merged 17 commits into from
Mar 21, 2023
Merged

feat: add score document support in csv #696

merged 17 commits into from
Mar 21, 2023

Conversation

bwanglzu
Copy link
Member

@bwanglzu bwanglzu commented Mar 19, 2023

This PR allows user create a CSV file contains three columns, col1 and col2 are content, and col3 indicates the similarity between col1 and col2. Besides, I refactored the build_finetuning_dataset function.


  • This PR references an open issue
  • I have added a line about this change to CHANGELOG

@github-actions github-actions bot added size/m and removed size/s labels Mar 19, 2023
@github-actions github-actions bot added the area/testing This issue/PR affects testing label Mar 19, 2023
@bwanglzu bwanglzu self-assigned this Mar 20, 2023
@bwanglzu bwanglzu marked this pull request as ready for review March 20, 2023 10:16
Copy link
Contributor

@LMMilliken LMMilliken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think the refactoring makes this much easier to read!
I think CSVHandler is a bit vague though, maybe CSVParser or CSVReader?

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
finetuner/data.py Outdated Show resolved Hide resolved
finetuner/data.py Outdated Show resolved Hide resolved
finetuner/data.py Outdated Show resolved Hide resolved
finetuner/data.py Outdated Show resolved Hide resolved
Copy link
Member

@guenthermi guenthermi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, if we really introduce the CSVContext class, we also need to update the documentation.

finetuner/data.py Outdated Show resolved Hide resolved
finetuner/data.py Outdated Show resolved Hide resolved
finetuner/data.py Show resolved Hide resolved
@guenthermi
Copy link
Member

Added some comments, if we really introduce the CSVContext class, we also need to update the documentation.

Ok probably, there is nothing in the documentation about building dataset from CSV, because we only apply it automatically if one provides a file path in the csv.

@bwanglzu
Copy link
Member Author

documentation will be added in a seperate PR: #688 (comment)

CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: George Mastrapas <32414777+gmastrapas@users.noreply.github.com>
Copy link
Member

@guenthermi guenthermi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -70,7 +70,7 @@ def list_experiments(self, page: int = 1, size: int = 50) -> Dict[str, Any]:
..note:: The maximum number for `size` per page is 100.
"""
params = {'page': page, 'size': size}
url = self._construct_url(self._base_url, API_VERSION, EXPERIMENTS)
url = self._construct_url(self._base_url, API_VERSION, EXPERIMENTS) + '/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be don in the construct_url function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

later we will investigate why this is happening

@bwanglzu bwanglzu merged commit a24d95e into main Mar 21, 2023
@bwanglzu bwanglzu deleted the feat-score-csv branch March 21, 2023 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants