Skip to content

Add ClassificationDataset API #125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jun 14, 2023
Merged

Add ClassificationDataset API #125

merged 19 commits into from
Jun 14, 2023

Conversation

capjamesg
Copy link
Collaborator

Description

This PR introduces a new ClassificationDataset API for use in working with classification datasets using supervision. Users can:

  • Create a ClassificationDataset object;
  • Save data in the multiclass classification folder, and;
  • Load data from a dataset structured as a multiclass classification folder.

Type of change

  • New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

I will add a test case once we have discussed the API.

Docs

I will add the requisite docs once we have discussed the API.

@capjamesg capjamesg requested a review from SkalskiP June 7, 2023 15:27
@capjamesg capjamesg self-assigned this Jun 7, 2023
@SkalskiP
Copy link
Collaborator

SkalskiP commented Jun 9, 2023

A summary:

  • Classifications class should not be in the datasets package.
  • Splitting dataset is not a responsibility of as_multiclass_folder_structure.
  • Please do not use+ "/" + to build file paths.
  • Classification API should be consistent with detections API. When users want to load train, test, and validation subsets, they should create three separate instances of ClassificationDataset. You can find more info in one of the code comments.
  • Running as_multiclass_folder_structure and from_multiclass_folder_structure should result in precisely the same dataset.
  • Make sure to run isort --profile black supervision/ and black supervision before commit.

@capjamesg
Copy link
Collaborator Author

@SkalskiP I have responded to a lot of your feedback in my latest commit. Please review. I am still confused about this comment:

Splitting dataset is not a responsibility of as_multiclass_folder_structure.

Can you outline in pseudocode the expected way this function would work?

@SkalskiP
Copy link
Collaborator

SkalskiP commented Jun 13, 2023

@capjamesg test, train, and valid sets will be represented by separate ClassificationDataset objects. And that's why there is no reason for you to split the dataset into parts inside as_multiclass_folder_structure. It will be already split by logic operating on a higher level. as_multiclass_folder_structure is here to save the dataset on the hard drive and do nothing more than that. Here is an example of the usage of the Classification API

ds = ClassificationDataset(...)
ds_train, ds_test = ds.split(split_ratio=0.8)
ds_test, ds_val = ds_test.split(split_ratio=0.5)

ds_train.as_multiclass_folder_structure(...)
ds_test.as_multiclass_folder_structure(...)
ds_val.as_multiclass_folder_structure(...)

Notice that split call is not inside of as_multiclass_folder_structure, but outside.

@capjamesg capjamesg added the enhancement New feature or request label Jun 13, 2023
@capjamesg
Copy link
Collaborator Author

My latest commit incorporates all your feedback. I left a comment in response to one of your comments re: the class_id [0] issue.

@capjamesg
Copy link
Collaborator Author

My latest commits add:

  1. Documentation for the new methods;
  2. Unit tests for get_top_k;
  3. Linting for all changes, and;
  4. A solution for the if self.annotations[image].class_id[0] == self.classes.index( issue.

@capjamesg
Copy link
Collaborator Author

I have incorporated all of your feedback into my latest commit. We now have five test cases for the get_top_k() function.

@SkalskiP SkalskiP merged commit de3ccc5 into main Jun 14, 2023
@SkalskiP SkalskiP mentioned this pull request Jun 14, 2023
1 task
@SkalskiP SkalskiP added the version: 0.10.0 Feature to be added in `0.10.0` release label Jun 14, 2023
@capjamesg capjamesg deleted the add-classification branch July 5, 2023 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request version: 0.10.0 Feature to be added in `0.10.0` release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants