Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add tutorials of fine-tune on a custom dataset #711

Merged
merged 3 commits into from
Sep 26, 2023
Merged

docs: add tutorials of fine-tune on a custom dataset #711

merged 3 commits into from
Sep 26, 2023

Conversation

XixinYang
Copy link
Collaborator

@XixinYang XixinYang commented Jul 25, 2023

Thank you for your contribution to the MindCV repo.
Before submitting this PR, please make sure:

Motivation

Add tutorials of fine-tuning on a custom dataset and other relevant codes.

Test Plan

(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)

Related Issues and PRs

(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

@XixinYang XixinYang linked an issue Jul 26, 2023 that may be closed by this pull request
Copy link
Collaborator

@SamitHuang SamitHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very comprehensive:)


### Read Custom Dataset

For custom datasets, you can either organize the dataset file directory locally into a tree structure similar to ImageNet, and then use the function `create_dataset` to read the dataset (offline way), or directly read all the images into an iterable object, replacing the file splitting and the `create_dataset` steps (online way).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confusing on why they are named as offline/online. It should more clear to directly use the name: ImageFolderDataset, GeneratorDataset

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the structure of downstream datasets can be diverse, to be able to handle this diversity, the following two approaches are common practice:

Offline processing: Manually reorganize data into a standard format for loading by existing interfaces. The keyword offline is reflected in the process of manually formatting the dataset.

Online processing: Building new interfaces to be able to load datasets in specific formats. The keyword online is reflected in the fact that the newly constructed interface is parsing the data structure during runtime.

docs/en/how_to_guides/finetune_with_a_custom_dataset.md Outdated Show resolved Hide resolved
docs/en/how_to_guides/finetune_with_a_custom_dataset.md Outdated Show resolved Hide resolved
docs/en/how_to_guides/finetune_with_a_custom_dataset.md Outdated Show resolved Hide resolved
docs/en/how_to_guides/finetune_with_a_custom_dataset.md Outdated Show resolved Hide resolved
docs/en/how_to_guides/finetune_with_a_custom_dataset.md Outdated Show resolved Hide resolved
for param in network.trainable_params():
if param.name not in classifier_names:
param.requires_grad = False
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be better to wrap these code of freezing backbone into a function, allowing easy integration for other datasets/tasks.

def freeze_backbone(net, cfg):
   ...
   return net

docs/en/how_to_guides/finetune_with_a_custom_dataset.md Outdated Show resolved Hide resolved
@XixinYang XixinYang merged commit bf0838d into mindspore-lab:main Sep 26, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

新增多场景数据集泛化支持
3 participants