Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where can I get files named 'train_1000' and 'test_1000'? #5

Closed
yeahQing opened this issue Jul 26, 2021 · 10 comments
Closed

Where can I get files named 'train_1000' and 'test_1000'? #5

yeahQing opened this issue Jul 26, 2021 · 10 comments

Comments

@yeahQing
Copy link

I don`t understand why I should use split().

for dataset_root in config['train_dataset'].split(',')
@yeahQing yeahQing reopened this Jul 26, 2021
@yeahQing
Copy link
Author

How can I create a lmdb dataset for Chinese character?

@JingyeChen
Copy link
Member

The link of the dataset is shown in http://www.nlpr.ia.ac.cn/databases/handwriting/Home.html

@yeahQing
Copy link
Author

The link of the dataset is shown in http://www.nlpr.ia.ac.cn/databases/handwriting/Home.html

Hi, Chen, thanks for your reply. I have downloaded the dataset, but I don’t understand why a string is looped here. The code is:

def get_data_package():
    train_dataset = []
    # 'train_dataset': './data/mydata/train_1000' why loop this path?
    for dataset_root in config['train_dataset'].split(','):
        _, dataset = get_dataloader(dataset_root, shuffle=True)
        train_dataset.append(dataset)

@yeahQing
Copy link
Author

What type of data set should I replace the path './data/mydata/train_1000'?

@JingyeChen
Copy link
Member

The link of the dataset is shown in http://www.nlpr.ia.ac.cn/databases/handwriting/Home.html

Hi, Chen, thanks for your reply. I have downloaded the dataset, but I don’t understand why a string is looped here. The code is:

def get_data_package():
    train_dataset = []
    # 'train_dataset': './data/mydata/train_1000' why loop this path?
    for dataset_root in config['train_dataset'].split(','):
        _, dataset = get_dataloader(dataset_root, shuffle=True)
        train_dataset.append(dataset)

A loop is used to concatenate multiple datasets. For example, the dataset can be formulated in this way:

'train_dataset': './data/mydata/train_1000,./data/mydata/train_1500,./data/mydata/train_2000'

@JingyeChen
Copy link
Member

What type of data set should I replace the path './data/mydata/train_1000'?

The format should be lmdb

@yeahQing
Copy link
Author

Thank you very much, it has helped me a lot!

@cptbtptp125
Copy link

Hello, have you successfully converted LMDB format? I want to know how to convert, I have tried many methods without success

@yeahQing
Copy link
Author

yeahQing commented Sep 8, 2022

Hello, have you successfully converted LMDB format? I want to know how to convert, I have tried many methods without success

Hi, you can see in #57.

@cptbtptp125
Copy link

Hello, I am a little confused about the loop connection of multiple data sets, may I ask why this operation is carried out, and what is the difference between it and the direct single training? Thank you very much for your reply. I would appreciate it if you could help me.
'train_dataset': './data/mydata/train_1000,./data/mydata/train_1500,./data/mydata/train_2000'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants