[Errno 21] Is a directory: 'data/mixture/Syn90k/label.lmdb' #1105

MingyuLau · 2022-06-23T12:15:13Z

When I tried to train MASTER on GPUs, it raised the error as below, however, I had orgnaized my data right and the directory "label.lmdb" surely had two files named "data.mdb" and "lock.mdb"

$RCS){OT{P@`2LUI@_Z4~SQ7$

Mountchicken · 2022-06-23T12:24:58Z

Hi @MingyuLau

It seems to be the problem of dataset config. Try replacing the code as below.

mmocr/configs/_base_/recog_datasets/ST_SA_MJ_train.py

Lines 9 to 23 in 1f888c9

    
           train1 = dict( 
        
               type='OCRDataset', 
        
               img_prefix=train_img_prefix1, 
        
               ann_file=train_ann_file1, 
        
               loader=dict( 
        
                   type='AnnFileLoader', 
        
                   repeat=1, 
        
                   file_format='lmdb', 
        
                   parser=dict( 
        
                       type='LineStrParser', 
        
                       keys=['filename', 'text'], 
        
                       keys_idx=[0, 1], 
        
                       separator=' ')), 
        
               pipeline=None, 
        
               test_mode=False)

replace to

train1 = dict(
    type='OCRDataset',
    img_prefix=train_img_prefix1,
    ann_file=train_ann_file1,
    loader=dict(
        type='AnnFileLoader',
        repeat=1,
        file_format='lmdb',
        parser=dict(
            type='LineJsonParser',
            keys=['filename', 'text']),
    pipeline=None,
    test_mode=False)

MingyuLau · 2022-06-23T12:35:45Z

Thank you for dealing the problem for me immediately, and it is effective that the previos error is fixed,but this time a new "pipeline" error is raised.
$J3_U0TLWAZ{ QV 05%VY0%G$

Mountchicken · 2022-06-23T12:44:35Z

Maybe we need to fix more, try

mmocr/configs/_base_/recog_datasets/ST_SA_MJ_train.py

Lines 39 to 41 in 1f888c9

    
           train3['loader']['file_format'] = 'txt' 
        
           train_list = [train1, train2, train3]

replace to

train3['loader']['file_format'] = 'txt'
tran3['loader']['parser'] = dict(
                              type='LineStrParser',
                              keys=['filename', 'text'],
                              keys_idx=[0, 1],
                              separator=' ')
train_list = [train1, train2, train3]

MingyuLau · 2022-06-23T12:49:04Z

I replace it but it raise the same error,I will check my code again

MingyuLau · 2022-06-24T01:46:08Z

@Mountchicken Sorry for bothering, I found a typo in your code yesterday, there is a missing ")" for the "loader=dict(......)", after I fix the typo, the console raise the same error "is a directory" as before

Mountchicken · 2022-06-24T02:15:50Z

@MingyuLau
Can you show me you dataset configs? This problem is due to the incorrect use of LineStrParser when loading a dataset in lmdb format, which in fact LineJsonParser should be used.

MingyuLau · 2022-06-24T07:26:23Z

@Mountchicken

# Text Recognition Training set, including:
# Synthetic Datasets: SynthText, Syn90k

train_root = 'data/mixture'

train_img_prefix1 = f'{train_root}/Syn90k/mnt/ramdisk/max/90kDICT32px'
train_ann_file1 = f'{train_root}/Syn90k/label.lmdb'

train1 = dict(
    type='OCRDataset',
    img_prefix=train_img_prefix1,
    ann_file=train_ann_file1,
    loader=dict(
        type='AnnFileLoader',
        repeat=1,
        file_format='lmdb',
        parser=dict(
            type='LineJsonParser',
            keys=['filename', 'text'])),
    pipeline=None,
    test_mode=False)

train_img_prefix2 = f'{train_root}/SynthText/' + \
    'synthtext/SynthText_patch_horizontal'
train_ann_file2 = f'{train_root}/SynthText/label.lmdb'

train_img_prefix3 = f'{train_root}/SynthText_Add'
train_ann_file3 = f'{train_root}/SynthText_Add/label.txt'

train2 = {key: value for key, value in train1.items()}
train2['img_prefix'] = train_img_prefix2
train2['ann_file'] = train_ann_file2

train3 = {key: value for key, value in train1.items()}
train3['img_prefix'] = train_img_prefix3
train3['ann_file'] = train_ann_file3
train3['loader']['file_format'] = 'txt'

train_list = [train1, train2, train3]

Mountchicken · 2022-06-24T08:12:20Z

@MingyuLau
Sorry for the misunderstanding. We have merged a new PR to fix this problem. The correct config should be

train_root = 'data/mixture'

train_img_prefix1 = f'{train_root}/Syn90k/mnt/ramdisk/max/90kDICT32px'
train_ann_file1 = f'{train_root}/Syn90k/label.lmdb'

train1 = dict(
    type='OCRDataset',
    img_prefix=train_img_prefix1,
    ann_file=train_ann_file1,
    loader=dict(
        type='AnnFileLoader',
        repeat=1,
        file_format='lmdb',
        parser=dict(type='LineJsonParser', keys=['filename', 'text'])),
    pipeline=None,
    test_mode=False)

train_img_prefix2 = f'{train_root}/SynthText/' + \
    'synthtext/SynthText_patch_horizontal'
train_ann_file2 = f'{train_root}/SynthText/label.lmdb'

train_img_prefix3 = f'{train_root}/SynthText_Add'
train_ann_file3 = f'{train_root}/SynthText_Add/label.txt'

train2 = {key: value for key, value in train1.items()}
train2['img_prefix'] = train_img_prefix2
train2['ann_file'] = train_ann_file2

train3 = {key: value for key, value in train1.items()}
train3['img_prefix'] = train_img_prefix3
train3['ann_file'] = train_ann_file3
train3['loader']['file_format'] = 'txt'
train3['loader']['parser'] = dict(
    type='LineStrParser',
    keys=['filename', 'text'],
    keys_idx=[0, 1],
    separator=' ')

train_list = [train1, train2, train3]

Mountchicken · 2022-06-24T08:13:40Z

SA is in txt format and needs to use LineStrParser. Parser is always confusing and we are working on this, and later on we will provide a more specific document.

MingyuLau · 2022-06-24T08:53:12Z

@Mountchicken
But it still give me the same error, did I do something wrong when I convert the "label.txt" to "label.lmdb" ？I choose "the label-only" option using "lmdb_converter.py"

MingyuLau · 2022-06-24T09:00:03Z

And this is my structure of dataset, orgnised as the official document
$_6Q({ENLX45{B0 91BNJ3W$

Mountchicken · 2022-06-24T09:25:48Z

@MingyuLau
It's not the problem of lmdb_converter.py. The problem still lies in the use of LineStrParser for lmdb annotations, but the modified config has solved this problem. My guess is that there is a problem with your version of MMOCR or there is something missing during the installation process. Please try to install the newest version of MMOCR. Use pip install -v -e . after you clone it.

MingyuLau · 2022-06-24T11:13:00Z

@Mountchicken
I found it strange that I had fix my dataset config file, but it seems that it didn't work because the output in console shows that the config is still "str" not "json"

Mountchicken · 2022-06-24T12:55:07Z

Try pip install -v -e .

MingyuLau · 2022-06-24T12:57:40Z

It makes no difference

MingyuLau · 2022-06-24T13:22:20Z

@Mountchicken
why I change the config file but the output shows that nothing had been changed?

Mountchicken · 2022-06-24T14:30:55Z

@MingyuLau
I am confused too. Is it possible that you specified two dataset configs at the same time, and the correct one is replaced by the wrong one.
E.g.

_base_ = [
    '../../_base_/recog_datasets/ST_SA_MJ_train.py',
    '../../_base_/recog_datasets/ST_MJ_train.py',
]

Are you training with master_r31_12e_ST_MJ_SA.py ?

MingyuLau · 2022-06-24T15:18:41Z

@Mountchicken
Yes，I am training with master_r31_12e_ST_MJ_SA.py And this is my training file:

_base_ = [
    '../../_base_/default_runtime.py', '../../_base_/recog_models/master.py',
    '../../_base_/schedules/schedule_adam_step_12e.py',
    '../../_base_/recog_pipelines/master_pipeline.py',
    '../../_base_/recog_datasets/ST_SA_MJ_train.py',
    '../../_base_/recog_datasets/academic_test.py'
]

train_list = {{_base_.train_list}}
test_list = {{_base_.test_list}}

train_pipeline = {{_base_.train_pipeline}}
test_pipeline = {{_base_.test_pipeline}}

data = dict(
    samples_per_gpu=8,
    workers_per_gpu=4,
    val_dataloader=dict(samples_per_gpu=8),
    test_dataloader=dict(samples_per_gpu=8),
    train=dict(
        type='UniformConcatDataset',
        datasets=train_list,
        pipeline=train_pipeline),
    val=dict(
        type='UniformConcatDataset',
        datasets=test_list,
        pipeline=test_pipeline),
    test=dict(
        type='UniformConcatDataset',
        datasets=test_list,
        pipeline=test_pipeline))

evaluation = dict(interval=1, metric='acc')

MingyuLau · 2022-06-27T01:21:46Z

@Mountchicken
I know where the error lies in ，I debug and find in the ST_SA_MJ_train.py ,the parameters of train3 cover the parameters of train1
Can you help me to fix this error?

Mountchicken · 2022-06-27T02:35:40Z

@MingyuLau
Thanks for pointing that out. That's a really well hidden bug. Using the following config can fix the problem.

# Text Recognition Training set, including:
# Synthetic Datasets: SynthText, Syn90k

train_root = 'data/mixture'

train_img_prefix1 = f'{train_root}/Syn90k/mnt/ramdisk/max/90kDICT32px'
train_ann_file1 = f'{train_root}/Syn90k/label.lmdb'

train1 = dict(
    type='OCRDataset',
    img_prefix=train_img_prefix1,
    ann_file=train_ann_file1,
    loader=dict(
        type='AnnFileLoader',
        repeat=1,
        file_format='lmdb',
        parser=dict(type='LineJsonParser', keys=['filename', 'text'])),
    pipeline=None,
    test_mode=False)

train_img_prefix2 = f'{train_root}/SynthText/' + \
    'synthtext/SynthText_patch_horizontal'
train_ann_file2 = f'{train_root}/SynthText/label.lmdb'

train_img_prefix3 = f'{train_root}/SynthText_Add'
train_ann_file3 = f'{train_root}/SynthText_Add/label.txt'

train2 = {key: value for key, value in train1.items()}
train2['img_prefix'] = train_img_prefix2
train2['ann_file'] = train_ann_file2

train3 = dict(
    type='OCRDataset',
    img_prefix=train_img_prefix3,
    ann_file=train_ann_file3,
    loader=dict(
        type='AnnFileLoader',
        repeat=1,
        file_format='txt',
        parser=dict(
            type='LineStrParser',
            keys=['filename', 'text'],
            keys_idx=[0, 1],
            separator=' ')),
    pipeline=None,
    test_mode=False)

train_list = [train1, train2, train3]

Mountchicken · 2022-06-27T02:37:24Z

BTW, it will be appreciate if you can also raise a PR to help us fix it.

MingyuLau · 2022-06-27T06:13:42Z

It's my pleasure, and I'm sincerely grateful for all your help in this problem!

mm-assistant bot assigned gaotongxiao Jun 23, 2022

Mountchicken closed this as completed Jun 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Errno 21] Is a directory: 'data/mixture/Syn90k/label.lmdb' #1105

[Errno 21] Is a directory: 'data/mixture/Syn90k/label.lmdb' #1105

MingyuLau commented Jun 23, 2022

Mountchicken commented Jun 23, 2022

MingyuLau commented Jun 23, 2022

Mountchicken commented Jun 23, 2022

MingyuLau commented Jun 23, 2022

MingyuLau commented Jun 24, 2022

Mountchicken commented Jun 24, 2022

MingyuLau commented Jun 24, 2022

Mountchicken commented Jun 24, 2022

Mountchicken commented Jun 24, 2022 •

edited

MingyuLau commented Jun 24, 2022

MingyuLau commented Jun 24, 2022

Mountchicken commented Jun 24, 2022

MingyuLau commented Jun 24, 2022

Mountchicken commented Jun 24, 2022

MingyuLau commented Jun 24, 2022

MingyuLau commented Jun 24, 2022

Mountchicken commented Jun 24, 2022

MingyuLau commented Jun 24, 2022 •

edited

MingyuLau commented Jun 27, 2022

Mountchicken commented Jun 27, 2022

Mountchicken commented Jun 27, 2022

MingyuLau commented Jun 27, 2022

[Errno 21] Is a directory: 'data/mixture/Syn90k/label.lmdb' #1105

[Errno 21] Is a directory: 'data/mixture/Syn90k/label.lmdb' #1105

Comments

MingyuLau commented Jun 23, 2022

Mountchicken commented Jun 23, 2022

MingyuLau commented Jun 23, 2022

Mountchicken commented Jun 23, 2022

MingyuLau commented Jun 23, 2022

MingyuLau commented Jun 24, 2022

Mountchicken commented Jun 24, 2022

MingyuLau commented Jun 24, 2022

Mountchicken commented Jun 24, 2022

Mountchicken commented Jun 24, 2022 • edited

MingyuLau commented Jun 24, 2022

MingyuLau commented Jun 24, 2022

Mountchicken commented Jun 24, 2022

MingyuLau commented Jun 24, 2022

Mountchicken commented Jun 24, 2022

MingyuLau commented Jun 24, 2022

MingyuLau commented Jun 24, 2022

Mountchicken commented Jun 24, 2022

MingyuLau commented Jun 24, 2022 • edited

MingyuLau commented Jun 27, 2022

Mountchicken commented Jun 27, 2022

Mountchicken commented Jun 27, 2022

MingyuLau commented Jun 27, 2022

Mountchicken commented Jun 24, 2022 •

edited

MingyuLau commented Jun 24, 2022 •

edited