KeyError: "['file_path_mask_eroded_3'] not in index" #119

XYAskWhy · 2018-05-29T09:01:00Z

When running local pure python with python main.py -- train_evaluate_predict --pipeline_name unet --chunk_size 5000 , the following error occurs, any help?

neptune: Executing in Offline Mode.
neptune: Executing in Offline Mode.
2018-05-29 16-16-52 mapping-challenge >>> training
neptune: Executing in Offline Mode.
2018-05-29 16-16-55 steps >>> step xy_train adapting inputs
2018-05-29 16-16-55 steps >>> step xy_train fitting and transforming...
Traceback (most recent call last):
File "main.py", line 282, in
action()
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "main.py", line 79, in train
_train(pipeline_name, dev_mode)
File "main.py", line 106, in _train
pipeline.fit_transform(data)
File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 103, in fit_transform
step_inputs[input_step.name] = input_step.fit_transform(data)
File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 103, in fit_transform
step_inputs[input_step.name] = input_step.fit_transform(data)
File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 103, in fit_transform
step_inputs[input_step.name] = input_step.fit_transform(data)
[Previous line repeated 5 more times]
File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 109, in fit_transform
step_output_data = self._cached_fit_transform(step_inputs)
File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 120, in _cached_fit_transform
step_output_data = self.transformer.fit_transform(**step_inputs)
File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 253, in fit_transform
return self.transform(*args, **kwargs)
File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/preprocessing/misc.py", line 17, in transform
y = meta[self.y_columns].values
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/pandas/core/frame.py", line 2133, in getitem
return self._getitem_array(key)
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/pandas/core/frame.py", line 2177, in _getitem_array
indexer = self.loc._convert_to_indexer(key, axis=1)
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1269, in _convert_to_indexer
.format(mask=objarr[mask]))
KeyError: "['file_path_mask_eroded_3'] not in index"

The text was updated successfully, but these errors were encountered:

jakubczakon · 2018-05-29T10:02:17Z

@XYAskWhy Hi, did you generate metadata?
You can do that by running python main.py -- prepare_metadata and you also need to prepare masks by going python main.py -- prepare_masks

If you have already done that then open your metadata csv and check which columns are available. Remember that you can choose how to generate your target masks so your csv may contain different columns. You can choose which column should be used as target masks in pipeline_config.py:

Y_COLUMNS = ['file_path_mask_eroded_0_dilated_0']

dslwz2008 · 2018-05-30T01:17:41Z

I also encountered this problem. l will have a try. Thanks!

XYAskWhy · 2018-05-30T01:59:18Z

Thanks @jakubczakon , I had done prepare_metadata and prepare_masks then, and the problem is we must prepare masks first.

dslwz2008 · 2018-05-30T03:52:32Z

@XYAskWhy After I executed python main.py -- prepare_masks, this error still exists. What columns are in your stage1_metadata.csv ? There are only the following columns in my file： ImageId, file_path_image, is_train, is_valid, is_test, n_buildings. Is there anything wrong? What else do I need to do?

XYAskWhy · 2018-05-30T05:13:46Z

@dslwz2008 If you prepare metadata first, you need to redo it after you prepare mask. Then the newly generated csv file will include a extra column like 'file_path_mask_eroded_0_dilated_0'.

jakubczakon · 2018-05-30T06:27:33Z

@XYAskWhy @dslwz2008 I will fix the readme today but yes as @XYAskWhy when metadata is created it looks for the folders with target masks and creates the columns based on that information.
It may seem over the top at first glance but creating target masks for this problem is very far from trivial.
The following ideas are all viable options:

overlay target masks
erode masks first and overlay
erode large masks but dilate small masks and overlay (to increase signal for the small objects)
drop border masks that are very thin (<2 pixels) and then overlay to decrease false signals from mislabeled edge objects

I hope this helps!

dslwz2008 · 2018-05-30T06:56:03Z

I re-executed commands
python main.py -- prepare_masks and
neptune experiment run main.py -- prepare_metadata \ --train_data \ --valid_data \ --test_data in order. However, there is still no file_path_mask_eroded_0_dilated_0 column in my file stage1_metadata.csv. I am using the master branch. What else do I need to do? @XYAskWhy @jakubczakon

jakubczakon · 2018-05-30T07:03:14Z

@dslwz2008 what are your paths in the neptune.yaml ?

  data_dir:                   /path/to/data
  meta_dir:                   /path/to/data
  masks_overlayed_dir:        /path/to/masks_overlayed
  masks_overlayed_eroded_dir: /path/to/masks_overlayed_eroded
  experiment_dir:             /path/to/work/dir

Can you confirm that your masks did generate? The mask overlayed folder should be around 100G

dslwz2008 · 2018-05-30T07:10:27Z

This is my neptune.yaml:

data_dir: /home/shenshen/Programs/mc_data
meta_dir: /home/shenshen/Programs/mc_data
masks_overlayed_dir: /home/shenshen/Programs/mc_dat_eroded_2_dilated_3
masks_overlayed_eroded_dir: /home/shenshen/Programs/mc_dat_eroded_2_dilated_3
experiment_dir: /home/shenshen/Programs/open-solution-mapping-challenge

I am not sure if the masks_overlayed_dir setting is correct. @jakubczakon

jakubczakon · 2018-05-30T07:23:50Z

Ok I see. You just need to have something like:

masks_overlayed_dir: /home/shenshen/Programs/masks_overlayed/

and it will create this particular setting with eroded2_dilated_3 automatically.
Below is the piece of the code that deals with this part:

            images_path_to_write = images_path
            masks_overlayed_dir_ = masks_overlayed_dir[:-1]
            masks_dir_prefix = os.path.split(masks_overlayed_dir_)[1]
            masks_overlayed_sufix_to_write = []
            for masks_dir in os.listdir(meta_dir):
                masks_dir_name = os.path.split(masks_dir)[1]
                if masks_dir_name.startswith(masks_dir_prefix):
                    masks_overlayed_sufix_to_write.append(masks_dir_name[len(masks_dir_prefix):])

So you need to define your path ending with / . I will submit an issue to clean that up right away but I am not sure if I will have time to change that today as I want to do some last minute postprocessing of the newest models and start generate final submission.

jakubczakon · 2018-05-30T07:25:47Z

@dslwz2008 @XYAskWhy by the way I updated the readme.
The most important part is that best training results were achieved when training with distance and size weighted loss so the pipeline that needs to be chosed is unet_weighted instead of the unet. Also when running predictions using replication padding+test time augmentation gave us significant improvements. The pipeline to run it is called unet_padded_tta

jakubczakon · 2018-05-30T07:28:48Z

@dslwz2008 also I would change the

experiment_dir: /home/shenshen/Programs/open-solution-mapping-challenge

to something particular to this experiment. All the models will be saved in that directory so I am not sure if you want to have it as generic as open-solution-mapping-challenge.
I usually have something like this:

experiment_dir: ...mapping-challenge/experiments/resnet34_crop256_erode2_dilate_3

or something like that.

dslwz2008 · 2018-05-30T07:31:19Z

OK. Thanks!
After python main.py -- prepare_masks， folder mc_dat_eroded_2_dilated_3 was generated. According to my statistics, it takes up 123.1GB of space.
When was this folder generated?
masks_overlayed_dir: /home/shenshen/Programs/masks_overlayed/

jakubczakon · 2018-05-30T07:37:48Z

Well you misspecified the

masks_overlayed_dir:

So I dont think you have that folder.

Now there are 2 options. You could either specify it correctly:

masks_overlayed_dir: /home/shenshen/Programs/masks_overlayed/

and rerun generation of the masks (takes time)

or you could simply go

mv /home/shenshen/Programs/mc_dat_eroded_2_dilated_3  /home/shenshen/Programs/masks_overlayed

and rerun metadata creation

dslwz2008 · 2018-05-30T07:54:16Z

Thank you very much! I understand. I did not create masks_overlayed folder before prepare_masks.
So how is this item set up? When is the content in it generated?
masks_overlayed_eroded_dir: ???

jakubczakon · 2018-05-30T07:57:35Z

this one should actually be dropped It is a remnant of older days when we only thought of 2 configurations of those target masks :) I will drop it from readme and yamls

jakubczakon · 2018-05-30T07:58:14Z

well you generated all those masks with prepare_masks you just put it in a wrong directory

dslwz2008 · 2018-05-30T08:23:40Z

I finally configured correctly！ Unfortunately, I did not find the column file_path_mask_eroded_0_dilated_0 in the generated stage1_metadata.csv file...
The first time I created metadata, it took several hours, but now it's generated in less than a minute.So I doubt, is this site (neptune.ml) cached?

jakubczakon · 2018-05-30T08:25:30Z

Well metadata generation should be pretty fast it's only filepath munging. Also since you are generating masks with erosion 2 and dilation 3 your path is actually file_path_mask_eroded_2_dilated_3 if I am correct.

dslwz2008 · 2018-05-30T08:37:34Z

This is the head of stage1_metadata.csv. No similar column appears.

jakubczakon · 2018-05-30T09:35:17Z

Did you change the mask_overlayed dir in neptune.yaml to

masks_overlayed_dir: /home/shenshen/Programs/masks_overlayed/

and recreated the metadata ?
Can you remove stage1_metadata.csv and run it again?

dslwz2008 · 2018-05-30T09:44:18Z

Yes, I have changed the masks_overlayed_dir in the neptune.yaml and delete stage1_metadata.csv. Then I ran the prepare_metadata again. The result is the same as in the picture above.

jakubczakon · 2018-05-30T10:34:10Z

what does this folder /home/shenshen/Programs/masks_overlayed/ contain ?

dslwz2008 · 2018-05-31T00:58:35Z

jakubczakon · 2018-05-31T05:34:31Z

Okey, I checked on my setup and I actually have folders like:

.../masks_overlayed_eroded_3_dilated_2

So i believe you should change the name of your directory to

../masks_overlayed_eroded_2_dilated_3

and rerun metadata generation and you will be ready to go.

dslwz2008 · 2018-05-31T06:50:17Z

Still no column file_path_mask_eroded_2_dilated_3. I'm going to carefully analyze the code and try again. Thank you very much.

jakubczakon · 2018-05-31T07:02:38Z

Ok, cool.
But one last try:

Change the folder name by:

mv ../masks_overlayed ../masks_overlayed_eroded_2_dilated_3

But LEAVE the name in the neptune.yaml as:

masks_overlayed_dir: ../masks_overlayed/

Rerun the metadata generation.

I think @XYAskWhy got it to work pretty quickly. Any advice?

dslwz2008 · 2018-05-31T07:19:58Z

Still not working... This is really a weird thing.
How about re-clone this repo and start over again? Which branch do you recommend?
@XYAskWhy How did you get it to work?

XYAskWhy · 2018-05-31T07:49:23Z

Got the local training running using the older master version, but still struggling with evaluating/predicting. The updated master version should be OK as well. @dslwz2008

jakubczakon · 2018-05-31T09:04:20Z

Master should work, dev too as i am generating final predictions with it right now. @XYAskWhy are you running unet_padding_tta ? Check evaluate checkpoint.py script to see how to add missing transformers (just run touch transformer_name in transformers dir)

XYAskWhy · 2018-06-03T02:10:13Z

@jakubczakon Thanks for the tip. When I run python main.py -- evaluate --pipeline_name unet_padded_tta --chunk_size 200, it dose raise error as the following:

(pytorch0.3) rs@rsLab:/media/rs/3EBAC1C7BAC17BC1/Xavier/Segmentation/open-solution-mapping-challenge$ python main.py -- evaluate --pipeline_name unet_padded_tta --chunk_size 200
neptune: Executing in Offline Mode.
neptune: Executing in Offline Mode.
2018-06-02 21-48-05 mapping-challenge >>> evaluating
/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py:895: DtypeWarning: Columns (6,7) have mixed types. Specify dtype option on import or set low_memory=False.
return ctx.invoke(self.callback, **ctx.params)
neptune: Executing in Offline Mode.
0%| | 0/5 [00:00<?, ?it/s]2018-06-02 21-48-13 steps >>> step xy_inference adapting inputs
2018-06-02 21-48-13 steps >>> step xy_inference loading transformer...
2018-06-02 21-48-13 steps >>> step xy_inference transforming...
2018-06-02 21-48-13 steps >>> step xy_inference adapting inputs
2018-06-02 21-48-13 steps >>> step xy_inference loading transformer...
2018-06-02 21-48-13 steps >>> step xy_inference transforming...
2018-06-02 21-48-13 steps >>> step tta_generator adapting inputs
Traceback (most recent call last):
File "main.py", line 282, in
action()
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "main.py", line 117, in evaluate
_evaluate(pipeline_name, dev_mode, chunk_size)
File "main.py", line 130, in _evaluate
prediction = generate_prediction(meta_valid, pipeline, logger, CATEGORY_IDS, chunk_size)
File "main.py", line 238, in generate_prediction
return _generate_prediction_in_chunks(meta_data, pipeline, logger, category_ids, chunk_size)
File "main.py", line 271, in _generate_prediction_in_chunks
output = pipeline.transform(data)
File "/media/rs/3EBAC1C7BAC17BC1/Xavier/Segmentation/open-solution-mapping-challenge/steps/base.py", line 152, in transform
step_inputs[input_step.name] = input_step.transform(data)
File "/media/rs/3EBAC1C7BAC17BC1/Xavier/Segmentation/open-solution-mapping-challenge/steps/base.py", line 152, in transform
step_inputs[input_step.name] = input_step.transform(data)
File "/media/rs/3EBAC1C7BAC17BC1/Xavier/Segmentation/open-solution-mapping-challenge/steps/base.py", line 152, in transform
step_inputs[input_step.name] = input_step.transform(data)
[Previous line repeated 8 more times]
File "/media/rs/3EBAC1C7BAC17BC1/Xavier/Segmentation/open-solution-mapping-challenge/steps/base.py", line 158, in transform
step_output_data = self._cached_transform(step_inputs)
File "/media/rs/3EBAC1C7BAC17BC1/Xavier/Segmentation/open-solution-mapping-challenge/steps/base.py", line 168, in _cached_transform
raise ValueError('No transformer cached {}'.format(self.name))
ValueError: No transformer cached tta_generator

I looked into evaluate_checkpoint.py and added 'tta_generator' to MISSING_TRANSFORMERS, but it didn't help. What might be the problem and how do I do? And I don't think I fully understand you commentting 'just run touch transformer_name in transformers dir'.

kamil-kaczmarek · 2018-06-03T10:39:26Z

@apyskir, @taraspiotr can you provide some help here (check previous message). Thx :)

jakubczakon · 2018-06-03T21:36:53Z

@XYAskWhy when you run evaluate or predict all transformers for given pipeline need to be persisted in the transformers folder. Since we did not train some pieces of this pipeline at train we need to either run this train on unet_padded_tta (don't advise to but could do with --dev_mode) or simply create those transformers by going touch PATH/TO/TRANSFORMER/DIR/TRANSFORMER_NAME just as I did in the evaluate_checkpoint.py .

This issue will be solved by having is_trainable flag in Step constructor but for now you need to persist all transformers (trainable or not).

jakubczakon · 2018-06-19T14:14:38Z

@XYAskWhy the is_trainable was added in #142 and is now on master

XYAskWhy · 2018-06-20T01:47:58Z

@jakubczakon Thanks! Will check out soon.

XYAskWhy · 2018-06-21T13:01:58Z

@jakubczakon Thanks for the update. But the README still doesn't include the prepare_masks step, which is necessary, right? And the script has been running for two days since I executed it with python main.py -- prepare_masks, does this seem normal to you?

jakubczakon · 2018-06-21T14:47:27Z

@XYAskWhy it is included in the readme now. Thanks for spotting that.
When it comes to performance it depends on the number of workers (and threads) you are using.

Aayushktyagi · 2019-08-27T06:48:39Z

@jakubczakon is there a shorter way to evaluate model on test image/images or we have to prepare masks first which is time consuming.
Thanks

jakubczakon · 2019-08-27T07:59:15Z

Well, you can simply use predict_on_dir which takes a directory of images as input:

python main.py predict_on_dir \
--pipeline_name unet_tta_scoring_model \
--chunk_size 1000 \
--dir_path path/to/inference_directory \
--prediction_path path/to/predictions.json

That will get you predicted segmentation masks which you can later plot by using results exploration notebook.

It is not a proper evaluation but it is definitely quicker.

willhunger mentioned this issue Aug 1, 2020

Confused about generating target masks #229

Open

KeyError: "['file_path_mask_eroded_3'] not in index" #119

KeyError: "['file_path_mask_eroded_3'] not in index" #119

Comments

XYAskWhy commented May 29, 2018 • edited Loading

jakubczakon commented May 29, 2018

dslwz2008 commented May 30, 2018

XYAskWhy commented May 30, 2018

dslwz2008 commented May 30, 2018

XYAskWhy commented May 30, 2018 • edited Loading

jakubczakon commented May 30, 2018 • edited Loading

dslwz2008 commented May 30, 2018

jakubczakon commented May 30, 2018 • edited Loading

dslwz2008 commented May 30, 2018 • edited Loading

jakubczakon commented May 30, 2018

jakubczakon commented May 30, 2018

jakubczakon commented May 30, 2018

dslwz2008 commented May 30, 2018

jakubczakon commented May 30, 2018

dslwz2008 commented May 30, 2018

jakubczakon commented May 30, 2018

jakubczakon commented May 30, 2018

dslwz2008 commented May 30, 2018

jakubczakon commented May 30, 2018

dslwz2008 commented May 30, 2018

jakubczakon commented May 30, 2018

dslwz2008 commented May 30, 2018

jakubczakon commented May 30, 2018 • edited Loading

dslwz2008 commented May 31, 2018

jakubczakon commented May 31, 2018

dslwz2008 commented May 31, 2018

jakubczakon commented May 31, 2018

dslwz2008 commented May 31, 2018

XYAskWhy commented May 31, 2018 • edited Loading

jakubczakon commented May 31, 2018 • edited Loading

XYAskWhy commented Jun 3, 2018 • edited Loading

kamil-kaczmarek commented Jun 3, 2018

jakubczakon commented Jun 3, 2018

jakubczakon commented Jun 19, 2018

XYAskWhy commented Jun 20, 2018

XYAskWhy commented Jun 21, 2018 • edited Loading

jakubczakon commented Jun 21, 2018 • edited Loading

Aayushktyagi commented Aug 27, 2019

jakubczakon commented Aug 27, 2019

XYAskWhy commented May 29, 2018 •

edited

Loading

XYAskWhy commented May 30, 2018 •

edited

Loading

jakubczakon commented May 30, 2018 •

edited

Loading

jakubczakon commented May 30, 2018 •

edited

Loading

dslwz2008 commented May 30, 2018 •

edited

Loading

jakubczakon commented May 30, 2018 •

edited

Loading

XYAskWhy commented May 31, 2018 •

edited

Loading

jakubczakon commented May 31, 2018 •

edited

Loading

XYAskWhy commented Jun 3, 2018 •

edited

Loading

XYAskWhy commented Jun 21, 2018 •

edited

Loading

jakubczakon commented Jun 21, 2018 •

edited

Loading