KeyError(f"None of [{key}] are in the [{axis_name}]" #50

lmohit95 · 2022-11-18T02:08:45Z

I am getting KeyError(f"None of [{key}] are in the [{axis_name}]" while running python models/load_data.py.
The error occurs in this line.

I have set appropriate path for criteo_dataset in load_data.py. The download, extraction and creating local files part is successful. I downloaded dataset from this website: https://www.kaggle.com/competitions/criteo-display-ad-challenge/data.

The text was updated successfully, but these errors were encountered:

Hankpipi · 2022-11-19T13:19:39Z

Hi, @lmohit95, this has been solved in #52 and sorry for the delay.

lmohit95 · 2022-11-19T23:47:25Z

Thank you. load_data.py works perfectly now. The dataset is downloaded and processed.
But while running bash tests/local_dcn_criteo.sh command, I get the following error. I have tried changing batchsize at line 62 in run_hetu.py and tried running the command, but I still get the same error.

I am able to run other DLRM like facebook open source DLRM using GPU, so I believe CUDA setup is correct.

Hankpipi · 2022-11-20T02:42:56Z

@lmohit95, Hetu main brench has been updated which enables dynamic memory, please pull the new code and try again.

lmohit95 · 2022-11-20T02:45:42Z

@lmohit95, Hetu main brench has been updated, please try to pull the new code and try again.

Thank you. It works now. Sorry for asking lot of questions. I am facing this issue now while training criteo dataset.

#50 (comment)

Hankpipi · 2022-11-20T02:59:52Z

@lmohit95, Hetu main brench has been updated, please try to pull the new code and try again.

Thank you. It works now. Sorry for asking lot of questions. I am facing this issue now while training criteo dataset.

#50 (comment)

I mean this problem has been solved by #47 which was merged not long before, and will it still happen when you pull these changes?

lmohit95 · 2022-11-20T03:12:53Z

I get the following error when I run bash tests/local_dcn_criteo.sh. I created hetu_config.yaml file in tmp folder and copied contents provided in README.MD. I am trying to run HET on a single GPU.

To avoid this error, I deliberately made file = None in __init__ function of distribute.py. While doing that, I am facing the outofmemory error.

Hankpipi · 2022-11-20T03:26:42Z

@lmohit95, You can also update the line 120 by the following code to avoid the first error:

Hetu/examples/ctr/run_hetu.py

Lines 120 to 126 in 1684091

    
           if args.comm is None: 
        
               executor = ht.Executor(eval_nodes, ctx=ht.gpu(0), cstable_policy=args.cache, 
        
                                      bsp=args.bsp, cache_bound=args.bound, seed=123, log_path=executor_log_path) 
        
           else: 
        
               strategy = ht.dist.DataParallel(aggregate=args.comm) 
        
               executor = ht.Executor(eval_nodes, dist_strategy=strategy, cstable_policy=args.cache, 
        
                                      bsp=args.bsp, cache_bound=args.bound, seed=123, log_path=executor_log_path)

For the OOM error, #47 implements dynamic memory allocation, and the gpu memory peak will be halved when you run bash tests/local_dcn_criteo.sh.

Maybe you haven't pull the latest code yet?

lmohit95 · 2022-11-21T20:39:47Z

Thanks a lot for everything. The tests are working perfectly now. I was accessing the forked repo mentioned in the HET paper.
While running python run_hetu.py --model dcn_criteo --all --val command, I am facing the following error:

I pulled latest code and downloaded criteo dataset by running load_data.py file.

Hankpipi · 2022-11-22T10:47:24Z

@lmohit95, thanks for you feedback and sorry for my mistake.
It is true that there are still some errors in dataset processing, and I have fixed it in #54.
Please pull my code and run python load_data.py again before running run_hetu.py --model dcn_criteo --all --val.

lmohit95 · 2022-11-22T19:03:00Z

Thank you. Everything works now. I just wanted to clarify something regarding run_hetu.py --model dcn_criteo --all --val. This command trains and tests HET on criteo dataset right?

The paper mentions that the training process can take hours (Fig 6),

but in my case the training runs for a total of 10 epochs with far less overall runtime.

Hsword · 2022-11-22T20:19:37Z

It seems like you are running in a local execution mode, rather than the distributed training. That's why it's much faster.
Besides, note that the test_auc is reported every 1/10 epoch as described in

Hetu/examples/ctr/run_hetu.py

Line 190 in acae42a

help="num of epochs, each train 1/10 data")

.

lmohit95 · 2022-11-23T05:46:32Z

Got it. Thanks for all the help!!!

lmohit95 · 2022-12-09T20:17:40Z

Hello,
Thanks for all the help until now.
I am running HET on criteo dataset on a single GPU node by setting HETU_VERSION = 'gpu' in HYBRID mode. I ran `bash examples/ctr/tests/hybrid_wdl_criteo.sh, but I am getting the following error:

This is my configuration file

shared :
  DMLC_PS_ROOT_URI : 127.0.0.1
  DMLC_PS_ROOT_PORT : 13100
  DMLC_NUM_WORKER : 2
  DMLC_NUM_SERVER : 1
  DMLC_PS_VAN_TYPE : p3
launch :
  worker : 2
  server : 1
  scheduler : true
nodes:
  - host: lmohit95
    servers: 1
    workers: 2
    chief: true

Hankpipi mentioned this issue Nov 19, 2022

fix criteo dataset error #52

Merged

Hsword closed this as completed in #52 Nov 20, 2022

Hsword added bug Something isn't working HET Issues about HET labels Dec 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError(f"None of [{key}] are in the [{axis_name}]" #50

KeyError(f"None of [{key}] are in the [{axis_name}]" #50

lmohit95 commented Nov 18, 2022

Hankpipi commented Nov 19, 2022

lmohit95 commented Nov 19, 2022

Hankpipi commented Nov 20, 2022 •

edited

Loading

lmohit95 commented Nov 20, 2022

Hankpipi commented Nov 20, 2022 •

edited

Loading

lmohit95 commented Nov 20, 2022

Hankpipi commented Nov 20, 2022 •

edited

Loading

lmohit95 commented Nov 21, 2022 •

edited

Loading

Hankpipi commented Nov 22, 2022 •

edited

Loading

lmohit95 commented Nov 22, 2022 •

edited

Loading

Hsword commented Nov 22, 2022

lmohit95 commented Nov 23, 2022

lmohit95 commented Dec 9, 2022 •

edited

Loading

KeyError(f"None of [{key}] are in the [{axis_name}]" #50

KeyError(f"None of [{key}] are in the [{axis_name}]" #50

Comments

lmohit95 commented Nov 18, 2022

Hankpipi commented Nov 19, 2022

lmohit95 commented Nov 19, 2022

Hankpipi commented Nov 20, 2022 • edited Loading

lmohit95 commented Nov 20, 2022

Hankpipi commented Nov 20, 2022 • edited Loading

lmohit95 commented Nov 20, 2022

Hankpipi commented Nov 20, 2022 • edited Loading

lmohit95 commented Nov 21, 2022 • edited Loading

Hankpipi commented Nov 22, 2022 • edited Loading

lmohit95 commented Nov 22, 2022 • edited Loading

Hsword commented Nov 22, 2022

lmohit95 commented Nov 23, 2022

lmohit95 commented Dec 9, 2022 • edited Loading

Hankpipi commented Nov 20, 2022 •

edited

Loading

Hankpipi commented Nov 20, 2022 •

edited

Loading

Hankpipi commented Nov 20, 2022 •

edited

Loading

lmohit95 commented Nov 21, 2022 •

edited

Loading

Hankpipi commented Nov 22, 2022 •

edited

Loading

lmohit95 commented Nov 22, 2022 •

edited

Loading

lmohit95 commented Dec 9, 2022 •

edited

Loading