Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaiveBayes hang in the middle of localAlgorithm.compute #2059

Closed
xwu99 opened this issue Jan 27, 2022 · 2 comments
Closed

NaiveBayes hang in the middle of localAlgorithm.compute #2059

xwu99 opened this issue Jan 27, 2022 · 2 comments
Assignees

Comments

@xwu99
Copy link
Contributor

xwu99 commented Jan 27, 2022

Describe the bug

Each local table is row x column 150x 262144 sparse table, hang in this line:
https://github.com/oap-project/oap-mllib/blob/master/mllib-dal/src/main/native/NaiveBayesDALImpl.cpp#L46

To Reproduce
Steps to reproduce the behavior:

The workload is using Naive Bayes to classify spam. The feature table is 600 x 262144 sparse table.
If I used a small table such as 16 x 262144, it can pass.
I am not sure if I can save DAL sparse numeric table to file. If that works, I can provide dataset.

Expected behavior
Not hang or print error message

Output/Screenshots

Features row x column: 150 x 262144
oneDAL (native): Number of CPU threads used: 1
oneDAL (native): training model with fastCSR method
NaiveBayes (native): start local step compute
... should print something here if localAlgorithm.compute is finished. but the process hang forever.

The following is the correct output if I am using another small local table (row x column: 16 x 262144)
Features row x column: 16 x 262144
oneDAL (native): Number of CPU threads used: 1
oneDAL (native): training model with fastCSR method
NaiveBayes (native): start local step compute
local step compute finished
NaiveBayes (native): local step compute took 0.022 secs
NaiveBayes (native): start ccl::gather
NaiveBayes (native): ccl::gather took 0.228 secs
NaiveBayes (native): start master step compute
NaiveBayes (native): master step compute took 0.041 secs
oneDAL (native): training model finished
training took 1.058 secs
NaiveBayesDAL compute took 1.063481252 secs
NaiveBayesDAL result conversion took 0.109307419 secs

Environment:

  • OS: Ubuntu 18.04
  • Compiler: Intel(R) oneAPI DPC++/C++ Compiler 2021.4.0 (2021.4.0.20210924)
  • Version: oneAPI 2021.4.0
@lordoz234 lordoz234 self-assigned this Feb 8, 2022
@lordoz234
Copy link
Contributor

Thanks for raising this issue. I am working on fix this problem.

@xwu99
Copy link
Contributor Author

xwu99 commented Feb 22, 2022

@lordoz234 Thanks for the work. We found the root cause is from our sparse data conversion code. We will let you know if there is additional problem. The issue is closed.

@xwu99 xwu99 closed this as completed Feb 22, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants