Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h2o flow unresponsive? #16331

Closed
blgodwin opened this issue Jul 16, 2024 · 14 comments
Closed

h2o flow unresponsive? #16331

blgodwin opened this issue Jul 16, 2024 · 14 comments
Labels

Comments

@blgodwin
Copy link

Connecting to h2o through R v. 4.3.3
H2O cluster version: 3.46.0.4

I connect to h2o through R and then begin to run a PCA model with a large dataset (30k+ datapoints) through the GUI webpage interface. I import the data, parse, it and view it, and the run the job. The job runs through all the iterations of alternating minimizations to where it says "100% progress." It takes about 1 minute 20 seconds. The progress bar does not change but the job status says "RUNNING." If I click "view" the Action a line in grey appears that says "getModel 'modelname'" but nothing more and a yellow bar appears at the bottom saying "Requesting http://localhost.54321/...."

Then it just stays this way. It has not given me a termination error, but it also remains unresponsive. I am unsure if it is just taking a long time to run or if it is indeed broken and not working. At the time of this writing I have let it sit for about 2 days. If I try to investigate the status using R it is also unresponsive.

Is this working as intended or have I encountered an error?

Thanks!

image

image

image

@blgodwin blgodwin added the bug label Jul 16, 2024
@tomasfryda
Copy link
Contributor

Thank you for reporting it. It doesn't seem to be working as expected. Is it possible it ran out of memory?

Could you provide us with the backend logs(https://docs.h2o.ai/h2o/latest-stable/h2o-docs/logs.html)? It's likely that they would suffice to find out what is wrong but if you could provide us with more information it will get easier for us.

Would you be able to check if it is browser/serialization issue? I would start by right click on the page -> "Inspect" and look if there are some errors in "console" tab and if not would you be able to rerun it with "Network" tab open to see if there is a long reply that could block the backend?

@blgodwin
Copy link
Author

h2o_127.0.0.1_54321-1-trace.log
h2o_127.0.0.1_54321-2-debug.log
h2o_127.0.0.1_54321-3-info.log
h2o_127.0.0.1_54321-4-warn.log

Thanks so much for your response! I uploaded some of the log files I found. It does seem like it was an OOM error. I admit to being out of my depth here - h2o was suggested to me because the PCA was too big for my computer but I suppose I don't know how to set up h2o properly. Is it possible to get enough memory to run this analysis? If so, can you please explain how? You can see in my screenshot of RStudio above I requested 20GB but it said 9.98 cluster memory. I had requested 9 days prior, so perhaps I need to clear something? I'm assuming I will actually need to have much more than even 20GB?

Thanks again!

@tomasfryda
Copy link
Contributor

In the RStudio screenshot, the h2o is already running and the h2o.init() just connects to the running instance so it can't change the memory size. You can shutdown the h2o and then start it again with the h2o.init() but please make sure to set also maximum memory size (just to rule out the possibility of being limited by maximum size being smaller than minimum).

I would also try to specify different pca_method parameters (https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/pca.html).

@wendycwong do you have any other ideas?

@blgodwin
Copy link
Author

blgodwin commented Jul 18, 2024

Hello,

I tried specifying mim_mem_size and max_mem_size as seen below. Knowing that 10G was probably not enough, I just went up to 100G to see what happened. I uploaded my data through R and then went to the GUI interface to build the model with the hope that any errors would be clearly displayed there. Now when I go to build the PCA model, not even run it, it says H2O is no longer connected. I had the same error on both Chrome and Firefox.

What am I doing wrong?

Thanks again!

image
image
image

@wendycwong
Copy link
Contributor

This is baffling. Your dataset size is not that big and your memory allocation is fine. There really is no reason to see the failure. Is it possible to share your dataset so we can reproduce the error here locally and fix it?

@blgodwin
Copy link
Author

blgodwin commented Jul 18, 2024

Of course! I'm sure there is some mistake on my end that I'm not expert enough to know to even mention here. It does have missing data if that helps troubleshoot, though I thought I picked the correct parameters to deal with that. I was going to test several combinations of the PCA methods (e.g., standardized or not) after I confirmed one of them worked.

Additionally, I was and still am able to get a PCA working with a smaller dataset.

Zipped data filed is attached.
pntest_mean_SNP_PCA_noheader.zip

Thanks so much for your help! It's very much appreciated.

@wendycwong
Copy link
Contributor

@blgodwin

Thank you so much for providing me with the information. Will try it out and let you know.

@wendycwong
Copy link
Contributor

@blgodwin

I played with your dataset in Python and execute the following:

import h2o
h2o.init(strict_version_check=False)
data = h2o.import_file("pntest_mean_SNP_PCA_noheader.txt")
from h2o.estimators.pca import H2OPrincipalComponentAnalysisEstimator
fitModel = H2OPrincipalComponentAnalysisEstimator(k=4, impute_missing=True) # you have many NA's in your dataset.
fitModel.train(data.names, training_frame=data)

I got the following result:

image

@wendycwong
Copy link
Contributor

However, I do run into one problem. When I set PCA_method="GLRM" like here:

fitModel = H2OPrincipalComponentAnalysisEstimator(k=4, PCA_method="glrm", ) use_all_factor_levels=True)
fitModel.train(x=data.names, training_frame=data)

I will run into a NPE error. I have opened an issue to resolve this: #16335

This is embarrassing.

@wendycwong
Copy link
Contributor

okay, I loaded your dataset into Flow using chrome and choose the following parameters in my model building:

image

I was able to get a model out:

image

@blgodwin
Copy link
Author

That's great! Thank you for taking the time to try that out!

I'm still having the same issue. Unless I'm missing something, I set up the PCA just like your screenshot above.

image

And I get an unresponsive error almost immediately. The only difference this time is that the progress bar doesn't go up to 100% before it quits.

image

I see this error in both Chrome and Firefox.

Is it something about how I'm connecting to h2o?

@tomasfryda
Copy link
Contributor

@blgodwin I would recommend trying it in R or Python. Flow gets much less attention than clients for the aforementioned languages so there might be more bugs than in R or Python client.

One thing that is probably different is that we use macOS and linux for development and testing so it's possible the bug is related to the OS you use or it might be due to the newer version of Java. IIRC from logs you use Java that's not yet officially supported by us.

If you can run h2o on some different OS it might help. If that would be too complicated, you might try different java version (older; AFAIK we support java 8 to 17). Or you might try running h2o in windows subsystem for linux.

@wendycwong knows more about our PCA implementation so she might have some more ideas what to try if you feel uncomfortable with installing different java etc.

@blgodwin
Copy link
Author

blgodwin commented Jul 19, 2024

I got it to work if I did not specify any min_mem_size, max_mem_size, or threads!

PCA ran with a simple "localh2o = h2o.init()" to connect

@tomasfryda
Copy link
Contributor

That's great! Thank you for mentioning how you resolved that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants