-
Notifications
You must be signed in to change notification settings - Fork 405
KryoException: Buffer overflow. Available: 1, required: 4 #350
Comments
If you wish I can generate CSV file for this data. |
Is anything on your cluster setting spark.kryoserializer.buffer.max to something? even if that's in MB (a historical default in Spark), that seems small. The default should be 64MB and it's safe to set the max up to about 2047m. You can try setting this value, at least, to make sure. |
Can you please give a hint how to set this value correctly? As param to spark-submit? |
If I haven't forgotten the HOCON syntax, it should be sufficient to add something like this to your app config:
This will cause it to be sent by spark-submit. You should also look at the logs, because they'll print the exact command and config used. This will help verify that the right params are being set. |
After fixing memory issues, i get these crashes in speed layer:
|
That kind of error indicates a problem in the Spark cluster. It suddenly failed to communicate with executors. I would look for other errors, on the executor side and in their logs. Do other Spark jobs work at all? |
I have restarted all the services and all of them seem to be running correctly: All the data from the model is gone, I hadn't deleted any files from HDFS: Now I try to push data into oryx: And I get this error message:
System was crashing under high load, so I've added more memory to kryo and total memory. |
I see you have 8 cores and 12GB of RAM. This is pretty small unless you also reduce the resource requirements in your config. YARN is telling you it can't give you the resource you are asking for. |
Ok, I've added more total memory: I dont see any problem in batch logs:
speed layer also has no crashes:
if I feed 1.9M data.csv again, then I get this crash:
how much memory does it need? Or may be this is some older data somewhere in queue caused to crash? Why does it crash? |
That says your driver is out of memory. It really depends on how big your workloads are, but I'd generally give drivers at least 1GB and executors at least 2GB. |
I've added more memory:
It doesn't seem to crash, but I still get result 503 after uploading csv file.
speed:
serving:
why serving layer still gives 503? |
It means no model has been built yet. You need to make sure the batch layer has built a model successfully and pushed it onto the kafka topic. Here it's set to build every 5 minutes. It may also be unable to build a model from the data; check the logs. |
You are right, please be so kind to take a look here, this is a tail of batch layer log:
|
My guess is that huge 4GB data, which I was using for testing previously, is somewhere in kafka or somewhere else, but what can I do to fix that now? |
The speed layer will ignore previous data, but batch won't. It may be taking a long time to build a model on all that data. You can just delete and recreate the topics. |
ok, but what does this message mean: No more replicas available ? |
It means executors died, and enough died that some broadcasted blocks of data were lost and the task failed. It's just a symptom of executors dying. These are really Spark questions rather than related to Oryx. I think it's mostly that you're not giving the cluster enough resources. |
Solved the problem, i had not enough space on /tmp, moved to /var, reformatted and now it works! Thank for your support! |
Feeding huge data (4G) through serving layer gives crash of speed layer:
how can I fix it?
The text was updated successfully, but these errors were encountered: