In-memory analytics benchmark run question #436

minkyuSnow · 2023-07-14T06:06:13Z

Hello

I am running in-memory analytics application on Arm cpu and memory 4GB

Running the benchmark with the benchmark data set set to 144MB produces the results "Movies Recommended" and "Benchmark Execution Time".

However, when I look at the message in the process of calculating the result, there is something that looks like an error message, so I wonder if this benchmark ran normally.

Here is the command I used to run the benchmark.
I'm trying to do it on one node. Node spec = Arm CPU + Memory 4GB

$ docker create --name movielens-data cloudsuite/movielens-dataset
$ docker run -dP --net host --name spark-master cloudsuite/spark:3.3.2 master
$ docker run -dP --net host --volumes-from movielens-data --name spark-worker-01 cloudsuite/spark:3.3.2 worker
spark://NODE_IP:7077
$ docker run --rm --net host --volumes-from movielens-data cloudsuite/in-memory-analytics /data/ml-latest
/data/myratings.csv --driver-memory 2g --executor-memory 2g --master spark://NODE_IP:7077

xusine · 2023-07-14T11:11:15Z

Hello,

After looking at the first error log, I believe the reason is running out of memory (Java.OutOfMemoryError). If possible, you can give more memory. Or you may consider redistribute the memory allocation for driver and executor, e.g., 1GB memory for driver and 3GB for executor.
Please let me know if that helps!

minkyuSnow · 2023-07-14T11:40:00Z

Hello,

After looking at the first error log, I believe the reason is running out of memory (Java.OutOfMemoryError). If possible, you can give more memory. Or you may consider redistribute the memory allocation for driver and executor, e.g., 1GB memory for driver and 3GB for executor. Please let me know if that helps!

Thanyk you for reply.

$ docker create --name movielens-data cloudsuite/movielens-dataset
$ docker run -dP --net host --name spark-master cloudsuite/spark:3.3.2 master
$ docker run -dP --net host --volumes-from movielens-data --name spark-worker-01 cloudsuite/spark:3.3.2 worker
spark://NODE_IP:7077
$ docker run --rm --net host --volumes-from movielens-data cloudsuite/in-memory-analytics /data/ml-latest
/data/myratings.csv --driver-memory 1g --executor-memory 3g --master spark://NODE_IP:7077

As you said, driver memory 1GB and executor memory 3GB were given, but unlike when 2GB was given, it appears that the resources are insufficient.
Conversely, even if 3GB of driver and 1GB of executor are given, it will not run.

If there is nothing that can be set, should it be regarded as insufficient physical memory?

xusine · 2023-07-14T12:05:52Z

Hello,

Thanks for doing the test. Indeed, this means the memory is not enough to run the workload.

There might be another way around: You can try to restrict the number of cores allocated to the container using --cpuset-cpus. Memory consumption may be reduced when the worker count becomes smaller. The trade-off is that it will take longer time to finish.

minkyuSnow · 2023-07-14T12:16:22Z

Thank you for reply.

Are you saying that the problem is caused by the fact that the actual memory size is small even though there is a lot of data?

There was a problem when running by allocating 2GB, but since the result came out, can it be considered normal?

xusine · 2023-07-14T12:44:21Z

Yes. It is an implication that the physical memory is not enough. You can explain it as a normal case, but not a representative case.

minkyuSnow · 2023-07-14T13:05:07Z

Yes. It is an implication that the physical memory is not enough. You can explain it as a normal case, but not a representative case.

Thank you for reply.

I understand little bit.
The result came out, but you're saying that it's hard to see it normally because a memory error came out?

As an example, I will show you a picture of the result.

xusine · 2023-07-14T13:11:17Z

Hello,

Yes. Even though you finally have the result and workload successfully finished, my understanding is that it still cannot represent a real server: This workload is supposed to run on a server with large amount of memory, so you should not see any out-of-memory error during running.

However, it is OK if your ideal case is not a server :)

Best,

minkyuSnow · 2023-07-14T13:43:36Z

Hello,

Yes. Even though you finally have the result and workload successfully finished, my understanding is that it still cannot represent a real server: This workload is supposed to run on a server with large amount of memory, so you should not see any out-of-memory error during running.

However, it is OK if your ideal case is not a server :)

Best,

Thank you for your kind reply. You have been very helpful. Thank you

minkyuSnow closed this as completed Jul 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-memory analytics benchmark run question #436

In-memory analytics benchmark run question #436

minkyuSnow commented Jul 14, 2023

xusine commented Jul 14, 2023 •

edited

Loading

minkyuSnow commented Jul 14, 2023

xusine commented Jul 14, 2023

minkyuSnow commented Jul 14, 2023

xusine commented Jul 14, 2023

minkyuSnow commented Jul 14, 2023

xusine commented Jul 14, 2023

minkyuSnow commented Jul 14, 2023

In-memory analytics benchmark run question #436

In-memory analytics benchmark run question #436

Comments

minkyuSnow commented Jul 14, 2023

xusine commented Jul 14, 2023 • edited Loading

minkyuSnow commented Jul 14, 2023

xusine commented Jul 14, 2023

minkyuSnow commented Jul 14, 2023

xusine commented Jul 14, 2023

minkyuSnow commented Jul 14, 2023

xusine commented Jul 14, 2023

minkyuSnow commented Jul 14, 2023

xusine commented Jul 14, 2023 •

edited

Loading