Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-memory analytics benchmark run question #436

Closed
minkyuSnow opened this issue Jul 14, 2023 · 8 comments
Closed

In-memory analytics benchmark run question #436

minkyuSnow opened this issue Jul 14, 2023 · 8 comments

Comments

@minkyuSnow
Copy link

Hello

I am running in-memory analytics application on Arm cpu and memory 4GB

Running the benchmark with the benchmark data set set to 144MB produces the results "Movies Recommended" and "Benchmark Execution Time".

However, when I look at the message in the process of calculating the result, there is something that looks like an error message, so I wonder if this benchmark ran normally.

Here is the command I used to run the benchmark.
I'm trying to do it on one node. Node spec = Arm CPU + Memory 4GB

$ docker create --name movielens-data cloudsuite/movielens-dataset
$ docker run -dP --net host --name spark-master cloudsuite/spark:3.3.2 master
$ docker run -dP --net host --volumes-from movielens-data --name spark-worker-01 cloudsuite/spark:3.3.2 worker
spark://NODE_IP:7077
$ docker run --rm --net host --volumes-from movielens-data cloudsuite/in-memory-analytics /data/ml-latest
/data/myratings.csv --driver-memory 2g --executor-memory 2g --master spark://NODE_IP:7077

스크린샷 2023-07-14 14 46 35
스크린샷 2023-07-14 14 46 54
스크린샷 2023-07-14 14 47 02

@xusine
Copy link
Contributor

xusine commented Jul 14, 2023

Hello,

After looking at the first error log, I believe the reason is running out of memory (Java.OutOfMemoryError). If possible, you can give more memory. Or you may consider redistribute the memory allocation for driver and executor, e.g., 1GB memory for driver and 3GB for executor.
Please let me know if that helps!

@minkyuSnow
Copy link
Author

Hello,

After looking at the first error log, I believe the reason is running out of memory (Java.OutOfMemoryError). If possible, you can give more memory. Or you may consider redistribute the memory allocation for driver and executor, e.g., 1GB memory for driver and 3GB for executor. Please let me know if that helps!

Thanyk you for reply.

$ docker create --name movielens-data cloudsuite/movielens-dataset
$ docker run -dP --net host --name spark-master cloudsuite/spark:3.3.2 master
$ docker run -dP --net host --volumes-from movielens-data --name spark-worker-01 cloudsuite/spark:3.3.2 worker
spark://NODE_IP:7077
$ docker run --rm --net host --volumes-from movielens-data cloudsuite/in-memory-analytics /data/ml-latest
/data/myratings.csv --driver-memory 1g --executor-memory 3g --master spark://NODE_IP:7077

As you said, driver memory 1GB and executor memory 3GB were given, but unlike when 2GB was given, it appears that the resources are insufficient.
Conversely, even if 3GB of driver and 1GB of executor are given, it will not run.

If there is nothing that can be set, should it be regarded as insufficient physical memory?

스크린샷 2023-07-14 20 32 14

@xusine
Copy link
Contributor

xusine commented Jul 14, 2023

Hello,

Thanks for doing the test. Indeed, this means the memory is not enough to run the workload.

There might be another way around: You can try to restrict the number of cores allocated to the container using --cpuset-cpus. Memory consumption may be reduced when the worker count becomes smaller. The trade-off is that it will take longer time to finish.

@minkyuSnow
Copy link
Author

Thank you for reply.

Are you saying that the problem is caused by the fact that the actual memory size is small even though there is a lot of data?

There was a problem when running by allocating 2GB, but since the result came out, can it be considered normal?

@xusine
Copy link
Contributor

xusine commented Jul 14, 2023

Yes. It is an implication that the physical memory is not enough. You can explain it as a normal case, but not a representative case.

@minkyuSnow
Copy link
Author

Yes. It is an implication that the physical memory is not enough. You can explain it as a normal case, but not a representative case.

Thank you for reply.

I understand little bit.
The result came out, but you're saying that it's hard to see it normally because a memory error came out?

As an example, I will show you a picture of the result.
스크린샷 2023-07-14 22 04 07

@xusine
Copy link
Contributor

xusine commented Jul 14, 2023

Hello,

Yes. Even though you finally have the result and workload successfully finished, my understanding is that it still cannot represent a real server: This workload is supposed to run on a server with large amount of memory, so you should not see any out-of-memory error during running.

However, it is OK if your ideal case is not a server :)

Best,

@minkyuSnow
Copy link
Author

Hello,

Yes. Even though you finally have the result and workload successfully finished, my understanding is that it still cannot represent a real server: This workload is supposed to run on a server with large amount of memory, so you should not see any out-of-memory error during running.

However, it is OK if your ideal case is not a server :)

Best,

Thank you for your kind reply. You have been very helpful. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants