Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing spark-user-id #933

Closed
wants to merge 2 commits into from
Closed

Conversation

pmahindrakar-oss
Copy link
Contributor

Signed-off-by: pmahindrakar-oss prafulla.mahindrakar@gmail.com

Without this change the spark driver pod fails with permission denied issue when writing to /.aws and anything to /tmp directory

+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.0.130.1 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///opt/venv/bin/entrypoint.py pyflyte-execute --inputs s3://union-j1-us-east-2-mockjackorg/metadata/propeller/flytesnacks-development-a7wbdflrdzs7sr7lhrff/n0/data/inputs.pb --output-prefix s3://union-j1-us-east-2-mockjackorg/metadata/propeller/flytesnacks-development-a7wbdflrdzs7sr7lhrff/n0/data/0 --raw-output-data-prefix s3://union-j1-us-east-2-mockjackorg/5a/a7wbdflrdzs7sr7lhrff-n0-0 --checkpoint-path s3://union-j1-us-east-2-mockjackorg/5a/a7wbdflrdzs7sr7lhrff-n0-0/_flytecheckpoints --prev-checkpoint '""' --resolver flytekit.core.python_auto_container.default_task_resolver -- task-module k8s_spark.pyspark_pi task-name hello_spark
22/12/22 22:55:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-t41x9d5w because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.

{"asctime": "2022-12-22 22:55:26,788", "name": "flytekit", "levelname": "ERROR", "message": "Error from command '['aws', 's3', 'cp', 's3://union-j1-us-east-2-mockjackorg/metadata/propeller/flytesnacks-development-a7wbdflrdzs7sr7lhrff/n0/data/inputs.pb', '/tmp/flyte-bmtatnmx/sandbox/local_flytekit/inputs.pb']':\nb\"fatal error: [Errno 13] Permission denied: '/.aws'\\n\"\n"}
{"asctime": "2022-12-22 22:55:27,312", "name": "flytekit", "levelname": "ERROR", "message": "Error from command '['aws', '--no-sign-request', 's3', 'cp', 's3://union-j1-us-east-2-mockjackorg/metadata/propeller/flytesnacks-development-a7wbdflrdzs7sr7lhrff/n0/data/inputs.pb', '/tmp/flyte-bmtatnmx/sandbox/local_flytekit/inputs.pb']':\nb'fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden\\n'\n"}


Removing the usage of spark-user-id

Signed-off-by: pmahindrakar-oss <prafulla.mahindrakar@gmail.com>
Signed-off-by: pmahindrakar-oss <prafulla.mahindrakar@gmail.com>
@samhita-alla
Copy link
Contributor

@pmahindrakar-oss, I'm not sure we want to have a root user; see this comment for reference: #914 (comment). It actually worked on the local sandbox cluster with a non-root user.

@pmahindrakar-oss
Copy link
Contributor Author

pmahindrakar-oss commented Dec 23, 2022

I agree @samhita-alla . this is temp workaround IMO until we fix the main issue with permissions.
We faced this issue in a cloud tenant and proposing to use https://github.com/flyteorg/flytesnacks/releases/tag/v0.3.154 until this is fixed.

Interesting that it works on local sandbox since we continuously ran into this issue on cloud tenant
"Error from command '['aws', 's3', 'cp', 's3://union-j1-us-east-2-mockjackorg/metadata/propeller/flytesnacks-development-a7wbdflrdzs7sr7lhrff/n0/data/inputs.pb', '/tmp/flyte-bmtatnmx/sandbox/local_flytekit/inputs.pb']':\nb\"fatal error: [Errno 13] Permission denied: '/.aws
Shouldn't we be creating the user with spark_uid=1001 and then use that user. It might be that this user doesn't exist and should be created before its usage

@samhita-alla
Copy link
Contributor

samhita-alla commented Dec 24, 2022

@pmahindrakar-oss, the issue had arisen cause the spark setup wasn't correctly configured on the development uniondemo cluster. It seems to have been fixed now. Let me test with user 1001 and see if that works. I'll let you know the status.

@pmahindrakar-oss
Copy link
Contributor Author

Sure @samhita-alla let me know what you find

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants