-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates Spark dockerfile #914
Conversation
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
ENV SPARK_VERSION 3.2.1 | ||
ENV PYSPARK_PYTHON ${VENV}/bin/python3 | ||
ENV PYSPARK_DRIVER_PYTHON ${VENV}/bin/python3 | ||
|
||
# Copy the makefile targets to expose on the container. This makes it easier to register. | ||
# Delete this after we update CI | ||
COPY in_container.mk /root/Makefile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When can we get rid of this stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking over this work! one comment
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! one comment and it's probably more because I'm forgetting...
RUN wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.2/hadoop-aws-3.2.2.jar -P /opt/spark/jars && \ | ||
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.563/aws-java-sdk-bundle-1.11.563.jar -P /opt/spark/jars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? I thought we managed to make it work without further customizations, did we not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required if say, a spark dataframe needs to be passed between the Flyte tasks. If these libraries aren't installed, we'll be receiving UnsupportedFileSystemException: No FileSystem for scheme "s3"
error. These libraries are required to read from or write to s3 in a spark setup.
Signed-off-by: Samhita Alla aallasamhita@gmail.com