New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix spark logging and spark tests action workflow #1413
Conversation
The tests began to pass when I fixed the configuration file. I think in the earlier failing test builds the pyspark dependencies got mixed up resulting in a version mismatch and the weird errors. As the tests began to pass automatically, I also migrated the test setup to use the Spark 3.*. Spark 3 has a breaking change wrt to schema declaration. In Spark 2, the schema fields are automatically sorted alphabetically. This is not the case in Spark 3. I have manually reordered the fields in alphabetical as needed to maintain this behaviour as many parts of the codebase depend on this. |
Dockerfile.spark
Outdated
LABEL org.label-schema.vcs-url="https://github.com/metabrainz/listenbrainz-server.git" \ | ||
org.label-schema.vcs-ref=$GIT_COMMIT_SHA \ | ||
org.label-schema.schema-version="1.0.0-rc1" \ | ||
org.label-schema.vendor="MetaBrainz Foundation" \ | ||
org.label-schema.name="ListenBrainz" \ | ||
org.metabrainz.based-on-image="airdock/oraclejdk:$JAVA_VERSION" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add the labels in the new version. Also add the changes in #1424
There is still one change pending for tests which I will probably do sometime else. The hadoop containers used in test setup are still on Java 8. The relevant files to build them are at https://github.com/metabrainz/hadoop-cluster-docker. Once this task is done, that repository can be archived. |
We had removed the flask logger from LB Spark in #1400 and BU, related dependencies in #1410. However, we forgot to configure the base python logger correctly at that time.
Also, the Spark tests workflow has been broken ever since we moved to Github Actions. We just never noticed because the exit codes were not being reported correctly.
The PR fixes both of these issues.