Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spark logging and spark tests action workflow #1413

Merged
merged 40 commits into from May 7, 2021
Merged

Conversation

amCap1712
Copy link
Member

@amCap1712 amCap1712 commented Apr 26, 2021

We had removed the flask logger from LB Spark in #1400 and BU, related dependencies in #1410. However, we forgot to configure the base python logger correctly at that time.
Also, the Spark tests workflow has been broken ever since we moved to Github Actions. We just never noticed because the exit codes were not being reported correctly.
The PR fixes both of these issues.

@amCap1712 amCap1712 changed the title Test sentry in test setup Fix spark logging and spark tests action workflow Apr 28, 2021
@amCap1712
Copy link
Member Author

amCap1712 commented May 2, 2021

The tests began to pass when I fixed the configuration file. I think in the earlier failing test builds the pyspark dependencies got mixed up resulting in a version mismatch and the weird errors.

As the tests began to pass automatically, I also migrated the test setup to use the Spark 3.*. Spark 3 has a breaking change wrt to schema declaration. In Spark 2, the schema fields are automatically sorted alphabetically. This is not the case in Spark 3. I have manually reordered the fields in alphabetical as needed to maintain this behaviour as many parts of the codebase depend on this.

Dockerfile.spark Outdated
Comment on lines 6 to 11
LABEL org.label-schema.vcs-url="https://github.com/metabrainz/listenbrainz-server.git" \
org.label-schema.vcs-ref=$GIT_COMMIT_SHA \
org.label-schema.schema-version="1.0.0-rc1" \
org.label-schema.vendor="MetaBrainz Foundation" \
org.label-schema.name="ListenBrainz" \
org.metabrainz.based-on-image="airdock/oraclejdk:$JAVA_VERSION"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add the labels in the new version. Also add the changes in #1424

@amCap1712
Copy link
Member Author

There is still one change pending for tests which I will probably do sometime else. The hadoop containers used in test setup are still on Java 8. The relevant files to build them are at https://github.com/metabrainz/hadoop-cluster-docker. Once this task is done, that repository can be archived.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants