Fix spark logging and spark tests action workflow #1413

amCap1712 · 2021-04-26T16:33:08Z

We had removed the flask logger from LB Spark in #1400 and BU, related dependencies in #1410. However, we forgot to configure the base python logger correctly at that time.
Also, the Spark tests workflow has been broken ever since we moved to Github Actions. We just never noticed because the exit codes were not being reported correctly.
The PR fixes both of these issues.

amCap1712 · 2021-05-02T13:43:22Z

The tests began to pass when I fixed the configuration file. I think in the earlier failing test builds the pyspark dependencies got mixed up resulting in a version mismatch and the weird errors.

As the tests began to pass automatically, I also migrated the test setup to use the Spark 3.*. Spark 3 has a breaking change wrt to schema declaration. In Spark 2, the schema fields are automatically sorted alphabetically. This is not the case in Spark 3. I have manually reordered the fields in alphabetical as needed to maintain this behaviour as many parts of the codebase depend on this.

alastair · 2021-05-05T15:51:22Z

Dockerfile.spark

-LABEL org.label-schema.vcs-url="https://github.com/metabrainz/listenbrainz-server.git" \
-      org.label-schema.vcs-ref=$GIT_COMMIT_SHA \
-      org.label-schema.schema-version="1.0.0-rc1" \
-      org.label-schema.vendor="MetaBrainz Foundation" \
-      org.label-schema.name="ListenBrainz" \
-      org.metabrainz.based-on-image="airdock/oraclejdk:$JAVA_VERSION"


We should add the labels in the new version. Also add the changes in #1424

amCap1712 · 2021-05-05T16:11:08Z

There is still one change pending for tests which I will probably do sometime else. The hadoop containers used in test setup are still on Java 8. The relevant files to build them are at https://github.com/metabrainz/hadoop-cluster-docker. Once this task is done, that repository can be archived.

Test sentry in test setup

784c3c7

amCap1712 force-pushed the test-sentry branch from 09d802b to 784c3c7 Compare April 26, 2021 16:38

amCap1712 added 4 commits April 26, 2021 22:22

Build spark containers before test and use run instead of up

dd11f22

Copy configuration file during spark test run

ee3d2b1

Remove SparkIntegration

ba0b805

Change config.py.sample to hadoop master because it is used in tests

c768d5d

amCap1712 force-pushed the test-sentry branch from c95dfc7 to c768d5d Compare April 26, 2021 18:19

amCap1712 added 13 commits April 27, 2021 00:08

Test removing pyspark dependency

2b892e5

Configure python logger

46e14a6

Test a hunch

45593bd

Configure base logger

03628bd

Add missing logger

5e13034

Move configuration to earlier phase

34a272c

Another attempt at logging configuration

b006235

remove pyspark dep

910ed62

Add stop-request-consumer-container.sh script

49a91ce

Add metabrainz-spark-test image for use in tests

c324ebe

Add back deps

0349388

Copy config file correctly

88c25cb

Fix file path and rearrange

76fbc3b

amCap1712 changed the title ~~Test sentry in test setup~~ Fix spark logging and spark tests action workflow Apr 28, 2021

amCap1712 added 6 commits May 1, 2021 23:19

Fix copying config file

dfbd064

Do not configure sentry in test

2fb6eb9

Dedup spark Dockerfile

75e5f40

Install development dependencies

09ae051

Remove pyspark dep

ca5a5bf

Set PYTHONPATH correctly

f476062

amCap1712 force-pushed the test-sentry branch from 37d2336 to f476062 Compare May 2, 2021 07:06

amCap1712 added 2 commits May 2, 2021 12:52

Add py4j to PYTHONPATH

7b7c160

reformat file

4c6edbf

amCap1712 added 6 commits May 2, 2021 14:18

Fix SPARK_HOME

a27e681

Second attempt to fix SPARK_HOME

77a198b

third attempt to fix SPARK_HOME

51de543

Rearrange schema fields

6d7a50a

Rearrange schema fields - 2

65d8115

Rearrange schema fields - 3

63edc77

amCap1712 force-pushed the test-sentry branch from edf004d to 63edc77 Compare May 2, 2021 11:36

Rearrange schema fields - 4

245ec5a

alastair reviewed May 5, 2021

View reviewed changes

amCap1712 added 4 commits May 5, 2021 21:30

Add labels to Dockerfile.spark

f94e95c

Add build-arg to push-request-consumer.sh

f59c161

Delete obsolete scripts

0276144

Move remaining spark scripts a level up

036c978

amCap1712 added 3 commits May 6, 2021 18:59

Add default label to base

5807dd4

Add build arg after FROM as well

6d0f8e8

Run spark-request-consumer without docker

ac66b8d

amCap1712 merged commit 61199a8 into master May 7, 2021

amCap1712 deleted the test-sentry branch May 7, 2021 15:54

StrezlessMusick mentioned this pull request Oct 14, 2023

[Snyk] Upgrade eslint-plugin-jest from 23.8.2 to 27.4.0 StrezlessMusick/listenbrainz-server#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix spark logging and spark tests action workflow #1413

Fix spark logging and spark tests action workflow #1413

amCap1712 commented Apr 26, 2021 •

edited

amCap1712 commented May 2, 2021 •

edited

alastair May 5, 2021

amCap1712 commented May 5, 2021

Fix spark logging and spark tests action workflow #1413

Fix spark logging and spark tests action workflow #1413

Conversation

amCap1712 commented Apr 26, 2021 • edited

amCap1712 commented May 2, 2021 • edited

alastair May 5, 2021

Choose a reason for hiding this comment

amCap1712 commented May 5, 2021

amCap1712 commented Apr 26, 2021 •

edited

amCap1712 commented May 2, 2021 •

edited