Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spark logging and spark tests action workflow #1413

Merged
merged 40 commits into from May 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
784c3c7
Test sentry in test setup
amCap1712 Apr 26, 2021
dd11f22
Build spark containers before test and use run instead of up
amCap1712 Apr 26, 2021
ee3d2b1
Copy configuration file during spark test run
amCap1712 Apr 26, 2021
ba0b805
Remove SparkIntegration
amCap1712 Apr 26, 2021
c768d5d
Change config.py.sample to hadoop master because it is used in tests
amCap1712 Apr 26, 2021
2b892e5
Test removing pyspark dependency
amCap1712 Apr 26, 2021
46e14a6
Configure python logger
amCap1712 Apr 27, 2021
45593bd
Test a hunch
amCap1712 Apr 27, 2021
03628bd
Configure base logger
amCap1712 Apr 27, 2021
5e13034
Add missing logger
amCap1712 Apr 27, 2021
34a272c
Move configuration to earlier phase
amCap1712 Apr 27, 2021
b006235
Another attempt at logging configuration
amCap1712 Apr 27, 2021
910ed62
remove pyspark dep
amCap1712 Apr 27, 2021
49a91ce
Add stop-request-consumer-container.sh script
amCap1712 Apr 27, 2021
c324ebe
Add metabrainz-spark-test image for use in tests
amCap1712 Apr 27, 2021
0349388
Add back deps
amCap1712 Apr 27, 2021
88c25cb
Copy config file correctly
amCap1712 Apr 27, 2021
76fbc3b
Fix file path and rearrange
amCap1712 Apr 28, 2021
dfbd064
Fix copying config file
amCap1712 May 1, 2021
2fb6eb9
Do not configure sentry in test
amCap1712 May 1, 2021
75e5f40
Dedup spark Dockerfile
amCap1712 May 2, 2021
09ae051
Install development dependencies
amCap1712 May 2, 2021
ca5a5bf
Remove pyspark dep
amCap1712 May 2, 2021
f476062
Set PYTHONPATH correctly
amCap1712 May 2, 2021
7b7c160
Add py4j to PYTHONPATH
amCap1712 May 2, 2021
4c6edbf
reformat file
amCap1712 May 2, 2021
a27e681
Fix SPARK_HOME
amCap1712 May 2, 2021
77a198b
Second attempt to fix SPARK_HOME
amCap1712 May 2, 2021
51de543
third attempt to fix SPARK_HOME
amCap1712 May 2, 2021
6d7a50a
Rearrange schema fields
amCap1712 May 2, 2021
65d8115
Rearrange schema fields - 2
amCap1712 May 2, 2021
63edc77
Rearrange schema fields - 3
amCap1712 May 2, 2021
245ec5a
Rearrange schema fields - 4
amCap1712 May 2, 2021
f94e95c
Add labels to Dockerfile.spark
amCap1712 May 5, 2021
f59c161
Add build-arg to push-request-consumer.sh
amCap1712 May 5, 2021
0276144
Delete obsolete scripts
amCap1712 May 5, 2021
036c978
Move remaining spark scripts a level up
amCap1712 May 5, 2021
5807dd4
Add default label to base
amCap1712 May 6, 2021
6d0f8e8
Add build arg after FROM as well
amCap1712 May 6, 2021
ac66b8d
Run spark-request-consumer without docker
amCap1712 May 7, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/frontend-tests.yml
Expand Up @@ -25,7 +25,7 @@ jobs:
- uses: satackey/action-docker-layer-caching@v0.0.11
continue-on-error: true

- name: Build frontend tests
- name: Build frontend containers
run: ./test.sh fe -b

- name: Run frontend tests
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/spark-tests.yml
Expand Up @@ -28,5 +28,8 @@ jobs:
- uses: satackey/action-docker-layer-caching@v0.0.11
continue-on-error: true

- name: Build spark containers
run: ./test.sh spark -b

- name: Run tests
run: ./test.sh spark
187 changes: 29 additions & 158 deletions Dockerfile.spark
@@ -1,148 +1,16 @@
ARG JAVA_VERSION=1.8
FROM airdock/oraclejdk:$JAVA_VERSION as metabrainz-spark-base

ARG GIT_COMMIT_SHA
ARG PYTHON_BASE_IMAGE_VERSION=3.8-20210115
FROM metabrainz/python:$PYTHON_BASE_IMAGE_VERSION

ARG PYTHON_BASE_IMAGE_VERSION
LABEL org.label-schema.vcs-url="https://github.com/metabrainz/listenbrainz-server.git" \
org.label-schema.vcs-ref=$GIT_COMMIT_SHA \
org.label-schema.vcs-ref="" \
org.label-schema.schema-version="1.0.0-rc1" \
org.label-schema.vendor="MetaBrainz Foundation" \
org.label-schema.name="ListenBrainz" \
org.metabrainz.based-on-image="airdock/oraclejdk:$JAVA_VERSION"

# Compile and install specific version of Python
# The jdk image comes with jessie which has python 3.4 which
# is not supported anymore. We install Python 3.6 here because
# 3.7 needs a version of OpenSSL that is not available in jessie
# Based on https://github.com/docker-library/python/blob/master/3.6/jessie/Dockerfile

# Ensure that local Python build is preferred over whatever might come with the base image
ENV PATH /usr/local/bin:$PATH

# http://bugs.python.org/issue19846
# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK.
ENV LANG C.UTF-8

# Runtime dependencies. This includes the core packages for all of the buildDeps listed
# below. We explicitly install them so that when we `remove --auto-remove` the dev packages,
# these packages stay installed.
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates \
netbase \
git \
libbz2-1.0 \
libexpat1 \
libffi6 \
libgdbm3 \
liblzma5 \
libncursesw5 \
libreadline6 \
libsqlite3-0 \
libssl1.0.0 \
libuuid1 \
tcl \
tk \
zlib1g wget \
&& rm -rf /var/lib/apt/lists/*

ENV GPG_KEY 0D96DF4D4110E5C43FBFB17F2D347EA6AA65421D
ENV PYTHON_VERSION 3.6.9

# The list of build dependencies comes from the python-docker slim version:
# https://github.com/docker-library/python/blob/408f7b8130/3.7/stretch/slim/Dockerfile#L29
RUN set -ex \
&& buildDeps=' \
build-essential \
libbz2-dev \
libexpat1-dev \
libffi-dev \
libgdbm-dev \
liblzma-dev \
libncursesw5-dev \
libreadline-dev \
libsqlite3-dev \
libssl-dev \
tk-dev \
tcl-dev \
uuid-dev \
xz-utils \
zlib1g-dev \
' \
&& apt-get update \
&& apt-get install -y $buildDeps --no-install-recommends \
\
&& wget -O python.tar.xz "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz" \
&& wget -O python.tar.xz.asc "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc" \
&& export GNUPGHOME="$(mktemp -d)" \
&& gpg --batch --keyserver ha.pool.sks-keyservers.net --recv-keys "$GPG_KEY" \
&& gpg --batch --verify python.tar.xz.asc python.tar.xz \
&& { command -v gpgconf > /dev/null && gpgconf --kill all || :; } \
&& rm -rf "$GNUPGHOME" python.tar.xz.asc \
&& mkdir -p /usr/src/python \
&& tar -xJC /usr/src/python --strip-components=1 -f python.tar.xz \
&& rm python.tar.xz \
\
&& cd /usr/src/python \
&& gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)" \
&& ./configure \
--build="$gnuArch" \
--enable-loadable-sqlite-extensions \
--enable-shared \
--with-system-expat \
--with-system-ffi \
--without-ensurepip \
&& make -j "$(nproc)" \
&& make install \
&& ldconfig \
\
&& find /usr/local -depth \
\( \
\( -type d -a \( -name test -o -name tests \) \) \
-o \
\( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \
\) -exec rm -rf '{}' + \
&& rm -rf /usr/src/python \
\
&& apt-get purge -y --auto-remove $buildDeps \
&& rm -rf /var/lib/apt/lists/* \
\
&& python3 --version


# make some useful symlinks that are expected to exist
RUN cd /usr/local/bin \
&& ln -s idle3 idle \
&& ln -s pydoc3 pydoc \
&& ln -s python3 python \
&& ln -s python3-config python-config

# Install pip
ENV PYTHON_PIP_VERSION 21.0.1

RUN set -ex; \
\
wget -O get-pip.py 'https://bootstrap.pypa.io/get-pip.py'; \
\
python get-pip.py \
--disable-pip-version-check \
--no-cache-dir \
"pip==$PYTHON_PIP_VERSION" \
; \
pip --version; \
\
find /usr/local -depth \
\( \
\( -type d -a \( -name test -o -name tests \) \) \
-o \
\( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \
\) -exec rm -rf '{}' +; \
rm -f get-pip.py

org.metabrainz.based-on-image="metabrainz/python:$PYTHON_BASE_IMAGE_VERSION"

RUN apt-get update \
&& apt-get install -y --no-install-recommends \
scala \
wget \
net-tools \
dnsutils \
Expand All @@ -152,36 +20,39 @@ RUN apt-get update \
zip \
&& rm -rf /var/lib/apt/lists/*

RUN pip3 install pip==21.0.1

COPY requirements_spark.txt /requirements_spark.txt
RUN pip3 install -r /requirements_spark.txt

ENV DOCKERIZE_VERSION v0.6.1
RUN wget https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz \
&& tar -C /usr/local/bin -xzvf dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz \
&& rm dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz

COPY docker/apache-download.sh /apache-download.sh
ENV SPARK_VERSION 2.4.1
ENV HADOOP_VERSION 2.7
RUN cd /usr/local && \
/apache-download.sh spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION.tgz && \
tar xzf spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION.tgz && \
ln -s spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION spark

RUN mkdir /rec
WORKDIR /rec
COPY requirements_spark.txt /rec/requirements_spark.txt
RUN pip3 install -r requirements_spark.txt
WORKDIR /usr/local

FROM metabrainz-spark-base as metabrainz-spark-master
CMD /usr/local/spark/sbin/start-master.sh
ENV JAVA_VERSION 11.0.11
ENV JAVA_BUILD_VERSION 9
RUN wget https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/download/jdk-${JAVA_VERSION}%2B${JAVA_BUILD_VERSION}/OpenJDK11U-jdk_x64_linux_hotspot_${JAVA_VERSION}_${JAVA_BUILD_VERSION}.tar.gz \
&& tar xzf OpenJDK11U-jdk_x64_linux_hotspot_${JAVA_VERSION}_${JAVA_BUILD_VERSION}.tar.gz
ENV JAVA_HOME /usr/local/jdk-${JAVA_VERSION}+${JAVA_BUILD_VERSION}
ENV PATH $JAVA_HOME/bin:$PATH

FROM metabrainz-spark-base as metabrainz-spark-worker
CMD dockerize -wait tcp://spark-master:7077 -timeout 9999s /usr/local/spark/sbin/start-slave.sh spark://spark-master:7077
COPY docker/apache-download.sh /apache-download.sh
ENV SPARK_VERSION 3.1.1
ENV HADOOP_VERSION 3.2
RUN /apache-download.sh spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION.tgz \
&& tar xzf spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION.tgz
ENV SPARK_HOME /usr/local/spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION
ENV PATH $SPARK_HOME/bin:$PATH
ENV PYTHONPATH $SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$SPARK_HOME/python:$PYTHONPATH

FROM metabrainz-spark-base as metabrainz-spark-jobs
COPY . /rec
COPY requirements_development.txt /requirements_development.txt
RUN pip3 install -r /requirements_development.txt

FROM metabrainz-spark-base as metabrainz-spark-dev
COPY . /rec
ARG GIT_COMMIT_SHA
LABEL org.label-schema.vcs-ref=$GIT_COMMIT_SHA

FROM metabrainz-spark-base as metabrainz-spark-request-consumer
WORKDIR /rec
COPY . /rec
22 changes: 0 additions & 22 deletions Dockerfile.spark.newcluster

This file was deleted.

44 changes: 0 additions & 44 deletions docker/create-cluster.py

This file was deleted.

5 changes: 3 additions & 2 deletions docker/docker-compose.spark.test.yml
Expand Up @@ -21,7 +21,8 @@ services:
build:
context: ..
dockerfile: Dockerfile.spark
target: metabrainz-spark-dev
command: dockerize -wait tcp://hadoop-master:9000 -timeout 60s bash -c "PYTHONDONTWRITEBYTECODE=1 python -m pytest -c pytest.spark.ini --junitxml=/data/test_report.xml --cov-report xml:/data/coverage.xml"
args:
GIT_COMMIT_SHA: HEAD
command: dockerize -wait tcp://hadoop-master:9000 -timeout 60s bash -c "cp listenbrainz_spark/config.py.sample listenbrainz_spark/config.py; PYTHONDONTWRITEBYTECODE=1 python -m pytest -c pytest.spark.ini"
volumes:
- ..:/rec:z
6 changes: 0 additions & 6 deletions docker/push-jobs-image.sh

This file was deleted.

6 changes: 0 additions & 6 deletions docker/push-master.sh

This file was deleted.

6 changes: 0 additions & 6 deletions docker/push-request-consumer.sh

This file was deleted.

5 changes: 0 additions & 5 deletions docker/push-worker.sh

This file was deleted.

12 changes: 0 additions & 12 deletions docker/setup-master-node.sh

This file was deleted.

44 changes: 0 additions & 44 deletions docker/setup-worker-node.sh

This file was deleted.