Skip to content

Commit

Permalink
histogrammar update for spark 3.0 (#87)
Browse files Browse the repository at this point in the history
  • Loading branch information
mbaak committed Jan 30, 2021
1 parent 6e8d8fb commit 4a59d17
Show file tree
Hide file tree
Showing 8 changed files with 24 additions and 7 deletions.
17 changes: 17 additions & 0 deletions README.rst
Expand Up @@ -19,11 +19,28 @@ using monitoring business rules.

|example|

Announcements
=============

Spark 3.0
---------

With Spark 3.0, based on Scala 2.12, make sure to pick up the correct `histogrammar` jar file:

.. code-block:: python
spark = SparkSession.builder.config("spark.jars.packages", "io.github.histogrammar:histogrammar-sparksql_2.12:1.0.11").getOrCreate()
For Spark 2.X compiled against scala 2.11, in the string above simply replace 2.12 with 2.11.

`January 29, 2021`

Documentation
=============

The entire `popmon` documentation including tutorials can be found at `read-the-docs <https://popmon.readthedocs.io>`_.


Examples
========

Expand Down
8 changes: 4 additions & 4 deletions docs/source/configuration.rst
Expand Up @@ -198,7 +198,7 @@ Spark usage
from pyspark.sql import SparkSession
# downloads histogrammar jar files if not already installed, used for histogramming of spark dataframe
spark = SparkSession.builder.config('spark.jars.packages','org.diana-hep:histogrammar-sparksql_2.11:1.0.4').getOrCreate()
spark = SparkSession.builder.config('spark.jars.packages','io.github.histogrammar:histogrammar-sparksql_2.12:1.0.11').getOrCreate()
# load a dataframe
spark_df = spark.read.format('csv').options(header='true').load('file.csv')
Expand All @@ -216,8 +216,8 @@ This snippet contains the instructions for setting up a minimal environment for
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://www-us.apache.org/dist/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgz
!tar xf spark-2.4.7-bin-hadoop2.7.tgz
!wget -P /content/spark-2.4.7-bin-hadoop2.7/jars/ -q https://repo1.maven.org/maven2/org/diana-hep/histogrammar-sparksql_2.11/1.0.4/histogrammar-sparksql_2.11-1.0.4.jar
!wget -P /content/spark-2.4.7-bin-hadoop2.7/jars/ -q https://repo1.maven.org/maven2/org/diana-hep/histogrammar_2.11/1.0.4/histogrammar_2.11-1.0.4.jar
!wget -P /content/spark-2.4.7-bin-hadoop2.7/jars/ -q https://repo1.maven.org/maven2/io/github/histogrammar/histogrammar-sparksql_2.12/1.0.11/histogrammar-sparksql_2.12-1.0.11.jar
!wget -P /content/spark-2.4.7-bin-hadoop2.7/jars/ -q https://repo1.maven.org/maven2/io/github/histogrammar/histogrammar_2.12/1.0.11/histogrammar_2.12-1.0.11.jar
!pip install -q findspark popmon
Now that spark is installed, restart the runtime.
Expand All @@ -234,7 +234,7 @@ Now that spark is installed, restart the runtime.
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]") \
.config("spark.jars", "/content/jars/histogrammar_2.11-1.0.4.jar,/content/jars/histogrammar-sparksql_2.11-1.0.4.jar") \
.config("spark.jars", "/content/jars/histogrammar_2.12-1.0.11.jar,/content/jars/histogrammar-sparksql_2.12-1.0.11.jar") \
.config("spark.sql.execution.arrow.enabled", "false") \
.config("spark.sql.session.timeZone", "GMT") \
.getOrCreate()
2 changes: 1 addition & 1 deletion popmon/notebooks/popmon_tutorial_advanced.ipynb
Expand Up @@ -162,7 +162,7 @@
"source": [
"if pyspark_installed:\n",
" spark = SparkSession.builder.config(\n",
" \"spark.jars.packages\", \"org.diana-hep:histogrammar-sparksql_2.11:1.0.4\"\n",
" \"spark.jars.packages\", \"io.github.histogrammar:histogrammar-sparksql_2.12:1.0.11\"\n",
" ).getOrCreate()\n",
"\n",
" sdf = spark.createDataFrame(df)\n",
Expand Down
Binary file not shown.
Binary file not shown.
4 changes: 2 additions & 2 deletions tests/popmon/hist/test_spark_histogrammar.py
Expand Up @@ -21,8 +21,8 @@ def get_spark():

current_path = dirname(abspath(__file__))

hist_spark_jar = join(current_path, "jars/histogrammar-sparksql_2.11-1.0.4.jar")
hist_jar = join(current_path, "jars/histogrammar_2.11-1.0.4.jar")
hist_spark_jar = join(current_path, "jars/histogrammar-sparksql_2.11-1.0.11.jar")
hist_jar = join(current_path, "jars/histogrammar_2.11-1.0.11.jar")

spark = (
SparkSession.builder.master("local")
Expand Down

0 comments on commit 4a59d17

Please sign in to comment.