![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/scala213/converting_models_from_212.ipynb)

# Converting Spark NLP Scala 2.12 models to Scala 2.13

Most models should work out of the box, when loading models across Scala versions. However, if you have models of the following annotators, you will need to do some manual steps.

1. `DependencyParserModel`
2. `TextMatcherModel`

This notebook will guide you step-by-step on how to convert models saved in Scala 2.12 to Scala 2.13.

## Install and Start Spark NLP Scala 2.12

- This feature was introduced in 6.3.2 so make sure you have at least this version.
- Let's install and setup Spark NLP (if running it Google Colab)
- This part is pretty easy via our simple script

In [None]:
! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

In [None]:
import sparknlp

spark = sparknlp.start()
print(sparknlp.version())

6.3.2


We will construct an example pipeline with `DependencyParserModel` and a `TextMatcherModel`. When we save this pipeline with Spark NLP >= 6.3.2, it will be saved in a format that will be compatible with the Scala 2.13 version.

In [None]:
# TextMatcherModel entities
! echo "Dependencies" > entities.txt

In [None]:
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline

documentAssembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
sentenceDetector = (
    SentenceDetector().setInputCols(["document"]).setOutputCol("sentence")
)
tokenizer = Tokenizer().setInputCols(["sentence"]).setOutputCol("token")
posTagger = (
    PerceptronModel.pretrained().setInputCols(["token", "sentence"]).setOutputCol("pos")
)
dependencyParser = (
    DependencyParserModel.pretrained()
    .setInputCols(["sentence", "pos", "token"])
    .setOutputCol("dependency")
)
typedDependencyParser = (
    TypedDependencyParserModel.pretrained()
    .setInputCols(["token", "pos", "dependency"])
    .setOutputCol("labdep")
)

textMatcher = (
    TextMatcher()
    .setInputCols(["sentence", "token"])
    .setEntities("entities.txt", ReadAs.TEXT)
    .setOutputCol("entity")
    .setCaseSensitive(False)
)

pipeline = Pipeline(
    stages=[
        documentAssembler,
        sentenceDetector,
        tokenizer,
        posTagger,
        dependencyParser,
        typedDependencyParser,
        textMatcher,
    ]
)
df = spark.createDataFrame(
    [["Dependencies represents relationships betweens words in a Sentence"]], ["text"]
)

result = pipeline.fit(df).transform(df)
result.select("dependency.result", "entity.result").show(truncate=False)

pos_anc download started this may take some time.
Approximate size to download 3.9 MB
[OK!]
dependency_conllu download started this may take some time.
Approximate size to download 16.7 MB
[OK!]
dependency_typed_conllu download started this may take some time.
Approximate size to download 2.4 MB
[OK!]
+---------------------------------------------------------------------------------+--------------+
|result                                                                           |result        |
+---------------------------------------------------------------------------------+--------------+
|[ROOT, Dependencies, represents, words, relationships, Sentence, Sentence, words]|[Dependencies]|
+---------------------------------------------------------------------------------+--------------+



In [None]:
# Save the pipeline
pipeline.write().overwrite().save("pipe_212")

## Loading the pipeline in Scala 2.13
The pipeline is now saved in a compatible format. Now we can set up a PySpark instance with Scala 2.13. 

Note that the session needs to be restarted and that we will need to reinstall PySpark. 

In [None]:
# Download official spark archive and extract
! wget https://archive.apache.org/dist/spark/spark-3.5.8/spark-3.5.8-bin-hadoop3-scala2.13.tgz
! tar xzf spark-3.5.8-bin-hadoop3-scala2.13.tgz

In [None]:
# Install Scala 2.13 PySpark
! pip uninstall -y pyspark
! pip install -e /content/spark-3.5.8-bin-hadoop3-scala2.13/python

Found existing installation: pyspark 3.4.4
Uninstalling pyspark-3.4.4:
  Successfully uninstalled pyspark-3.4.4
Obtaining file:///content/spark-3.5.8-bin-hadoop3-scala2.13/python
  Preparing metadata (setup.py) ... [?25l[?25hdone
Installing collected packages: pyspark
  Running setup.py develop for pyspark
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dataproc-spark-connect 1.0.1 requires pyspark[connect]~=4.0.0, but you have pyspark 3.5.8 which is incompatible.[0m[31m
[0mSuccessfully installed pyspark-3.5.8


We need to start a custom Spark session to point to the right `spark-nlp` dependency. We need to replace the suffix `_2.12` to `_2.13`. So the coordinates will be `com.johnsnowlabs.nlp:spark-nlp_2.13:6.3.2`.

In [None]:
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession

spark = (
    SparkSession.builder.appName("Spark NLP")
    .master("local[*]")
    .config("spark.driver.memory", "16G")
    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    .config("spark.kryoserializer.buffer.max", "2000M")
    .config("spark.driver.maxResultSize", "0")
    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.13:6.3.2")
    .getOrCreate()
)  # note the package version ends with 2.13

spark

In [None]:
# Check the Scala version
spark.sparkContext._jvm.scala.util.Properties.versionString()

'version 2.13.8'

In [None]:
pipeline = Pipeline.load("pipe_212")

In [None]:
df = spark.createDataFrame(
    [["Dependencies represents relationships betweens words in a Sentence"]], ["text"]
)

result = pipeline.fit(df).transform(df)
result.select("dependency.result", "entity.result").show(truncate=False)

+---------------------------------------------------------------------------------+--------------+
|result                                                                           |result        |
+---------------------------------------------------------------------------------+--------------+
|[ROOT, Dependencies, represents, words, relationships, Sentence, Sentence, words]|[Dependencies]|
+---------------------------------------------------------------------------------+--------------+



Now you have successfully exported a Scala 2.12 pipeline to Scala 2.13! You can upload it to the [Models Hub](https://nlp.johnsnowlabs.com/models) or use it directly on cloud platforms such as Databricks or Dataproc.