Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for other Spark NLP models #24

Open
2 of 6 tasks
tschaffter opened this issue Aug 2, 2021 · 10 comments
Open
2 of 6 tasks

Add support for other Spark NLP models #24

tschaffter opened this issue Aug 2, 2021 · 10 comments
Assignees
Labels
Enhancement New feature or request

Comments

@tschaffter
Copy link
Member

tschaffter commented Aug 2, 2021

The zip archive of models can be found using the Download button present on the model's page.

All the PHI models are listed here and they require a license:
https://nlp.johnsnowlabs.com/models?task=Named+Entity+Recognition&language=en&q=phi

Licensed models

Outputs NER

Outputs Obfuscated/Document

Other

  • ner_deid_augmented_en_3.0.0_2.3_1617208449273

Model sizes

root@133bc94168b6:/opt/app/models# du -h --max-depth=1
16M     ./ner_deid_biobert_en_3.0.0_3.0_1617260631832
15M     ./ner_deidentify_dl_en_2.7.2_2.4_1612178436389
15M     ./ner_deid_sd_en_3.0.0_3.0_1617260827858
15M     ./ner_deid_synthetic_en_2.7.4_2.4_1613746244835
15M     ./ner_deid_large_en_2.5.3_2.4_1595427435246
15M     ./ner_deid_enriched_en_2.5.3_2.4_1594170530497
1.8G    ./embeddings_clinical_en_2.4.0_2.4_1580237286004
1.9G    .
@tschaffter tschaffter added the Enhancement New feature or request label Aug 2, 2021
@tschaffter tschaffter self-assigned this Aug 2, 2021
@tschaffter tschaffter moved this from Incoming to In progress in NLP Sandbox - Sprint 21.4 - Multi-Site Evaluation Aug 3, 2021
@tschaffter
Copy link
Member Author

About model version 3.0.0_*

I tried three models that have a version 3.0.0_* and loading them always fail with the error java.lang.ClassNotFoundException: com.johnsnowlabs.nlp.annotators.ner.MedicalNerModel. On the other hand, I loaded three models version 2.y.z_* and all work. All models were attempted to be loaded with NerDLModel.load().

According to this source about Spark NLP Healthcare version 3.0.0:

All the licensed clinical and biomedical pre-trained NER models will now run with MedicalNerModel  instead of its parent  NerDLModel  from Spark NLP.

@tschaffter
Copy link
Member Author

tschaffter commented Aug 3, 2021

Issues

I think that I can import MedicalNerModel with from sparknlp_jsl.annotator import MedicalNerModel. Then I have the following error: 'JavaPackage' object is not callable.

What is the difference between:

  • from sparknlp.annotator
  • from sparknlp_jsl.annotator

Spark NLP for Healthcare is a commercial extension of Spark NLP for clinical and biomedical text mining. If you don’t have a Spark NLP for Healthcare subscription yet, you can ask for a free trial by clicking on the button below.

I definitively have the JSL python library. Maybe my FAT Jar is not for JSL? Is it because I don't provide the secret when I start my spark session as seen in other tutorial?

@tschaffter
Copy link
Member Author

Java gateway process exited before sending its port number

I installed the JSL version of the FAT jar. Now the error is:

phi-annotator  | Exception in thread "main" java.net.UnknownHostException: pypi.johnsnowlabs.com
phi-annotator  |        at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)
phi-annotator  |        at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
phi-annotator  |        at java.base/java.net.Socket.connect(Socket.java:609)
phi-annotator  |        at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:299)
phi-annotator  |        at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
phi-annotator  |        at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
phi-annotator  |        at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
phi-annotator  |        at java.base/sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:266)
phi-annotator  |        at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:373)
phi-annotator  |        at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:203)
phi-annotator  |        at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
phi-annotator  |        at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
phi-annotator  |        at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:189)
phi-annotator  |        at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:168)
phi-annotator  |        at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:764)
phi-annotator  |        at org.apache.spark.deploy.DependencyUtils$.downloadFile(DependencyUtils.scala:138)
phi-annotator  |        at org.apache.spark.deploy.DependencyUtils$.$anonfun$downloadFileList$2(DependencyUtils.scala:104)
phi-annotator  |        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
phi-annotator  |        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
phi-annotator  |        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
phi-annotator  |        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
phi-annotator  |        at scala.collection.TraversableLike.map(TraversableLike.scala:238)
phi-annotator  |        at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
phi-annotator  |        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
phi-annotator  |        at org.apache.spark.deploy.DependencyUtils$.downloadFileList(DependencyUtils.scala:104)
phi-annotator  |        at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$9(SparkSubmit.scala:379)
phi-annotator  |        at scala.Option.map(Option.scala:230)
phi-annotator  |        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:379)
phi-annotator  |        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
phi-annotator  |        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
phi-annotator  |        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
phi-annotator  |        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
phi-annotator  |        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
phi-annotator  |        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
phi-annotator  |        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
phi-annotator  | Java gateway process exited before sending its port number

According to several comments in this thread, Java 10 and 11 have this issue but not Java 8. Java 8 is also the version "needed" according to Spark NLP. So far I was fine using Java 11.

@tschaffter
Copy link
Member Author

I now have Java 8 installed.

$ docker exec -it phi-annotator bash
java -versionjava -versionroot@77612cb11cc7:/# java -version
Picked up JAVA_TOOL_OPTIONS: -Xmx4G -Dorg.bytedeco.javacpp.maxBytes=0
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.292-b10, mixed mode)

The above error disappeared and I'm now back to a 'JavaPackage' object is not callable error.

@tschaffter
Copy link
Member Author

I notice that this page also specify the option spark.jars.packages. The value used two version numbers:

  • 2.12, which I think correspond to a spark package version, all releases are listed here.
  • 3.1.3 the JSL version, which is 3.1.1 for me.

@tschaffter
Copy link
Member Author

This was the solution. Moving on, here is the new error:

phi-annotator  | An error occurred while calling None.com.johnsnowlabs.nlp.annotators.ner.MedicalNerModel.
phi-annotator  | : java.lang.ExceptionInInitializerError
phi-annotator  |        at com.johnsnowlabs.license.Licensed.$init$(Licensed.scala:5)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.MedicalNerModel.<init>(MedicalNerModel.scala:18)
phi-annotator  |        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
phi-annotator  |        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
phi-annotator  |        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
phi-annotator  |        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
phi-annotator  |        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
phi-annotator  |        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
phi-annotator  |        at py4j.Gateway.invoke(Gateway.java:238)
phi-annotator  |        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
phi-annotator  |        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
phi-annotator  |        at py4j.GatewayConnection.run(GatewayConnection.java:238)
phi-annotator  |        at java.lang.Thread.run(Thread.java:748)
phi-annotator  | Caused by: java.lang.IllegalArgumentException: requirement failed: License Key not set please set environment variable JSL_NLP_LICENSE,SPARK_NLP_LICENSE or property jsl.settings.license!
phi-annotator  |        at scala.Predef$.require(Predef.scala:281)
phi-annotator  |        at com.johnsnowlabs.license.LicenseValidator$.<init>(LicenseValidator.scala:29)
phi-annotator  |        at com.johnsnowlabs.license.LicenseValidator$.<clinit>(LicenseValidator.scala)
phi-annotator  |        ... 13 more

@tschaffter
Copy link
Member Author

Following the hint provided by the above error, I renamed the env var SPARK_LICENSE_SECRET to SPARK_NLP_LICENSE. The new error is:

phi-annotator  | 21/08/04 16:47:34 ERROR LicenseValidator$: Wrong symbol in license key: ***********************************.
phi-annotator  | java.lang.IllegalArgumentException: Input byte[] should at least have 2 bytes for base64 bytes
phi-annotator  |        at java.util.Base64$Decoder.outLength(Base64.java:659)
phi-annotator  |        at java.util.Base64$Decoder.decode(Base64.java:525)
phi-annotator  |        at pdi.jwt.JwtBase64$.decode(JwtBase64.scala:9)
phi-annotator  |        at pdi.jwt.JwtBase64$.decodeString(JwtBase64.scala:17)
phi-annotator  |        at pdi.jwt.JwtBase64$.decodeString(JwtBase64.scala:20)
phi-annotator  |        at pdi.jwt.JwtCore.splitToken(Jwt.scala:193)
phi-annotator  |        at pdi.jwt.JwtCore.$anonfun$decodeAll$5(Jwt.scala:451)
phi-annotator  |        at scala.util.Try$.apply(Try.scala:213)
phi-annotator  |        at pdi.jwt.JwtCore.decodeAll(Jwt.scala:450)
phi-annotator  |        at pdi.jwt.JwtCore.decodeAll$(Jwt.scala:450)
phi-annotator  |        at pdi.jwt.Jwt$.decodeAll(JwtPureScala.scala:19)
phi-annotator  |        at pdi.jwt.JwtCore.decode(Jwt.scala:538)
phi-annotator  |        at pdi.jwt.JwtCore.decode$(Jwt.scala:537)
phi-annotator  |        at pdi.jwt.Jwt$.decode(JwtPureScala.scala:19)
phi-annotator  |        at pdi.jwt.JwtCore.decode(Jwt.scala:541)
phi-annotator  |        at pdi.jwt.JwtCore.decode$(Jwt.scala:540)
phi-annotator  |        at pdi.jwt.Jwt$.decode(JwtPureScala.scala:19)
phi-annotator  |        at com.johnsnowlabs.license.LicenseValidator$.$anonfun$isValidLicence$2(LicenseValidator.scala:33)
phi-annotator  |        at scala.util.Try$.apply(Try.scala:213)
phi-annotator  |        at com.johnsnowlabs.license.LicenseValidator$.<init>(LicenseValidator.scala:35)
phi-annotator  |        at com.johnsnowlabs.license.LicenseValidator$.<clinit>(LicenseValidator.scala)
phi-annotator  |        at com.johnsnowlabs.license.Licensed.$init$(Licensed.scala:5)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.MedicalNerModel.<init>(MedicalNerModel.scala:18)
phi-annotator  |        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
phi-annotator  |        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
phi-annotator  |        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
phi-annotator  |        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
phi-annotator  |        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
phi-annotator  |        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
phi-annotator  |        at py4j.Gateway.invoke(Gateway.java:238)
phi-annotator  |        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
phi-annotator  |        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
phi-annotator  |        at py4j.GatewayConnection.run(GatewayConnection.java:238)
phi-annotator  |        at java.lang.Thread.run(Thread.java:748)
phi-annotator  | An error occurred while calling None.com.johnsnowlabs.nlp.annotators.ner.MedicalNerModel.
phi-annotator  | : java.lang.Exception: Licence not configured or invalid licence key! Please contact support!!
phi-annotator  |        at com.johnsnowlabs.license.Licensed.$init$(Licensed.scala:6)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.MedicalNerModel.<init>(MedicalNerModel.scala:18)
phi-annotator  |        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
phi-annotator  |        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
phi-annotator  |        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
phi-annotator  |        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
phi-annotator  |        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
phi-annotator  |        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
phi-annotator  |        at py4j.Gateway.invoke(Gateway.java:238)
phi-annotator  |        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
phi-annotator  |        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
phi-annotator  |        at py4j.GatewayConnection.run(GatewayConnection.java:238)
phi-annotator  |        at java.lang.Thread.run(Thread.java:748)

@tschaffter
Copy link
Member Author

I tried removing the version token from SPARK_NLP_LICENSE but this token is actually meant to remain as part of the license as indicated by the new error below:

phi-annotator  | 21/08/04 16:59:47 ERROR LicenseValidator$: Wrong length of license key: ***********.
phi-annotator  | pdi.jwt.exceptions.JwtLengthException: Expected token [***********.] to be composed of 2 or 3 parts separated by dots.

@tschaffter
Copy link
Member Author

Meanwhile, I blocked again the access to the internet for the container and got this error:

hi-annotator  | :::: WARNINGS
phi-annotator  |        Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.12/3.1.1/spark-nlp_2.12-3.1.1.pom
phi-annotator  | 
phi-annotator  |        Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.12/3.1.1/spark-nlp_2.12-3.1.1.jar
phi-annotator  | 
phi-annotator  |        Host dl.bintray.com not found. url=https://dl.bintray.com/spark-packages/maven/com/johnsnowlabs/nlp/spark-nlp_2.12/3.1.1/spark-nlp_2.12-3.1.1.pom
phi-annotator  | 
phi-annotator  |        Host dl.bintray.com not found. url=https://dl.bintray.com/spark-packages/maven/com/johnsnowlabs/nlp/spark-nlp_2.12/3.1.1/spark-nlp_2.12-3.1.1.jar
phi-annotator  | 
phi-annotator  |                module not found: com.johnsnowlabs.nlp#spark-nlp_2.12;3.1.1
phi-annotator  | 
phi-annotator  |        ==== local-m2-cache: tried
phi-annotator  | 
phi-annotator  |          file:/home/nlp/.m2/repository/com/johnsnowlabs/nlp/spark-nlp_2.12/3.1.1/spark-nlp_2.12-3.1.1.pom
phi-annotator  | 
phi-annotator  |          -- artifact com.johnsnowlabs.nlp#spark-nlp_2.12;3.1.1!spark-nlp_2.12.jar:
phi-annotator  | 
phi-annotator  |          file:/home/nlp/.m2/repository/com/johnsnowlabs/nlp/spark-nlp_2.12/3.1.1/spark-nlp_2.12-3.1.1.jar
phi-annotator  | 
phi-annotator  |        ==== local-ivy-cache: tried
phi-annotator  | 
phi-annotator  |          /home/nlp/.ivy2/local/com.johnsnowlabs.nlp/spark-nlp_2.12/3.1.1/ivys/ivy.xml
phi-annotator  | 
phi-annotator  |          -- artifact com.johnsnowlabs.nlp#spark-nlp_2.12;3.1.1!spark-nlp_2.12.jar:
phi-annotator  | 
phi-annotator  |          /home/nlp/.ivy2/local/com.johnsnowlabs.nlp/spark-nlp_2.12/3.1.1/jars/spark-nlp_2.12.jar
phi-annotator  | 
phi-annotator  |        ==== central: tried
phi-annotator  | 
phi-annotator  |          https://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.12/3.1.1/spark-nlp_2.12-3.1.1.pom
phi-annotator  | 
phi-annotator  |          -- artifact com.johnsnowlabs.nlp#spark-nlp_2.12;3.1.1!spark-nlp_2.12.jar:
phi-annotator  | 
phi-annotator  |          https://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.12/3.1.1/spark-nlp_2.12-3.1.1.jar
phi-annotator  | 
phi-annotator  |        ==== spark-packages: tried
phi-annotator  | 
phi-annotator  |          https://dl.bintray.com/spark-packages/maven/com/johnsnowlabs/nlp/spark-nlp_2.12/3.1.1/spark-nlp_2.12-3.1.1.pom
phi-annotator  | 
phi-annotator  |          -- artifact com.johnsnowlabs.nlp#spark-nlp_2.12;3.1.1!spark-nlp_2.12.jar:
phi-annotator  | 
phi-annotator  |          https://dl.bintray.com/spark-packages/maven/com/johnsnowlabs/nlp/spark-nlp_2.12/3.1.1/spark-nlp_2.12-3.1.1.jar
phi-annotator  | 
phi-annotator  |                ::::::::::::::::::::::::::::::::::::::::::::::
phi-annotator  | 
phi-annotator  |                ::          UNRESOLVED DEPENDENCIES         ::
phi-annotator  | 
phi-annotator  |                ::::::::::::::::::::::::::::::::::::::::::::::
phi-annotator  | 
phi-annotator  |                :: com.johnsnowlabs.nlp#spark-nlp_2.12;3.1.1: not found
phi-annotator  | 
phi-annotator  |                ::::::::::::::::::::::::::::::::::::::::::::::

@tschaffter
Copy link
Member Author

The above error disappeared when adding the fat JAR publicly available. I'm now back to the "wrong license format" error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request
Development

No branches or pull requests

1 participant