![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

# Use pretrained `match_pattern` Pipeline

### Spark `2.4` and Spark NLP `2.0.0`

* DocumentAssembler
* SentenceDetector
* Tokenizer
* RegexMatcher (match phone numbers)


In [1]:
import sys
sys.path.append('../../')

#Spark ML and SQL
from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql.functions import array_contains
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
#Spark NLP
from sparknlp.annotator import *
from sparknlp.common import RegexRule
from sparknlp.base import DocumentAssembler, Finisher

### Let's create a Spark Session for our app

In [6]:
spark = SparkSession.builder \
    .appName("Training_SentimentDetector")\
    .master("local[*]")\
    .config("spark.driver.memory","8G")\
    .config("spark.driver.maxResultSize", "2G")\
    .config("spark.jars", "/tmp/sparknlp.jar")\
    .config("spark.driver.extraClassPath", "/tmp/sparknlp.jar")\
    .config("spark.executor.extraClassPath", "/tmp/sparknlp.jar")\
    .config("spark.kryoserializer.buffer.max", "500m")\
    .getOrCreate()

In [7]:
spark.version

'2.4.0'

This Pipeline can extract `phone numbers` in these formats:
```
0689912549
+33698912549
+33 6 79 91 25 49
+33-6-79-91-25-49
(555)-555-5555
555-555-5555
+1-238 6 79 91 25 49
+1-555-532-3455
+15555323455
+7 06 79 91 25 49
```

In [8]:
pipeline = PipelineModel.load("/tmp/match_pattern_en_1.8.0_2.4_1552738240594")

In [11]:
from sparknlp.base import LightPipeline
lp=LightPipeline(pipeline)

In [12]:
result=lp.annotate("You should call Mr. Jon Doe at +33 1 79 01 22 89")

In [13]:
result['regex']

['+33 1 79 01 22 89']

In [14]:
result=lp.annotate("Ring me up dude! +1-334-179-1466")

In [15]:
result['regex']

['+1-334-179-1466']