# Splunk App for Data Science and Deep Learning - Example for NLP with spaCy

This notebook contains a barebone example workflow how to work on custom containerized code that seamlessly interfaces with you Splunk platform by utilizing the Splunk App for Data Science and Deep Learning (DSDL) - formerly known as the Deep Learning Toolkit for Splunk (DLTK). Find more examples and information in the app and on the [DSDL splunkbase page](https://splunkbase.splunk.com/app/4607/#/details).

## Stage 0 - import libraries
At stage 0 we define all imports necessary to run our subsequent code depending on various libraries.

In [1]:
# this definition exposes all python module imports that should be available in all subsequent commands
import json
import datetime
import numpy as np
import pandas as pd
import spacy
import en_core_web_sm
from spacytextblob.spacytextblob import SpacyTextBlob

# global constants
MODEL_DIRECTORY = "/srv/app/model/data/"

In [2]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing purposes
print("numpy version: " + np.__version__)
print("pandas version: " + pd.__version__)
print("spacy version: " + spacy.__version__)

numpy version: 1.26.4
pandas version: 2.2.2
spacy version: 3.7.5


In [None]:
import sys
!{sys.executable} -m spacy download en_core_web_sm

## Stage 1 - get a data sample from Splunk
In Splunk run a search to pipe a prepared dataset into this environment.

| makeresults
| eval text = "Baroness is an American heavy metal band from Savannah, Georgia whose original members grew up together in Lexington, Virginia.Baroness formed in mid-2003, founded by former members of the punk/metal band Johnny Welfare and the Paychecks. Singer John Dyer Baizley creates the artwork for all Baroness albums, and has done artwork for other bands.;From 2004 to 2007, Baroness recorded and released three EPs, named First, Second and A Grey Sigh in a Flower Husk (aka Third), with the third one being a split album with Unpersons.;Baroness started recording their first full-length album in March 2007. Phillip Cope from Kylesa continued to produce Baroness on this album. The Red Album was released on September 4, 2007, and met positive reception. Heavy metal magazine Revolver named it Album of the Year. On December 1, 2007, Baroness performed at New York City's Bowery Ballroom. On September 20, 2008, the band announced via MySpace Brian Blickle would be parting ways with the band, while also introducing a new guitarist named Peter Adams, also of Virginia-based band Valkyrie.;Throughout 2007–9, Baroness toured and shared the stage with many bands including Converge, The Red Chord, High on Fire, Opeth, Coheed and Cambria, Coliseum, Mastodon, Minsk and Clutch.;On May 18, 2009 Baroness entered The Track Studio in Plano, Texas, to record their second full-length album, Blue Record, produced by John Congleton (The Roots, Explosions in the Sky, Black Mountain, The Polyphonic Spree). It was released via Relapse Records on October 13, 2009.;In February and March 2010 Baroness played in the Australian Soundwave Festival, alongside bands such as Clutch, Isis, Meshuggah, Janes Addiction and Faith No More, and toured Japan in March 2010 with Isis.;Baroness have toured with many other prominent bands, such as supporting Mastodon on their US headlining tour in April–May 2010, Deftones for August–September 2010. In addition, Baroness was selected as one of two support acts (the other being Lamb of God) for Metallica on their tour of Australia and New Zealand in late 2010.;Baroness also performed at Coachella and Bonnaroo in 2010.;Blue Record would later be named the 20th Greatest Metal Album in History by 'LA Weekly' in 2013.;On May 23, 2011, the band launched their official website. The first content released on the new page gave hints to work on a new album being produced again by John Congleton. On May 14, 2012, the single 'Take My Bones Away' from the new album was released over YouTube, along with an album teaser.;On August 15, 2012, nine passengers were injured (two seriously) when the German-registered coach in which the band were traveling fell from a viaduct near Bath, England. Emergency services were called to Brassknocker Hill in Monkton Combe after the coach fell 30 ft (9m) from the viaduct. Avon Fire and Rescue Service said the incident happened at 11:30BST; due to heavy rain and reduced visibility it was not possible for the air ambulance to land. Emergency services said two people were transported to Frenchay Hospital in Bristol while seven others went to the Royal United Hospital (RUH) in Bath. As a result of the crash, frontman John Baizley suffered a broken left arm and broken left leg. Allen Blickle and Matt Maggioni each suffered fractured vertebrae. Peter Adams was treated and released from the hospital on August 16, 2012.;During the subsequent months of recovery, Baroness began scheduling tour dates once more. John Baizley performed an acoustic set and artwork exhibition on March 14, 15, and 16, 2013 at SXSW in Austin, Texas. In addition, Baroness made plans to perform at festivals such as Chaos in Tejas, Free Press Summer Festival, and Heavy MTL in Montreal, Quebec.;On March 25, 2013, through a statement posted on Baroness' official website, it was announced that both Allen Blickle (drums) and Matt Maggioni (bass guitar) had left Baroness.;On April 1, 2013, the first leg of Baroness' 2013 US Headlining Tour was announced, featuring the debut of bass guitarist Nick Jost, and drummer Sebastian Thomson of Trans Am.;On September 27, 2013 they started their European Tour in Tilburg, Netherlands.;On August 28, 2015 towards the end of a two-week tour in Europe they released the song 'Chlorine & Wine' and announced that their new album Purple would be released December 18, 2015 on their own newly formed label Abraxan Hymns.;Purple was recorded with Dave Fridmann at Tarbox Road Studios in Cassadaga, New York.;On September 24, 2015, Baroness released the official music video for 'Chlorine & Wine' and announced a North American small venue tour for the fall of 2015.;On November 15, 2015, the band released the first official single 'Shock Me' from the forthcoming album Purple, which debuted on BBC Radio 1's Rock Show with Daniel P. Carter.;Purple's track 'Shock Me' was nominated for Grammy Award for Best Metal Performance at the 2017 Grammy Awards.;On April 26, 2017, in an interview in Teamrock, John Baizley stated that they had begun writing material for their fifth studio album. Baizley stated: 'We've started writing a few tunes that we’re working on. The really cool thing now is that Sebastian and Nick have been in the band long enough that they understand what we do.';On June 1, 2017, it was announced that Peter Adams was amicably leaving the band to focus his energy at home, and not on the road. Gina Gleason was announced as his replacement.;On March 9, 2019, the band began teasing the release of a new album, entitled Gold & Grey. Three days later on March 12, they released the album art on their social media accounts stating, 'This painting was born from a deeply personal reflection on the past 12 years of this band’s history, and will stand as the 6th and final piece in our chromatically-themed records.';Gold & Grey was released to overwhelmingly positive reviews, achieving a score of 94 on metacritic with 9 reviews. Critics praised the album's artistry, the instrumental musicianship, and the use of vocal harmonies as well as stylist breadth that builds upon elements from the band's past works while also incorporating new stylistic elements."
| makemv text delim=";"
| mvexpand text
| fit MLTKContainer algo=spacy_sentiment mode=stage epochs=100 text into app:spacy_sentiment_model as sentiment

After you run this search your data set sample is available as a csv inside the container to develop your model. The name is taken from the into keyword ("spacy_entity_extraction_model in the example above) or set to "default" if no into keyword is present. This step is intended to work with a subset of your data to create your custom model.

In [3]:
# this cell is not executed from MLTK and should only be used for staging data into the notebook environment
def stage(name):
    with open("data/"+name+".csv", 'r') as f:
        df = pd.read_csv(f)
    with open("data/"+name+".json", 'r') as f:
        param = json.load(f)
    return df, param

In [4]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing purposes
df, param = stage("spacy_sentiment_model")
print(df)
print(df.shape)
print(str(param))

                                                 text
0   Baroness is an American heavy metal band from ...
1   From 2004 to 2007, Baroness recorded and relea...
2   Baroness started recording their first full-le...
3   Throughout 2007–9, Baroness toured and shared ...
4   On May 18, 2009 Baroness entered The Track Stu...
5   In February and March 2010 Baroness played in ...
6   Baroness have toured with many other prominent...
7   Baroness also performed at Coachella and Bonna...
8   Blue Record would later be named the 20th Grea...
9   On May 23, 2011, the band launched their offic...
10  On August 15, 2012, nine passengers were injur...
11   due to heavy rain and reduced visibility it w...
12  During the subsequent months of recovery, Baro...
13  On March 25, 2013, through a statement posted ...
14  On April 1, 2013, the first leg of Baroness' 2...
15  On September 27, 2013 they started their Europ...
16  On August 28, 2015 towards the end of a two-we...
17  Purple was recorded with

## Stage 2 - create and initialize a model

In [7]:
# initialize the model
# params: data and parameters
# returns the model object which will be used as a reference to call fit, apply and summary subsequently
def init(df,param):
    # Load English parser and text blob (for sentiment analysis)
    model = spacy.load('en_core_web_sm')
    #spacy_text_blob = SpacyTextBlob()
    #model.add_pipe(spacy_text_blob)
    model.add_pipe('spacytextblob')
    return model

In [8]:
model = init(df,param)

## Stage 3 - fit the model

Note that for this algorithm the model is pre-trained (the en_core_web_sm library comes pre-packaged by spacy) and therefore this stage is a placeholder only

In [9]:
# returns a fit info json object
def fit(model,df,param):
    returns = {}
    return returns

## Stage 4 - apply the model

In [14]:
def apply(model,df,param):
    X = df[param['feature_variables']].values.tolist()
    temp_data=list()
    
    for i in range(len(X)):
        doc = model(str(X[i]))
        polarity=doc._.blob.polarity
        subjectivity=doc._.blob.subjectivity
        assessments=doc._.blob.sentiment_assessments.assessments
        temp_data.append([polarity,subjectivity,assessments])
        
    column_names=["polarity","subjectivity","assessments"]
    returns=pd.DataFrame(temp_data, columns=column_names)
        
    return returns

In [15]:
returns = apply(model,df,param)
print(returns)

    polarity  subjectivity                                        assessments
0   0.010000      0.325000  [([american], 0.0, 0.0, None), ([heavy], -0.2,...
1   0.040000      0.086667  [([first], 0.25, 0.3333333333333333, None), ([...
2   0.083333      0.388745  [([first], 0.25, 0.3333333333333333, None), ([...
3   0.220000      0.346667  [([many], 0.5, 0.5, None), ([red], 0.0, 0.0, N...
4  -0.033333      0.241667  [([second], 0.0, 0.0, None), ([full-length], 0...
5  -0.083333      0.333333  [([australian], 0.0, 0.0, None), ([such], 0.0,...
6   0.104545      0.506818  [([many], 0.5, 0.5, None), ([other], -0.125, 0...
7   0.000000      0.000000                                                 []
8   0.250000      0.275000  [([blue], 0.0, 0.1, None), ([later], 0.0, 0.0,...
9   0.117532      0.382251  [([first], 0.25, 0.3333333333333333, None), ([...
10 -0.116667      0.533333  [([seriously], -0.3333333333333333, 0.66666666...
11 -0.160714      0.382143  [([due], -0.125, 0.375, None), ([hea

## Stage 5 - save the model

In [16]:
# save model to name in expected convention "<algo_name>_<model_name>.h5"
def save(model,name):
    # model will not be saved or reloaded as it is pre-built
    return model

## Stage 6 - load the model

In [17]:
# load model from name in expected convention "<algo_name>_<model_name>.h5"
def load(name):
    # model will not be saved or reloaded as it is pre-built
    return model

## Stage 7 - provide a summary of the model

In [18]:
# return model summary
def summary(model=None):
    returns = {"version": {"spacy": spacy.__version__} }
    if model is not None:
        s = []
        returns["summary"] = ''.join(s)
    return returns

## End of Stages
All subsequent cells are not tagged and can be used for further freeform code