In a few past blog posts, we have given some examples of how to build dashboards using panel.  These have all been one stage examples, but you can actually use panel to build a pipeline of stages with information that is carried over from one stage to the next.  

I have recently been learning a bit of Natural Language processing (NLP) for various texts, particulary ways to prepare and clean the data. I thought it may be kind of interesting to create a simple panel app that walks you through some of these steps.  If you are interesting in learning more about these pre-processing steps, please check out this post (here).  In this particular post, I will be focusing on how to build the app, and not as much on the steps included in the app.

First, let me show you an example of a simple pipeline just to give you an idea of how easy it is to put this into place.

You will start by importing panel, then instantiating a panel pipeline.  

In [None]:
import panel as pn

pn.extension()

dag = pn.pipeline.Pipeline(inherit_params=False, debug=True)

Next, you can add stages to the pipeline.  In order to this, we need stages to add.  



In order for a panel pipeline to work, you must create paramterized classes.  This means we will inherit from the `param.Parameterized` class.  You must also include a `panel` method for each stage that will determine the layout of the widgets you are including in your app.  

Furthermore, if you plan to pass values from one stage to the next, you will need to define an `output` method that returns the values to pass to the next stage.  The stage receiving the values must include variables at the onset that will consume these values.  

Below, `stage1` will display a text input widget and a continue button.  The text typed in stage 1 will be passed to the next stage.  In order to do that I have defined a `text` string parameter and an `output` method with the `param.output('text')` decorator.  This indicates that `text` is the output of this stage.

I also want to point out here that I included a `ready` parameter.  These can be useful in order to control when the stage is complete and ready to move to the next.  Later, you will see how it is used in the pipeline.

In [None]:
import param
class stage1(param.Parameterized):
    
    ready = param.Boolean(
        default=False,
        doc='trigger for moving to the next page',
        )   
    
    text = param.String()
    
    def __init__(self, **params):
        super().__init__(**params)
        self.text_input = pn.widgets.TextInput(name='Text Input', placeholder='Enter a string here...')
        self.continue_button = pn.widgets.Button(name='Continue',button_type='primary')
        self.continue_button.on_click(self.on_click_continue)
        
    def on_click_continue(self, event):
        self.ready=True
    
    @param.output('text')
    def output(self):
        text = self.text_input.value
        return text
        
    def panel(self):
        return pn.Column(self.text_input,
                  self.continue_button
                 )
        

`Stage2` is going to display a single line of static text that will display what the user entered in `Stage1`.  Below, you can see that `text` was defined again as a `param.String`.  

In [None]:
class stage2(param.Parameterized):
    
    
    text = param.String()
    
    def __init__(self, **params):
        super().__init__(**params)
        self.text_display = pn.widgets.StaticText(name='Previously, you typed ', value=self.text)
        
    def panel(self):
        return pn.Column(self.text_display)

Now that our classes are defined, we can add the stages to the pipeline.  Below, you'll notice that each stage has string input that will serve as the string identifier for this stage, and then the second input is the class being added in this stage.  Here I have added a `"Stage 1"` and `"Stage 2"`.  

Earlier, I mentioned having a `ready` parameter defined.  When adding a stage, you can specify the `ready_parameter` and set `auto_advance` to True, which will cause the next stage to appear when that `ready_parameter` is triggered.  

After adding stages, you will define the relationship between the stages by calling the `define_graph` method.  This will determine the order of stages.  Here we will start with `Stage 1` then move to `Stage 2`.  

In [None]:
dag.add_stage(
    'Stage 1',
    stage1,
    ready_parameter='ready',
    auto_advance=True
)

dag.add_stage(
            'Stage 2',
            stage2,
            )

dag.define_graph(
            {'Stage 1': 'Stage 2',
             }
            )


example_app = pn.Column(dag.stage).servable()

Let's view our app and confirm it does what we expect:

In [None]:
example_app

You can build much for elaborate pipelines with more stages, or even have branching stages that depend on user input.  Its quite flexible.  

After seeing this simple example, I will now insert a bit more complicated classes into a pipeline the same way I did above. 

This panel app will display several tabs with different choices for a user to select for different pre-processing options in an NLP task.  When the user is ready to click `continue`, the pre-processing is completed and the next stage will display some options for testing and training a sentiment analysis model.  Currently, I don't have many choices implemented, but the structure is there to build upon. 

The two classes are displayed here:

In [None]:
import panel as pn
import param
import pandas as pd

import io

from nltk.stem import (PorterStemmer, SnowballStemmer)
from nltk.tokenize import RegexpTokenizer
from sklearn.feature_extraction.text import CountVectorizer

class PreProcessor(param.Parameterized):
    
    # df will be the variable holding the dataframe of text
    df = param.DataFrame()
    # title to display for each tab
    name_of_page = param.String(default = 'Name of page')
    # dataframe to display.
    display_df = param.DataFrame(default = pd.DataFrame())
    # stopword_df is the dataframe containing the stopewords
    stopword_df = param.DataFrame(default = pd.DataFrame())
    
    stopwords = param.List(default = [])
    X = param.Array(default = None)
    
    # *****NEW***********
    ready = param.Boolean(
        default=False,
        doc='trigger for moving to the next page',
        )   
    # *******************
    
    def __init__(self, **params):
        super().__init__(**params)
        
        
        
        # button for the pre-processing page
        self.continue_button = pn.widgets.Button(name='Continue',
                                                 width = 100,
                                                 button_type='primary')

        self.continue_button.on_click(self.continue_ready)
        
        # load text widgets 
        self.header_checkbox = pn.widgets.Checkbox(name='Header included in file')
        self.load_file = pn.widgets.FileInput()
        self.load_file.link(self.df, callbacks={'value': self.load_df})
        self.header_checkbox = pn.widgets.Checkbox(name='Header included in file')
        
        # tokenize widgets
        self.search_pattern_input = pn.widgets.TextInput(name='Search Pattern', value = '\w+', width = 100)
        
        # remove stop words widgets
        self.load_words_button = pn.widgets.FileInput()
        self.load_words_button.link(self.stopwords, callbacks={'value': self.load_stopwords})
        
        # stem widgets
        self.stem_choice = pn.widgets.Select(name='Select', options=['Porter', 'Snowball'])
        
        # embedding widgets
        
        self.we_model = pn.widgets.Select(name='Select', options=['SKLearn Count Vectorizer'])

        
    @param.output('X', 'display_df')
    def output(self):
        return self.X, self.display_df
    
    
    @param.depends('display_df')
    def df_pane(self):
        return pn.WidgetBox(self.display_df,
                           height = 300,
                           width = 400)
    
    # load text page functions
    #-----------------------------------------------------------------------------------------------------
    def load_df(self, df, event):
        info = io.BytesIO(self.load_file.value)
        if self.header_checkbox.value==True:
            self.df = pd.read_csv(info)
        else:
            self.df = pd.read_csv(info, sep='\n', header = None, names=['text'])
        
        self.display_df = self.df
    
    def load_text_page(self):
        helper_text = (
            "This simple Sentiment Analysis NLP app will allow you to select a few different options " +
            "for some preprocessing steps to prepare your text for testing and training. " +
            "It will then allow you to choose a model to train, the percentage of data to " +
            "preserve for test, while the rest will be used to train the model.  Finally, " +
            "some initial metrics will be displayed to determine how well the model did to predict " +
            "the testing results." +
            " " +
            "Please choose a csv file that contains lines of text to analyze.  This text should " +
            "have a text column as well as a sentiment column.  If there is a header included in the file, " +
            "make sure to check the header checkbox."
        )
        return pn.Row(
                pn.Column(
                    pn.pane.Markdown(f'##Load Text:'),
                    pn.Column(
                        helper_text,
                         self.header_checkbox,
                         self.load_file
                        ),
                ),
                pn.Column(
                    pn.Spacer(height=52),
                    self.df_pane,
                    
                )
        
        )

    #-----------------------------------------------------------------------------------------------------
    
    # tokenize page options
    #-----------------------------------------------------------------------------------------------------
    def tokenize_option_page(self):
        
        help_text = ("Tokenization will break your text into a list of single articles " +
            "(ex. ['A', 'cat', 'walked', 'into', 'the', 'house', '.']).  Specify a regular " +
            "expression (regex) search pattern to use for splitting the text.")
        
        return pn.Column(
                    pn.pane.Markdown(f'##Tokenize options:'),
                    pn.WidgetBox(help_text, self.search_pattern_input,
                                    height = 300,
                                    width = 300
        
                                )
                )
    
    #-----------------------------------------------------------------------------------------------------
    
    
    # remove stopwords page 
    #-----------------------------------------------------------------------------------------------------
    
    def remove_stopwords_page(self):
        
        help_text = (
            "Stop words are words that do not add any value to the sentiment of the text. " +
            "Removing them may improve your sentiment results.  You may load a list of stop words " +
            "to exclude from your text."
        )
        return pn.Row(
                pn.Column(
                    pn.pane.Markdown(f'##Load Stopwords:'),
                    pn.WidgetBox(help_text, self.load_words_button,
                                    height = 300,
                                    width = 300
        
                    )
                ),
                pn.Column(
                    pn.Spacer(height=52),
                    pn.WidgetBox(self.stopword_df,
                           height = 300,
                           width = 400)
                    
                )
        )
    
    def load_stopwords(self, stopwords, event):
        info = io.BytesIO(self.load_words_button.value)
        self.stopwords = pd.read_pickle(info)
        self.stopword_df = pd.DataFrame({'stop words': self.stopwords})

    #-----------------------------------------------------------------------------------------------------
    
    # stemming page 
    #-----------------------------------------------------------------------------------------------------
    
    def stemmer_page(self):
        help_text = (
            "Stemming is a normalization step for the words in your text.  Something that is " +
            "plural should probably still be clumped together with a singular version of a word, " +
            "for example.  Stemming will basically remove the ends of words.  Here you can choose " + 
            "between a Porter Stemmer or Snowball Stemmer. Porter is a little less aggressive than " +
            "Snowball, however, Snowball is considered a slight improvement over Porter."
        )
        return pn.Column(
                    pn.pane.Markdown(f'##Stemmer options:'),
                    pn.WidgetBox(help_text, self.stem_choice,
                height = 300,
                width = 300)
                )
    
    #-----------------------------------------------------------------------------------------------------
    
    # embedding page 
    #-----------------------------------------------------------------------------------------------------
    
    def word_embedding_page(self):
        
        help_text = ("Embedding the process of turning words into numerical vectors. " +
                    "There have been several algorithms developed to do this, however, currently in this " +
                    "app, the sklearn count vectorizer is available. This algorithm will return a sparse " +
                    "matrix represention of all the words in your text."
                    )
        
        
        
        return pn.Column(
                    pn.pane.Markdown(f'##Choose embedding model:'),
                    pn.WidgetBox(help_text, self.we_model,
                            height = 300,
                            width = 300
        
                    )
        
                )
    
    #-----------------------------------------------------------------------------------------------------
          
    def continue_ready(self, event):

        # Set up for tokenization
        tokenizer = RegexpTokenizer(self.search_pattern_input.value)

        # Set up for stemming
        if self.stem_choice.value == 'Porter':
            stemmer = PorterStemmer() 
        else:
            stemmer = SnowballStemmer()

        # Set up for embedding
        if self.we_model.value == 'SKLearn Count Vectorizer':
            # Create a vectorizer instance
            vectorizer = CountVectorizer(max_features=1000)

        corpus = []
        #loop through each line of data
        for n in range(len(self.display_df)):  
            sentence = self.display_df.iloc[n].text

            #1. Tokenize
            tokens = tokenizer.tokenize(sentence)

            #2. remove stop words
            tokens_no_sw = [word for word in tokens if not word in self.stopwords]

            #3. stem the remaining words
            stem_words = [stemmer.stem(x) for x in tokens_no_sw]

            #Join the words back together as one string and append this string to your corpus.
            corpus.append(' '.join(stem_words))

        X = vectorizer.fit_transform(corpus).toarray()
        labels = self.display_df['sentiment']

        xlist = []
        for n in range(len(X)):
            xlist.append(list(X[n]))
        self.X = X
        self.display_df = pd.DataFrame({'embeddings': xlist, 'sentiment': labels})
        
        self.ready = True
    
    def panel(self):
        
        return pn.Column(
            pn.Tabs(
                ('Load Text', self.load_text_page),
                ('Tokenize', self.tokenize_option_page),
                ('Remove Stopwords', self.remove_stopwords_page),
                ('Stem', self.stemmer_page),
                ('Embed', self.word_embedding_page)
                ),
            self.continue_button
        )
        

In [None]:

import panel as pn
import pandas as pd
import param
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import numpy as np


class trainer(param.Parameterized):
    
    display_df = param.DataFrame(default = pd.DataFrame())
    
    results = param.Boolean(default = False)
    
    X = param.Array(default = None)
    
    result_string = param.String(default = '')

    result_string = param.String('')
    
    def __init__(self, **params):
        super().__init__(**params)
        self.name_of_page = 'Test and Train'
        
        self.test_slider = pn.widgets.IntSlider(name='Test Percentage', start=0, end=100, step=10, value=20)

        self.tt_button = pn.widgets.Button(name='Train and Test', button_type='primary')
        self.tt_button.on_click(self.train_test)
        
        self.tt_model = pn.widgets.Select(name='Select', options=['Random Forrest Classifier'])
        
        
    def train_test(self, event):
        
        #get values from sentiment.
        self.display_df = convert_sentiment_values(self.display_df)
        
        y = self.display_df['label']
        
        #get train test sets
        X_train, X_test, y_train, y_test = train_test_split(self.X, y, test_size = self.test_slider.value/100, random_state = 0)
        
        
        if self.tt_model.value == 'Random Forrest Classifier':
            sentiment_classifier = RandomForestClassifier(n_estimators = 1000, random_state = 0)
            
            sentiment_classifier.fit(X_train, y_train)
            
            y_pred = sentiment_classifier.predict(X_test)
            
        self.y_test = y_test
        self.y_pred = y_pred
        self.analyze()
        
    def analyze(self):
        self.cm = confusion_matrix(self.y_test,self.y_pred)
        self.cr = classification_report(self.y_test,self.y_pred)
        self.acc_score = accuracy_score(self.y_test, self.y_pred)
        
        splits = self.cr.split('\n')
        cml = self.cm.tolist()
        self.result_string = f"""
            ### Classification Report
            <pre>
            {splits[0]}
            {splits[1]}
            {splits[2]}
            {splits[3]}
            {splits[4]}
            {splits[5]}
            {splits[6]}
            {splits[7]}
            {splits[8]}
            </pre>
            ### Confusion Matrix
            <pre>
            {cml[0]}
            {cml[1]}

            </pre>

            ### Accuracy Score
            <pre>
            {round(self.acc_score, 4)}
            </pre
            """
        

        self.results = True 

    def options_page(self, help_text):
        
        return pn.WidgetBox(help_text, self.tt_model,
                            self.test_slider,
                            self.tt_button,
                height = 375,
                width = 300
        
        )
        
    @pn.depends('results')
    def df_pane(self):
        
        if self.results == False:
            self.result_pane = self.display_df
            
        else:
            self.result_pane = pn.pane.Markdown(f"""
                {self.result_string}
                """, width = 500, height = 350)
        
        return pn.WidgetBox(self.result_pane,
                           height = 375,
                           width = 450)
        


    def panel(self):
        
        help_text = (
            "Your text will now be trained and tested using a selected model.  You may " +
            "choose a percentage of your data to reserve for testing, the rest will be used for " +
            "training.  For example, if I reserve 20%, the rest of the 80% will be used for training " +
            "and the 20% will be used to determine how well the trained model does assigning a " +
            "sentiment label to the testing text.  Currently, the only model available is the sklearn " +
            "Random Forrest Classifier model."
        )
        
        return pn.Row(
                pn.Column(
                    pn.pane.Markdown(f'##Train and Test'),
                    self.options_page(help_text),
                ),
                pn.Column(
                    pn.Spacer(height=52),
                    self.df_pane,
                    
                )
        
        )
    
    
def convert_sentiment_values(df, col = 'sentiment'):
    vals = df['sentiment'].unique()
    df['label'] = 0

    for n in range(len(vals)):
        df['label'] = [n if df[col][x] == vals[n] else df['label'][x] for x in range(len(df[col]))]
        
    return df




    

The pipeline is established just as before:

In [None]:
dag = pn.pipeline.Pipeline(inherit_params=False)

dag.add_stage(
    'Preprocess',
    PreProcessor,
    ready_parameter='ready',
    auto_advance=True
)

dag.add_stage(
            'Testing',
            trainer,
            ready_parameter='ready',
            auto_advance=True,
            )

dag.define_graph(
            {'Preprocess': 'Testing',
             }
            )


SentimentApp = pn.Column(dag.stage).servable()

And now we can view our new app:

In [None]:
SentimentApp

In [None]:
For more details on how I built the first page of this app, I wrote up my thought process to develop this stage.