# Assignment 1

In this assignment you will build a language model for the [OHHLA corpus](http://ohhla.com/) we are using in the book. You will train the model on the available training set, and can tune it on the development set. After submission we will run your notebook on a different test set. Your mark will depend on 

* whether your language model is **properly normalized**,
* its **perplexity** on the unseen test set,
* your **description** of your approach. 

To develop your model you have access to:

* The training and development data in `data/ohhla`.
* The code of the lecture, stored in a python module [here](/edit/statnlpbook/lm.py).
* Libraries on the [docker image](https://github.com/uclmr/stat-nlp-book/blob/python/Dockerfile) which contains everything in [this image](https://github.com/jupyter/docker-stacks/tree/master/scipy-notebook), including scikit-learn and tensorflow. 

As we have to run the notebooks of all students, and because writing efficient code is important, **your notebook should run in 5 minutes at most**, on your machine. Further comments:

* We have tested a possible solution on the Azure VMs and it ran in seconds, so it is possible to train a reasonable LM on the data in reasonable time. 

* Try to run your parameter optimisation offline, such that in your answer notebook the best parameters are already set and don't need to be searched.

## Setup Instructions
It is important that this file is placed in the **correct directory**. It will not run otherwise. The correct directory is

    DIRECTORY_OF_YOUR_BOOK/assignments/2016/assignment1/problem/
    
where `DIRECTORY_OF_YOUR_BOOK` is a placeholder for the directory you downloaded the book to. After you placed it there, **rename the file** to your UCL ID (of the form `ucxxxxx`). 

## General Instructions
This notebook will be used by you to provide your solution, and by us to both assess your solution and enter your marks. It contains three types of sections:

1. **Setup** Sections: these sections set up code and resources for assessment. **Do not edit these**. 
2. **Assessment** Sections: these sections are used for both evaluating the output of your code, and for markers to enter their marks. **Do not edit these**. 
3. **Task** Sections: these sections require your solutions. They may contain stub code, and you are expected to edit this code. For free text answers simply edit the markdown field.  

Note that you are free to **create additional notebook cells** within a task section. 

Please **do not share** this assignment publicly, by uploading it online, emailing it to friends etc. 


## Submission Instructions

To submit your solution:

* Make sure that your solution is fully contained in this notebook. 
* **Rename this notebook to your UCL ID** (of the form "ucxxxxx"), if you have not already done so.
* Download the notebook in Jupyter via *File -> Download as -> Notebook (.ipynb)*.
* Upload the notebook to the Moodle submission site.


## <font color='green'>Setup 1</font>: Load Libraries
This cell loads libraries important for evaluation and assessment of your model. **Do not change it.**

In [27]:
#! SETUP 1
import sys, os
_snlp_book_dir = "../../../../"
sys.path.append(_snlp_book_dir) 
import statnlpbook.lm as lm
import statnlpbook.ohhla as ohhla
import math

## <font color='green'>Setup 2</font>: Load Training Data

This cell loads the training data. We use this data for assessment to define the reference vocabulary: the union of the words of the training and set set. You can use the dataset to train your model, but you are also free to load the data in a different way, or focus on subsets etc. However, when you do this, still **do not edit this setup section**. Instead refer to the variables in your own code, and slice and dice them as you see fit.   

In [28]:
#! SETUP 2
_snlp_train_dir = _snlp_book_dir + "/data/ohhla/train"
_snlp_dev_dir = _snlp_book_dir + "/data/ohhla/dev"
_snlp_train_song_words = ohhla.words(ohhla.load_all_songs(_snlp_train_dir))
_snlp_dev_song_words = ohhla.words(ohhla.load_all_songs(_snlp_dev_dir))
assert(len(_snlp_train_song_words)==1041496)

Could not load ../../../..//data/ohhla/train/www.ohhla.com/anonymous/nas/distant/tribal.nas.txt.html


Due to file encoding issues this code produces one error `Could not load ...`. **Ignore this error**.

## <font color='blue'>Task 1</font>: Develop and Train the Model

This is the core part of the assignment. You are to code up, train and tune a language model. Your language model needs to be subclass of the `lm.LanguageModel` class. You can use some of the existing language models developed in the lecture, or develop your own extensions. 

Concretely, you need to return a better language model in the `create_lm` function. This function receives a target vocabulary `vocab`, and it needs to return a language model defined over this vocabulary. 

The target vocab will be the union of the training and test set (hidden to you at development time). This vocab will contain words not in the training set. One way to address this issue is to use the `lm.OOVAwareLM` class discussed in the lecture notes.

In [29]:
## You should improve this cell
import collections

class NGramLM(lm.CountLM):
    """
    Change the NGram language model to keep track of the various counts
    needed for Knesser Ney
    """
    def __init__(self, train, order):
        
        # Initialisation
        super().__init__(set(train), order)
        self._counts = collections.defaultdict(float)
        self._norm = collections.defaultdict(float)
        self._nbis = 0
        self._nconts = collections.defaultdict(float) #N+ continuations
        self._nhists = collections.defaultdict(float) #N+ histories
        self._nmids = collections.defaultdict(float) #N+ (*t*)
        self._nends = collections.defaultdict(float) 
        
        for i in range(self.order, len(train)):
            history = tuple(train[i - self.order + 1: i])
            word = train[i]
            if self._counts[(word,) + history] == 0.0:
                self._nbis += 1.0
                self._nconts[history] += 1.0
                self._nhists[(word, )] += 1.0
                if self.order > 2:
                    self._nmids[tuple(history[-(self.order - 2):])] += 1.0
                    self._nends[(word,) + tuple(history[-(self.order - 2):])] += 1.0
            self._counts[(word,) + history] += 1.0
            self._norm[history] += 1.0

    def counts(self, word_and_history):
        return self._counts[word_and_history]

    def norm(self, history):
        return self._norm[history]
    
class KneserNeyNGram(lm.LanguageModel):
    """
    Kneser-Ney N-Gram model - recursively goes from highest order with fixed discount d
    """
    def __init__(self, train, order, d):
        
        # Initialisation
        super().__init__(set(train), order)
        self.train = train
        self.nmodel = self.order #the order of the n-gram model as we recurse down
        self.d = d
        self.ngram = collections.defaultdict(float) #collection of n-gram models
        
        for i in range(2, order + 1):
            self.ngram[i] = NGramLM(self.train, i)
    
    def probability (self, word, *history):
        model_hist = tuple(history[-(self.nmodel- 1):])
        
        if self.nmodel == 1:
            self.nmodel = self.order
            return self.ngram[2]._nhists[(word, )] / self.ngram[2]._nbis
        
        elif self.nmodel == self.order:
            norm = self.ngram[self.nmodel].norm(model_hist)
            
            if not(norm): norm = len(self.vocab)
            prob = max(self.ngram[self.nmodel].counts((word, ) + model_hist) - self.d, 0) / norm
        
        elif self.nmodel > 1 and self.nmodel != self.order:
            norm = self.ngram[self.nmodel + 1]._nmids[model_hist]
            
            if not(norm): norm = len(self.vocab)
            prob = max(self.ngram[self.nmodel + 1]._nends[(word, ) + model_hist] - self.d, 0) / norm
        
        #calc lambda
        nconts = self.ngram[self.nmodel]._nconts[model_hist]
        if nconts > 0:
            _lambda = self.d / norm * nconts
        else:
            if norm != len(self.vocab):
                _lambda = self.d / norm * len(self.vocab)
            else:
                _lambda = 1.0
        self.nmodel -= 1
        return prob + (_lambda * self.probability(word, *history))
        
def create_lm(vocab):
    """
    Return an instance of `lm.LanguageModel` defined over the given vocabulary.
    Args:
        vocab: the vocabulary the LM should be defined over. It is the union of the training and test words.
    Returns:
        a language model, instance of `lm.LanguageModel`.
    """    
    oov_train = lm.inject_OOVs(_snlp_train_song_words+_snlp_dev_song_words)
    oov_vocab = set(oov_train)
    missing_words = set([word for word in _snlp_vocab if word not in oov_vocab])

    #Cross validation and optimisation done below...the optimum parameter comes from this          
    return  lm.OOVAwareLM(KneserNeyNGram(oov_train,6,0.914522743225), missing_words) 

## <font color='green'>Setup 3</font>: Specify Test Data
This cell defines the directory to load the test songs from. When we evaluate your notebook we will point this directory elsewhere and use a **hidden test set**.  

In [30]:
#! SETUP 3
_snlp_test_dir = _snlp_book_dir + "/data/ohhla/dev"

## <font color='green'>Setup 4</font>: Load Test Data and Prepare Language Model
In this section we load the test data, prepare the reference vocabulary and then create your language model based on this vocabulary.

In [31]:
#! SETUP 4
_snlp_test_song_words = ohhla.words(ohhla.load_all_songs(_snlp_test_dir))
_snlp_test_vocab = set(_snlp_test_song_words)
_snlp_dev_vocab = set(_snlp_dev_song_words)
_snlp_train_vocab = set(_snlp_train_song_words)
_snlp_vocab = _snlp_test_vocab | _snlp_train_vocab | _snlp_dev_vocab
_snlp_lm = create_lm(_snlp_vocab)

## <font color='red'>Assessment 1</font>: Test Normalization (20 pts)
Here we test whether the conditional distributions of your language model are properly normalized. If probabilities sum up to $1$ you get full points, you get half of the points if probabilities sum up to be smaller than 1, and 0 points otherwise. Due to floating point issues we will test with respect to a tolerance $\epsilon$ (`_eps`).

Points:
* 10 pts: $\leq 1 + \epsilon$
* 20 pts: $\approx 1$

In [32]:
#! ASSESSMENT 1
_snlp_test_token_indices = [100, 1000, 10000]
_eps = 0.000001
for i in _snlp_test_token_indices:
    result = sum([_snlp_lm.probability(word, *_snlp_test_song_words[i-_snlp_lm.order+1:i]) for word in _snlp_vocab])
    print("Sum: {sum}, ~1: {approx_1}, <=1: {leq_1}".format(sum=result, 
                                                            approx_1=abs(result - 1.0) < _eps, 
                                                            leq_1=result - _eps <= 1.0))

Sum: 0.9999999999998268, ~1: True, <=1: True
Sum: 0.9999999999999071, ~1: True, <=1: True
Sum: 0.9999999999991043, ~1: True, <=1: True


The above solution is marked with **
<!-- ASSESSMENT 2: START_POINTS -->
20
<!-- ASSESSMENT 2: END_POINTS --> 
points **.

### <font color='red'>Assessment 2</font>: Apply to Test Data (50 pts)

We assess how well your LM performs on some unseen test set. Perplexities are mapped to points as follows.

* 0-10 pts: uniform perplexity > perplexity > 550, linear
* 10-30 pts: 550 > perplexity > 140, linear
* 30-50 pts: 140 > perplexity > *Best-Result*, linear

The **linear** mapping maps any perplexity value between the lower and upper bound linearly to a score. For example, if uniform perplexity is $U$ and your model's perplexity is $P\leq550$, then your score is $10\frac{P-U}{550-U}$. 

The *Best-Result* perplexity is the minimum of the best perplexity the course organiser achieved, and the submitted perplexities.  

In [33]:
lm.perplexity(_snlp_lm, _snlp_test_song_words)

4.384279468112543

The above solution is marked with **
<!-- ASSESSMENT 3: START_POINTS -->
0
<!-- ASSESSMENT 3: END_POINTS --> points**. 

## <font color='blue'>Task 2</font>: Describe your Approach

<p style="font-size:20px">Goal

The goal was to create a correctly normalised language model with as low a perplexity as possible on an unseen test set.

<p style="font-size:20px">OOVs

Given that the test set was unseen and I was almost certain to encounter out of vocabulary words, I first used the OOVInject heuristic, then to correctly normalise the model for testing I used OOVAware for words in the overall vocabulary, but missing in training. 

<p style="font-size:20px">NGrams

I first experimented with NGrams. To avoid zero probabilities and thus infinite perplexity I used Laplace smooothing, finally I interpolated between higher and lower order models. At first I used the brute force optimiser provided by scipy. However as the order of the models became higher (I went to order 8), there were multiple parameters to tune (both laplace and interpolation parameters). So at this point I simply tuned by hand to get a rough idea of the perplexity. Even this simple method reduced perplexity on the development set to 171.

<p style="font-size:20px">Bar Aware

The data has structure, particularly for [BAR]s. For example, [/BAR] was always followed by [BAR], plus they alternate. I also looked at the distribution of distances between [BAR]s. I made the model 'Bar Aware' by coding the most obvious rules. This reduced my perplexity from 171 to 159. 

<p style="font-size:20px">Utilities

For sensibility checks I used utility functions provided in the lectures such as plot_probabilities and particular sampling. Utilities plus cross-validation code are commented out at the end of this notebook. 


<p style="font-size:20px">Kneser-Ney Models

The 1999 paper by Chen and Goodman [1], gives compelling evidence for the efficacy of the Kneser-Ney model, and indeed the Modified KN model. I implemented first the bigram then the trigram version and finally the higher order recursive version of this model using a single discounting parameter (the more sophisticated modified, has different discounts between different orders). The intuition is that for lower order models, instead of using maximum likelihood we instead derive a contination probability. Another advantage is that the model only has one discount parameter to optimise. The graph of perplexity against this parameter is convex so I used bisection search to optimise.  


<img src="https://i.imgsafe.org/c9dd46567d.png">


A trigram Kneser-Ney reduced the perplexity to 134. The chart above shows the perplexites for a model of order 6 which achieved a perplexity of 116. This also rendered the bar_aware model redundant, Mostly due to Kneser-Ney also picking up the continuation of [/BAR] with very high probability (see below). 

<img src="https://i.imgsafe.org/b432e386db.png">


The Kneser-Ney trigram model formula


$P_{KN}\left ( w_{3}| w_{1}w_{2}\right ) = \frac{max\left \{ c(w_{1}w_{2}w_{3})-D, 0 \right \}}{c(w_{1}w_{2})} + D * \frac{\mathrm{N}( w_{1}w_{2}\bullet)}{c(w_{1}w_{2})} *   \left(  \frac{max \left \{ \mathrm{N}( \bullet w_{2}w_{3})-D,0 \right \}}{\mathrm{N}( \bullet w_{2}\bullet)}   + D*\frac{\mathrm{N}( w_{2}\bullet)}{\mathrm{N}( \bullet w_{2}\bullet)}* \frac{\mathrm{N}( \bullet w_{3})}{\mathrm{N}(\bullet \bullet)}  \right)$


<p style="font-size:20px">Generalisation

For the unseen test set I wished to train the model on all the data available. I performed 5-fold cross validation in order to tune the model. I was happy to see that the optimum parameter varied little through each fold giving confidence that the model should generalise. However the perplexity did vary somewhat on each cross-validation hold out set. I chose the mean of the optimal parameters as my final Kneser-Ney model discounting parameter. The model was finally trained on all the data (train and development). In terms of the order of the model, although I was able to get lower perplexities (albeit at a reducing rate) with models up to order 8 (perplexity of 113), higher order models also seemed like overkill and more to the point did kill the kernel in Jupyter several times ! Thus my implementation settled on an order 6 Kneser-Ney model.

<p style="font-size:20px">Further work

I suspect the results could have been further improved had I implemented the full modified Kneser-Ney. However, I slowed down rapidly with my fiddly KN trigram implementation and also would have had extra parameters to optimise. It would have been interesting to compare with a neural net language model.

<p style="font-size:20px">Reference

[1] Chen Goodman http://www2.denizyuret.com/ref/goodman/chen-goodman-99.pdf







 

## <font color='red'>Assessment 3</font>: Assess Description (30 pts) 

We will mark the description along the following dimensions: 

* Clarity (10pts: very clear, 0pts: we can't figure out what you did)
* Creativity (10pts: we could not have come up with this, 0pts: Use the unigram model from the lecture notes)
* Substance (10pts: implemented complex state-of-the-art LM, 0pts: Use the unigram model from the lecture notes)

The above solution is marked with **
<!-- ASSESSMENT 1: START_POINTS -->
0
<!-- ASSESSMENT 1: END_POINTS --> points**. 

In [34]:
#Code from here is now commented out
#Some utilities for sensibility checking 

#Given a history plot_probalities shows the next most likely word...in the case [/BAR] is the final word
#I wish to see a high probability that [BAR] will be next
"""
import statnlpbook.util as util
def plot_probabilities(lm, context = ('sun','[BAR]', 'ain', "'t", 'shining', 'Tupac', '[/BAR]'), how_many = 10):    
    probs = sorted([(word,lm.probability(word,*context)) for word in lm.vocab], key=lambda x:x[1], reverse=True)[:how_many]
    util.plot_bar_graph([prob for _,prob in probs], [word for word, _ in probs])
plot_probabilities(_snlp_lm)
"""

'\nimport statnlpbook.util as util\ndef plot_probabilities(lm, context = (\'sun\',\'[BAR]\', \'ain\', "\'t", \'shining\', \'Tupac\', \'[/BAR]\'), how_many = 10):    \n    probs = sorted([(word,lm.probability(word,*context)) for word in lm.vocab], key=lambda x:x[1], reverse=True)[:how_many]\n    util.plot_bar_graph([prob for _,prob in probs], [word for word, _ in probs])\nplot_probabilities(_snlp_lm)\n'

In [35]:
#Sampling utility to see how closely generated text matches text in the development set
"""
hist = ('sun','[BAR]', 'ain', "'t", 'shining', 'Tupac', '[/BAR]')
import numpy as np

def sample(lm, init, amount,test_vocab):
    
    #words = list(lm.vocab)
    words = list(test_vocab)
    result = []
    result += init
    for i in range(0, amount):
        history = result[-(lm.order-1):]
        probs = [lm.probability(word, *history) for word in words]
        sampled = np.random.choice(words,p=probs)
        result.append(sampled)
    return result

sample(_snlp_lm,hist,10, _snlp_vocab)
"""

'\nhist = (\'sun\',\'[BAR]\', \'ain\', "\'t", \'shining\', \'Tupac\', \'[/BAR]\')\nimport numpy as np\n\ndef sample(lm, init, amount,test_vocab):\n    \n    #words = list(lm.vocab)\n    words = list(test_vocab)\n    result = []\n    result += init\n    for i in range(0, amount):\n        history = result[-(lm.order-1):]\n        probs = [lm.probability(word, *history) for word in words]\n        sampled = np.random.choice(words,p=probs)\n        result.append(sampled)\n    return result\n\nsample(_snlp_lm,hist,10, _snlp_vocab)\n'

In [36]:
#View the perplexities on the development set given a varying parameter in KNTri
"""
import matplotlib.pyplot as plt

oov_train = lm.inject_OOVs(_snlp_train_song_words)
oov_vocab = set(oov_train)
missing_words = set([word for word in _snlp_vocab if word not in oov_vocab])
alphas = np.arange(0.05,1.05,0.05)    


perplexities = [lm.perplexity(lm.OOVAwareLM(KneserNeyNGram(oov_train,6,alpha), missing_words),_snlp_dev_song_words) for alpha in alphas]
fig = plt.figure()
plt.plot(alphas,perplexities)
plt.ylabel("Perplexity")
plt.xlabel("KNGram parameter")
plt.savefig("perplexity.png")
plt.show(fig)
"""

'\nimport matplotlib.pyplot as plt\n\noov_train = lm.inject_OOVs(_snlp_train_song_words)\noov_vocab = set(oov_train)\nmissing_words = set([word for word in _snlp_vocab if word not in oov_vocab])\nalphas = np.arange(0.05,1.05,0.05)    \n\n\nperplexities = [lm.perplexity(lm.OOVAwareLM(KneserNeyNGram(oov_train,6,alpha), missing_words),_snlp_dev_song_words) for alpha in alphas]\nfig = plt.figure()\nplt.plot(alphas,perplexities)\nplt.ylabel("Perplexity")\nplt.xlabel("KNGram parameter")\nplt.savefig("perplexity.png")\nplt.show(fig)\n'

In [37]:
#For optimisation of the KN parameter, do cross validation
"""
train_dev = _snlp_train_song_words+_snlp_dev_song_words
def find_optimal(low, high, oov_train, test_set, epsilon=1e-6):
        
        print(high, low)
        if high - low < epsilon:
            return high, lm.perplexity(lm.OOVAwareLM(KneserNeyNGram(oov_train,6,high), missing_words),test_set)
        else:
            mid = (high+low) / 2.0
            left = lm.perplexity(lm.OOVAwareLM(KneserNeyNGram(oov_train,6,mid-epsilon), missing_words),test_set)
            right = lm.perplexity(lm.OOVAwareLM(KneserNeyNGram(oov_train,6,mid+epsilon), missing_words),test_set)
            if left < right:
                return find_optimal(low, mid, oov_train,test_set, epsilon)
            else:
                return find_optimal(mid, high,oov_train,test_set, epsilon)

alphas_perps = []
for j in range(4,-1,-1):
    
    train_set = train_dev[:j*len(train_dev)//5]+train_dev[(j+1)*len(train_dev)//5:]
    dev_set = train_dev[j*len(train_dev)//5:(j+1)*len(train_dev)//5]
    
    oov_train = lm.inject_OOVs(train_set)
    oov_vocab = set(oov_train)
    missing_words = set([word for word in _snlp_vocab if word not in oov_vocab])
    
    alpha = find_optimal(0.0,1.0,oov_train,dev_set)
    alphas_perps.append(alpha)
    print(alpha)
"""

'\ntrain_dev = _snlp_train_song_words+_snlp_dev_song_words\ndef find_optimal(low, high, oov_train, test_set, epsilon=1e-6):\n        \n        print(high, low)\n        if high - low < epsilon:\n            return high, lm.perplexity(lm.OOVAwareLM(KneserNeyNGram(oov_train,6,high), missing_words),test_set)\n        else:\n            mid = (high+low) / 2.0\n            left = lm.perplexity(lm.OOVAwareLM(KneserNeyNGram(oov_train,6,mid-epsilon), missing_words),test_set)\n            right = lm.perplexity(lm.OOVAwareLM(KneserNeyNGram(oov_train,6,mid+epsilon), missing_words),test_set)\n            if left < right:\n                return find_optimal(low, mid, oov_train,test_set, epsilon)\n            else:\n                return find_optimal(mid, high,oov_train,test_set, epsilon)\n\nalphas_perps = []\nfor j in range(4,-1,-1):\n    \n    train_set = train_dev[:j*len(train_dev)//5]+train_dev[(j+1)*len(train_dev)//5:]\n    dev_set = train_dev[j*len(train_dev)//5:(j+1)*len(train_dev)//5]\n   

In [38]:
"""
param = np.mean([x[0] for x in alphas_perps])
print(param)
perp = np.mean([x[1] for x in alphas_perps])
print(perp)
"""

'\nparam = np.mean([x[0] for x in alphas_perps])\nprint(param)\nperp = np.mean([x[1] for x in alphas_perps])\nprint(perp)\n'

In [39]:
#Redundant Bar Aware Language model - no longer needed due to Knesser-Ney
"""
class BarAwareLM(lm.LanguageModel):
      
    def __init__(self,base_lm):
        
        super().__init__(base_lm.vocab,base_lm.order)
        self.base_lm = base_lm
        
    def probability(self, word, *history):
        if history[-1] == '[/BAR]':
            if word =='[BAR]':
                return 1.0
            else:
                return 0.0
        else:
            #Note that [Bar],[/Bar] alternate so adjust probabilities
            hist = history #[-(self.order - 1):]
            if '[BAR]' in hist and '[/BAR]' in hist:
                last_bar = max(loc for loc, val in enumerate(hist) if val == '[BAR]')
                last_slashbar = max(loc for loc, val in enumerate(hist) if val == '[/BAR]')
                if last_bar>last_slashbar:
                    if word =='[BAR]':
                        return 0.0
                    else:
                        return self.base_lm.probability( word, *history)/(1-self.base_lm.probability('[BAR]', *history))
                else:
                    if word =='[/BAR]':
                        return 0.0
                    else:
                        return self.base_lm.probability( word, *history)/(1-self.base_lm.probability('[/BAR]', *history)) 
            elif '[BAR]' in hist:
                if word =='[BAR]':
                    return 0.0
                else:
                    return self.base_lm.probability( word, *history)/(1-self.base_lm.probability('[BAR]', *history))
            elif '[/BAR]' in hist:
                if word =='[/BAR]':
                    return 0.0
                else:
                    return self.base_lm.probability( word, *history)/(1-self.base_lm.probability('[/BAR]', *history)) 
                    
            return self.base_lm.probability( word, *history)
"""

"\nclass BarAwareLM(lm.LanguageModel):\n      \n    def __init__(self,base_lm):\n        \n        super().__init__(base_lm.vocab,base_lm.order)\n        self.base_lm = base_lm\n        \n    def probability(self, word, *history):\n        if history[-1] == '[/BAR]':\n            if word =='[BAR]':\n                return 1.0\n            else:\n                return 0.0\n        else:\n            #Note that [Bar],[/Bar] alternate so adjust probabilities\n            hist = history #[-(self.order - 1):]\n            if '[BAR]' in hist and '[/BAR]' in hist:\n                last_bar = max(loc for loc, val in enumerate(hist) if val == '[BAR]')\n                last_slashbar = max(loc for loc, val in enumerate(hist) if val == '[/BAR]')\n                if last_bar>last_slashbar:\n                    if word =='[BAR]':\n                        return 0.0\n                    else:\n                        return self.base_lm.probability( word, *history)/(1-self.base_lm.probability('[BAR]