In [1]:
#Training sample set - specify each intent with sample texts. Also, annotate the specific entities that needs to be extracted
nlu_md = """

## intent:get_aum
- Assets under management as at [30 April 2019](aum_date) exceeded [USD 119.9 billion](aum_value).
- The assets under management (AUM) in the BGF range increased from US144.12 billion to [US145.80 billion](aum_value) over the period,
as strong inflows into the groups specialist funds were offset by weakness in some of the European and high yield bond funds.
- It is one of the largest asset managers in the world with [USD 2.46 trillion](aum_value) in assets under management as of [March 2018](aum_date)

## intent:fund_launch
- The [Asian High Yield Bond Fund](fund_name) launched on [1 December 2017](launch_date).
- The [China A-Share Opportunities Fund](fund_name) launched on [26 October 2017](launch_date) to offer specialist investment in the expanding onshore market in China.
- The [China Flexible Equity Fund](fund_name) launched on [31 October 2017](launch_date), giving selective access to Chinese equities.
- The [Dynamic High Income Fund](fund)name launched on [6 February 2018](launch_date).

"""
%store nlu_md > nlu.md

Writing 'nlu_md' (str) to file 'nlu.md'.


In [2]:
#Load the dependencies
from rasa.nlu.training_data import load_data
from rasa.nlu import config
from rasa.nlu.model import Trainer
from rasa.nlu.model import Metadata, Interpreter
import json
import warnings
warnings.filterwarnings("ignore")
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

In [3]:
#Function to get a JSON response of the output
def intent_print(o):
    print(json.dumps(o, indent=2))

In [4]:
#Specify the model pipeline
tensor_config = """
language: "en"

pipeline:
- name: "WhitespaceTokenizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
- name: "EmbeddingIntentClassifier"
  intent_tokenization_flag: true
  intent_split_symbol: "+"
"""
%store tensor_config > tensor_config.yml

Writing 'tensor_config' (str) to file 'tensor_config.yml'.


In [7]:
#Train the model and persist it for further use
training_data = load_data('nlu.md')
trainer = Trainer(config.load('tensor_config.yml'))
trainer.train(training_data)
model_directory = trainer.persist("./models/nlu", fixed_model_name="tensor")

Epochs: 100%|█████████████████████████████████████████████████| 300/300 [00:15<00:00, 19.08it/s, loss=0.213, acc=1.000]


In [8]:
#Load the interpreter to test the model with sample inputs
interpreter = Interpreter.load('./models/nlu/tensor')

Instructions for updating:
Use standard file APIs to check for files with this prefix.


Instructions for updating:
Use standard file APIs to check for files with this prefix.


INFO:tensorflow:Restoring parameters from ./models/nlu/tensor\component_5_EmbeddingIntentClassifier.ckpt


INFO:tensorflow:Restoring parameters from ./models/nlu/tensor\component_5_EmbeddingIntentClassifier.ckpt


In [10]:
intent_print(interpreter.parse('The Asian High Yield Bond Fund launched on 1 December 2017'))

{
  "intent": {
    "name": "fund_launch",
    "confidence": 0.9687952399253845
  },
  "entities": [
    {
      "start": 4,
      "end": 30,
      "value": "Asian High Yield Bond Fund",
      "entity": "fund_name",
      "confidence": 0.8942744324046314,
      "extractor": "CRFEntityExtractor"
    },
    {
      "start": 43,
      "end": 58,
      "value": "1 December 2017",
      "entity": "launch_date",
      "confidence": 0.9033348076043048,
      "extractor": "CRFEntityExtractor"
    }
  ],
  "intent_ranking": [
    {
      "name": "fund_launch",
      "confidence": 0.9687952399253845
    },
    {
      "name": "get_aum",
      "confidence": 0.031204750761389732
    }
  ],
  "text": "The Asian High Yield Bond Fund launched on 1 December 2017"
}


In [11]:
int = interpreter.parse('The Asian High Yield Bond Fund launched on 1 December 2017')

In [12]:
int['intent']

{'name': 'fund_launch', 'confidence': 0.9687952399253845}