Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert timestamp to opt-in feature #209

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/Test.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,11 @@ See [Caching speech-to-text transcriptions](#caching-speech-to-text-transcriptio

This is currently only used for LUIS, see the section on LUIS prebuilt entities in [Configuring prebuilt entities](LuisModelConfiguration.md#configuring-prebuilt-entities).

### `--timestamp`
(Optional) Signals whether to add a timestamp to each NLU test result.

See the documentation on the [`timestamp` property](UtteranceExtensions.md#returning-timestamps-for-each-query) for more details.

### `-i, --include`
(Optional) Path to custom NLU provider DLL. See documentation about [Specifying the include path](https://github.com/microsoft/NLU.DevOps/blob/master/docs/CliExtensions.md#specifying-the-include-path) for more details.

Expand Down
4 changes: 2 additions & 2 deletions docs/UtteranceExtensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ When an NLU provider in NLU.DevOps returns a prediction result, the value will b
```
In this case, the intent confidence score was `0.99` and the text transcription confidence score was `0.95`. This is useful context when debugging false predictions, as a low confidence score may indicate that the model could be improved with more training examples. The recognized `genre` entity also includes a confidence score of `0.80`, although it should be noted that only the LUIS provider currently returns confidence score for entity types trained from examples.

## Labeled utterance timestamps
## Returning timestamps for each query

When analyzing results for a set of NLU predictions, it is often important context to understand when the test was run. For example, for Dialogflow `date` and `time` entities, the service only returns a date time string, and no indication of what token(s) triggered that entity to be recognized. For example, the result from a query like `"Call a taxi in 15 minutes"` may look like the following:
```json
Expand All @@ -38,7 +38,7 @@ When analyzing results for a set of NLU predictions, it is often important conte
"timestamp": "2020-01-01T00:00:00-04:00"
}
```
Without the context provided by the `timestamp` property, we wouldn't be able to make any assertion about the correctness of the `entityValue` property for time. Currently, LUIS, Lex, and Dialogflow return a timestamp for each prediction result.
Without the context provided by the `timestamp` property, we wouldn't be able to make any assertion about the correctness of the `entityValue` property for time. Currently, you must specify the [`--timestamp`](Test.md#--timestamp) option to ensure a timestamp is assigned to each NLU prediction result.

## Adjusting entity compare results

Expand Down
34 changes: 33 additions & 1 deletion src/NLU.DevOps.CommandLine/Test/TestCommand.cs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ namespace NLU.DevOps.CommandLine.Test
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading;
using System.Threading.Tasks;
using Core;
using Models;
using Newtonsoft.Json.Linq;
using static Serializer;
Expand All @@ -31,7 +33,8 @@ public override int Main()

protected override INLUTestClient CreateNLUTestClient()
{
return NLUClientFactory.CreateTestInstance(this.Options, this.Configuration, this.Options.SettingsPath);
var client = NLUClientFactory.CreateTestInstance(this.Options, this.Configuration, this.Options.SettingsPath);
return this.Options.Timestamp ? new TimestampNLUTestClient(client) : client;
}

private static void EnsureDirectory(string filePath)
Expand Down Expand Up @@ -129,5 +132,34 @@ private IEnumerable<(JToken Query, string SpeechFile)> LoadUtterances()
yield return (query, speechFile);
}
}

private class TimestampNLUTestClient : INLUTestClient
{
public TimestampNLUTestClient(INLUTestClient client)
{
this.Client = client;
}

private INLUTestClient Client { get; }

public async Task<LabeledUtterance> TestAsync(JToken query, CancellationToken cancellationToken)
{
var timestamp = DateTimeOffset.Now;
var result = await this.Client.TestAsync(query, cancellationToken).ConfigureAwait(false);
return result.WithTimestamp(timestamp);
}

public async Task<LabeledUtterance> TestSpeechAsync(string speechFile, JToken query, CancellationToken cancellationToken)
{
var timestamp = DateTimeOffset.Now;
var result = await this.Client.TestSpeechAsync(speechFile, query, cancellationToken).ConfigureAwait(false);
return result.WithTimestamp(timestamp);
}

public void Dispose()
{
this.Client.Dispose();
}
}
}
}
3 changes: 3 additions & 0 deletions src/NLU.DevOps.CommandLine/Test/TestOptions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,8 @@ internal class TestOptions : BaseOptions

[Option('p', "parallelism", HelpText = "Numeric value to determine the numer of parallel tests. Default value is 3.", Required = false)]
public int Parallelism { get; set; } = 3;

[Option("timestamp", HelpText = "Assign a timestamp to each utterance result.", Required = false)]
public bool Timestamp { get; set; }
}
}
6 changes: 2 additions & 4 deletions src/NLU.DevOps.Dialogflow/DialogflowNLUTestClient.cs
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,7 @@ protected override async Task<LabeledUtterance> TestAsync(string utterance, Canc
result.QueryResult.Intent.DisplayName,
result.QueryResult.Parameters?.Fields.SelectMany(GetEntities).ToList())
.WithScore(result.QueryResult.IntentDetectionConfidence)
.WithTextScore(result.QueryResult.SpeechRecognitionConfidence)
.WithTimestamp(DateTimeOffset.Now);
.WithTextScore(result.QueryResult.SpeechRecognitionConfidence);
},
cancellationToken)
.ConfigureAwait(false);
Expand Down Expand Up @@ -115,8 +114,7 @@ protected override async Task<LabeledUtterance> TestSpeechAsync(string speechFil
result.QueryResult.Intent.DisplayName,
result.QueryResult.Parameters?.Fields.SelectMany(GetEntities).ToList())
.WithScore(result.QueryResult.IntentDetectionConfidence)
.WithTextScore(result.QueryResult.SpeechRecognitionConfidence)
.WithTimestamp(DateTimeOffset.Now);
.WithTextScore(result.QueryResult.SpeechRecognitionConfidence);
},
cancellationToken)
.ConfigureAwait(false);
Expand Down
4 changes: 0 additions & 4 deletions src/NLU.DevOps.Lex.Tests/LexNLUTestClientTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,6 @@ public static async Task TestsWithSpeech(string slots, string entityType, string
// assert reads content from file (file contents are "hello world")
content.Should().Be("hello world");

// assert result type
result.Should().BeOfType<JsonLabeledUtterance>();

// assert intent and text
result.Intent.Should().Be(intent);
result.Text.Should().Be(transcript);
Expand Down Expand Up @@ -121,7 +118,6 @@ public static async Task CreatesLabeledUtterances()
using (var lex = new LexNLUTestClient(string.Empty, string.Empty, mockClient.Object))
{
var response = await lex.TestAsync(text).ConfigureAwait(false);
response.Should().BeOfType<JsonLabeledUtterance>();
response.Text.Should().Be(text);
response.Intent.Should().Be(intent);
response.Entities.Should().BeEmpty();
Expand Down
6 changes: 2 additions & 4 deletions src/NLU.DevOps.Lex/LexNLUTestClient.cs
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,7 @@ protected override async Task<LabeledUtterance> TestAsync(string utterance, Canc
.Select(slot => new Entity(slot.Key, slot.Value, null, 0))
.ToArray();

return new LabeledUtterance(utterance, postTextResponse.IntentName, entities)
.WithTimestamp(DateTimeOffset.Now);
return new LabeledUtterance(utterance, postTextResponse.IntentName, entities);
}

/// <inheritdoc />
Expand Down Expand Up @@ -118,8 +117,7 @@ protected override async Task<LabeledUtterance> TestSpeechAsync(string speechFil
.ToArray()
: null;

return new JsonLabeledUtterance(postContentResponse.InputTranscript, postContentResponse.IntentName, slots)
.WithTimestamp(DateTimeOffset.Now);
return new LabeledUtterance(postContentResponse.InputTranscript, postContentResponse.IntentName, slots);
}
}

Expand Down
1 change: 0 additions & 1 deletion src/NLU.DevOps.Luis.Tests/LuisNLUTestClientTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,6 @@ public static async Task TestModel()
using (var luis = builder.Build())
{
var result = await luis.TestAsync(test).ConfigureAwait(false);
result.Should().BeOfType<JsonLabeledUtterance>();
result.Text.Should().Be(test);
result.Intent.Should().Be("intent");
result.Entities.Count.Should().Be(1);
Expand Down
3 changes: 1 addition & 2 deletions src/NLU.DevOps.Luis/LuisNLUTestClient.cs
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,7 @@ Entity getEntity(EntityModel entity)
speechLuisResult.LuisResult.Entities?.Select(getEntity).ToList())
.WithProperty("intents", speechLuisResult.LuisResult.Intents)
.WithScore(speechLuisResult.LuisResult.TopScoringIntent?.Score)
.WithTextScore(speechLuisResult.TextScore)
.WithTimestamp(DateTimeOffset.Now);
.WithTextScore(speechLuisResult.TextScore);
}
}
}
1 change: 0 additions & 1 deletion src/NLU.DevOps.LuisV3.Tests/LuisNLUTestClientTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ public static async Task TestModel()
using (var luis = builder.Build())
{
var result = await luis.TestAsync(test).ConfigureAwait(false);
result.Should().BeOfType<JsonLabeledUtterance>();
result.Text.Should().Be(test);
result.Intent.Should().Be("intent");
result.Entities.Count.Should().Be(1);
Expand Down
3 changes: 1 addition & 2 deletions src/NLU.DevOps.LuisV3/LuisNLUTestClient.cs
Original file line number Diff line number Diff line change
Expand Up @@ -208,8 +208,7 @@ private LabeledUtterance LuisResultToLabeledUtterance(SpeechPredictionResponse s
return new LabeledUtterance(query, intent, entities)
.WithProperty("intents", intents)
.WithScore(intentData?.Score)
.WithTextScore(speechPredictionResponse.TextScore)
.WithTimestamp(DateTimeOffset.Now);
.WithTextScore(speechPredictionResponse.TextScore);
}
}
}