In [2]:
import json
import os
import pandas as pd
import numpy as np

# Fly Me
A travel provider for individuals and professionals. 

## Overview
The chatbot project - Jupyter Notebook
Fly Me a travel agency has launched an ambitious project to develop a *chatbot* to *help users choose a travel offer*.

The first phase of this project is to **develop an MVP** that will help Fly Me employees to easily book airline tickets for their holidays.

This first MVP will allow us to test the concept and performance of the chatbot quickly and on a large scale.

As this project is iterative, we have limited the features of the chatbot V1. It must be able to identify the following five elements in the user's request:

* Departure city
* Destination city
* Desired flight departure date
* Desired flight return date
* Maximum budget for the total price of tickets

If one of the elements is missing, the chatbot must be able to ask the user the relevant questions in order to fully understand the request. When the chatbot thinks it has understood all the elements of the user's request, it must be able to reformulate the user's request and ask the user for confirmation.

## Tools and Technologies used
To carry out this project, we will need to use the following tools and technologies:

* The Microsoft Bot Framework source code for Python “Microsoft Bot Framework SDK v4 for Python” 
* The Azure LUIS Cognitive Service which allows you to perform a semantic analysis of a message entered by the user and structure it for processing by the bot (it should allow you to identify the five elements requested)
* The Azure Web App service that allows you to run a web application on the Azure Cloud (you won't need to use the Azure Bot service)
* The Bot Framework Emulator which will allow you to test your chatbot locally and in production. It is an interface that allows a user to interact with the chatbot.

## Data
The data used to train the LUIS model comes from a dataset of conversations between users and travel agents. This dataset is in JSON format and contains a total of 1500 conversations. Each conversation consists of a series of messages exchanged between the user and the travel agent. The messages contain information about the user's travel preferences, such as departure city, destination city, travel dates, and budget. For more details about the dataset, please refer to the original source: [Frames Dataset](https://www.microsoft.com/en-us/research/project/frames-dataset/).

For the purpose of this project, the dataset is divided into two parts: a training set and a test set. The training set contains 80% conversations and is used to train the LUIS model, the remainder of the conversations are assigned to the test set. The test set is used to evaluate the performance of the LUIS model.
The dataset is available in the `data/frames_dataset` directory of the project. The main file is `frames.json`, which contains all the conversations. The file is structured as follows:
```json
{
  "conversations": [
    {
      "id": "1",
      "turns": [
        {
          "speaker": "user",
          "text": "I want to book a flight from New York to Paris."
        },
        {
          "speaker": "agent",
          "text": "Sure, when do you want to depart?"
        },
        ...
      ]
    },
    ...
  ]
}
```
The `turns` field contains a list of messages exchanged between the user and the travel agent. Each message has a `speaker` field indicating who sent the message (either "user" or "agent") and a `text` field containing the content of the message.

## Objective
The objective of this notebook is to explore and preprocess the dataset to prepare it for training the LUIS model. This includes loading the JSON file, flattening the nested structure, and extracting relevant information from the conversations. We will use the `pandas` library to manipulate the data and perform basic exploratory data analysis (EDA) to understand the distribution of the different entities and intents in the dataset.

## What is a bot?
Bots provide an experience that feels less like using a computer and more like dealing with a person—or intelligent robot. Bots can used to shift simple, repetitive tasks—such as taking a dinner reservation or gathering profile information—onto automated systems that may no longer require direct human intervention. Users converse with a bot using text, interactive cards, and speech. A bot interaction can be a quick answer to a question or an involved conversation that intelligently provides access to services.

In creating a bot, we define how the bot interacts with users, what services it connects to, and how it processes information. We can create bots that run in a variety of environments, including websites, apps, Microsoft Teams, Skype, Slack, Facebook Messenger, and more.
For a detailed overview of bots, see [What are bots?](https://learn.microsoft.com/en-us/azure/bot-service/bot-service-overview-introduction?view=azure-bot-service-4.0). 

In this project we will create a bot that can help users book flights using the Language Understanding Interface System (LUIS) or the Conversational Language Understanding (CLU) model from Microsoft. To create the model we will define intents, entities, and utterances.

1. **Intents**: 

An intent represents a task or action that the user wants to perform. It is the purpose of the user's input. For example, in our flight booking bot, we might have intents such as "BookFlight", "CancelFlight", and "GetFlightStatus".

2. **Entities**: 

Entities are used to extract specific pieces of information from the user's input that are relevant to the intent. For example, in the "BookFlight" intent, we might have entities such as "DepartureCity", "DestinationCity", "DepartureDate", "ReturnDate", and "Budget".

3. **Utterances**: 

Utterances are the actual phrases or sentences that users might say to express their intent. For example, for the "BookFlight" intent, some example utterances might be "I want to book a flight from New York to Paris on June 1st and return on June 10th with a budget of $1000" or "Can you help me find a flight to London next month?".

In summary:
* An **intent** represents the purpose of a user's input or a task/action a user wants to perform. It is the meaning of an utterance. In this case, the intent is to book a flight.
* An **entity** are used to add specific context to intents. Represents a specific piece of information that is relevant to the intent. In this case, the entities are the departure city, destination city, departure date, return date, and budget.
* An **utterance** is a phrase(s) that a user might enter when interacting with the application. It is a specific example of a user's input that expresses the intent. In this case, an utterance could be "I want to book a flight from New York to Paris on June 1st and return on June 10th with a budget of $1000."

We create a model by defining intents and associating them with one or more utterances. We define the intents we want the model to understand. Every model must have a **None** intent (used to explicitly identify utterances that a user might submit, but for which there is no specific action required (for example, conversational greetings like "hello") or that fall outside of the scope of the domain for this model). We must define the entities that are relevant to the intent and annotate the utterances with the appropriate entities. This helps the model learn to recognize the intent and extract the relevant entities from user input. 

The model can then recognize the intent to book a flight and extract the relevant entities from the user's input. This model can then be integrated into a chatbot application to help users book flights.



In [13]:
# Flatten nested JSON columns into a flat DataFrame
from pandas import json_normalize
# Load JSON file
path = '../data/frames_dataset/frames.json'
with open(path, 'r') as file:
    data = json.load(file)

# create a flat DataFrame from the loaded `data` variable
try:
    # Normalize `data` to have all dicts/lists at the same level
    if 'data' in globals() and (isinstance(data, list) or isinstance(data, dict)):
        df_flat = json_normalize(data, sep='.')
    else:
        # Fall back to normalizing rows from the existing DataFrame `df`
        df_flat = json_normalize(df.to_dict(orient='records'), sep='.')
except Exception as e:
    print('Normalization error:', e)
    df_flat = df.copy()

# Handle columns that are lists of dicts: detect and explode+normalize them
for col in df_flat.columns.tolist():
    # If the column contains lists of dicts, expand them
    if df_flat[col].apply(lambda x: isinstance(x, list) and len(x) > 0 and isinstance(x[0], dict)).any():
        # Explode the list so each element becomes its own row, then normalize that column
        exploded = df_flat.explode(col).reset_index(drop=True)
        # Normalize the exploded column (it may contain dicts or NaN)
        expanded = json_normalize(exploded[col].dropna().tolist(), sep='.')
        # Prefix new columns with the original column name
        expanded = expanded.add_prefix(col + '.')
        # Join back to exploded frame (align by index)
        exploded = exploded.drop(columns=[col]).join(expanded)
        df_flat = exploded

# Show results
print('Flattened DataFrame shape:', df_flat.shape)
display(df_flat.head())

Flattened DataFrame shape: (19986, 14)


Unnamed: 0,user_id,wizard_id,id,labels.userSurveyRating,labels.wizardSurveyTaskSuccessful,turns.text,turns.author,turns.timestamp,turns.labels.acts,turns.labels.acts_without_refs,turns.labels.active_frame,turns.labels.frames,turns.db.result,turns.db.search
0,U22HTHYNP,U21DKG18C,e2c0fc6c-2134-4891-8353-ef16d8412c9a,4.0,True,I'd like to book a trip to Atlantis from Capri...,user,1471272000000.0,"[{'args': [{'val': 'book', 'key': 'intent'}], ...","[{'args': [{'val': 'book', 'key': 'intent'}], ...",1,"[{'info': {'intent': [{'val': 'book', 'negated...",,
1,U22HTHYNP,U21DKG18C,e2c0fc6c-2134-4891-8353-ef16d8412c9a,4.0,True,"Hi...I checked a few options for you, and unfo...",wizard,1471272000000.0,"[{'args': [{'val': [{'annotations': [], 'frame...",,1,"[{'info': {'intent': [{'val': 'book', 'negated...",[[{'trip': {'returning': {'duration': {'hours'...,"[{'ORIGIN_CITY': 'Porto Alegre', 'PRICE_MIN': ..."
2,U22HTHYNP,U21DKG18C,e2c0fc6c-2134-4891-8353-ef16d8412c9a,4.0,True,"Yes, how about going to Neverland from Caprica...",user,1471273000000.0,"[{'args': [{'val': 'Neverland', 'key': 'dst_ci...","[{'args': [{'val': 'Neverland', 'key': 'dst_ci...",2,"[{'info': {'intent': [{'val': 'book', 'negated...",,
3,U22HTHYNP,U21DKG18C,e2c0fc6c-2134-4891-8353-ef16d8412c9a,4.0,True,I checked the availability for this date and t...,wizard,1471273000000.0,[{'args': [{'val': [{'annotations': [{'val': N...,,2,"[{'info': {'intent': [{'val': 'book', 'negated...","[[], [], [], [], [], []]","[{'ORIGIN_CITY': 'Caprica', 'PRICE_MIN': '1700..."
4,U22HTHYNP,U21DKG18C,e2c0fc6c-2134-4891-8353-ef16d8412c9a,4.0,True,I have no flexibility for dates... but I can l...,user,1471273000000.0,"[{'args': [{'val': False, 'key': 'flex'}], 'na...","[{'args': [{'val': False, 'key': 'flex'}], 'na...",3,"[{'info': {'intent': [{'val': 'book', 'negated...",,


In [None]:
df_flat['turns.text'].values

0        I'd like to book a trip to Atlantis from Capri...
1        Hi...I checked a few options for you, and unfo...
2        Yes, how about going to Neverland from Caprica...
3        I checked the availability for this date and t...
4        I have no flexibility for dates... but I can l...
                               ...                        
19981    Yup it's from the 12th to the 25th, and it wil...
19982                                 Ok perfect, book me!
19983    Consider it done! Have a good trip :slightly_s...
19984                                              Thanks!
19985                                         My pleasure!
Name: turns.text, Length: 19986, dtype: object

In [None]:
import platform
print(platform.architecture()[0])

## Create Conversation Analysis Client
To create a Conversation Analysis client, you need to install the azure-ai-conversationanalysis package. You can do this using pip:


```bashpip install azure-ai-conversationanalysis

```python
import json
import pandas as pd
from pandas import json_normalize
# Load the JSON file
with open('../data/frames_dataset/frames.json', 'r') as file:
    data = json.load(file)
# Normalize the JSON data to flatten the structure
df = json_normalize(data['conversations'], 'turns', ['id'])
# Display the first few rows of the DataFrame
df.head()
'''

#### Generate a LUIS JSON from the frames.json file
This dataset is in JSON format and contains a total of 1500 conversations. Each conversation consists of a series of messages exchanged between the user and the travel agent. The messages contain information about the user's travel preferences, such as departure city, destination city, travel dates, and budget.
The dataset is divided into two parts: a training set and a test set. The training set contains 1200 conversations and is used to train the LUIS model. The test set contains 300 conversations and is used to evaluate the performance of the LUIS model.
The dataset is available in the `data/frames_dataset` directory of the project. The main file is `frames.json`, which contains all the conversations. The file is structured as follows:
```json
{
  "conversations": [
    {
      "id": "1",
      "turns": [
        {
          "speaker": "user",
          "text": "I want to book a flight from New York to Paris."
        },
        {
          "speaker": "agent",
          "text": "Sure, when do you want to depart?"
        },
        ...
      ]
    },
    ...
  ]
}
```
The dataset contains conversation in French and English. The LUIS model must be able to understand both languages.
Each conversation is identified by a unique ID and consists of multiple turns. Each turn has a speaker (either "user" or "agent") and the text of the message.
The structure:
* multiple turns per conversation (user and agent)
* each turn has a speaker and text (frames)
* information about travel preferences (departure city, destination city, travel dates, budget)
The goal is to extract the relevant information (<user utterances>) from the conversations and format it according to the LUIS JSON structure. This includes identifying the intents and entities in the user's messages and structuring (map slots/entities) them in a way that LUIS can understand.

**LUIS JSON Structure:**
```json
{
  "luis_schema_version": "7.0.0",
  "versionId": "0.1",
  "name": "TravelBooking",
  "desc": "LUIS model for travel booking chatbot",
  "culture": "en-us",
  "intents": [
    {
      "name": "BookFlight"
    }
  ],
  "entities": [
    {
      "name": "DepartureCity"
    },
    {
      "name": "DestinationCity"
    },
    {
      "name": "DepartureDate"
    },
    {
      "name": "ReturnDate"
    },
    {
      "name": "Budget"
    }
  ],
  "composites": [],
  "closedLists": [],
  "patternAnyEntities": [],
  "regex_entities": [],
  "prebuiltEntities": [],
  "model_features": [],
  "regex_features": [],
  "utterances": [
    {
      "text": "I want to book a flight from New York to Paris.",
      "intent": "BookFlight",
      "entities": [
        {
          "entity": "DepartureCity",
          "startPos": 27,
          "endPos": 34
        },
        {
          "entity": "DestinationCity",
          "startPos": 38,
          "endPos": 42
        }
      ]
    }
  ],
  "patterns": []
}
```

# Authoring the APP

### Get an API key
az cognitiveservices account keys list --resource-group <resource-group-name> --name <resource-name>


az cognitiveservices account keys list --resource-group luisapp --name fly

### Create ConversationAnalysisClient

In [None]:
import azure.core
from azure.core.credentials import AzureKeyCredential
from azure.ai.language.conversations import ConversationAnalysisClient

endpoint = "https://fly.cognitiveservices.azure.com/"
credential = AzureKeyCredential("key1")
client = ConversationAnalysisClient(endpoint, credential)

### Create ConversationAuthoringClient

In [None]:
from azure.core.credentials import AzureKeyCredential
from azure.ai.language.conversations.authoring import ConversationAuthoringClient

endpoint = "https://fly.cognitiveservices.azure.com/"
credential = AzureKeyCredential("key1")
client = ConversationAuthoringClient(endpoint, credential)

### Create a client with an Azure Active Directory Credential

In [None]:
from azure.ai.language.conversations import ConversationAnalysisClient
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
client = ConversationAnalysisClient(endpoint="https://fly.cognitiveservices.azure.com/", credential=credential)

### Example using DefaultAzureCredential

In [None]:
# import libraries
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.language.conversations import ConversationAnalysisClient

# get secrets
# clu_endpoint = os.environ["AZURE_CONVERSATIONS_ENDPOINT"]
clu_endpoint = os.environ[endpoint]
# clu_key = os.environ["AZURE_CONVERSATIONS_KEY"]
clu_key = os.environ[credential]
# project_name = os.environ["AZURE_CONVERSATIONS_PROJECT_NAME"]
project_name = os.environ[booking_travel]
# deployment_name = os.environ["AZURE_CONVERSATIONS_DEPLOYMENT_NAME"]
deployment_name = os.environ[luismodel]

# analyze quey
client = ConversationAnalysisClient(clu_endpoint, AzureKeyCredential(clu_key))
with client:
    query = "Hi there! So, between September 7 and 27 I would like to see what is available from Curitiba to Mexico City. My budget is around 1900 dollars and I will be traveling with my wife and two kids. Can you help me with that?"
    result = client.analyze_conversation(
        task={
            "kind": "Conversation",
            "analysisInput": {
                "conversationItem": {
                    "participantId": "1",
                    "id": "1",
                    "modality": "text",
                    "language": "en",
                    "text": query
                },
                "isLoggingEnabled": False
            },
            "parameters": {
                "projectName": project_name,
                "deploymentName": deployment_name,
                "verbose": True
            }
        }
    )

# view result
print("query: {}".format(result["result"]["query"]))
print("project kind: {}\n".format(result["result"]["prediction"]["projectKind"]))

print("top intent: {}".format(result["result"]["prediction"]["topIntent"]))
print("category: {}".format(result["result"]["prediction"]["intents"][0]["category"]))
print("confidence score: {}\n".format(result["result"]["prediction"]["intents"][0]["confidenceScore"]))

print("entities:")
for entity in result["result"]["prediction"]["entities"]:
    print("\ncategory: {}".format(entity["category"]))
    print("text: {}".format(entity["text"]))
    print("confidence score: {}".format(entity["confidenceScore"]))
    if "resolutions" in entity:
        print("resolutions")
        for resolution in entity["resolutions"]:
            print("kind: {}".format(resolution["resolutionKind"]))
            print("value: {}".format(resolution["value"]))
    if "extraInformation" in entity:
        print("extra info")
        for data in entity["extraInformation"]:
            print("kind: {}".format(data["extraInformationKind"]))
            if data["extraInformationKind"] == "ListKey":
                print("key: {}".format(data["key"]))
            if data["extraInformationKind"] == "EntitySubtype":
                print("value: {}".format(data["value"]))