#0. Set path

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
proj_path = "/content/drive/MyDrive/omdena/chatbot_omdena/chatbot-a-v1/"

## 1. Load dataset

In [3]:
import pandas as pd

# Load the CSV data
df = pd.read_csv('https://raw.githubusercontent.com/nyhDatAIQC/omdena_chatbot/main/data/1_joining_omdena.csv')

# Show the first few rows of the dataframe
df.head()

Unnamed: 0,story_1,Intent/Action name,story_2,Intent/Action name.1,story_3,Intent/Action Name,story_4,Intent/Action Name.1,story_5,Intent/Action Name.2,story_6,Intent/Action Name.3,story_7,Intent/Action Name.4,story_7.1,Intent/Action Name.5,story_7.2,Intent/Action Name.6
0,What is Omdena's main focus?,about_omdena,How do I join Omdena?,join_omdena,"How do Omdena Local Chapters work, and how can...",join_local_chapter,Are the courses paid or free?,access_courses,"Hi, I'd like to know more about Omdena projects.",about_omdena,I'm interested in becoming a collaborator. How...,join_collaborator,Is there any support or guidance provided duri...,support,I represent a company and would like to partne...,join_company,Are there any AI learning resources available ...,resources
1,To provide a platform for a global community o...,utter_about_omdena,You can join Omdena through the Local Chapters...,utter_join_omdena,Local Chapters enable you to work on projects ...,utter_join_local_chapter,"As a collaborator, you could be considered for...",utter_access courses,Sure! Omdena hosts various AI projects that ta...,utter_about_omdena,"Visit our website's ""Projects"" section to see ...",utter_join_collaborator,"Absolutely! Throughout the project, you'll rec...",utter_support,Fantastic! Omdena welcomes collaborations with...,utter_join_company,Absolutely! Omdena values learning and knowled...,utter_resources
2,,,How can I develop my AI portfolio with Omdena?,develop_portfolio,How do I join an AI Innovation challenge?,join_ai_innovation_challenge,,,,,I found a project that interests me. What shou...,about_project,What happens after the project is completed?,project_completion,What are the key elements included in the part...,partnership,Do you host events or webinars related to AI?,events
3,,,Collaborate to solve real-world challenges thr...,utter_develop_portfolio,Explore real world challenges and develop real...,utter_join_ai_innovation_challenge,,,,,"Fantastic! Once you've selected a project, cli...",utter_about_project,"At the end of the project, your team will pres...",utter_project_completion,"If your project is selected for collaboration,...",utter_partnership,"Yes, we do! Omdena regularly organizes webinar...",utter_events
4,,,,,"What is the Omdena School, and what courses do...",join_omdena_school,,,,,I've submitted my application. What's the next...,application_period,This sounds like a fantastic opportunity! Befo...,,Is there anything else we need to know?,additional_info,How can I get in touch with someone from Omdena?,omdena_support


In [4]:

df['story_3'][5]

'Omdena School empowers learners with quality education in Machine Learning, Artificial Intelligence, Data Science and Data Engineering courses. Sign up as a student or instructor https://omdena.com/omdena-school/'

The provided CSV file contains 7 different stories, each with their own intents/actions. In order to generate the domain.yml file, we will:

- Identify all unique intents (intents start with no "utter_" prefix) and responses (responses start with the "utter_" prefix) across all the stories.
- Group the responses by their names and concatenate the associated stories to form the text for each response.
- Write the intents and responses to the domain.yml file in the appropriate format.

Let's start by extracting all the unique intents and responses.

## 1. Create Domain.yml

In [5]:

# Initialize lists to store intents and actions
intents = []
actions = []

# Iterate over all columns in the dataframe
for column in df.columns:
    if "Intent/Action" in column:  # Check if the column contains intents/actions
        for value in df[column].dropna().unique():  # Drop NaN values and get unique values
            if "utter_" in value:  # If the value starts with "utter_", it's a response
                actions.append(value)
            else:  # Otherwise, it's an intent
                intents.append(value)

# Get unique intents and actions
unique_intents = list(set(intents))
unique_actions = list(set(actions))

unique_intents, unique_actions


(['additional_info',
  'join_omdena',
  'about_omdena',
  'commitment',
  'about_project',
  'join_ai_innovation_challenge',
  'join_company',
  'join_omdena_school',
  'join_local_chapter',
  'access_courses',
  'application_period',
  'support',
  'join_collaborator',
  'project_completion',
  'resources',
  'technical_help',
  'develop_portfolio',
  'events',
  'partnership',
  'beginner_join',
  'benefits_of_joining_community',
  'omdena_support'],
 ['utter_join_omdena_school',
  'utter_application_period',
  'utter_about_project',
  'utter_additional_info',
  'utter_benefits_of_joining_community',
  'utter_omdena_support',
  'utter_join_omdena',
  'utter_about_omdena',
  'utter_join_local_chapter',
  'utter_support',
  'utter_join_collaborator',
  'utter_resources',
  'utter_develop_portfolio',
  'utter_events',
  'utter_access courses',
  'utter_project_completion',
  'utter_partnership',
  'utter_join_company',
  'utter_technical_help',
  'utter_commitment',
  'utter_join_ai_inn

We have successfully extracted unique intents and responses from the CSV file. Now, let's create a mapping of responses to their associated stories (responses). This mapping will be used to generate the response text in the domain.yml file.

Each response is linked to a story that corresponds to the same row in the dataframe. We'll iterate through each story and response in the CSV file, and create a dictionary that maps each response to a list of stories. If a response is associated with multiple stories, the stories will be grouped together in a list.

In [7]:
# Initialize a dictionary to store actions and their associated answers
actions_to_answers = {action: [] for action in unique_actions}

# Iterate over all columns in the dataframe
for i in range(0, len(df.columns), 2):  # Step by 2 to iterate over pairs of answer and intent/action columns
    # Get the answer column and the intent/action column
    answer_column = df.columns[i]
    intent_action_column = df.columns[i+1]

    for index, row in df.iterrows():
        intent_action = row[intent_action_column]
        if isinstance(row[answer_column], str):  # Skip NaN values
          answer = row[answer_column].replace("'", "''")  #Replace "'" with "''" since it will break yml

          # If the intent/action is a action (starts with "utter_"), add the answer to the action's list
          if isinstance(intent_action, str) and "utter_" in intent_action:
              actions_to_answers[intent_action].append(answer) #

try:
  print(actions_to_answers['utter_join_collaborator'])
except:
  pass


['Visit our website\'\'s "Projects" section to see the list of ongoing and upcoming projects. You can \nbrowse through various topics and find one that aligns with your interests and skills. \nIf you need any guidance or have specific questions, feel free to ask!']


In [8]:
intent_action

nan

We have successfully created a mapping of responses to their associated stories.

Now, we will generate the domain.yml file. This file will contain all the intents and responses in the correct format. We will write the version, intents, responses, and session_config to the file. For the responses, we will include each response name along with its associated stories as the text for the response.

The session_config and version values will remain constant based on the example you provided:

```
version: "3.1"
session_config:
session_expiration_time: 60
carry_over_slots_to_new_session: true
```
Finished working


In [9]:
# Define the session_config and version
session_config = {
    "session_expiration_time": 60,
    "carry_over_slots_to_new_session": True
}
version = "3.1"

# Define the path to the output domain.yml file
output_path = proj_path + "domain.yml"

with open(output_path, "w") as f:
    # Write the version
    f.write(f"version: \"{version}\"\n\n")

    # Write the intents
    f.write("intents:\n")
    for intent in unique_intents:
        f.write(f"  - {intent}\n")
    f.write("\n")

    # Write the responses
    f.write("responses:\n")
    for response, stories in actions_to_answers.items():
        f.write(f"  {response}:\n")
        for story in stories:
            # Escape any special YAML characters in the story (like : and -)
            # story = story.replace(":", "\\:")
            # story = story.replace("-", "\\-")

            # Add 4 tab spaces after the first new line character in the text field
            story = story.replace("\n", "\n" + "  " * 4)

            f.write(f"    - text: \'{story}\'\n")

        f.write("\n")

    # Write the session_config
    f.write("session_config:\n")
    for key, value in session_config.items():
        f.write(f"  {key}: {value}\n")

# Return the path to the generated domain.yml file
output_path

'/content/drive/MyDrive/omdena/chatbot_omdena/chatbot-a-v1/domain.yml'

## 2. Generate stories.yml

In [10]:
# Initialize a dictionary to store stories and their associated intents/actions
stories_to_intents_actions = {}

# Iterate over all columns in the dataframe
for i in range(0, len(df.columns), 2):  # Step by 2 to iterate over pairs of story and intent/action columns
    # Get the story column and the intent/action column
    story_column = df.columns[i]
    intent_action_column = df.columns[i+1]

    # Create a list to store the intents/actions associated with the story
    intents_actions = []

    # Iterate over all rows in the story and intent/action columns
    for index, row in df.iterrows():
        story = row[story_column]
        intent_action = row[intent_action_column]

        # If the story and intent/action are not NaN values, add the intent/action to the list
        if pd.notnull(story) and pd.notnull(intent_action):
            intents_actions.append(intent_action)

    # Add the intents/actions list to the dictionary with the story name as the key
    stories_to_intents_actions[story_column] = intents_actions

stories_to_intents_actions


{'story_1': ['about_omdena', 'utter_about_omdena'],
 'story_2': ['join_omdena',
  'utter_join_omdena',
  'develop_portfolio',
  'utter_develop_portfolio'],
 'story_3': ['join_local_chapter',
  'utter_join_local_chapter',
  'join_ai_innovation_challenge',
  'utter_join_ai_innovation_challenge',
  'join_omdena_school',
  'utter_join_omdena_school',
  'beginner_join',
  'utter_beginner_join',
  'benefits_of_joining_community',
  'utter_benefits_of_joining_community'],
 'story_4': ['access_courses', 'utter_access courses'],
 'story_5': ['about_omdena', 'utter_about_omdena'],
 'story_6': ['join_collaborator',
  'utter_join_collaborator',
  'about_project',
  'utter_about_project',
  'application_period',
  'utter_application_period',
  'commitment',
  'utter_commitment'],
 'story_7': ['support',
  'utter_support',
  'project_completion',
  'utter_project_completion'],
 'story_7.1': ['join_company',
  'utter_join_company',
  'partnership',
  'utter_partnership',
  'additional_info',
  'utter

In [11]:
# Define the version
version = "3.1"

# Define the path to the output stories.yml file
output_path = proj_path + "data/stories.yml"

with open(output_path, "w") as f:
    # Write the version
    f.write(f"version: \"{version}\"\n\n")

    # Write the stories
    f.write("stories:\n")
    for story, intents_actions in stories_to_intents_actions.items():
        f.write(f"\n- story: {story}\n")
        f.write("  steps:\n")
        for intent_action in intents_actions:
            if "utter_" in intent_action:  # If the intent/action is a response (starts with "utter_"), it's an action
                f.write(f"  - action: {intent_action}\n")
            else:  # Otherwise, it's an intent
                f.write(f"  - intent: {intent_action}\n")

# Return the path to the generated stories.yml file
output_path


'/content/drive/MyDrive/omdena/chatbot_omdena/chatbot-a-v1/data/stories.yml'

## 3. Generate nlu.yml

In [12]:
# Initialize a dictionary to store intents and their associated examples
intents_to_examples = {}

# Iterate over all columns in the dataframe
for i in range(0, len(df.columns), 2):  # Step by 2 to iterate over pairs of story and intent/action columns
    # Get the story column and the intent/action column
    story_column = df.columns[i]
    intent_action_column = df.columns[i+1]

    for index, row in df.iterrows():
        story = row[story_column]
        intent_action = row[intent_action_column]

        # If the story and intent/action are not NaN values, add the story to the intent's list of examples
        if isinstance(story, str) and isinstance(intent_action, str) and not "utter_" in intent_action:
            if intent_action not in intents_to_examples:
                intents_to_examples[intent_action] = []
            intents_to_examples[intent_action].append(story)

intents_to_examples


{'about_omdena': ["What is Omdena's main focus?",
  "Hi, I'd like to know more about Omdena projects."],
 'join_omdena': ['How do I join Omdena?'],
 'develop_portfolio': ['How can I develop my AI portfolio with Omdena?'],
 'join_local_chapter': ['How do Omdena Local Chapters work, and how can I join one?'],
 'join_ai_innovation_challenge': ['How do I join an AI Innovation challenge?'],
 'join_omdena_school': ['What is the Omdena School, and what courses does it offer?'],
 'beginner_join': ['Can I join as a beginner?'],
 'benefits_of_joining_community': ['What are the benefits of joining the Omdena community?'],
 'access_courses': ['Are the courses paid or free?'],
 'join_collaborator': ["I'm interested in becoming a collaborator. How can I join a project?"],
 'about_project': ['I found a project that interests me. What should I do next?'],
 'application_period': ["I've submitted my application. What's the next step in the process?"],
 'commitment': ['What kind of commitment is expected

In order to generate the nlu.yml file:

We will identify each unique intent and its associated examples.
Group the examples by their respective intents.
Write the intents and their associated examples to the nlu.yml file in the appropriate format.
An intent corresponds to the goal or purpose of the user's input. Examples are representative phrases that the user might say to express a given intent. For each intent, we will take all the questions (stories) that map to that intent as examples.

Let's begin by extracting the unique intents and their associated examples from the CSV file. For each intent, we will create a dictionary that maps the intent to a list of examples. If an intent has multiple examples, they will be grouped together in a list. Let's start by creating this mapping.

In [13]:
# Define the version
version = "3.1"

# Define the path to the output nlu.yml file
output_path = proj_path + "data/nlu.yml"

with open(output_path, "w") as f:
    # Write the version
    f.write(f"version: \"{version}\"\n\n")

    # Write the nlu data
    f.write("nlu:\n")
    for intent, examples in intents_to_examples.items():
        f.write(f"\n- intent: {intent}\n")
        f.write("  examples: |\n")
        for example in examples:
            # Escape any special YAML characters in the example (like : and -)
            example = example.replace(":", "\\:")
            example = example.replace("-", "\\-")

            f.write(f"    - {example}\n")

# Return the path to the generated nlu.yml file
output_path


'/content/drive/MyDrive/omdena/chatbot_omdena/chatbot-a-v1/data/nlu.yml'