# Automation of task creation for chat discussions

Based on the results of the meeting or chat history, it is necessary to create tasks for developers.
The task consists of parts:
- analyze the discussion and collect threads;
- determine the type of thread (bug, feature or just discussion);
- create a task (trello).


I will use Gemini. Install library and prepare notebook.

In [None]:
!pip uninstall -qqy jupyterlab  # Remove unused conflicting packages
!pip install -U -q "google-genai==1.7.0"
from google import genai
from google.genai import types
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")

In [None]:
# Define a retry policy. The model might make multiple consecutive calls automatically
# for a complex query, this ensures the client retries if it hits quota limits.
from google.api_core import retry

is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

if not hasattr(genai.models.Models.generate_content, '__wrapped__'):
  genai.models.Models.generate_content = retry.Retry(
      predicate=is_retriable)(genai.models.Models.generate_content)

In [None]:
client = genai.Client(api_key=GOOGLE_API_KEY)

# Understand text, split it into JSON

I'll use a text content, but it can be an audio/video (I mean text extraction).

In [None]:
# Input. The text to be analyzed.
path_to_file = ''
!wget path_to_file -O discussion.pdf
document_file = client.files.upload(file='discussion.pdf')

We need a structured text as a result.
Since the output length is limited with a large input file, it is possible to split the analysis into 2 steps: 1) find threads 2) in a loop for each thread find all the information. 
But since I have a test version, we will do it all in one request.

In [None]:
few_shot_prompt = '''The document contains a discussion between programmers and users about the application. Users mostly ask questions, suggest new features, and report bugs.
You need to select the important information from the chat and structure result into valid JSON.
One issue/suggestion/discussion should be in one thread. Each object can contain only one thread.
The user story must be full and contain all the details.
The subject should be short and reflect the problem or proposal.

The description is a technical summary needed to identify the type of problem (bug, feature request, question) written for developers. This is a text of the task for the developer. So use technical definitions here.
IMPORTANT to understand whether this is a bug, a feature, or just a discussion and reflect this in the description.
Typical words to describe a bug: error, crash, stack trace, OOM, etc.
Typical words to describe a feature: suggest, add, create, would like, improve, etc.

Several issues are discussed in the chat at the same time. You have to collect the threads of discussions yourself.
Ignore irrelevant information (anything that is not related to the discussion of the application) or unclear context messages.

EXAMPLE JSON Response:
```
{
"participants" : ["Jack", "Maria"],
"subject": "Improve error message",
"user_story": "When I press the X button, a message appears with a technical description. I don't understand what to do.",
"description" : "Improve the error message. It must say what happend and what the user have to do. Fx, error loading data."
}
```

EXAMPLE JSON Response:
```
{
"participants" : ["Daniel"],
"subject": "The table is not displayed",
"user_story": "I select the Tasks item in the menu. On the page, I select any task and click on the List of editors. A dialog opens in which the loader is displayed, which will never complete.",
"description" : "Uncaught error or no data. We need to figure out what the problem is. And we need to show either the data or the error message."
}
```

'''

At this stage, it is important to be creative and smart, so we need to use the latest model with high temperature.

In [None]:
# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

@retry.Retry(predicate=is_retriable, timeout=300.0)
def summarise_doc(request: str) -> str:
  """Execute the request on the uploaded document."""
  config=types.GenerateContentConfig(
        temperature=1.5,
        # top_p=0.5,
        response_mime_type="application/json"
  )
  response = client.models.generate_content(
      model='gemini-2.0-flash',
      config=config,
      contents=[request, document_file],
  )

  return response.text

In [None]:
summary = summarise_doc(few_shot_prompt)

In [None]:
# let's see the result
import json
print(json.dumps(json.loads(summary), indent=2))

# Classificator

Typical topics of daily developer discussions are bugs, features, or nice to have. These will be our classes.
Since I don't have any public tasks for training the classifier, we'll generate them with a minimal description. High creativity and not too long content is required.

In [None]:
import pandas as pd
import json

gen_config = types.GenerateContentConfig(max_output_tokens=250, temperature=2.0, response_mime_type="application/json")

# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

@retry.Retry(predicate=is_retriable, timeout=300.0)
def generate_tasks(task_id : int, task_type : str, task_sample : str, max_cnt : int):
  gen_prompt = f"""There is a task board for programmers working on Communication application (like Gmail, Google Chat, ...).
                  IMPORTANT to create ONE task (one object) for the board in the column {task_type}. Think of any name for a person. Use markdown syntax for description. Escape all special symbols.
The description should be SHORT, specific and with technical details. That is a ready-made card for the performer. It can indicate the names of functions, places in the interface, stack trace, etc.
Description example : {task_sample}

EXAMPLE Response:
```
[{{
"author" : "First_Name Last_Name",
"subject": "Title"
"description": "Description of the problem or suggestion."
}}]
```
"""
    
  tasks = []
  for _ in range(max_cnt):
    response = client.models.generate_content(
        model='gemini-2.0-flash',
        config=gen_config,
        contents=gen_prompt
    )
    if response.text:
        try:
            obj_response = json.loads(response.text)
            for item in obj_response:
                item['class'] = task_id
                item['type'] = task_type
                tasks.append(item)
        except json.JSONDecodeError:
             # "close" json if generation stopped due to length
            try:
                normalized_response = response.text.removesuffix(']').removesuffix('"}').removesuffix('"\n}') + '"\n}\n]'
                obj_response = json.loads(normalized_response)
                for item in obj_response:
                    item['class'] = task_id
                    item['type'] = task_type
                    tasks.append(item)
            except json.JSONDecodeError:
                print(normalized_response)

  df_tasks = pd.DataFrame(tasks)
  return df_tasks

Generate max 100 tasks for each type.

In [None]:
bug_sample = """The "Address can not be empty" error is shown on the map screen after selecting an address from google suggestions \n # Steps to reproduce: 1) Tap the suggestion 2) Tap the green button to submit \n Results Actual \n The "Address can not be empty" error is shown on the map screen after selecting an address from google suggestions \n Results Expected \n Google suggestion is saved as the address to the "Address" field of the "Profile info" form after selecting an address from google suggestions

Use words to describe the issue: error, crash, stack trace, OOM, etc.
"""
bugs = generate_tasks(1, 'BUG', bug_sample, 100)

feature_sample = """Description of NEED: User profiles pictures don’t have any personalization options, like the ability to draw mustaches or smiles on one’s profile picture.  \n In the Account Settings page, ADD “Draw on photo” button. Clicking on that opens up a simple line drawing tool and color picker like the one you see in Preview app.

Use words to describe the feature: suggest, add, need, create, want, etc."""
features = generate_tasks(2, 'New Feature', feature_sample, 100)

bad_sample = '''I don't understand where to click to upload my picture. \n I can't find the user in my contact list. \n Is it possible to download my certificate?

Usually there is NO specific suggestion or complaint.
'''
unrecognized = generate_tasks(3, 'Unsorted', bad_sample, 100)

Split generated data into 2 datasets : train and test.

In [None]:
df_train = pd.concat([bugs.iloc[:70], features.iloc[:70], unrecognized.iloc[:70]]).reset_index(drop=True)
df_test = pd.concat([bugs.iloc[-29:], features.iloc[-29:], unrecognized.iloc[-29:]]).reset_index(drop=True)

Create embeddings for generated datasets

In [None]:
from google.api_core import retry
import tqdm
from tqdm.rich import tqdm as tqdmr
import warnings

# Add tqdm to Pandas...
tqdmr.pandas()

# ...But suppress the experimental warning.
warnings.filterwarnings("ignore", category=tqdm.TqdmExperimentalWarning)

# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

@retry.Retry(predicate=is_retriable, timeout=300.0)
def embed_fn(text: str) -> list[float]:
    # You will be performing classification, so set task_type accordingly.
    response = client.models.embed_content(
        model="models/text-embedding-004",
        contents=text,
        config=types.EmbedContentConfig(
            task_type="classification",
        ),
    )

    return response.embeddings[0].values


def create_embeddings(df):
    df["text"] = df["subject"] + df["description"]
    df["Embeddings"] = df["text"].progress_apply(embed_fn)
    return df

In [None]:
df_e_train = create_embeddings(df_train)
df_e_test = create_embeddings(df_test)

In [None]:
# let's check the result
df_e_train.head()

Make Classificator

In [None]:
import keras
from keras import layers


def build_classification_model(input_size: int, num_classes: int) -> keras.Model:
    return keras.Sequential(
        [
            layers.Input([input_size], name="embedding_inputs"),
            layers.Dense(input_size, activation="relu", name="hidden"),
            layers.Dense(num_classes, activation="softmax", name="output_probs"),
        ]
    )

In [None]:
# Derive the embedding size from observing the data. The embedding size can also be specified
# with the `output_dimensionality` parameter to `embed_content` if you need to reduce it.
embedding_size = len(df_e_train["Embeddings"].iloc[0])

classifier = build_classification_model(
    embedding_size, len(df_e_train["type"].unique())
)
classifier.summary()

classifier.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(),
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    metrics=["accuracy"],
)

In [None]:
import numpy as np


NUM_EPOCHS = 20
BATCH_SIZE = 32

# Split the x and y components of the train and validation subsets.
y_train = df_e_train["class"].transform(lambda x: x - 1)
x_train = np.stack(df_e_train["Embeddings"])
y_val = df_e_test["class"].transform(lambda x: x - 1)
x_val = np.stack(df_e_test["Embeddings"])

# Specify that it's OK to stop early if accuracy stabilises.
early_stop = keras.callbacks.EarlyStopping(monitor="accuracy", patience=3)

# Train the model for the desired number of epochs.
history = classifier.fit(
    x=x_train,
    y=y_train,
    validation_data=(x_val, y_val),
    callbacks=[early_stop],
    batch_size=BATCH_SIZE,
    epochs=NUM_EPOCHS,
)

In [None]:
classifier.evaluate(x=x_val, y=y_val, return_dict=True)

# Classify initial data

Calculate the probability of classifying our text into a specific class.

In [None]:
def make_prediction(text: str) -> list[float]:
    """Infer categories from the provided text."""
    # Remember that the model takes embeddings as input, so calculate them first.
    embedded = embed_fn(text)

    # And recall that the input must be batched, so here they are wrapped as a
    # list to provide a batch of 1.
    inp = np.array([embedded])

    # And un-batched here.
    [result] = classifier.predict(inp)
    result[::-1].sort()
    return result

In [None]:
issues_from_chat = json.loads(summary)
df_issues = pd.DataFrame(issues_from_chat)

In [None]:
for index, row in df_issues.iterrows():
    print(index, row['description'])

In [None]:
def add_prediction(df) :
    for index, row in df.iterrows():
        result = make_prediction(row['subject'] + row['description'])
        for idx, category in enumerate(df_test["type"].astype("category").cat.categories):
            df.at[index, category.lower()] = f"{result[idx] * 100:0.2f}"
    return df
    

df_issue_cat = add_prediction(df_issues)

df_issue_cat

# Create cards

We have pre-processed all the data. Now we will use Gemini to create cards. 
To do this, we will create a couple of functions (with HTTP requests inside) and ask the AI to analyze the data, what the user wants, and perform the necessary actions.

In [None]:
# these values should be taken from the environment
TRELLO_API_KEY = ''
TRELLO_TOKEN = ''
board_id = ''

Helpers

In [None]:
import requests
import json

# return a link
def create_trello_card(text : str, list_id : str) -> str:
    card = json.loads(text)
    if (list_id == ''):
        return ''
    else :
        url = "https://api.trello.com/1/cards"
        headers = {
          "Accept": "application/json"
        }
        query = {
            'key': TRELLO_API_KEY,
            'token': TRELLO_TOKEN,
            'idList' : list_id,
            'name' : card['subject'],
            'desc' : card['description'] + '\n ___________________ \nParticipants: ' + ' '.join(card['participants']) + '\n' + card['user_story']
        }
        response = requests.request(
           "POST",
           url,
           headers=headers,
           params=query
        )
        try:
            url = json.dumps(json.loads(response.text)['url'])
            return url
        except json.JSONDecodeError:
            return ''

In [None]:
import requests

# response is a dictionary of lists (name and id)
def get_trello_lists() -> list[tuple[str]]:
    url = f"https://api.trello.com/1/boards/{board_id}/lists"
    
    query = {
      'key': TRELLO_API_KEY,
      'token': TRELLO_TOKEN
    }
    
    response = requests.request(
       "GET",
       url,
       headers=headers,
       params=query
    )
    
    try:
        board_lists = []
        for item in json.loads(response.text):
            board_list = (item['name'] , item['id'])
            board_lists.append(board_list)
        return board_lists
    except json.JSONDecodeError:
        print(response)
        return []

Define the main function for AI.

In [None]:
def to_card(discussion : str, user_prompt : str) -> str :
    request_tools = [get_trello_lists, create_trello_card]
    
    instruction = f"""You are a helpful chatbot that can create a Trello card create cards based on discussions of user issues in chat.
    You will analyze the users request and turn them into commands using the tools available. Once you have the information you need, you will
    answer the user's question using the data returned.
    
    Use get_trello_lists to find available lists, response is a tuple (name, ID).
    After that analyze the provided TEXT to understand in which column the card can be created. If you don't need to create a card, ID should be an empty string. The last columns contain the probability of assigning a problem to a type (list).
    If you decide to create a card, then create a card in the appropriate column using create_trello_card. You must use the entire TEXT given to you as an argument to the function WITHOUT modification. Return an extracted url of the created card from response.

    You must return either a string with a link to the created card, or write why the card was not created.
    
    TEXT : {discussion}
    """
    
    # Start a chat with automatic function calling enabled.
    trello_chat = client.chats.create(
        model="gemini-2.0-flash",
        config=types.GenerateContentConfig(
            system_instruction=instruction,
            tools=request_tools,
        ),
    )
    
    response = trello_chat.send_message(user_prompt)
    return response

In [None]:
user_prompt = 'Create a card only if there is a CRITICAL bug being discussed.'
for i in df_issue_cat.index:
    description = df_issue_cat.loc[i].to_json()
    link = to_card(description, user_prompt).text
    print(str(i + 1), link)
    print(json.dumps(json.loads(description)['subject'], indent=2))
    print('-----------------')
    