# Slackbot Example

SlackBot keeps you in the loop without disturbing your focus. Its personalized, intelligent AI continuously monitors your Slack workspace, alerting you to important conversations and freeing you to concentrate on what’s most important.

SlackBot reads the full history of your (public) Slack workspace and trains a Generative AI model to predict when you need to engage with a conversation. This training process gives the AI a deep understanding of your interests, expertise, and relationships. Using this understanding, SlackBot watches conversations in real-time and notifies you when an important conversation is happening without you. With SlackBot200 you can focus on getting things done without worrying about missing out.

In this notebook, you’ll see you how to build and deploy SlackBot in 15 minutes using only OpenAI’s API’s and open-source Python libraries - Data Science PhD not required.



In [None]:
%pip install openai kaskada

In [2]:
from datetime import datetime, timedelta
from slack_sdk.socket_mode import SocketModeClient
from slack_sdk.socket_mode.response import SocketModeResponse
import sparrow_py as kt
import pandas
import openai
import getpass
import pyarrow
import datetime

# Initialize Kaskada with a local execution context.
kt.init_session()

# Initialize OpenAI
openai.api_key = getpass.getpass('OpenAI: API Key')

OpenAI: API Key ········


## Fine-tune the model

### Read Historical Messages

In [3]:
messages = kt.sources.ArrowSource(
    data = pandas.read_parquet("./messages.parquet"), 
    time_column_name = "ts", 
    key_column_name = "channel",
)

messages.preview(5)

Unnamed: 0,_time,_subsort,_key_hash,_key,subtype,ts,user,text,team,user_team,...,reactions,thread_ts,reply_count,reply_users_count,latest_reply,is_locked,subscribed,last_read,parent_user_id,channel
0,2023-07-25 19:42:13,5,15750806798332339587,general,message,2023-07-25 19:42:13,U05JQJJDJ6P,<@U05JQJJDJ6P> has joined the channel,,,...,,,,,,,,,,general
1,2023-07-25 19:42:14,14,3094307063304068259,random,message,2023-07-25 19:42:14,U05JQJJDJ6P,<@U05JQJJDJ6P> has joined the channel,,,...,,,,,,,,,,random
2,2023-07-25 19:44:27,0,2954779196800164886,demo,message,2023-07-25 19:44:27,U05JQJJDJ6P,<@U05JQJJDJ6P> has joined the channel,,,...,,,,,,,,,,demo
3,2023-07-26 08:29:35,6,15750806798332339587,general,message,2023-07-26 08:29:35,U05JQJJDJ6P,old message 1,T05JA5XCR9D,T05JA5XCR9D,...,,,,,,,,,,general
4,2023-07-26 08:29:37,7,15750806798332339587,general,message,2023-07-26 08:29:37,U05JQJJDJ6P,old message 2,T05JA5XCR9D,T05JA5XCR9D,...,,,,,,,,,,general


### Construct prompts

In [4]:
#messages = messages.with_key(kt.record({
#        "channel": messages.col("channel"),
#        "thread": messages.col("thread_ts"),
#    }))

prompts = messages \
    .select("user", "ts", "text", "reactions") \
    .collect(max=20)

prompts.preview(5)

Unnamed: 0,_time,_subsort,_key_hash,_key,result
0,2023-07-25 19:42:13,5,15750806798332339587,general,"[{'ts': 1690314133000000000, 'user': 'U05JQJJD..."
1,2023-07-25 19:42:14,14,3094307063304068259,random,"[{'ts': 1690314134000000000, 'user': 'U05JQJJD..."
2,2023-07-25 19:44:27,0,2954779196800164886,demo,"[{'ts': 1690314267000000000, 'user': 'U05JQJJD..."
3,2023-07-26 08:29:35,6,15750806798332339587,general,"[{'ts': 1690314133000000000, 'user': 'U05JQJJD..."
4,2023-07-26 08:29:37,7,15750806798332339587,general,"[{'ts': 1690314133000000000, 'user': 'U05JQJJD..."


### Build examples

In [10]:
duration = datetime.timedelta(minutes=1)

shifted_prompts = prompts.shift_by(duration)
#reaction_users = messages.col("reactions").col("users").flatten().collect(max=100).flatten()
#reaction_users = messages.col("reactions").flatten().col("users").collect(kt.Trailing(duration)).flatten()
#participating_users = conversations.col("user").collect(max=100) #kt.windows.Trailing(duration))
#engaged_users = reaction_users #kt.union(reaction_users, participating_users)
engaged_users = prompts.col("user").collect(max=100).flatten()

examples = kt.record({"prompt": shifted_prompts, "completion": engaged_users}) \
    .filter(shifted_prompts.is_not_null())
examples = kt.record({"prompt": shifted_prompts, "completion": engaged_users})
examples.preview(100) # NOTE: completion shouldn't be None
#engaged_users.preview(100)

Unnamed: 0,_time,_subsort,_key_hash,_key,prompt,completion
0,2023-07-25 19:42:13,5,15750806798332339587,general,,[U05JQJJDJ6P]
1,2023-07-25 19:42:14,14,3094307063304068259,random,,[U05JQJJDJ6P]
2,2023-07-25 19:43:13,0,15750806798332339587,general,"[{'ts': 1690314133000000000, 'user': 'U05JQJJD...",
3,2023-07-25 19:43:14,1,3094307063304068259,random,"[{'ts': 1690314134000000000, 'user': 'U05JQJJD...",
4,2023-07-25 19:44:27,0,2954779196800164886,demo,,[U05JQJJDJ6P]
5,2023-07-25 19:45:27,2,2954779196800164886,demo,"[{'ts': 1690314267000000000, 'user': 'U05JQJJD...",
6,2023-07-26 08:29:35,6,15750806798332339587,general,,"[U05JQJJDJ6P, U05JQJJDJ6P, U05JQJJDJ6P]"
7,2023-07-26 08:29:37,7,15750806798332339587,general,,"[U05JQJJDJ6P, U05JQJJDJ6P, U05JQJJDJ6P, U05JQJ..."
8,2023-07-26 08:30:10,1,2954779196800164886,demo,,"[U05JQJJDJ6P, U05JQJJDJ6P, U05JQJJDJ6P]"
9,2023-07-26 08:30:14,15,3094307063304068259,random,,"[U05JQJJDJ6P, U05JQJJDJ6P, U05JQJJDJ6P]"


## Fine-tune a model

### Create training dataset

In [None]:
from sklearn import preprocessing

examples_df = examples.run().to_pandas().drop(["_time", "_subsort", "_key_hash", "_key"], axis=1)

le = preprocessing.LabelEncoder()
le.fit(examples_df.completion.explode())

# Format for the OpenAI API
def format_prompt(prompt):
    return "start -> " + "\n\n".join([f' {msg["user"]} --> {msg["text"]} ' for msg in prompt]) + "\n\n###\n\n"
examples_df.prompt = examples_df.prompt.apply(format_prompt)

def format_completion(completion):
    return " " + (" ".join(le.transform(completion).astype(str)) if len(completion) > 0 else "nil") + " end"
examples_df.completion = examples_df.completion.apply(format_completion)

# Write examples to file
examples_df.to_json("examples.jsonl", orient='records', lines=True)

### Upload to OpenAI

In [None]:
from types import SimpleNamespace
from openai import cli

# verifiy data format, split for training & validation
args = SimpleNamespace(file='./examples.jsonl', quiet=True)
cli.FineTune.prepare_data(args)
training_id = cli.FineTune._get_or_upload('./examples_prepared_train.jsonl', True)

### Create the training job

In [None]:
import openai

resp = openai.FineTune.create(
    training_file = training_id,
    model = "davinci",
    n_epochs = 2,
    learning_rate_multiplier = 0.02,
    suffix = "coversation_users"
)
print(f'Fine-tuning model with job ID: "{resp["id"]}"')