Genny - Generate Slack Chatbot

Operations - Data Solutions - Internal Insights

Spring 2025

Chigo Ike, Matthew Li, Kaydence Lin

ADD ANY RESOURCES OR LINKS YOU REFERENCED

The purpose of the chatbot is to streamline the process of gathering organization wide information for answering internal questions in real-time. This documentation is for internal Generate members who may implement or work on the chatbot in the future. Knowledge of Python, Slack APIs, LLM, Hugging Face, and Digital Ocean may be useful to understand this documentation.

User Guide - Matthew

app.py

Slack

Create a Slack App at https://api.slack.com/apps
- Disable Socket Mode
- Add bot token scopes: app_mentions:read, channels:history, channels:read, chat:write, chat:write.public, groups:history, groups:read, im:history, im:read, im:write, incoming-webhook, mpim:history, users:read
- Install the app to your workspace

Environment

libraries needed pip install RecursiveCharacterTextSplitter SentenceTransformer numpy faiss-cpu torch pip install -r requirements.txt

DigitalOcean

DigitalOcean is a platform that we decided to use for hosting. There are two components that need to be hosted. First, there is the Slack backend, which can be found in the app.py file. You can host this by creating a App Platform. Once this is done, you will have to change the event subscription link within the slack api website. The second aspect is hosting the model. We were originally going to run an LLM on top of our RAG model to produce conversational answers using the relavant text generated by the RAG model. We were exploring using droplets, but unfortunately the resources that we chose were not enough to run the LLM for the cost. This is something that needs to be looked at. GPU droplets or a higher spec droplets could work. There is also a GenAI Platform on DigitalOcean that we did not look at.

Technical Details

Data

The data that was used was an export of the Generate Notion Wiki (folder named "Wiki Export"). The export contained markdown files of each page. To preprocess this data, we converted all the .md into .txt files (md_to_txt.py). The text files are stored in a folder called "Wiki_txt". We cleaned the .txt files to delete any unnecessary md formatting, emojis, and empty lines and combined all the .txt files to a .json file (wiki_json.py and gen_wiki.json). The gen_wiki.json was used as the knowledge base for the models.

Training Data

We created training data with Question-Answer pairs called training.json. This data encompasses content that can be found in Generate's Notion and was manually created. We created a parser to convert the .json to .jsonl file (json_jsonl.py and training.jsonl). We found that the training.jsonl worked better for the DistilBERT model. This data was not used for the RAG.

xyz, fix and clean

Models

Introduction

we initially tried the rag.py, didnt work well
rag3.py works well
xyz - talk about other limitations and processes to reach final stage

rag3.py - Matthew

doesnt run on a M1 Mac Pro with 8GB of memory
runs on a M1 Mac Pro with 16 GB of memory, however it runs very slow, may not run depending on availble memory on local machine
if taking an LLM class, you may have access to a GPU you can run this on, it will be much faster, we succesfully ran this on a GPU with 48 GB of memory
comments regarding how the code works are within the file. If you have more questions, try running the code through Claude.ai first or reach out to us.

rag.py - matthew

This file is legacy code. It was the first implementation of our rag model and we improved to use JSON in rag2.py.

rag_ollama.py - edit this, kaydence

download ollama and models
uses a RAG and feeds it into Ollama
Ollama produces good responses, however, after initial research Ollama isn't meant to be deployed, only to use on your local machine
this model is a good example of what the chatbot responses should look like
if you want to run ollama, you have to download from https://ollama.com/ and download the mistral model

training.py -chigo

distilBERT
uses training.jsonl to train model

Final Product

slackbot
rag3.py?

Maintenance and Updates

rag3.py is too resource intensive, need to figure out a less resource intensive or more money for more ram on hosting?

FAQs

xyz

Potential Future Work

Create an API with Notion SDK to integrate with the Generate Notion
Integrate with the Notion Calendar and automate reminders for events
Automate reminders for team meetings
Integrate with Slack message history to return more personalized answers

Contact Information

Name	Email Address	Role	Date of Last Edit
Chigo Ike	ike.c@northeastern.edu	Data Analyst	4/11/2025
Matthew Li	li.matt@northeastern.edu	Data Analyst	4/11/2025
Kaydence Lin	lin.kay@northeastern.edu	Data Analyst	4/11/2025

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
.idea		.idea
Wiki Export		Wiki Export
Wiki_txt		Wiki_txt
__pycache__		__pycache__
venv		venv
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
Wiki ab5f3792da934cca84cadb5381b1baec.md		Wiki ab5f3792da934cca84cadb5381b1baec.md
app.py		app.py
chunks_metadata.json		chunks_metadata.json
combined.txt		combined.txt
download_nltk_data.py		download_nltk_data.py
gen_wiki.json		gen_wiki.json
genny.png		genny.png
json_embeddings_cache.npz		json_embeddings_cache.npz
json_jsonl.py		json_jsonl.py
llama3.2_responses.txt		llama3.2_responses.txt
main.py		main.py
md_to_txt.py		md_to_txt.py
mistral_responses.txt		mistral_responses.txt
qamodel.py		qamodel.py
questions1.json		questions1.json
rag.py		rag.py
rag2.py		rag2.py
rag3.py		rag3.py
rag_flask_api.py		rag_flask_api.py
rag_json.py		rag_json.py
rag_ollama.py		rag_ollama.py
requirements.txt		requirements.txt
software_train.json		software_train.json
software_train2.json		software_train2.json
training.json		training.json
training.jsonl		training.jsonl
training.py		training.py
wiki_json.py		wiki_json.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genny - Generate Slack Chatbot

Operations - Data Solutions - Internal Insights

Spring 2025

Chigo Ike, Matthew Li, Kaydence Lin

ADD ANY RESOURCES OR LINKS YOU REFERENCED

User Guide - Matthew

app.py

Slack

Environment

DigitalOcean

Technical Details

Data

Training Data

Models

Introduction

rag3.py - Matthew

rag.py - matthew

rag_ollama.py - edit this, kaydence

training.py -chigo

Final Product

Maintenance and Updates

FAQs

Potential Future Work

Contact Information

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Genny - Generate Slack Chatbot

Operations - Data Solutions - Internal Insights

Spring 2025

Chigo Ike, Matthew Li, Kaydence Lin

ADD ANY RESOURCES OR LINKS YOU REFERENCED

User Guide - Matthew

app.py

Slack

Environment

DigitalOcean

Technical Details

Data

Training Data

Models

Introduction

rag3.py - Matthew

rag.py - matthew

rag_ollama.py - edit this, kaydence

training.py -chigo

Final Product

Maintenance and Updates

FAQs

Potential Future Work

Contact Information

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages