Slenti is a ready-to-be-deployed solution for performing sentiment analysis in your slack channels. It's not just another Slackbot. It's an event-driven solution running on AWS which means that sky is the limit in terms of seamless scalability. The sentiment analysis is being performed using a Naive Bayes classifier that has been trained with the GoEmotions (link) dataset. All rights of the dataset belong to the respective owner. This version of Slenti supports only English. Generally speaking, feel free to clone this repo and make it meet your requirements by even modyfing the core functionalities such as the logic of the sentiment analysis itself. Do you want to use it as a boilerplate? Sure thing!
The idea behind it is that you deploy the solution directly via the pipeline (github actions) and whenever any of the users post messages in any of the slack channels where the bot has been added to, the messages get received by the Slackbot, they get inserted into a queue and they later get analyzed. Bear in mind that the same happens when a messages is getting modified. When a message is edited by a user, the message is being inserted again into the queue and it gets analyzed and stored without deleting the results of the analysis either of the original message or of any older versions of it. The results of the analysis are being saved in a S3 bucket and then any meaningful information that anyone might be interested in can be extracted using the AWS Athena service. Regarding the deployment process, it takes place via the Serverless framework - the template is available here (link).
- The
.configuration/
folder includes the configuration of the environments/stages. - The
.diagrams/
folder includes the diagrams of the whole application/stack. - The
.github/workflows/
folder includes the the definitions of the GitHub actions that have been configured for this repository. - The
functions/
folder includes the source code of the lambda functions that are connected with the stack. - The
infrastructure/
folder includes the IaC. - Other files.
The application stores the information in an S3 bucket. I guess that you are now wondering how you are supposed to access the results of the analyzed messages. Well, there's no dashboard or something for quickly accesing the data. The slackbot was built to analyze the messages and store them in an S3 bucket for later consumption. If you would be interested in running queries directly against the S3 bucket, you can use AWS Athena. However, since the results are stored in an S3 bucket, if you would be interested in saving these data in any kind of database, you can pretty easily create a lambda which will be getting triggered every time a new object is getting stored and that lambda will be responsible for reading the content of the JSON object, manipulate it -if needed- and store it in any kind of database (e.g. DynamoDB, RDS, etc).
{
"messageBody": "I like Slenti. It is awesome!",
"slackChannelId": "C012ABCDE3F",
"slackMessageEventTs": "1660000000.000000",
"slackUserId": "U01ABCDEFGH",
"slackMessageTs": "1660000000.000000",
"classifiedAsLabel": "positive",
"classifiedAsProbability": "0.9506572042465574",
"classifiedAsLogLikelihood": "-29.881178994701806",
"scores": {
"positive": {
"label": "positive",
"probability": "0.9506572042465574",
"logLikelihood": "-29.881178994701806"
},
"negative": {
"label": "negative",
"probability": "0.04428668402234906",
"logLikelihood": "-32.94764848859803"
},
"neutral": {
"label": "neutral",
"probability": "0.005056111731093468",
"logLikelihood": "-35.11773477874752"
}
}
}
The query below will create a new table in the default
database in Athena.
CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`slenti-test` (
`messageBody` string,
`slackChannelId` string,
`slackMessageEventTs` string,
`slackUserId` string,
`slackMessageTs` string,
`classifiedAsLabel` string,
`classifiedAsProbability` string,
`classifiedAsLogLikelihood` string,
`scores` string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
) LOCATION 's3://{nameOfTheS3Bucket}/'
Don't forget to:
- Replace the
{nameOfTheS3Bucket}
with the name of the s3 bucket where the results are being saved. - Change the name of the table from
slenti-test
to something else.
SELECT
"default"."slenti-test".*
FROM
"default"."slenti-test"
WHERE
"default"."slenti-test".classifiedaslabel = 'positive'
SELECT
"default"."slenti-test".*
FROM
"default"."slenti-test"
WHERE
CAST(JSON_EXTRACT("default"."slenti-test".scores, '$.positive.probability') AS DOUBLE) > 0.7
Get the number of positive, negative and neutral messages that each user posted between 2 given dates
SELECT
"default"."slenti-test".slackuserid,
SUM(CASE WHEN ("default"."slenti-test".classifiedaslabel = 'positive') THEN 1 ELSE 0 END) AS positiveMessages,
SUM(CASE WHEN ("default"."slenti-test".classifiedaslabel = 'negative') THEN 1 ELSE 0 END) AS negativeMessages,
SUM(CASE WHEN ("default"."slenti-test".classifiedaslabel = 'neutral') THEN 1 ELSE 0 END) AS neutralMessages
FROM
"default"."slenti-test"
WHERE
FROM_UNIXTIME(CAST(SPLIT_PART("default"."slenti-test".slackmessagets, '.', 1) AS bigint)) BETWEEN DATE '2022-09-04' AND DATE '2022-09-06'
GROUP BY
slackuserid
Based on how you are planning to use the solution, you can apply additional changes so as to optmize the tables and improve the performance of the queries.
Some helpful resource are:
- https://docs.aws.amazon.com/athena/latest/ug/partitions.html
- https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/
In your AWS account, create an IAM user with -ideally- only Programmatic Access. Let's call this user slenti-github-cicd
.
Since this user acts as the main service account for the pipeline, the PowerUserAccess
policy will grant sufficient permissions. Hey, I can hear you mumbling about not following the Principle of Least Privilege.
Once the user is created, create a set of Access Keys and keep them somewhere, we will need them in a bit.
Clone this repository.
More information is available here (link)
More information is available here (link)
In the 1
step, you created a set of Access Keys for the newly created user. Now it's time to make the GitHub actions make use of those access keys by storing them on Github.
All the steps regarding the creation of the GitHub's secrets are described in detail here (link).
In a nutshell, each environment requires 2 secrets.
- The first one stores the AWS Access Key ID. The naming convention is:
{environment}_AWS_ACCESS_KEY_ID
. - The second one stores the AWS Secret Access Key. The naming convention is:
{environment}_AWS_SECRET_ACCESS_KEY
.
Don't forget:
- The
{environment}
is the environment with which those access keys are associated with. In case you want to deploy multiple environments within the same AWS account, feel free to use the same access keys.
Tutorial: https://slack.com/help/articles/115005265703-Create-a-bot-for-your-workspace
The bot needs to have only Bot Token Scopes. The list of the scopes is available below:
channels:history
groups:history
im:history
mpim:history
Once you configure the permissions and you install the Slack Application in your desired Slack workspace, the Bot User OAuth Token will be created. This token will be used later on for the connection between our Slackbot and Slack.
Each environment requires 2 parameters.
-
The first one stores the Slack's Bot User OAuth Token.
The naming convention is:
/{serviceName}/{environment}/functions/slackbot/env-variables/SLACK_BOT_TOKEN
. -
The second one stores the Slack's Signing Secret.
The naming convention is:
/{serviceName}/{environment}/functions/slackbot/env-variables/SLACK_SIGNING_SECRET
.
Don't forget:
- The
{serviceName}
is theINFRASTRUCTURE.SERVERLESS.SERVICE.NAME
as it is explained in the step3.2
. - The
{environment}
is the environment.
Why don't we use the Secrets Manager? Well, that's just because we don't need the additional features and the free encrypted parameters are absolutely fine.
Theoretically, at this point everything is configured properly and the next step is to run the GitHub action that is responsible for deploying the stack.
To do so, open the Actions
page and click on the Test or Deploy or Remove
workflow.
There are 3 jobs available.
This job is responsible for linting and running tests. For the testing part, the localstack
is being used in order to emulate -in our case- the S3 and the SQS services.
This job is responsible for deploying the serverless stack for a given environment/stage. However, before the deployment, the Testing & Linting
job runs always so as to ensure that everything is fine.
This job is responsible for removing the serverless stack for a given environment/stage.
Choose the deploy-full-serverless-stack
, write in the Environment
field your desired environment and click Run workflow
. The whole process will take a couple of minutes, so relax and enjoy it.
Once the stack is deployed and as shown in the architecture diagram (link), there will be an API Gateway sitting in front of the Slackbot lambda function. The Slack Application, that was created in the step 5
, needs to be updated so it can start forwarding the messages to the lambda function.
This can be done by opening the Event Subscription
configuration page (link) of the Slack Application.
In that page, you have to:
-
Enable the events
-
Set the
Request URL
The URI should follow the following pattern:https://{idOfTheApiGateway}.execute-api.{region}.amazonaws.com/slack/events
Don't forget:
- The
{idOfTheApiGateway}
is the ID of the API Gateway that was created by the stack (e.g.ab123cde4f
). - The
region
is the region (e.g.eu-central-1
).
- The
-
Subscribe to the following 4 events.
message.channels
message.groups
message.im
message.mpim
Once you reach this step, everything should be running smootly. The steps to confirm that are:
-
Open a slack channel where you want to perform sentiment analysis.
-
Invite/Add the bot in the slack channel.
-
Send a message.
-
Wait for a few seconds.
-
Open the s3 bucket that was created by the serverless stack.
The naming convention is
{environment}-{serviceName}-{s3StoreResultsBucketSuffix}
where:- The
{environment}
is the environment. - The
{serviceName}
is theINFRASTRUCTURE.SERVERLESS.SERVICE.NAME
as it is explained in the step3.2
. - The
{s3StoreResultsBucketSuffix}
is theINFRASTRUCTURE.SERVERLESS.S3.STORE_RESULTS_BUCKET.SUFFIX
as it is explained in the step3.2
.
The messages are stored following a nested folder structure:
/{year}/{month}/{day}/{filename}
.- The
{year}
is the year (2xxx) when the event on Slack took place. - The
{month}
is the month (0-12) when the event on Slack took place. - The
{day}
is the day (0-31) of the month when the event on Slack took place. - The
{filename}
is the MD5 hash of the base64 encoded string of the concatenation of the following values:- slackChannelId
- slackUserId
- slackMessageEventTs
- slackMessageTs
- The
All rights of logos, images, datasets belong to their respective owners and nobody claims any right over them.
This source code is licensed under the MIT License.
- Dataset: GoEmotions (link).
- Naive Bayes classifier for node.js (classificator (link)).
- Serverless framework (link).
- Localstack (link).
Thodoris Itsios (https://itsios.eu)