To get messages from a slack channel you will need a token that had permissions to read from that channel and the ID of the channel. 

To get the slack channel ID:

There is probably an easier way but the one I use is to reight clkiek the channel name in the channel menu and select "Copy link". The link will look something like <https://your-org.slack.com/archives/C12345>. The channel ID is the last piece. That's C12345 in the example URL. 

To get a token:

Setup a bot user following these instructions <https://api.slack.com/bot-users>. 

Make sure you set the OAuth scope on the bot to allow `channels:history`. Add the bot user to the channel you wish to read messages from. The channel should be public. I have not tested any of this with private channels. 



The next section assumed you have 2 environment variables:

`SLACK_SENTIMENT_BOT_TOKEN` - the bot token we setup previously
`SLACK_SENTIMENT_CHANNEL_ID` - The channel ID we retrieved previously

If you are not familiar with how to setup environment variables, feel free to just replace those values with hardcoded values. Obviously be careful not to accidentally push your token up to Github or somethiug like that :)
post on managing secrets
If you're interested, I have a [post on managing secrets](https://dev.to/ruarfff/managing-local-app-secrets-and-sharing-secrets-with-your-team-34m1).

In [None]:
import os
# Import WebClient from Python SDK (github.com/slackapi/python-slack-sdk)
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError

token = os.environ.get("SLACK_SENTIMENT_BOT_TOKEN")
channel_id = os.environ.get("SLACK_SENTIMENT_CHANNEL_ID")

client = WebClient(token=token)

conversation_history = []


try:
    result = client.conversations_history(channel=channel_id)
    conversation_history = result["messages"]
except SlackApiError as e:
    print("Error getting conversations: {}".format(e))

We will want to train a model to try and figure out the "sentiment" of this slack channel but first let's play around with the data a little. 

In [None]:
# Let's take all the text in our slack channel and tokenize it into sentences
from nltk.tokenize import sent_tokenize
import itertools

all_text = [message['text'] for message in conversation_history if 'text' in message]


tokenized_text = [sent_tokenize(text) for text in all_text]
sentences = list(itertools.chain(*tokenized_text))

download('vader_lexicon')
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
from random import shuffle

num_pos = 0
num_neg = 0

def is_positive(sentence: str) -> bool:
    return sia.polarity_scores(sentence)["compound"] > 0

shuffle(sentences)
for s in sentences:
    if is_positive(s):
        num_pos +=1
    else:
        num_neg += 1
        
labels = 'Good', 'Bad'
sizes = [num_pos, num_neg]

fig1, ax1 = plt.subplots()
ax1.pie(sizes, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()

The previous example works to some degree but isn't very interesting. Let's look at traiing our own model and let's also look at cleaning up the data we're working with. 

## Prepare our data

Previously we just fed a list of sentences to a pre-trained model We want to train our own model but first we also need to do som ework to clean up our data and reduce the noise.

There are a lof ot pieces in the text that aren't really useful. Punctuation for example. Also things like emojis. It would probably be useful to include emojis in this kind of analysis but let's keep it simple and exclude them for this example. The following code breaks up the text in to words and renders a frequency distribution chart. 

In [None]:
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize
import matplotlib.pyplot as plt
from nltk import download

# If you did not use a slack channel, delete of comment out the next 2 lines
tokenized_words = [word_tokenize(sentence) for sentence in sentences]
words = list(itertools.chain(*tokenized_words))

# If you do not want to use a slack channel you can still use some sample data by uncommenting these lines
# download('state_union')
# words = nltk.corpus.state_union.words()


fdist = FreqDist(words)

fdist.plot(30,cumulative=False)
plt.show()

There are a lot of words showing up a lot that are not very useful. Let's do some processing. First we download some useful data sets provided to use by nltk. 

In [None]:
download('stopwords')
download('names')

Now we remove as much noise as possible from our data set. 

In [None]:
from nltk.corpus import stopwords, names
from string import punctuation

name_words = set([n.lower() for n in names.words()])
stop_words = set(stopwords.words("english"))


In [None]:
words = [w.lower() for w in words if w.isalpha()]
words = [w for w in words if w not in stop_words and w not in name_words and w not in punctuation]

In [None]:
fdist = FreqDist(words)
fdist.plot(30,cumulative=False)
plt.show()

## Train a model

Now we are going to train a model off of various data sets with text that is labelled as positive and negative. We are going to do some processing on this data to split it up into words with labels. This is called extracting features. We will extract features of workd frequencies and labels on the words. 

The model will then classify any text we give it based on how frequently the various words have previously appeared as positive or negative. Of course this is perhaps not foolproof but it's a model to play around with.

We are going to take some datasets from a couple of places and put them all together to create our features.

First let's read some data stored in this repository which was downloaded from [this repository](https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences). These files contain text with a label `0` for negative and `1` for positive.

In [None]:
import pandas as pd

amazon = pd.read_csv('sentiment_labelled_sentences/amazon_cells_labelled.txt', names=['review', 'sentiment'], sep='\t') 
imdb = pd.read_csv('sentiment_labelled_sentences/imdb_labelled.txt', names=['review', 'sentiment'], sep='\t') 
yelp = pd.read_csv('sentiment_labelled_sentences/yelp_labelled.txt', names=['review', 'sentiment'], sep='\t') 

NLTK also provides some labelled data in its corpus. Let's grab that too for good measure.

In [None]:
download('movie_reviews')

Let's process all our data into 2 collections of positive and negative. We will create 2 helper functions. One to clean up the data set, just like we did with our slack data earlier. The other will sort of create a frequency distribution but to use it in our doce it will create a dictionary with each word and the number of occurences of that word. 

In [None]:
def clean_words(words):
    return [w for w in [w.lower() for w in words if w.isalpha()] if w not in stop_words and w not in name_words and w not in punctuation]

def word_counts(words):
    counts = {}
    for word in words:
        counts[word] = counts.get(word, 0) + 1
    return counts

In [None]:
reviews = [*amazon['review'].values, *imdb['review'].values, *yelp['review'].values]
sentiment = [*amazon['sentiment'].values, *imdb['sentiment'].values, *yelp['sentiment'].values]

positive_reviews = []
negative_reviews = []

from nltk.corpus import movie_reviews

for f in movie_reviews.fileids('pos'):
    positive_reviews.append((word_counts(clean_words(movie_reviews.words(f))), 'pos'))
for f in movie_reviews.fileids('neg'):
    negative_reviews.append((word_counts(clean_words(movie_reviews.words(f))), 'neg'))

for i, r in enumerate(reviews):
    review_words = word_tokenize(r)
    if sentiment[i] == 0:
        negative_reviews.append((word_counts(clean_words(review_words)), 'neg'))
    else:
        positive_reviews.append((word_counts(clean_words(review_words)), 'pos'))

Now we have all that training data, let's train a model.

In [None]:
split_pct = .80

def split_set(review_set):
    split = int(len(review_set)*split_pct)
    return (review_set[:split], review_set[split:])

pos_train, pos_test = split_set(positive_reviews)
neg_train, neg_test = split_set(negative_reviews)

train_set = pos_train + neg_train
test_set = pos_test + neg_test

In [None]:
from nltk.classify import NaiveBayesClassifier
from nltk.classify.util import accuracy

model = NaiveBayesClassifier.train(train_set)
print(100 * accuracy(model, test_set))

In [None]:
import pickle

model_file = open('sa_classifier.pickle','wb')
pickle.dump(model, model_file)
model_file.close()