# Writeup and documentation of Sentiment Analysis (MoodSwing)
### This is a document explaining the code behind the twitter sentiment analysis using the textblob module to classify tweets as positive, neutral, and negative 
---
## The code:

---
### get_file_contents
```python
def get_file_contents(filename):

    try:
        with open(filename, 'r') as f:
            return f.read().strip()
    except FileNotFoundError:
        print("'%s' file not found" % filename)
```
#### this function just has a file name as input and performs a read operation. (It is used to read the api keys for twitter in this situation)
---

---
### the TwitterClient
```python
class TwitterClient(object): 

    def __init__(self): 

        consumer_key = get_file_contents("consumerKey.txt")
        consumer_secret = get_file_contents("consumerSecret.txt")
        access_token = get_file_contents("accessToken.txt")
        access_token_secret = get_file_contents("accessTokenSecret.txt")
  
        try: 
            self.auth = OAuthHandler(consumer_key, consumer_secret) 
            self.auth.set_access_token(access_token, access_token_secret) 
            self.api = tweepy.API(self.auth) 
        except: 
            print("Error: Authentication Failed") 
```
#### this class defines and ultimately runs the api for twitter and handes all the requests. We do have the alias of self in this class.
---

---
### clean_tweet
```python
def clean_tweet(self, tweet): 
        return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t]) | (\w+:\/\/\S+)", " ", tweet).split())
```
#### this function expect a single tweet (object) and the twitter api object to drive the tweet
---

---
### clean_tweet
```python
def clean_tweet(self, tweet): 
        return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t]) | (\w+:\/\/\S+)", " ", tweet).split())
```
#### this function expect a single tweet (object) and the twitter api object to drive the tweet. It will out put the tweet after running the tweet through a regular expression, and removing stop characters and useless characters 
---

---
### get_tweet_sentiment
```python
def get_tweet_sentiment(self, tweet): 
        analysis = TextBlob(self.clean_tweet(tweet)) 
        # set sentiment 
        if analysis.sentiment.polarity > 0: 
            return 'positive', analysis.sentiment.polarity
        elif analysis.sentiment.polarity == 0: 
            return 'neutral', analysis.sentiment.polarity
        else: 
            return 'negative', analysis.sentiment.polarity
```
#### this function expect a single tweet (object) and the twitter api object to drive the tweet, it then uses the textBlob module to perform classification and tokenization, like a spam filter would (spam, ham) but instead the classification outputs a number between -1 and 1. If the number outputted (polarity) is greater than 0, the tweet receives the sentiment of "positive", if it is 0, it receives the sentiment of "neutral", and if the classification output is less than 0, it receives the sentiment of "negative". The classification is done via Naive Bayes.
---

---
### get_tweets
```python
def get_tweets(self, user_id, count = 10000):
```
#### this function expects the twitterClient object, a user id to get the respective timeline of, and a count of how many tweets to retrieve (count was excluded in our example)

```python
        tweets = [] 
        tweet_dates = []  
        fetched_tweets = []
        try: 
            for status in tweepy.Cursor(self.api.user_timeline, screen_name=user_id, tweet_mode="extended").items():
                fetched_tweets.append(status)
```
#### empty lists are created for the tweets and their attributes. Then, the twitter api is used to get a users timeline (all of their tweets up to 3200~ tweets) then we add them to the fetched_tweets list

```python
for tweet in fetched_tweets: 
                parsed_tweet = {} 
  
                parsed_tweet['text'] = tweet.full_text
                parsed_tweet['sentiment'], parsed_tweet['polarity'] = self.get_tweet_sentiment(tweet.full_text) 
                parsed_tweet['date'] = tweet.created_at
  
                if tweet.retweet_count > 0: 
                    if parsed_tweet not in tweets: 
                        tweets.append(parsed_tweet) 
                else: 
                    tweets.append(parsed_tweet) 
  
            return tweets
```
#### each tweet is than gone through, and then made into an object with an attribute matched to a function or attribute:
* 'text': has the full text of the tweet
* 'sentiment': has the classification of positive, neutral, or negative. (uses get_tweet_sentiment)
* 'polarity': has the numerical result of sentimental classification -1 to 1 (uses get_tweet_sentiment)
* 'date': has the date the tweet was created at
#### then it returns the tweets
---


---
### all in the main function
```python
def main(): 
    api = TwitterClient() 
    
    tweets = api.get_tweets(user_id = 'elonmusk', count = 10000) 
    tweet_dates = []
    tweet_polarity_points = []
    print("Tweet count: ", len(tweets))
    for tweet in tweets:
        print("Tweet: ", tweet, "  - Date: ", str(tweet['date']))
        tweet_dates.append(tweet['date'])
        tweet_polarity_points.append(tweet['polarity'])
    
    scatter_plot = plt.figure(figsize=(14, 8), dpi=80)
    plt.scatter(tweet_dates, tweet_polarity_points, s =.25, c = 'green')
    scatter_plot.show()
    scatter_plot.savefig('elon.png', dpi='figure')
    
    
    positive_tweet_sum = sum(status['sentiment'] == "positive" for status in tweets)
    neutral_tweet_sum = sum(status['sentiment'] == "neutral" for status in tweets)
    negative_tweet_sum = sum(status['sentiment'] == "negative" for status in tweets)
    
    
    print("Positive tweets: ", positive_tweet_sum, "(", ((positive_tweet_sum/len(tweets))*100), "%)")
    print("Neutral tweets: ", neutral_tweet_sum, "(", ((neutral_tweet_sum/len(tweets))*100), "%)" )
    print("Negative tweets: ", negative_tweet_sum, "(", ((negative_tweet_sum/len(tweets))*100), "%)")
```
#### the main function starts by creating a twitterClient object using the twitterClient class. Then, you use the get_tweets function, with a user's username to request the tweets in their timeline (again, up to 3200). Next, we printed the number of tweets received, and printed all the tweets out for debug purposes. Then, the matplotlib module to initialize a scatterplot figure, and sets the size. After that, the tweet_dates become the x coordinates and are paired with the polarity points (the numerical representation of sentiment from the classification process). The scatter plot points are green in our example, and have a size of .25. Finally, the percentages of positive, neutral, and negative tweets are calculated and then outputted with the scatterplot.
---