 <table><tr><td><img src="images/dbmi_logo.png" width="75" height="73" alt="Pitt Biomedical Informatics logo"></td><td><img src="images/pitt_logo.png" width="75" height="75" alt="University of Pittsburgh logo"></td></tr></table>
 
 
 # Social Media and Data Science - Part 5
 
 Done by Pengan Li
 
 
Data science modules developed by the University of Pittsburgh Biomedical Informatics Training Program with the support of the National Library of Medicine data science supplement to the University of Pittsburgh (Grant # T15LM007059-30S1). 

Developed by Harry Hochheiser, harryh@pitt.edu. All errors are my responsibility.

Done by Pengan Li, pel85@pitt.edu. Feel free to contact me if you have any question or suggestion about my answers.

<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.


### Goal: Use social media posts to explore the appplication of text and natural language processing to see what might be learned from online interactions.

Specifically, we will retrieve, annotate, process, and interpret Twitter data on health-related issues such as smoking.

--- 
References:
* [Mining Twitter Data with Python (Part 1: Collecting data)](https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/)
* The [Tweepy Python API for Twitter](http://www.tweepy.org/)

---

In [1]:
%matplotlib inline

import operator
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import jsonpickle
import json
import random
import tweepy
import spacy
import time
from datetime import datetime
from spacy.symbols import ORTH, LEMMA, POS

# 5.0 Introduction

This final part of our journey through social media data retrieval, annotation, natural langauge processing, and classififcation will challenge you to apply these techniques to a new problem. Specifically, you will create, annotate, and process a new data set.

# 5.0.1 Setup

As before, we start with the Tweets class and the configuration for our Twitter API connection.  We may not need this, but we'll load it in any case.

In [9]:
class Tweets:
    
    
    def __init__(self,term="",corpus_size=100):
        self.tweets={}
        if term !="":
            self.searchTwitter(term,corpus_size)
                
    def searchTwitter(self,term,corpus_size):
        searchTime=datetime.now()
        while (self.countTweets() < corpus_size):
            new_tweets = api.search(term,lang="en",tweet_mode='extended',count=corpus_size)
            for nt_json in new_tweets:
                nt = nt_json._json
                if self.getTweet(nt['id_str']) is None and self.countTweets() < corpus_size:
                    self.addTweet(nt,searchTime,term)
            time.sleep(14)
                
    def addTweet(self,tweet,searchTime,term="",count=0):
        id = tweet['id_str']
        if id not in self.tweets.keys():
            self.tweets[id]={}
            self.tweets[id]['tweet']=tweet
            self.tweets[id]['count']=0
            self.tweets[id]['searchTime']=searchTime
            self.tweets[id]['searchTerm']=term
        self.tweets[id]['count'] = self.tweets[id]['count'] +1
        
    def combineTweets(self,other):
        for otherid in other.getIds():
            tweet = other.getTweet(otherid)
            searchTerm = other.getSearchTerm(otherid)
            searchTime = other.getSearchTime(otherid)
            self.addTweet(tweet,searchTime,searchTerm)
        
    def getTweet(self,id):
        if id in self.tweets:
            return self.tweets[id]['tweet']
        else:
            return None
    
    def getTweetCount(self,id):
        return self.tweets[id]['count']
    
    def countTweets(self):
        return len(self.tweets)
    
    # return a sorted list of tupes of the form (id,count), with the occurrence counts sorted in decreasing order
    def mostFrequent(self):
        ps = []
        for t,entry in self.tweets.items():
            count = entry['count']
            ps.append((t,count))  
        ps.sort(key=lambda x: x[1],reverse=True)
        return ps
    
    # reeturns tweet IDs as a set
    def getIds(self):
        return set(self.tweets.keys())
    
    # save the tweets to a file
    def saveTweets(self,filename):
        json_data =jsonpickle.encode(self.tweets)
        with open(filename,'w') as f:
            json.dump(json_data,f)
    
    # read the tweets from a file 
    def readTweets(self,filename):
        with open(filename,'r') as f:
            json_data = json.load(f)
            incontents = jsonpickle.decode(json_data)   
            self.tweets=incontents
        
    def getSearchTerm(self,id):
        return self.tweets[id]['searchTerm']
    
    def getSearchTime(self,id):
        return self.tweets[id]['searchTime']
    
    def getText(self,id):
        tweet = self.getTweet(id)
        text=tweet['full_text']
        if 'retweeted_status'in tweet:
            original = tweet['retweeted_status']
            text=original['full_text']
        return text
                
    def addCode(self,id,code):
        tweet=self.getTweet(id)
        if 'codes' not in tweet:
            tweet['codes']=set()
        tweet['codes'].add(code)
        
   
    def addCodes(self,id,codes):
        for code in codes:
            self.addCode(id,code)
        
 
    def getCodes(self,id):
        tweet=self.getTweet(id)
        if 'codes' in tweet:
            return tweet['codes']
        else:
            return None
    
    # NEW -ROUTINE TO GET PROFILE
    def getCodeProfile(self):
        summary={}
        for id in self.tweets.keys():
            tweet=self.getTweet(id)
            if 'codes' in tweet:
                for code in tweet['codes']:
                    if code not in summary:
                            summary[code] =0
                    summary[code]=summary[code]+1
        sortedsummary = sorted(summary.items(),key=operator.itemgetter(0),reverse=True)
        return sortedsummary
    
    #new functions
    
    def clearcode(self,id):
        tweet=self.getTweet(id)
        tweet['codes'].clear()
        
    def removecode(self,id,code):
        tweet=self.getTweet(id);
        if code in tweet['codes']:
            tweet['codes'].remove(code)
        
    def freq(self):
        codes={};
        for id in self.tweets.keys():
            code=self.getCodes(id);
            if code!=None:
                for ele in code:
                    if ele not in codes:
                        codes[ele]=1;
                    else:
                        codes[ele]=codes[ele]+1;
        codes=sorted(codes.items(),key = lambda x:x[1]);
        return codes;

Put the values of your keys into these variables

In [3]:
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''

In [4]:
from tweepy import OAuthHandler

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

We will also load some routines that we defined in [Part 3](SocialMedia - Part 3.ipynb):
    
1. Our routine for creating a customized NLP pipeline
2. Our routine for including tokens
3. The `filterTweetTokens` routine defined in an exercise (Without the inclusion of named entities. It will be easier to leave them out for now).

In [5]:
def getTwitterNLP():
    nlp = spacy.load('en')
    
    for word in nlp.Defaults.stop_words:
        lex = nlp.vocab[word]
        lex.is_stop = True
    
    special_case = [{ORTH: u'e-cigarette', LEMMA: u'e-cigarette', POS: u'NOUN'}]
    nlp.tokenizer.add_special_case(u'e-cigarette', special_case)
    nlp.tokenizer.add_special_case(u'E-cigarette', special_case)
    vape_case = [{ORTH: u'vape',LEMMA:u'vape',POS: u'NOUN'}]
    
    vape_spellings =[u'vap',u'vape',u'vaping',u'vapor',u'Vap',u'Vape',u'Vapor',u'Vapour']
    for v in vape_spellings:
        nlp.tokenizer.add_special_case(v, vape_case)
    def hashtag_pipe(doc):
        merged_hashtag = True
        while merged_hashtag == True:
            merged_hashtag = False
            for token_index,token in enumerate(doc):
                if token.text == '#':
                    try:
                        nbor = token.nbor()
                        start_index = token.idx
                        end_index = start_index + len(token.nbor().text) + 1
                        if doc.merge(start_index, end_index) is not None:
                            merged_hashtag = True
                            break
                    except:
                        pass
        return doc
    nlp.add_pipe(hashtag_pipe,first=True)
    return nlp

def includeToken(tok):
    val =False
    if tok.is_stop == False:
        if tok.is_alpha == True: 
            if tok.text =='RT':
                val = False
            elif tok.pos_=='NOUN' or tok.pos_=='PROPN' or tok.pos_=='VERB':
                val = True
        elif tok.text[0]=='#' or tok.text[0]=='@':
            val = True
    if val== True:
        stripped =tok.lemma_.lower().strip()
        if len(stripped) ==0:
            val = False
        else:
            val = stripped
    return val

def filterTweetTokens(tokens):
    filtered=[]
    for t in tokens:
        inc = includeToken(t)
        if inc != False:
            filtered.append(inc)
    return filtered

Finally, we will include some additional modules from Scikit-Learn:

In [6]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.base import TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.stop_words import ENGLISH_STOP_WORDS
from sklearn.metrics import accuracy_score
import string
import re

Now, we're ready to go along for an exercise

Identifying the source of social media comments might be an important step in the process of interpreting a large corpus. Continuing with our example of smoking and vaping, it might be interesting to compare tweets from users - people who are talking about their own personal use  to those who might be either promoting vaping  (manufacturers, sponsors, etc.) or warning about dangers of vaping (physicians, researchers, public health agencies, etc.).

A team of researchers at RTI International tackled this problem in a 2018 paper [Classification of Twitter Users Who Tweet About E-Cigarettes](http://publichealth.jmir.org/2017/3/e63/) by Annice Kim and colleagues collected tweets and attributed them to individuals, enthusiasts, "informed agencies (news media or health community), marketers, or spammers. 

Your goal here is to collect a small data set and to attempt a smaller version of this challenge. Specifically, we will try to collect preliminary data for a classifier capable of identifing tweets from users of e-cigarettes vs. others.  Using any of the code found in Parts 1-4, complete these steps:

1. Run some searches for tweets like 'e-cig', 'e-cigarette', 'vape' and 'vaping'. Collect a corpus of 200-300  or more tweets. You might want to save each of these result sets in files.

2. Combine these tweets into one large collection using the 'Tweet' class listed above. Save the results in a file 

3. Annotate 50 of these tweets as pertaining to either 'individual' or 'non-individual'. Be sure that you do at least a few of the tweets from each of the original sets. One way to do this might be to randomize the tweets. Save the annotated results in a file. 

4.Review at the distrbution. Is it close to even? If not, do more.

5. Take your annotated tweets - split them into train (80%) and test (20%) sets.  Process the train data and build a model (based on a TfIdf Vectorizer and an SVM). Evaluate the model on the test data sets.

6. Test your model on the remaining tweets. What does your result look like?

7. Review some of the data to identify opportunities for improvement - how might you make these models bettter?

8. Reflect on the reproducibility and the reusability of the code: what should be done to make these tools easier to apply to other datasets.



----
*ANSWER FOLLOWS - insert answer here*

Step 1-2

In [10]:
tweets1 = Tweets("electronic cigarette",19)

In [11]:
tweets2 = Tweets("e-cigarette",49)

In [13]:
tweets3 = Tweets("vaping",100)

In [16]:
tweets4 = Tweets("vaporizer",14)

In [17]:
tweets5 = Tweets("e-cig",31)

In [19]:
tweets=tweets1;

In [20]:
tweets.combineTweets(tweets2);
tweets.combineTweets(tweets3);
tweets.combineTweets(tweets4);
tweets.combineTweets(tweets5);

In [21]:
tweets.countTweets()

211

In [22]:
tweets.saveTweets('e-cig.json')

Step 3

Note: I discovered that some tweets with different ids could have identical text. To deal with such cases, my solution ensures that all the 50 tweets are different both in id and text.

In [32]:
ids=random.sample(list(tweets.getIds()),100);
working=[];
text=[];
c=0;
for i in range(len(ids)):
    tx=tweets.getText(ids[i]);
    if ids[i] not in working and tx not in text:
        working.append(ids[i]);
        text.append(tx);
        c=c+1;
    if c==50:
        break;
for i in range(len(working)):
    print("Tweet %d: "%(i))
    print(tweets.getText(working[i]))
    print("Annotation: ",tweets.getCodes(working[i]))

Tweet 0: 
The big e-cigarette debate: do they do more harm than good? #LungCancerAwarenessMonth https://t.co/yeMKSYNNzd
Annotation:  None
Tweet 1: 
Today: @US_FDA public hearing: Eliminating Youth Electronic Cigarette and Other Tobacco Product Use: The Role for Drug Therapies: https://t.co/n3NdHIB2Dm
Annotation:  None
Tweet 2: 
I have a lot of strong feelings and questions about this, but I wonder why this is just on e-cigarettes rather than on cigarettes and cigarillos (the significantly more addictive and harmful source of nicotine) @SGottliebFDA @FDATobacco 

https://t.co/063OcJYfqW
Annotation:  None
Tweet 3: 
@TIME They’re also vaping, which drives me nuts
Annotation:  None
Tweet 4: 
New episode is up and running! Jonny M returns! Jim and Jon settle a debate that the @WrestlingDorks had on their show and we talk about local sandwich shops, our #JustSayNOvember challenge, #vaping
and alot more. Enjoy. 

#nophonypodcastnetwork

https://t.co/9F1RnsWKFV
Annotation:  None
Tweet 5: 
http

In [33]:
idv=[2,3,7,8,9,10,11,13,18,19,20,22,26,27,28,32,36,37,38,39,40,42,43,45,47,49];
for i in range(len(working)):
    if i in idv:
        tweets.addCodes(working[i],['individual']);
    else:
        tweets.addCodes(working[i],['non-individual']);

In [34]:
for i in range(len(working)):
    print("Tweet %d: "%(i))
    print(tweets.getText(working[i]))
    print("Annotation: ",tweets.getCodes(working[i]))

Tweet 0: 
The big e-cigarette debate: do they do more harm than good? #LungCancerAwarenessMonth https://t.co/yeMKSYNNzd
Annotation:  {'non-individual'}
Tweet 1: 
Today: @US_FDA public hearing: Eliminating Youth Electronic Cigarette and Other Tobacco Product Use: The Role for Drug Therapies: https://t.co/n3NdHIB2Dm
Annotation:  {'non-individual'}
Tweet 2: 
I have a lot of strong feelings and questions about this, but I wonder why this is just on e-cigarettes rather than on cigarettes and cigarillos (the significantly more addictive and harmful source of nicotine) @SGottliebFDA @FDATobacco 

https://t.co/063OcJYfqW
Annotation:  {'individual'}
Tweet 3: 
@TIME They’re also vaping, which drives me nuts
Annotation:  {'individual'}
Tweet 4: 
New episode is up and running! Jonny M returns! Jim and Jon settle a debate that the @WrestlingDorks had on their show and we talk about local sandwich shops, our #JustSayNOvember challenge, #vaping
and alot more. Enjoy. 

#nophonypodcastnetwork

https://

In [37]:
cidv=0;cnidv=0;
for i in range(len(working)):
    if "non-individual" in tweets.getCodes(working[i]):
        cnidv=cnidv+1;
    else:
        cidv=cidv+1;
print("Numbers of individual tweets in the random set:\n",cidv);
print("Numbers of non-individual tweets in the random set:\n",cnidv);

Numbers of individual tweets in the random set:
 26
Numbers of non-individual tweets in the random set:
 24


Fortunately, the random set has 24 non-individual tweets and 26 individual tweets, which is almost even and thus no need to add more for a balance set.

In [38]:
tweets.saveTweets("idv-annotated.json")

Step 4-1 and 4-2

In [58]:
def flattenTweets(tweets):
    flat=[]
    for i in tweets.getIds():
        if tweets.getCodes(i)!=None:
            txt = tweets.getText(i);
            cat = list(tweets.getCodes(i))[0];
            pair =(txt,cat)
            flat.append(pair)
    return flat

In [59]:
flat=flattenTweets(tweets)
flat

[('The FireFly 2’s Convection Technology Delivers Phenomenal Flavor\n🍁\n🍁\n🍁\n#fireflyvaporizer #herbalvaporizer #dabvaporizer #vaporizer #vape #canadiancannabis #cannabiscommunity #bcbud #weedmodel #abbotsford #bckush #canadianstoners #vancouver #firefly2 #canadianvaporizers https://t.co/qDlzKwFC11',
  'non-individual'),
 ('Dr. Camenga #AAP18. 7 to 10 teens were exposed to e-cig advertisements in 2016',
  'non-individual'),
 ('Get the vape that puts YOU in control! Try one today! Orochi N200 200W Box Mod Sub-Ohm VV/VW Battery w/Temp Control https://t.co/ua74BVIlX2 .@Cuecig  #vape #ejuice #vapesale #sale #vaping #vapelife #vapefam #ecig #today #free https://t.co/kQKrXmWBY1',
  'non-individual'),
 ('S O U R  B L A S T  .  C O M I N G  S O O N 💨💨 💨 #iblissvapor #vapors #vape #vaping💨 #vapingmurah #ibliss#vapingsavedmylife #liquid #liquidmurah#liquids #liquidlokal #vapeonthis #vapesafe#vapecapitol #vapelyfe #vapetorontoo#vapingsaveslives #vapeforever #ecigs https://t.co/RCZ2ye3iRH',
  'no

In [60]:
def getTestTrainSplit(pairs,splitFactor=0.8):
    random.shuffle(pairs)
    split=int(len(pairs)*splitFactor)
    train=pairs[:split]
    test =pairs[split:]
    return train,test

train,test=getTestTrainSplit(flat);
print(str(len(train))+ " "+str(len(test)))

40 10


In [51]:
def performanceCal(testCats,preds,truecat):
    tp=0;fp=0;tn=0;fn=0;
    for i in range(len(testCats)):
        if testCats[i] == preds[i] and testCats[i]==truecat:
            tp=tp+1;
        if testCats[i] == preds[i] and testCats[i]!=truecat:
            tn=tn+1;
        if testCats[i] != preds[i] and testCats[i]==truecat:
            fp=fp+1;
        if testCats[i] != preds[i] and testCats[i]!=truecat:
            fn=fn+1;
    precision=tp/(tp+fp);
    recall=tp/(tp+fn);
    result=(precision,recall);
    print("Precision (predicting catagory %s): %f"%(truecat,precision));
    print("Recall    (predicting catagory %s): %f"%(truecat,recall));
    return result

In [54]:
def tokenizeText(text):
    nlp=getTwitterNLP()
    tokens=nlp(text)
    return filterTweetTokens(tokens)

In [61]:
from sklearn.feature_extraction.text import TfidfVectorizer
trainTexts,trainCats=zip(*train);
testTexts,testCats=zip(*test);
vectorizer= TfidfVectorizer(tokenizer=tokenizeText,preprocessor=lambda x: x);
clf = LinearSVC();
pipe = Pipeline([('vectorizer', vectorizer), ('clf', clf)]);
pipe.fit(trainTexts,trainCats);
preds = pipe.predict(testTexts);

In [64]:
performance=performanceCal(testCats,preds,"individual");
acc=accuracy_score(testCats, preds);
print("Accuracy  (predicting catagory %s): %f"%("individual",acc));

Precision (predicting catagory individual): 0.600000
Recall    (predicting catagory individual): 0.750000
Accuracy  (predicting catagory individual): 0.700000


In [65]:
performance=performanceCal(testCats,preds,"non-individual");
acc=accuracy_score(testCats, preds);
print("Accuracy  (predicting catagory %s): %f"%("non-individual",acc));

Precision (predicting catagory non-individual): 0.800000
Recall    (predicting catagory non-individual): 0.666667
Accuracy  (predicting catagory non-individual): 0.700000


The accuracy is just 70%. To know whether this is an incident or not, I perform another try as follows:

In [68]:
flat=flattenTweets(tweets);
train,test=getTestTrainSplit(flat);
trainTexts,trainCats=zip(*train);
testTexts,testCats=zip(*test);
vectorizer= TfidfVectorizer(tokenizer=tokenizeText,preprocessor=lambda x: x);
clf = LinearSVC();
pipe = Pipeline([('vectorizer', vectorizer), ('clf', clf)]);
pipe.fit(trainTexts,trainCats);
preds = pipe.predict(testTexts);

In [69]:
performance=performanceCal(testCats,preds,"individual");
acc=accuracy_score(testCats, preds);
print("Accuracy  (predicting catagory %s): %f"%("individual",acc));

Precision (predicting catagory individual): 0.857143
Recall    (predicting catagory individual): 0.857143
Accuracy  (predicting catagory individual): 0.800000


In [71]:
flat=flattenTweets(tweets);
train,test=getTestTrainSplit(flat);
trainTexts,trainCats=zip(*train);
testTexts,testCats=zip(*test);
vectorizer= TfidfVectorizer(tokenizer=tokenizeText,preprocessor=lambda x: x);
clf = LinearSVC();
pipe = Pipeline([('vectorizer', vectorizer), ('clf', clf)]);
pipe.fit(trainTexts,trainCats);
preds = pipe.predict(testTexts);
performance=performanceCal(testCats,preds,"individual");
acc=accuracy_score(testCats, preds);
print("Accuracy  (predicting catagory %s): %f"%("individual",acc));

Precision (predicting catagory individual): 0.714286
Recall    (predicting catagory individual): 0.714286
Accuracy  (predicting catagory individual): 0.600000


The accuracy reach 80% and 60% in these two extra tries (with different data in the train and test set). After running several times and getting the average, the results show that the model does exhibit some classification ability, but it is far from satisfying (hardly reach 90%). 

However, I consider this reasonable since it is even sometimes hard for me to classify the tweets into individual or non-individual by just looking into the text (when I annotated the tweets, sometimes I had to click into the tweet and read the profile of the account to figure out whether it is an individual tweet or not, relying on some extra information).

Step 4-3

In terms of making the model better, one effective way may be enlarging the dataset (both train and test set), and annotate them very accurately (read the account profile of each tweet to learn about the identity of the author instead of annotating directly). Complex patterns and potential features that cannot be observed by human will be able to be obtained by models when the train set is large enough. 

On the other hand, adding a new feature: the profile description of the user that send a tweet into the data, instead of only the tweet text, will also be helpful. In part 1, we have defined a getAuthors function which can retrieve the authors of tweets. Both the id and the user description will be helpful.

Step 4-4

Make these tools easier to apply to other datasets? Simplest way, just turn all of them into functions, and form a pipeline by defining a new function which successively calls all of them.

In previous functions, I have already written some of them in a more universal way, such as the function plotData in exercise 4-4(the challenge part), which can make plot based on different data and label.

In [None]:
def plotData(par,data,yaxis):
    plt.rcParams['figure.figsize'] = (14.0, 14.0);
    plt.plot(par, data);
    plt.xlabel("Ratio of training data",fontsize=14);
    plt.ylabel(yaxis,fontsize=14);
    plt.title("%s with different proportions of test and train data"%yaxis,fontsize=15);
    for a, b in zip(par,data):
        if a==0.1:
            plt.text(a, b,"(%.1f, %.4f)"%(a,b),va="bottom",ha="left",fontsize=13);
        else:
            plt.text(a, b,"(%.1f, %.4f)"%(a,b),va="top",ha="right",fontsize=13);
    plt.show()

This could be a good example of code reusability. By defining yaxis as a function parameter and using %s to plot the label, I can plot all the three graph (accuracy, recall, and precision) with this same function.

The performanceCal function in this part is also a relatively reusable function:

In [None]:
def performanceCal(testCats,preds,truecat):
    tp=0;fp=0;tn=0;fn=0;
    for i in range(len(testCats)):
        if testCats[i] == preds[i] and testCats[i]==truecat:
            tp=tp+1;
        if testCats[i] == preds[i] and testCats[i]!=truecat:
            tn=tn+1;
        if testCats[i] != preds[i] and testCats[i]==truecat:
            fp=fp+1;
        if testCats[i] != preds[i] and testCats[i]!=truecat:
            fn=fn+1;
    precision=tp/(tp+fp);
    recall=tp/(tp+fn);
    result=(precision,recall);
    print("Precision (predicting catagory %s): %f"%(truecat,precision));
    print("Recall    (predicting catagory %s): %f"%(truecat,recall));
    return result

As shown above, this function of mine can calculate the precision and recall based on "which you think is the true category". No extra function has to be defined if you would like to change the true category.

From these two examples, if we want to improve the reproducibility and the reusability of codes, we can also add more parameters to functions, so that the users will have more freedom of customizing the input according their own demands. This is what I would usually like to do when solving problems with coding. You may find many functions in the previous parts of my homework that have higher reusability by applying this method.

*END ANSWER*

---