# **Analytical Comparison Between CEO Concerns**


> ## _Author: Ronald Washington III_ 

> > Our society is currently run by some of the worlds largest and smartest technology
companies. Where each of these companies have a vision and impact on the world due to the
views of their CEO. But how do the views between these CEOs differ? What are the topics that
these CEOs are constantly concerned with and are expressing their opinions? Within this study
we shall collect tweets from selected CEOs, determine if we are able to identify the tweet’s
originator correctly, and extract the topic of each tweet to have a collection of concerns that each
CEO has. From there the analysis can be furthered by calculating the distance between the issues
that the CEOs talk about and have an understanding of how closely related their concerns are.

> > Conducting our analysis through the assistance of an classification algorithm will aid in
our research concerning the topics that CEOs primarily tweet about. The specific algorithm that
will be performed in this research will be Support Vector Machines, which is known to be useful
in the classification of images, bioinformatics, and text categorization. Support Vector Machines
(SVMs) are going to be useful determining which CEOs a tweet belong to, and we will be
utilizing Latent Dirichlet Allocation to extract prominent topics among the classified tweets.
Through the usage of both these techniques, we strive to accomplish our goal of classifying
tweets and finding the common topics discussed among CEOs. By the end of this research we
shall attempt to distinguish the difference between topics of CEOs, the similarities/differences,
and the number of times a specific topic is tweeted.


### Imported Libraries 

In [1]:
import keys as k
import pandas as pd
import webbrowser
import requests
import bs4
import importlib
import tweepy
import json 
from requests_oauthlib import OAuth1Session
importlib.reload(k)

<module 'keys' from 'C:\\Users\\Bloody Dachi\\Documents\\CS_401\\Final_Project\\CEO-Topic-Classifier-Extractor\\keys.py'>

### Preparing/Gathering Authorization for Twitter Access

In [2]:
twitter = k.twitter
client_key = twitter["client_key"]
client_secret = twitter["client_secret"]
resource_owner_key = twitter['resource_owner_key']
resource_owner_secret = twitter['resource_owner_secret']
protected_url = 'https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name=jlist'

session = OAuth1Session(client_key,
                        client_secret=client_secret,
                        resource_owner_key=resource_owner_key,
                        resource_owner_secret= resource_owner_secret)

consumer_key = twitter["client_key"]
consumer_secret = twitter["client_secret"]
access_token = twitter['resource_owner_key']
access_token_secret =  twitter['resource_owner_secret']
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)

# Developing a list of CEOs

> Within this study we will be utlizing a list of CEOs provided by Valeria Maltoni (@ConversationAge), who created a member group/list of CEOs who use Twitter. 
- https://twitter.com/ConversationAge/lists/ceos/members?lang=en

In [3]:

group_name = 'ceos'
owner = 'ConversationAge'


def get_list_members(api, owner, group_name):
    members = []
    # without this you only get the first 20 list members
    for person in tweepy.Cursor(api.list_members, owner, group_name).items():
        number_tweets = person.statuses_count
        if number_tweets>=1000:
            members.append(person)
    # create a list containing all usernames
    return [ m.screen_name for m in members ]

In [4]:
len(get_list_members(api,owner,group_name))

655

In [5]:
CEO_list = pd.DataFrame()
CEO_list['CEO_Twitter_Accounts'] = get_list_members(api,owner,group_name)
CEO_list.head(5)

Unnamed: 0,CEO_Twitter_Accounts
0,DrJudyMonroe
1,JillStorey2020
2,co_opcloud
3,PaulPolman
4,KenLingad


In [6]:
len(CEO_list)

655

In [7]:
PrivateList = []
TweetCountList = []
import time
t0 = time.time()

for i in range(len(CEO_list)):
    number_tweets = api.get_user(CEO_list['CEO_Twitter_Accounts'][i]).statuses_count
    if number_tweets>=1000:
        x = api.get_user(CEO_list['CEO_Twitter_Accounts'][i])
        TweetCountList.append(number_tweets)
        PrivateList.append(x.protected)
    #print(x.screen_name,x.protected)
t1 = time.time()
total = ((t1-t0)/60)
print("Time to Run: ", total)
import collections
counter=collections.Counter(PrivateList)
print("Count of Non-Private Versus Private Twitter Accounts: ", counter)

Time to Run:  16.520678122838337
Count of Non-Private Versus Private Twitter Accounts:  Counter({False: 651, True: 4})


In [8]:
se = pd.Series(PrivateList)
te = pd.Series(TweetCountList)
CEO_list['Private_Indicator'] = se.values
CEO_list['Tweet_Count'] = te.values

In [9]:
CEO_list[25:30]

Unnamed: 0,CEO_Twitter_Accounts,Private_Indicator,Tweet_Count
25,techUKCEO,False,3274
26,morganberman,False,1553
27,jane_knows,False,1373
28,limouris,False,2192
29,Monique_Villa,False,1532


In [10]:
CEO_list_cleaned = pd.DataFrame()
CEO_list_cleaned = CEO_list
CEO_list_cleaned = CEO_list_cleaned[CEO_list['Private_Indicator']!=True]

In [11]:
len(CEO_list_cleaned)

651

In [12]:
CEO_list_cleaned[24:222]

Unnamed: 0,CEO_Twitter_Accounts,Private_Indicator,Tweet_Count
24,HaniSFarsi,False,1109
25,techUKCEO,False,3274
26,morganberman,False,1553
27,jane_knows,False,1373
28,limouris,False,2192
29,Monique_Villa,False,1532
30,DelMonteLouis1,False,3639
31,AbdelQaader,False,29153
32,CarolynAHardy,False,9181
33,heylizelle,False,19947


# Gathering Tweets

In [13]:
import tweepy
import json
import time
def tweetCollector(dataframe):
    """
    Creates a CSV for every CEO's tweets 
    """
    function_start = time.time()
    list_tweets = []
    name = []
    user_name = []
    count = 0
    for i in dataframe:
        
        for page in tweepy.Cursor(api.user_timeline, screen_name='@'+i,include_rts = False, tweet_mode = 'extended').pages(100):
            for status in page:

                list_tweets.append(status._json['full_text'])
                name.append(status._json['user']['name'])
                user_name.append(status._json['user']['screen_name'])
                DataSet = pd.DataFrame()

                DataSet['Tweet'] = [tweet for tweet in list_tweets]
                DataSet['CEO_Full_Name'] = [tweet for tweet in name]
                DataSet['CEO_User_Name'] = [tweet for tweet in user_name]

    #either one CSV for all the CEO
        count +=1
        print("CEO #: ", count, "CEO Name: ",i)
    DataSet.to_csv("Collection_CEO_TWEETS.csv")
    
    function_end = time.time()
    total_function_time = ((function_end-function_start)/60)
    print("Time to Complete: ", total_function_time)
    
    return DataSet
#tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][:50]) 1
#tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][50:99])2
#tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][99:149])3
#tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][149:199])4
#tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][199:249])5
#tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][249:299])6
#tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][299:349])7
#tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][349:399])8
tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][399:449])
#tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][449:499])
#tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][499:549])
#tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][549:599])
#tweetCollector(CEO_list_cleaned['CEO_Twitter_Accounts'][599:])

CEO #:  1 CEO Name:  databrett
CEO #:  2 CEO Name:  ThomRainer
CEO #:  3 CEO Name:  scottharrison
CEO #:  4 CEO Name:  Brand_Crowd
CEO #:  5 CEO Name:  juliahartz
CEO #:  6 CEO Name:  bhargreaves
CEO #:  7 CEO Name:  brianshin
CEO #:  8 CEO Name:  jonmacdonald
CEO #:  9 CEO Name:  richr
CEO #:  10 CEO Name:  davidadler
CEO #:  11 CEO Name:  StandardofTrust
CEO #:  12 CEO Name:  fspeiser
CEO #:  13 CEO Name:  simonharrow
CEO #:  14 CEO Name:  rafat
CEO #:  15 CEO Name:  GumboShowJoe
CEO #:  16 CEO Name:  GayGaddis
CEO #:  17 CEO Name:  RobVandenberg
CEO #:  18 CEO Name:  DrewNeisser
CEO #:  19 CEO Name:  entwistletx
CEO #:  20 CEO Name:  Schwartzie14
CEO #:  21 CEO Name:  jradoff
CEO #:  22 CEO Name:  samdecker
CEO #:  23 CEO Name:  TerezaN
CEO #:  24 CEO Name:  TinaHui
CEO #:  25 CEO Name:  tombed
CEO #:  26 CEO Name:  SueMarks
CEO #:  27 CEO Name:  rsingh68
CEO #:  28 CEO Name:  Jon_Ferrara
CEO #:  29 CEO Name:  r2rothenberg
CEO #:  30 CEO Name:  alarno
CEO #:  31 CEO Name:  rebrivved

Unnamed: 0,Tweet,CEO_Full_Name,CEO_User_Name
0,I just published The Entrepreneur’s Essentials...,Brett Hurt,databrett
1,Cool event happening now. Just in time for Ch...,Brett Hurt,databrett
2,"That would include us! Happy Chanukah, friend...",Brett Hurt,databrett
3,"Another real honor for us to help, thanks for ...",Brett Hurt,databrett
4,"A real honor for us to help, truly! https://t....",Brett Hurt,databrett
5,This! https://t.co/7gJUEAAI6M,Brett Hurt,databrett
6,Agreed https://t.co/ihM618BoE8,Brett Hurt,databrett
7,"So much potential, we loved it too! Thanks fo...",Brett Hurt,databrett
8,Looking forward to keynoting this on 12/6! htt...,Brett Hurt,databrett
9,@RealfoodKev @lochhead Thanks for the kind sho...,Brett Hurt,databrett


- Figure out how to get all the tweets for each CEO
- Put all tweets together into a large corpus (Each CEO gets ~3200 tweets and here are 902 Tweets so 288k tweets roughly)
- Perform Support Vector Machine Algorithm to classify which tweet belongs to which CEO

In [14]:
# df = pd.read_csv("Collection_1_CEO_TWEETS.csv",encoding = "latin1")
# df.head()

##### Here I have found that I am unable to grab the same number of tweets per person for some reason. So in the rest of this research I will be only caring about individuals that I was able to successfully grab more than 1000 tweets. 