# **Analytical Comparison Between CEO Concerns**


> ## _Author: Ronald Washington III_ 

> > Our society is currently run by some of the worlds largest and smartest technology
companies. Where each of these companies have a vision and impact on the world due to the
views of their CEO. But how do the views between these CEOs differ? What are the topics that
these CEOs are constantly concerned with and are expressing their opinions? Within this study
we shall collect tweets from selected CEOs, determine if we are able to identify the tweet’s
originator correctly, and extract the topic of each tweet to have a collection of concerns that each
CEO has. From there the analysis can be furthered by calculating the distance between the issues
that the CEOs talk about and have an understanding of how closely related their concerns are.

> > Conducting our analysis through the assistance of an classification algorithm will aid in
our research concerning the topics that CEOs primarily tweet about. The specific algorithm that
will be performed in this research will be Support Vector Machines, which is known to be useful
in the classification of images, bioinformatics, and text categorization. Support Vector Machines
(SVMs) are going to be useful determining which CEOs a tweet belong to, and we will be
utilizing Latent Dirichlet Allocation to extract prominent topics among the classified tweets.
Through the usage of both these techniques, we strive to accomplish our goal of classifying
tweets and finding the common topics discussed among CEOs. By the end of this research we
shall attempt to distinguish the difference between topics of CEOs, the similarities/differences,
and the number of times a specific topic is tweeted.


### Imported Libraries 

In [1]:
import keys as k
import pandas as pd
import webbrowser
import requests
import bs4
import importlib
import tweepy
import json 
from requests_oauthlib import OAuth1Session
importlib.reload(k)

<module 'keys' from 'C:\\Users\\Ron\\Desktop\\CS401\\Final_Project\\CEO_Classifier_Extractor\\keys.py'>

### Preparing/Gathering Authorization for Twitter Access

In [2]:
twitter = k.twitter
client_key = twitter["client_key"]
client_secret = twitter["client_secret"]
resource_owner_key = twitter['resource_owner_key']
resource_owner_secret = twitter['resource_owner_secret']
protected_url = 'https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name=jlist'

session = OAuth1Session(client_key,
                        client_secret=client_secret,
                        resource_owner_key=resource_owner_key,
                        resource_owner_secret= resource_owner_secret)

consumer_key = twitter["client_key"]
consumer_secret = twitter["client_secret"]
access_token = twitter['resource_owner_key']
access_token_secret =  twitter['resource_owner_secret']
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

# Developing a list of CEOs

> Within this study we will be utlizing a list of CEOs provided by Valeria Maltoni (@ConversationAge), who created a member group/list of CEOs who use Twitter. 
- https://twitter.com/ConversationAge/lists/ceos/members?lang=en

In [3]:

group_name = 'ceos'
owner = 'ConversationAge'


def get_list_members(api, owner, group_name):
    members = []
    # without this you only get the first 20 list members
    for page in tweepy.Cursor(api.list_members, owner, group_name).items():
        members.append(page)
    # create a list containing all usernames
    return [ m.screen_name for m in members ]

In [5]:
len(get_list_members(api,owner,group_name))

902

In [7]:
CEO_list = pd.DataFrame()
CEO_list['CEO_Twitter_Accounts'] = get_list_members(api,owner,group_name)
CEO_list.head(5)

Unnamed: 0,CEO_Twitter_Accounts
0,timothyldaniel
1,AmitKBouri
2,MohnishPabrai
3,BrettHickeySMC
4,DianneCalvi


# Gathering Tweets

In [20]:
import tweepy
import json
def tweetCollector(dataframe):
    """
    Creates a CSV for every CEO's tweets 
    """
    list_tweets = []
    results = []
    for i in dataframe:
        #print(i)
        for status in tweepy.Cursor(api.user_timeline, screen_name='@'+i).items():

            list_tweets.append(status._json['text'])
            results.append(status._json['user']['name'])
            DataSet = pd.DataFrame()

            DataSet['tweet'] = [tweet for tweet in list_tweets]
            DataSet['name'] = [tweet for tweet in results]

            DataSet.to_csv(i+"_TWEETS.csv")

    return DataSet 
tweetCollector(CEO_list['CEO_Twitter_Accounts']).head()

Unnamed: 0,tweet,name
0,RT @adamsaperia: Thanks for joining us today M...,Miles S. Nadal
1,Billionaire Drahi Takes it Down to the Wire at...,Miles S. Nadal
2,https://t.co/n4txlz1rem,Miles S. Nadal
3,Toronto real estate market sputters toward hol...,Miles S. Nadal
4,Why Prem Watsa sees value in Stelco https://t....,Miles S. Nadal


- Figure out how to get all the tweets for each CEO
- Put all tweets together into a large corpus (Each CEO gets ~3200 tweets and here are 902 Tweets so 288k tweets roughly)
- Perform Support Vector Machine Algorithm to classify which tweet belongs to which CEO