# Assessing Language Used When Tweeting About the NCAA

How do people talk about the NCAA on Twitter? Is it positively or negatively? What are common words and sentiment associated with the NCAA?

<b>Kayla Ketring

COMM 313 Final Project<b/>

<b>Background</b> 

The National Collegiate Athletic Association, also knows as the NCAA, has been a sexy topic in recent years. The purpose of the NCAA as stated on their website is to "safeguard the well-being of student-athletes and equip them with the skills to succeed on the playing field, in the classroom and throughout life." Since 2015, the NCAA has been faced with many tough decisions revolving around college sports and collegiate athletes. Detailed below are a handful of the major events, decisions, and court cases against the NCAA since 2015. 

In 2015, Ed O'Bannon, a former UCLA basketball player sued the NCAA and filed a class-action lawsuit claiming that the athletes should be compensated for the use of their images and likenesses in television advertisements, video games, apparel, and other materials. The O'Bannon vs. NCAA lawsuit sparked an endless battle for NCAA athletes to obtain compensation for their name, image, and likeness being used by universities, companies and organizations without proper compensation. The battle is still ongoing and the Supreme Court is set to make a decision in June, 2021. 

In 2018, The NCAA spent $13.5 million more for the men's March Madness basketball tournament than it did for the women's.

In 2019, California passed a legislation that would prohibit schools from punishing atheltes who accept endorsement money while in college, The legislation is set to start in 2023 and has not been accepted by the NCAA, as it declared that the legislation is an "existential threat" to college amateur sports. The debate is continuously ongoing as 2023 is nearing closer and more states are nearing a decision to join California.

In the 2021 the Men's March Madness "bubble" in Indianapolis was well-equipped with an over-sized gym room full of squat racks, benches, barbells, and racks. However, over in San Antonio where the Women's March Madness "bubble" was located, there were only yoga mats and a couple dumbbells. The NCAA Title IX rule states that, "No person in the United States shall, on the basis of sex, be excluded from participation in, be denied the benefits of, or be subjected to discrimination under any education program or activity receiving Federal financial assistance." In simple terms, Title IX ensures that women's and men's athletics receive equal benefits across the board.  The inequality in resources provided for the womens' teams was not only embarrassing for the NCAA, but also discouraging and down-right unlawful. The NCAA apologized and quickly fixed the women's gym, but such a disaster caused the collegiate world to ask, "didn't they know this was a violation of title IX?" 

The NCAA has been challenged and continues to be challenged by athletes, universities, and even politicians to adapt and change, but NCAA decisions continue to ebb and flow in helping athletes but also hurting them. 

<b>Guiding Question</b> 

Since 2015, has sentiment about the NCAA on Twitter become more negative?

<b>The Corpus</b> 

text

# Setup and Load Data

In [13]:
import snscrape.modules.twitter as sntwitter

import json
import os
import datetime
from nltk.tokenize import TweetTokenizer, WordPunctTokenizer
from nltk.sentiment import SentimentIntensityAnalyzer

In [14]:
def load_tweets(tfile):
    
    tweets = []
    for line in open(tfile):
        try:
            tweets.append(json.loads(line))
        except:
            pass
        
    return tweets

In [15]:
def process_tweet(tweet):
    
    toks = tt.tokenize(tweet['text'])
    tweet['tokens'] = toks
    
    
    tweet['Valence']=0
    tweet['Dominance']=0
    tweet['Arousal']=0
    
    tweet['VAD_toks']=[]
    
    for t in toks:
        if t.lower() in NRC_VAD.keys():
            scores = NRC_VAD[t.lower()]
            scores['tok']=t
            
            tweet['Valence']+=scores['V']
            tweet['Arousal']+=scores['A']
            tweet['Dominance']+=scores['D']
            
            tweet['VAD_toks'].append(scores)
    
    
    for dimension in ('Valence','Arousal','Dominance'):
        if len(tweet['VAD_toks'])>0:
            tweet[dimension] /= len(tweet['VAD_toks'])

In [16]:
def download_query_tweets(query, date_since, date_until, max=1000):
    print(f"Downloading tweets for query: '{query}' from {date_since} to {date_until} (max of {max})")

    tweet_list = []
    
    query = f'{query} since:{date_since} until:{date_until}'
    
    for i,tweet in enumerate(sntwitter.TwitterSearchScraper(query).get_items()):
        if i>=max:
            break
    

        tweet_dict = {
            'id': tweet.id,
            'created_at': tweet.date.strftime('%Y-%m-%d %H:%M'),
            'text': tweet.content,
            'username': tweet.username
        }

        tweet_list.append(tweet_dict)
        
    return tweet_list

In [17]:
DATA_DIR = 'data/ncaa'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-03-08"
until = "2020-03-15"
queries = [ 'ncaa', "#ncaa"] 

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('\t retrieved {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        out.write(json.dumps(tweet_list))

Downloading tweets for query: 'ncaa' from 2020-03-08 to 2020-03-15 (max of 1000)
	 retrieved 1000 tweets...

Downloading tweets for query: '#ncaa' from 2020-03-08 to 2020-03-15 (max of 1000)
	 retrieved 1000 tweets...



# I'm assuming I have code in here that does not need to be in here and is taking up unnecessary space. Are you able to highlight what can be deleted? Or just delete it?

In [32]:
ranges = [("2020-03-08",'2020-03-08'),("2020-03-08",'2020-03-08')]

In [None]:
def Get_query(date_range,queries):
    since,until = date_range
    year = since.split('-')[0]
    if not os.path.exists('data/{}'.format(year)):
        os.makedir('data/{}'.format(year))
        
    for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('\t retrieved {} tweets...\n'.format(len(tweet_list)))
        
    return{"date-range":date_range,"results":tweet_list}

In [None]:
for date_range in ranges:
    data = Get_query(date_range,['#ncaa',''])

In [25]:
ncaa_data = json.load(open('data/ncaa/ncaa_2020-03-08_to_2020-03-15.json'))

In [26]:
ncaa_data

[{'created_at': '2020-03-14 23:59',
  'id': 1238978207669501953,
  'text': '@FootbaIl_Tweets NCAA, the others all fall behind Madden. But you also forgot the Dreamcast NFL game, that was a cool game too.',
  'username': 'StephenMaxO79'},
 {'created_at': '2020-03-14 23:59',
  'id': 1238978159296614400,
  'text': 'NCAA https://t.co/UM8CrK83vc',
  'username': 'D2_GDC'},
 {'created_at': '2020-03-14 23:59',
  'id': 1238978152149291008,
  'text': '@FootbaIl_Tweets NCAA 14, Madden 08, NFL Blitz street v2. Best in they class all the other years suck ass',
  'username': 'kramer_dan'},
 {'created_at': '2020-03-14 23:59',
  'id': 1238978147841789953,
  'text': '@JonRothstein @KentuckySports Sometime in the next two seasons Iona will be put on NCAA probation, bet on it.',
  'username': 'KaboomLip'},
 {'created_at': '2020-03-14 23:59',
  'id': 1238978147258793984,
  'text': 'damn NCAA over tho 😥',
  'username': 'CrocutaMane'},
 {'created_at': '2020-03-14 23:59',
  'id': 1238978106850869248,
  'text

In [27]:
ncaa_data2 = json.load(open('data/ncaa/#ncaa_2020-03-08_to_2020-03-15.json'))

In [28]:
ncaa_data2

[{'created_at': '2020-03-14 23:57',
  'id': 1238977595221454848,
  'text': 'https://t.co/lnXUzhVIGV #NCAA',
  'username': 'raiderway83'},
 {'created_at': '2020-03-14 23:54',
  'id': 1238976931699347467,
  'text': 'Since everyone is quarantined and no #ncaa or #nba...the #ps4 server is dead ass like: https://t.co/2zIHErEGWY',
  'username': 'His_Majesty_J'},
 {'created_at': '2020-03-14 23:50',
  'id': 1238975774197194761,
  'text': 'May Madness would be great! #MarchSadness2020 #NCAA #NCAATournament',
  'username': 'JKarpo20'},
 {'created_at': '2020-03-14 23:45',
  'id': 1238974636911722496,
  'text': 'There is no way the CFP would have been cancelled #omaha #COVIDー19  #ncaa',
  'username': 'KennyLedford21'},
 {'created_at': '2020-03-14 23:42',
  'id': 1238973909732659200,
  'text': 'To hockey to wrestling and all the other winter sports. Competing for a National Championship is an ultimate dream for many. Maybe the NCAA can re-evaluate their decision? April Madness or May Madness would 