# Project Cobra
### Logan Snyder, Iluda Ko, Brett Simmons, Joey Markun
This notebook will show various statistics of vehicle sales at the Ames Ford dealership, and compare those to some national car data. It will also consider tweets made about certain cars and how that compares to how well that car is selling. 

In [1]:
import tweepy
from keys import *
import requests
import pandas as pd
from textblob import TextBlob

client = tweepy.Client(bearer_token, api_key, api_secret_key, access_token, access_token_secret)
client = tweepy.Client( bearer_token=bearer_token, 
                        consumer_key=api_key, 
                        consumer_secret=api_secret_key, 
                        access_token=access_token, 
                        access_token_secret=access_token_secret, 
                        return_type = requests.Response,
                        wait_on_rate_limit=True)
# Define query
query = '(Ford F150) lang:en'
# get tweets
tweets = client.search_recent_tweets(query=query, 
                                    tweet_fields=['created_at'],
                                     max_results=50)
# Save data as dictionary
tweets_dict = tweets.json() 
# Extract "data" value from dictionary
tweets_data = tweets_dict['data'] 
# Transform to pandas Dataframe
twitter_df = pd.json_normalize(tweets_data) 

# make polarity and subjectiveness rows, using textblob to calculate the values for each
twitter_df['polarity'] = twitter_df['text'].apply(lambda x: float(TextBlob(x).sentiment.polarity)) #-1 to 1(positive)
twitter_df['subjectiveness'] = twitter_df['text'].apply(lambda x: float(TextBlob(x).sentiment.subjectivity))


# Add another row to the dataframe called sentiment. If the polarity is above 0.2, enter 'pos'. 
# If the sentiment is <-0.2 enter 'neg'. For polarity -0.2 to 0.2, count this as 'nt' (for neutral).
twitter_df.loc[:, 'sentiment'] = 'nt' #set all rows to neutral first 
twitter_df.loc[twitter_df['polarity'] > 0.2, 'sentiment'] = 'pos' #set all rows above this threshold to pos
twitter_df.loc[twitter_df['polarity'] < -0.2, 'sentiment'] = 'neg' #set the column 'sentiment' to neg for all rows w/ polarity below this threshold 

# calculate Total and percentage positives, negatives, and neutrals. 
pos_count = len(twitter_df.loc[twitter_df['sentiment'] == 'pos']) # the amount of rows where the sentiment is positive 
neg_count = len(twitter_df.loc[twitter_df['sentiment'] == 'neg'])
nt_count = len(twitter_df.loc[twitter_df['sentiment'] == 'nt']) 

twitter_df.head(5)

Unnamed: 0,created_at,id,text,polarity,subjectiveness,sentiment
0,2022-04-30T16:23:14.000Z,1520438638554800133,RT @InsideEVsForum: If you haven't watched or ...,0.375,0.5,pos
1,2022-04-30T16:16:14.000Z,1520436879044321280,RT @InsideEVs: If you haven't watched or liste...,0.375,0.5,pos
2,2022-04-30T16:15:58.000Z,1520436808689102848,RT @DonotInnovate: @richsignorelli Some exciti...,0.516667,0.733333,pos
3,2022-04-30T16:14:48.000Z,1520436517340168195,If you haven't watched or listened to this wee...,0.375,0.5,pos
4,2022-04-30T16:14:25.000Z,1520436418077728768,If you haven't watched or listened to this wee...,0.375,0.5,pos


In [2]:
national_df = pd.read_csv("https://raw.githubusercontent.com/iludako/final_project/main/2021%20TruckSUV%20Sales.csv") 
#national sales data dictionary to access data later
national_data_dict = {}
for i in national_df.index:
    model = national_df.loc[i]['Model']
    sales = national_df.loc[i]['Total']
    sales = int(sales.replace(",",""))
    national_data_dict[model] = sales
national_df.head(3)

Unnamed: 0,Model,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,Total,Unnamed: 14
0,Chevrolet Colorado,7707,7707,8670,4989,4797,4989,4285,4126,4285,7151,7151,7151,73008,
1,Chevrolet Silverado,40509,40509,45573,55623,53484,55623,41535,39997,41535,38459,38459,38459,529765,
2,Ford F-Series,55276,64478,84043,66302,46260,45672,52314,57321,63164,68259,60418,62496,726003,


In [3]:
local_df = pd.read_csv("https://raw.githubusercontent.com/iludako/final_project/main/MIS307%20Final%20Project%20Database.csv") 
local_df.head(3)


Unnamed: 0,StockNo,Customer ID,VehicleType,Model,CarTrim,DateSold,IsHybrid,IsDiesel,IsElectric,MPG,Range,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17
0,16517,7,Car,Mustang,EcoBoost,1/6/21,N,N,N,26.5,-,,,,,,,
1,16385,62,Car,Mustang,EcoBoost,2/18/21,N,N,N,26.5,-,,,,,,,
2,15687,185,Car,Mustang,EcoBoost,5/15/21,N,N,N,26.5,-,,,,,,,


In [4]:
# set dictionary for counting trim occurances for the F-150 in local data
trim_count = {}
for i in local_df.index:
    if local_df.loc[i]['Model'] == "F-150":
        trim = local_df.loc[i]['CarTrim']
        if trim in trim_count: 
            trim_count[trim] +=1
        else: 
            trim_count[trim] = 1
        
print(trim_count)

{'King Ranch': 10, 'Lariat': 18, 'Limited': 5, 'Platinum': 7, 'Raptor': 4, 'XL': 41, 'XLT': 25}


In [5]:
# set dictionary for counting model occurances in local data
model_count = {}
for i in local_df.index:
    model = local_df.loc[i]['Model']
    if model in model_count: 
        model_count[model] +=1
    else: 
        model_count[model] = 1
        
print(model_count)

{'Mustang': 16, 'F-150': 110, 'F-250': 7, 'F-350': 13, 'F-450': 3, 'Expedition': 6, 'Explorer': 38, 'F-550': 1, 'Fusion': 6, 'Escape': 61, 'EcoSport': 10}


In [9]:
print("Ford F-Series truck sales nationwide compared to at the Ford dealership in Ames: \n\n")
print(f"The most popular selling Ford vehicle for the Ames dealership is the F-150, and it accounts for {model_count['F-150']/len(local_df.loc[:]) *100 :.2f}% of all sales at the dealership. ")
print(f"The Ford F-150 has various trim levels, and the most commonly sold trim in Ames is the 'XL' trim. Approximately {trim_count['XL']/model_count['F-150'] *100 :.2f}% of the F-150 trucks sold at the Ames deaslership were sold in this trim level.  ")


print(f"\n\nThe Ford F-Series accounted for {national_data_dict['Ford F-Series']/national_data_dict['All_Vehicle_Sale_Totals'] *100 :.2f}% of sales nation wide. ")

print("\n\n50 recent tweets about the Ford F-150 have the following average positivity levels: ")
print(f'% Positive Tweets: {pos_count/len(twitter_df) *100:.2f}% ')
print(f'% Negative Tweets: {neg_count/len(twitter_df) *100:.2f}% ')
print(f'% Negative Tweets: {nt_count/len(twitter_df) *100:.2f}% ')

Ford F-Series truck sales nationwide compared to at the Ford dealership in Ames: 


The most popular selling Ford vehicle for the Ames dealership is the F-150, and it accounts for 40.59% of all sales at the dealership. 
The Ford F-150 has various trim levels, and the most commonly sold trim in Ames is the 'XL' trim. Approximately 37.27% of the F-150 trucks sold at the Ames deaslership were sold in this trim level.  


The Ford F-Series accounted for 12.03% of sales nation wide. 


50 recent tweets about the Ford F-150 have the following average positivity levels: 
% Positive Tweets: 46.00% 
% Negative Tweets: 10.00% 
% Negative Tweets: 44.00% 
