# Project Cobra
### Logan Snyder, Iluda Ko, Brett Simmons, Joey Markun
This notebook will show various statistics of vehicle sales at the Ames Ford dealership, and compare those to some national car data. It will also consider tweets made about certain cars and how that compares to how well that car is selling. 

In [31]:
import tweepy
from keys import *
import requests
import pandas as pd
from textblob import TextBlob

client = tweepy.Client(bearer_token, api_key, api_secret_key, access_token, access_token_secret)
client = tweepy.Client( bearer_token=bearer_token, 
                        consumer_key=api_key, 
                        consumer_secret=api_secret_key, 
                        access_token=access_token, 
                        access_token_secret=access_token_secret, 
                        return_type = requests.Response,
                        wait_on_rate_limit=True)
# Define query
query = '(Ford F150) lang:en'
# get tweets
tweets = client.search_recent_tweets(query=query, 
                                    tweet_fields=['created_at'],
                                     max_results=50)
# Save data as dictionary
tweets_dict = tweets.json() 
# Extract "data" value from dictionary
tweets_data = tweets_dict['data'] 
# Transform to pandas Dataframe
twitter_df = pd.json_normalize(tweets_data) 

# make polarity and subjectiveness rows, using textblob to calculate the values for each
twitter_df['polarity'] = twitter_df['text'].apply(lambda x: float(TextBlob(x).sentiment.polarity)) #-1 to 1(positive)
twitter_df['subjectiveness'] = twitter_df['text'].apply(lambda x: float(TextBlob(x).sentiment.subjectivity))

# Add another row to the dataframe called classification. 
twitter_df.loc[:, 'classification'] = 'nt' #set all rows to neutral first 
twitter_df.loc[twitter_df['polarity'] > 0.3, 'classification'] = 'pos' #set all rows above this threshold to pos
twitter_df.loc[twitter_df['polarity'] < -0.3, 'classification'] = 'neg' #set all rows to neg w/ polarity below this threshold 

# Add another row to the dataframe called sentiment. If the polarity is above 0.2, enter 'pos'. 
# If the sentiment is <-0.2 enter 'neg'. For polarity -0.2 to 0.2, count this as 'nt' (for neutral).
twitter_df.loc[:, 'sentiment'] = 'nt' #set all rows to neutral first 
twitter_df.loc[twitter_df['polarity'] > 0.2, 'sentiment'] = 'pos' #set all rows above this threshold to pos
twitter_df.loc[twitter_df['polarity'] < -0.2, 'sentiment'] = 'neg' #set the column 'sentiment' to neg for all rows w/ polarity below this threshold 

# calculate Total and percentage positives, negatives, and neutrals. 
pos_count = len(twitter_df.loc[twitter_df['sentiment'] == 'pos']) # the amount of rows where the sentiment is positive 
neg_count = len(twitter_df.loc[twitter_df['sentiment'] == 'neg'])
nt_count = len(twitter_df.loc[twitter_df['sentiment'] == 'nt']) 

twitter_df.head(5)

Review Statistics throughout all the data: 

Positive Reviews: 14, % Positive: 28.00% 
Negative Reviews: 2, % Negative: 4.00% 
Neutral Reviews: 34, % Negative: 68.00% 


Unnamed: 0,created_at,id,text,polarity,subjectiveness,classification,sentiment
0,2022-04-29T20:57:26.000Z,1520145254124593153,2011 FORD F150 SUPERCREW BLACK Pickup 4 Doors ...,0.166667,0.466667,nt,nt
1,2022-04-29T20:55:14.000Z,1520144703626350592,The Ford F-150 Lightning has more power than e...,0.5,0.5,pos,pos
2,2022-04-29T20:54:17.000Z,1520144463082913792,RT @Flyin18T: Ford F-150 Lightning Now Makes 5...,0.0,0.0,nt,nt
3,2022-04-29T20:54:05.000Z,1520144413711691776,Ford F-150 Lightning Now Makes 580 HP With Ext...,0.0,0.0,nt,nt
4,2022-04-29T20:53:51.000Z,1520144355499065346,If a Ford F150 Lightening can tow only 2000 po...,0.068182,0.727273,nt,nt


In [2]:
local_df = pd.read_csv("https://raw.githubusercontent.com/iludako/final_project/main/MIS307%20Final%20Project%20Database.csv") 
local_df.head(3)


Unnamed: 0,StockNo,Customer ID,VehicleType,Model,CarTrim,DateSold,IsHybrid,IsDiesel,IsElectric,MPG,Range,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17
0,16517,7,Car,Mustang,EcoBoost,1/6/21,N,N,N,26.5,-,,,,,,,
1,16385,62,Car,Mustang,EcoBoost,2/18/21,N,N,N,26.5,-,,,,,,,
2,15687,185,Car,Mustang,EcoBoost,5/15/21,N,N,N,26.5,-,,,,,,,


In [21]:
national_df = pd.read_csv("https://raw.githubusercontent.com/iludako/final_project/main/2021%20TruckSUV%20Sales.csv") 
national_df.head(3)

Unnamed: 0,Model,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,Total,Unnamed: 14
0,Chevrolet Colorado,7707,7707,8670,4989,4797,4989,4285,4126,4285,7151,7151,7151,73008,
1,Chevrolet Silverado,40509,40509,45573,55623,53484,55623,41535,39997,41535,38459,38459,38459,529765,
2,Ford F-Series,55276,64478,84043,66302,46260,45672,52314,57321,63164,68259,60418,62496,726003,


In [23]:
#national sales data dictionary to access data later
national_data_dict = {}
for i in national_df.index:
    model = national_df.loc[i]['Model']
    sales = national_df.loc[i]['Total']
    sales = int(sales.replace(",",""))
    national_data_dict[model] = sales
    
print(national_data_dict)

{'Chevrolet Colorado': 73008, 'Chevrolet Silverado': 529765, 'Ford F-Series': 726003, 'Ford Maverick': 13258, 'Ford Ranger': 94755, 'GMC Canyon': 24125, 'GMC Sierra': 248923, 'Honda Ridgeline': 41355, 'Hyundai Santa Cruz': 9634, 'Jeep Gladiator': 89712, 'Nissan Frontier': 60697, 'Nissan Titan': 27406, 'Ram Pickup': 569389, 'Toyota Tacoma': 252490, 'Toyota Tundra': 81959, 'Chevrolet Suburban': 85159, 'Chevrolet Tahoe': 106019, 'Ford Expedition': 81988, 'GMC Yukon': 84243, 'Jeep Wagoneer': 5349, 'Nissan Armada': 22815, 'Toyota Sequoia': 8070, 'Buick Enclave': 42340, 'Chevrolet Blazer': 70323, 'Chevrolet TrailBlazer': 90163, 'Chevrolet Traverse': 116251, 'Dodge Durango': 65936, 'Ford Bronco': 35023, 'Ford Bronco Sport': 108169, 'Ford Edge': 85225, 'Ford Explorer': 219871, 'GMC Acadia': 59913, 'Honda Passport': 53133, 'Honda Pilot': 143062, 'Hyundai Palisade': 86282, 'Hyundai Santa Fe': 112705, 'Jeep Grand Cherokee': 254445, 'Jeep Wrangler': 204610, 'Kia Sorento': 81785, 'Kia Telluride': 9

In [3]:
# set dictionary for counting trim occurances
trim_count = {}
for i in local_df.index:
    trim = local_df.loc[i]['CarTrim']
    if trim in trim_count: 
        trim_count[trim] +=1
    else: 
        trim_count[trim] = 1
        
print(trim_count)

{'EcoBoost': 4, 'EcoBoost Premium': 3, 'GT': 4, 'GT Premium': 3, 'King Ranch': 13, 'Lariat': 21, 'Limited': 14, 'Platinum': 13, 'Plug-in Hybrid': 4, 'Police Inteceptor': 11, 'Hybrid Police Inteceptor': 10, 'Raptor': 4, 'S': 10, 'SE': 19, 'SE ': 1, 'SE Hybrid': 8, 'SEL': 19, 'Shelby GT350': 1, 'Shelby GT500': 1, 'ST': 3, 'Titanium': 10, 'Titanium Hybrid': 6, 'XL': 54, 'XL ': 2, 'XLT': 33}


In [15]:
# set dictionary for counting model occurances
model_count = {}
for i in local_df.index:
    model = local_df.loc[i]['Model']
    if model in model_count: 
        model_count[model] +=1
    else: 
        model_count[model] = 1
        
print(model_count)

{'Mustang': 16, 'F-150': 110, 'F-250': 7, 'F-350': 13, 'F-450': 3, 'Expedition': 6, 'Explorer': 38, 'F-550': 1, 'Fusion': 6, 'Escape': 61, 'EcoSport': 10}


'726,003'

In [35]:
print("Ford F-Series truck sales nationwide compared to at the Ford dealership in Ames: \n")
print(f"The most popular selling Ford vehicle for the Ames dealership is the F-150, and it accounts for {model_count['F-150']/len(local_df.loc[:]) *100 :.2f}% of all sales at the dealership. ")


print(f"The Ford F-Series accounted for {national_data_dict['Ford F-Series']/national_data_dict['All_Vehicle_Sale_Totals'] *100 :.2f}% of sales nation wide. ")

print("\n\nReview Statistics throughout 50 recent tweets about the Ford F-!50: \n")
print(f'% Positive Tweets: {pos_count/len(twitter_df) *100:.2f}% ')
print(f'% Negative Tweets: {neg_count/len(twitter_df) *100:.2f}% ')
print(f'% Negative Tweets: {nt_count/len(twitter_df) *100:.2f}% ')

Ford F-Series truck sales nationwide compared to at the Ford dealership in Ames: 

The most popular selling Ford vehicle for the Ames dealership is the F-150, and it accounts for 40.59% of all sales at the dealership. 
The Ford F-Series accounted for 12.03% of sales nation wide. 


Review Statistics throughout 50 recent tweets about the Ford F-!50: 

% Positive Tweets: 28.00% 
% Negative Tweets: 4.00% 
% Negative Tweets: 68.00% 
