### Script Purpose

Find the top-5 brands from the forum messages by calculating frequency counts. 

For each brand, the mention is counted only once per post.

In [1]:
import pandas as pd
import datetime

In [2]:
comments = pd.read_csv('forum_comments_2022.csv')
comments.head()

Unnamed: 0,date,user_id,message
0,"November 30, 2021 10:59AM",stickguy,\nJust watched this. Seems like GTI is better ...
1,"November 30, 2021 11:09AM",explorerx4,"\nStickguy,Did you see the video of TFL testin..."
2,"November 30, 2021 11:14AM",oldfarmer50,\nqbrozen said: kyfdx said: qbrozen said:wonde...
3,"November 30, 2021 11:16AM",tjc78,"\nLooking at the Lexus deal, I wonder it I wou..."
4,"November 30, 2021 11:17AM",oldfarmer50,\nstickguy said:Beautiful example of one of th...


In [79]:
# Extract only 2022 data
comments['date'] = pd.to_datetime(comments['date']) 

comments = comments[ (comments['date'] >= pd.Timestamp(2022,1,1)) & (comments['date'] < pd.Timestamp(2023,1,1))]

comments.head()

Unnamed: 0,date,user_id,message
1380,2022-01-01 03:53:00,benjaminh,\nAlthough the acceleration times in most car ...
1381,2022-01-01 03:57:00,tjc78,\n\n@stickguy said:\nXM only comes on it if yo...
1382,2022-01-01 04:08:00,tjc78,\n\n@qbrozen said:\nI don’t have the room for ...
1383,2022-01-01 04:48:00,graphicguy,\nCongrats @stickguy ……very cool!
1384,2022-01-01 06:03:00,au1994,\nHappy New Year all!Congrats @stickguy. I rea...


In [84]:
len(comments)

13503

In [80]:
# Extract the message column
messages = comments['message'].tolist()
messages[:5]

['\nAlthough the acceleration times in most car magazines are somewhat bogus, they are still useful for comparison. Anyway, Car and Driver has just tested a Maverick 2.0 awd, and they got a 0-60 time of 5.9, which is not that far from a base BMW 330i at 5.6. Anyway, for the money the Maverick is a fast vehicle.https://www.caranddriver.com/reviews/a38516737/2022-ford-maverick-xlt-fx4-by-the-numbers/stickguy: Did you end up going for the 0% financing? If so, those payments of c. 750 a month are steep, but the equity will obviously build up very quickly. Even right now you could probably sell it for several thousand more than what you bought it for. The latest I heard on the chip shortage is that it will only slowly abate in 2022. According to one article I saw, things aren\'t likely to truly get back to "normal," whatever that is, probably until the second half of 2023. If true, this will likely mean that there won\'t be a sudden collapse of car prices. With all the lost production and p

In [3]:
brands = pd.read_csv('car_companies.csv')
brands.head()

Unnamed: 0,Make
0,SNVI
1,Zanella
2,Koller
3,Anasagasti
4,AutoLatina


In [43]:
# Extract the brand names
brands = brands['Make'].tolist()

# Convert to lowercase
brands = list(map(str.lower,brands))

brands[:5]

['snvi', 'zanella', 'koller', 'anasagasti', 'autolatina']

In [None]:
freqDict = dict(zip(brands, [0]*len(brands)))

# For each message
for i in range(len(messages)):
    # For every word in the message
    text = set(messages[i].lower().replace('\n','').split(" "))
    for w in text:
        if w in brands:
            freqDict[w] += 1

In [82]:
newDict = dict((k, v) for k, v in freqDict.items() if v > 0)
print(newDict)

{'steyr': 1, 'brooks': 3, 'dennis': 1, 'dynasty': 1, 'passport': 2, 'russell': 8, 'aero': 5, 'alpine': 6, 'bugatti': 3, 'peugeot': 8, 'renault': 1, 'alpina': 1, 'apollo': 1, 'audi': 190, 'bitter': 5, 'bmw': 369, 'fuso': 4, 'man': 64, 'mercedes-benz': 4, 'opel': 33, 'porsche': 33, 'smart': 50, 'volkswagen': 6, 'nag': 2, 'bet': 97, 'force': 17, 'tvs': 7, 'premier': 8, 'standard': 120, 'cts': 4, 'ducati': 3, 'ferrari': 11, 'fiat': 34, 'iso': 1, 'lamborghini': 1, 'maserati': 8, 'zagato': 3, 'bertone': 4, 'fca': 2, 'rapid': 3, 'acura': 156, 'daihatsu': 1, 'dome': 3, 'honda': 240, 'infiniti': 64, 'isuzu': 8, 'lexus': 69, 'mazda': 76, 'nissan': 131, 'subaru': 107, 'suzuki': 12, 'toyota': 279, 'datsun': 2, 'eunos': 1, 'stellantis': 11, 'buddy': 46, 'think': 1927, 'delta': 20, 'star': 16, 'yugo': 4, 'genesis': 27, 'hyundai': 134, 'kia': 81, 'daewoo': 3, 'samsung': 7, 'seat': 199, 'micro': 10, 'polestar': 46, 'saab': 12, 'martini': 1, 'ac': 43, 'bentley': 8, 'jaguar': 3, 'lotus': 4, 'mclaren': 2

In [85]:
sorted_brands = sorted(newDict.items(), key=lambda x:x[1], reverse=True)
sorted_dict = dict(sorted_brands)

print(sorted_dict)

{'think': 1927, 'ford': 472, 'local': 443, 'bmw': 369, 'toyota': 279, 'jeep': 274, 'white': 243, 'honda': 240, 'seat': 199, 'audi': 190, 'tesla': 169, 'mini': 159, 'acura': 156, 'hyundai': 134, 'nissan': 131, 'standard': 120, 'subaru': 107, 'bet': 97, 'rivian': 83, 'kia': 81, 'gm': 79, 'ram': 78, 'pilot': 78, 'mazda': 76, 'lexus': 69, 'man': 64, 'infiniti': 64, 'chrysler': 55, 'smart': 50, 'buddy': 46, 'polestar': 46, 'national': 45, 'cadillac': 44, 'google': 44, 'ac': 43, 'lincoln': 39, 'dodge': 36, 'buick': 35, 'fiat': 34, 'opel': 33, 'porsche': 33, 'continental': 33, 'austin': 29, 'genesis': 27, 'saturn': 25, 'delta': 20, 'cutting': 20, 'pontiac': 19, 'force': 17, 'moon': 17, 'star': 16, 'king': 14, 'chevrolet': 13, 'oldsmobile': 13, 'suzuki': 12, 'saab': 12, 'ferrari': 11, 'stellantis': 11, 'rover': 11, 'micro': 10, 'cord': 10, 'russell': 8, 'peugeot': 8, 'premier': 8, 'maserati': 8, 'isuzu': 8, 'bentley': 8, 'tvs': 7, 'samsung': 7, 'gmc': 7, 'brush': 7, 'plymouth': 7, 'alpine': 6,

"THINK', 'LOCAL', and 'WHITE' are some common words that people use in their sentences, it is highly likely that they got misclassified as a car brand.

Additionaly, 'TH!NK' has filed bankrupty in 2011, "LOCAL" also shut down their factory in the beginning of 2022, and "WHITE" is an old brand back in the 1980s.

It is safe to say people were not referring to the brand "TH!NK", "LOCAL" or "WHITE" when they used these threee words.

So the actual top-5 brand should be <b>Ford, BMW, Toyota, Jeep, and Honda</b>.