# C-More

### Aggregate data with main.py

In [1]:
import pymongo

In [2]:
client = pymongo.MongoClient('mongodb://localhost:27017/')

In [3]:
db = client['rep_analysis_main']

In [4]:
db.list_collection_names()

['data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'data_twitter',
 'sentiment_test',
 'sentiment_daily_test',
 'kw_freq_weight_weekly_test']

We will start by running our sentiment_analysis.py script directly in order to **aggregate the sentiment analysis results**.

#### 1. Aggregate sentiment analysis results per day

We will use the sentiment_test collection as our source of data.

After **running the sentiment_analysis.py script** for the **daily aggregation** ("2022-09-29"), we get:

In [5]:
db.list_collection_names()

['data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'data_twitter',
 'sentiment_test',
 'sentiment_daily_main',
 'sentiment_daily_test',
 'kw_freq_weight_weekly_test']

We have a new collection, **sentiment_daily_main**.

In [8]:
sentiment_daily_main = db['sentiment_daily_main']

In [9]:
for doc in sentiment_daily_main.find():
    print(doc)

{'_id': ObjectId('6353cf57deb6e8aaa03d2042'), 'extracted_at': '2022-09-29', 'year': '2022', 'month': '09', 'week_of_year': '39', 'positive_count': 30, 'negative_count': 20, 'neutral_count': 50}


In [12]:
sentiment_daily_test = db['sentiment_daily_test']

In [13]:
for doc in sentiment_daily_test.find({"extracted_at": {"$eq": "2022-09-29"}}):
    print(doc)

{'_id': ObjectId('635144f3bbc8183bba8b8a1f'), 'extracted_at': '2022-09-29', 'year': '2022', 'month': '09', 'week_of_year': '39', 'positive_count': 30, 'negative_count': 20, 'neutral_count': 50}


The result we get is consistent with our test collection.

After running the script for all the days we have available, we get the following result:

In [19]:
for doc in sentiment_daily_main.find():
    print(doc)

{'_id': ObjectId('6353d5cac877f987b8c9adab'), 'extracted_at': '2022-09-29', 'year': '2022', 'month': '09', 'week_of_year': '39', 'positive_count': 30, 'negative_count': 20, 'neutral_count': 50}
{'_id': ObjectId('6353d5d6dfb5241a922d9301'), 'extracted_at': '2022-09-30', 'year': '2022', 'month': '09', 'week_of_year': '39', 'positive_count': 43, 'negative_count': 21, 'neutral_count': 36}
{'_id': ObjectId('6353d60ab6d703c41b363abd'), 'extracted_at': '2022-10-01', 'year': '2022', 'month': '10', 'week_of_year': '39', 'positive_count': 32, 'negative_count': 25, 'neutral_count': 43}
{'_id': ObjectId('6353d612d63257cf6cc24191'), 'extracted_at': '2022-10-02', 'year': '2022', 'month': '10', 'week_of_year': '40', 'positive_count': 44, 'negative_count': 18, 'neutral_count': 38}
{'_id': ObjectId('6353d61b3fb72a4f4619b4b4'), 'extracted_at': '2022-10-03', 'year': '2022', 'month': '10', 'week_of_year': '40', 'positive_count': 37, 'negative_count': 19, 'neutral_count': 44}
{'_id': ObjectId('6353d62387da

#### 2. Aggregate sentiment analysis results per week

After **running the sentiment_analysis.py script** for the **weekly aggregation ("39")**, we get:

In [20]:
db.list_collection_names()

['data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'sentiment_weekly_main',
 'data_twitter',
 'sentiment_test',
 'sentiment_daily_test',
 'sentiment_daily_main',
 'kw_freq_weight_weekly_test']

We have a new collection, **sentiment_weekly_main**.

In [21]:
sentiment_weekly_main = db['sentiment_weekly_main']

In [22]:
for doc in sentiment_weekly_main.find():
    print(doc)

{'_id': {'year_week': ['2022', '39']}, 'total_positive_count': 105, 'total_negative_count': 66, 'total_neutral_count': 129, 'total_number_of_days': 3}


In [23]:
sentiment_weekly_test = db['sentiment_weekly_test']

In [25]:
for doc in sentiment_weekly_test.find():
    print(doc)

{'_id': {'year_week': ['2022', '39']}, 'total_positive_count': 105, 'total_negative_count': 66, 'total_neutral_count': 129, 'total_number_of_days': 3}
{'_id': {'year_week': ['2022', '40']}, 'total_positive_count': 285, 'total_negative_count': 129, 'total_neutral_count': 286, 'total_number_of_days': 7}
{'_id': {'year_week': ['2022', '41']}, 'total_positive_count': 279, 'total_negative_count': 140, 'total_neutral_count': 281, 'total_number_of_days': 7}
{'_id': {'year_week': ['2022', '42']}, 'total_positive_count': 159, 'total_negative_count': 70, 'total_neutral_count': 142, 'total_number_of_days': 4}


The result we get is consistent with our test collection.

For all the available weeks, we get:

In [26]:
for doc in sentiment_weekly_main.find():
    print(doc)

{'_id': {'year_week': ['2022', '39']}, 'total_positive_count': 105, 'total_negative_count': 66, 'total_neutral_count': 129, 'total_number_of_days': 3}
{'_id': {'year_week': ['2022', '40']}, 'total_positive_count': 285, 'total_negative_count': 129, 'total_neutral_count': 286, 'total_number_of_days': 7}
{'_id': {'year_week': ['2022', '41']}, 'total_positive_count': 279, 'total_negative_count': 140, 'total_neutral_count': 281, 'total_number_of_days': 7}
{'_id': {'year_week': ['2022', '42']}, 'total_positive_count': 159, 'total_negative_count': 70, 'total_neutral_count': 142, 'total_number_of_days': 4}


#### 3. Aggregate sentiment analysis results per month

After **running the sentiment_analysis.py script** for the **monthly aggregation ("09")**, we get:

In [27]:
db.list_collection_names()

['sentiment_monthly_main',
 'data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'sentiment_weekly_main',
 'data_twitter',
 'sentiment_test',
 'sentiment_daily_test',
 'sentiment_daily_main',
 'kw_freq_weight_weekly_test']

We have a new collection, **sentiment_monthly_main**.

In [28]:
sentiment_monthly_main = db['sentiment_monthly_main']

In [29]:
for doc in sentiment_monthly_main.find():
    print(doc)

{'_id': {'year_month': ['2022', '09']}, 'total_positive_count': 73, 'total_negative_count': 41, 'total_neutral_count': 86, 'total_number_of_days': 2, 'year': '2022'}


In [30]:
sentiment_monthly_test = db['sentiment_monthly_test']

In [31]:
for doc in sentiment_monthly_test.find():
    print(doc)

{'_id': {'year_month': ['2022', '09']}, 'total_positive_count': 73, 'total_negative_count': 41, 'total_neutral_count': 86, 'total_number_of_days': 2, 'year': '2022'}
{'_id': {'year_month': ['2022', '10']}, 'total_positive_count': 755, 'total_negative_count': 364, 'total_neutral_count': 752, 'total_number_of_days': 19, 'year': '2022'}


The result we get is consistent with our test collection.

For all the available months, we get:

In [32]:
for doc in sentiment_monthly_main.find():
    print(doc)

{'_id': {'year_month': ['2022', '09']}, 'total_positive_count': 73, 'total_negative_count': 41, 'total_neutral_count': 86, 'total_number_of_days': 2, 'year': '2022'}
{'_id': {'year_month': ['2022', '10']}, 'total_positive_count': 755, 'total_negative_count': 364, 'total_neutral_count': 752, 'total_number_of_days': 19, 'year': '2022'}


#### 4. Aggregate sentiment analysis results per year

Note: We had been using the words "yearly" and "annually" for this aggregation. From now on, we will use the word "yearly".

After **running the sentiment_analysis.py script** for the **yearly aggregation ("2022")**, we get:

In [33]:
db.list_collection_names()

['sentiment_monthly_main',
 'data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'sentiment_weekly_main',
 'data_twitter',
 'sentiment_test',
 'sentiment_daily_test',
 'sentiment_daily_main',
 'kw_freq_weight_weekly_test',
 'sentiment_yearly_main']

We have a new collection, **sentiment_yearly_main**.

In [34]:
sentiment_yearly_main = db['sentiment_yearly_main']

In [35]:
for doc in sentiment_yearly_main.find():
    print(doc)

{'_id': {'year': '2022'}, 'total_positive_count': 828, 'total_negative_count': 405, 'total_neutral_count': 838, 'total_number_of_months': 2}


In [36]:
sentiment_annually_test = db['sentiment_annually_test']

In [37]:
for doc in sentiment_annually_test.find():
    print(doc)

{'_id': {'year': '2022'}, 'total_positive_count': 828, 'total_negative_count': 405, 'total_neutral_count': 838, 'total_number_of_months': 2}


The result we get is consistent with our test collection.

We will now run our kw_extraction.py script directly in order to **aggregate the keyword extraction results**.

#### 5. Aggregate keyword extraction results per day

We will use the kw_freq_weight_test collection as our source of data.

After **running the kw_extraction.py script** for the **daily aggregation** ("2022-09-29"), we get:

In [38]:
db.list_collection_names()

['sentiment_monthly_main',
 'data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_daily_main',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'sentiment_weekly_main',
 'data_twitter',
 'sentiment_test',
 'sentiment_daily_test',
 'sentiment_daily_main',
 'kw_freq_weight_weekly_test',
 'sentiment_yearly_main']

We have a new collection, **kw_daily_main**.

In [39]:
kw_daily_main = db['kw_daily_main']

In [40]:
for doc in kw_daily_main.find():
    print(doc)

{'_id': ObjectId('6357bd906c7ea2f2057a1baf'), 'extracted_at': '2022-09-29', 'year': '2022', 'month': '09', 'week_of_year': '39', 'kw_weights': {'burger': 1.5335735314067886, 'burger king': 1.0666416760494113, 'fuck burger king': 0.9999999999999998, 'fries': 0.8823000626564113, 'donut lemon burger': 0.8245616022774844, 'mcdonald’s&gt;&gt;&gt;burger king': 0.7432429121622104, 'french fries': 0.5915399806791025, 'curly fries': 0.4999999999999999, 'charge burger': 0.38047270961771007, 'warren burger': 0.3346138658463873, 'chili cheese burger': 0.3230473423692231, 'flavor fries': 0.3091413057713317, 'dairy queen merger': 0.30513293012948206, 'skanky burger': 0.27287617250061713, 'crazy day': 0.26944415125775256, 'fast food restaurants': 0.26316233918467186, 'wet naps': 0.2614568690422643, 'burger king bruh': 0.24102348054102785, 'local street food': 0.23850857463973751, 'smn burger': 0.22712382749938284, 'sad girl era': 0.2223768923052492, 'money': 0.2167107161732355, 'american burger chees

In [41]:
kw_daily_test = db['kw_freq_weight_daily_test']

In [42]:
for doc in kw_daily_test.find({"extracted_at": {"$eq": "2022-09-29"}}):
    print(doc)

{'_id': ObjectId('63526555bbc8183bba8b8a34'), 'extracted_at': '2022-09-29', 'year': '2022', 'month': '09', 'week_of_year': '39', 'kw_weights': {'burger': 1.5335735314067886, 'burger king': 1.0666416760494113, 'fuck burger king': 0.9999999999999998, 'fries': 0.8823000626564113, 'donut lemon burger': 0.8245616022774844, 'mcdonald’s&gt;&gt;&gt;burger king': 0.7432429121622104, 'french fries': 0.5915399806791025, 'curly fries': 0.4999999999999999, 'charge burger': 0.38047270961771007, 'warren burger': 0.3346138658463873, 'chili cheese burger': 0.3230473423692231, 'flavor fries': 0.3091413057713317, 'dairy queen merger': 0.30513293012948206, 'skanky burger': 0.27287617250061713, 'crazy day': 0.26944415125775256, 'fast food restaurants': 0.26316233918467186, 'wet naps': 0.2614568690422643, 'burger king bruh': 0.24102348054102785, 'local street food': 0.23850857463973751, 'smn burger': 0.22712382749938284, 'sad girl era': 0.2223768923052492, 'money': 0.2167107161732355, 'american burger chees

The result we get is consistent with our test collection.

After running the script for all the days we have available, we get the following result:

In [43]:
for doc in kw_daily_main.find():
    print(doc)

{'_id': ObjectId('6357bd906c7ea2f2057a1baf'), 'extracted_at': '2022-09-29', 'year': '2022', 'month': '09', 'week_of_year': '39', 'kw_weights': {'burger': 1.5335735314067886, 'burger king': 1.0666416760494113, 'fuck burger king': 0.9999999999999998, 'fries': 0.8823000626564113, 'donut lemon burger': 0.8245616022774844, 'mcdonald’s&gt;&gt;&gt;burger king': 0.7432429121622104, 'french fries': 0.5915399806791025, 'curly fries': 0.4999999999999999, 'charge burger': 0.38047270961771007, 'warren burger': 0.3346138658463873, 'chili cheese burger': 0.3230473423692231, 'flavor fries': 0.3091413057713317, 'dairy queen merger': 0.30513293012948206, 'skanky burger': 0.27287617250061713, 'crazy day': 0.26944415125775256, 'fast food restaurants': 0.26316233918467186, 'wet naps': 0.2614568690422643, 'burger king bruh': 0.24102348054102785, 'local street food': 0.23850857463973751, 'smn burger': 0.22712382749938284, 'sad girl era': 0.2223768923052492, 'money': 0.2167107161732355, 'american burger chees

In [44]:
kw_daily_main.count_documents({})

21

#### 6. Aggregate keyword extraction results per week

After **running the kw_extraction.py script** for the **weekly aggregation ("39")**, we get:

In [51]:
db.list_collection_names()

['sentiment_monthly_main',
 'data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_daily_main',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'sentiment_weekly_main',
 'data_twitter',
 'sentiment_test',
 'sentiment_daily_test',
 'sentiment_daily_main',
 'kw_freq_weight_weekly_test',
 'kw_weekly_main',
 'sentiment_yearly_main']

We have a new collection, **kw_weekly_main**.

In [52]:
kw_weekly_main = db['kw_weekly_main']

In [53]:
for doc in kw_weekly_main.find():
    print(doc)

{'_id': {'year_week': ['2022', '39']}, 'kw_weights': {'burger': 0.5353516220953951, 'burger king': 1.0666416760494113, 'fuck burger king': 0.9999999999999998, 'fries': 1.2572951951631912, 'donut lemon burger': 0.8245616022774844, 'mcdonald’s&gt;&gt;&gt;burger king': 0.7432429121622104, 'french fries': 0.7215137172114583, 'curly fries': 0.4999999999999999, 'charge burger': 0.38047270961771007, 'warren burger': 0.3346138658463873, 'chili cheese burger': 0.3230473423692231, 'flavor fries': 0.3091413057713317, 'dairy queen merger': 0.30513293012948206, 'skanky burger': 0.27287617250061713, 'crazy day': 0.26944415125775256, 'fast food restaurants': 0.26316233918467186, 'wet naps': 0.2614568690422643, 'burger king bruh': 0.24102348054102785, 'local street food': 0.23850857463973751, 'smn burger': 0.22712382749938284, 'sad girl era': 0.2223768923052492, 'money': 0.2167107161732355, 'american burger cheese': 0.20980830215244695, 'burger ya kfc ke buns': 0.20867010177093204, 'burger ka sakin': 

In [54]:
kw_weekly_test = db['kw_freq_weight_weekly_test']

In [55]:
for doc in kw_weekly_test.find():
    print(doc)

{'_id': {'year_week': ['2022', '39']}, 'kw_weights': {'burger': 0.5353516220953951, 'burger king': 1.0666416760494113, 'fuck burger king': 0.9999999999999998, 'fries': 1.2572951951631912, 'donut lemon burger': 0.8245616022774844, 'mcdonald’s&gt;&gt;&gt;burger king': 0.7432429121622104, 'french fries': 0.7215137172114583, 'curly fries': 0.4999999999999999, 'charge burger': 0.38047270961771007, 'warren burger': 0.3346138658463873, 'chili cheese burger': 0.3230473423692231, 'flavor fries': 0.3091413057713317, 'dairy queen merger': 0.30513293012948206, 'skanky burger': 0.27287617250061713, 'crazy day': 0.26944415125775256, 'fast food restaurants': 0.26316233918467186, 'wet naps': 0.2614568690422643, 'burger king bruh': 0.24102348054102785, 'local street food': 0.23850857463973751, 'smn burger': 0.22712382749938284, 'sad girl era': 0.2223768923052492, 'money': 0.2167107161732355, 'american burger cheese': 0.20980830215244695, 'burger ya kfc ke buns': 0.20867010177093204, 'burger ka sakin': 

The result we get is consistent with our test collection.

For all the available weeks, we get:

In [56]:
for doc in kw_weekly_main.find():
    print(doc)

{'_id': {'year_week': ['2022', '39']}, 'kw_weights': {'burger': 0.5353516220953951, 'burger king': 1.0666416760494113, 'fuck burger king': 0.9999999999999998, 'fries': 1.2572951951631912, 'donut lemon burger': 0.8245616022774844, 'mcdonald’s&gt;&gt;&gt;burger king': 0.7432429121622104, 'french fries': 0.7215137172114583, 'curly fries': 0.4999999999999999, 'charge burger': 0.38047270961771007, 'warren burger': 0.3346138658463873, 'chili cheese burger': 0.3230473423692231, 'flavor fries': 0.3091413057713317, 'dairy queen merger': 0.30513293012948206, 'skanky burger': 0.27287617250061713, 'crazy day': 0.26944415125775256, 'fast food restaurants': 0.26316233918467186, 'wet naps': 0.2614568690422643, 'burger king bruh': 0.24102348054102785, 'local street food': 0.23850857463973751, 'smn burger': 0.22712382749938284, 'sad girl era': 0.2223768923052492, 'money': 0.2167107161732355, 'american burger cheese': 0.20980830215244695, 'burger ya kfc ke buns': 0.20867010177093204, 'burger ka sakin': 

In [57]:
kw_weekly_main.count_documents({})

4

#### 7. Aggregate keyword extraction results per month

After **running the kw_extraction.py script** for the **mothly aggregation ("09")**, we get:

In [58]:
db.list_collection_names()

['sentiment_monthly_main',
 'data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_daily_main',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'sentiment_weekly_main',
 'data_twitter',
 'sentiment_test',
 'sentiment_daily_test',
 'sentiment_daily_main',
 'kw_freq_weight_weekly_test',
 'kw_weekly_main',
 'sentiment_yearly_main',
 'kw_monthly_main']

We have a new collection, **kw_monthly_main**.

In [59]:
kw_monthly_main = db['kw_monthly_main']

In [60]:
for doc in kw_monthly_main.find():
    print(doc)

{'_id': {'year_month': ['2022', '09']}, 'kw_weights': {'burger': 0.7907877685576852, 'burger king': 1.0666416760494113, 'fuck burger king': 0.9999999999999998, 'fries': 1.2572951951631912, 'donut lemon burger': 0.8245616022774844, 'mcdonald’s&gt;&gt;&gt;burger king': 0.7432429121622104, 'french fries': 0.7215137172114583, 'curly fries': 0.4999999999999999, 'charge burger': 0.38047270961771007, 'warren burger': 0.3346138658463873, 'chili cheese burger': 0.3230473423692231, 'flavor fries': 0.3091413057713317, 'dairy queen merger': 0.30513293012948206, 'skanky burger': 0.27287617250061713, 'crazy day': 0.26944415125775256, 'fast food restaurants': 0.26316233918467186, 'wet naps': 0.2614568690422643, 'burger king bruh': 0.24102348054102785, 'local street food': 0.23850857463973751, 'smn burger': 0.22712382749938284, 'sad girl era': 0.2223768923052492, 'money': 0.2167107161732355, 'american burger cheese': 0.20980830215244695, 'burger ya kfc ke buns': 0.20867010177093204, 'burger ka sakin':

In [61]:
kw_monthly_test = db['kw_freq_weight_monthly_test']

In [62]:
for doc in kw_monthly_test.find():
    print(doc)

{'_id': {'year_month': ['2022', '09']}, 'kw_weights': {'burger': 0.7907877685576852, 'burger king': 1.0666416760494113, 'fuck burger king': 0.9999999999999998, 'fries': 1.2572951951631912, 'donut lemon burger': 0.8245616022774844, 'mcdonald’s&gt;&gt;&gt;burger king': 0.7432429121622104, 'french fries': 0.7215137172114583, 'curly fries': 0.4999999999999999, 'charge burger': 0.38047270961771007, 'warren burger': 0.3346138658463873, 'chili cheese burger': 0.3230473423692231, 'flavor fries': 0.3091413057713317, 'dairy queen merger': 0.30513293012948206, 'skanky burger': 0.27287617250061713, 'crazy day': 0.26944415125775256, 'fast food restaurants': 0.26316233918467186, 'wet naps': 0.2614568690422643, 'burger king bruh': 0.24102348054102785, 'local street food': 0.23850857463973751, 'smn burger': 0.22712382749938284, 'sad girl era': 0.2223768923052492, 'money': 0.2167107161732355, 'american burger cheese': 0.20980830215244695, 'burger ya kfc ke buns': 0.20867010177093204, 'burger ka sakin':

The result we get is consistent with our test collection.

For all the available months, we get:

In [63]:
for doc in kw_monthly_main.find():
    print(doc)

{'_id': {'year_month': ['2022', '09']}, 'kw_weights': {'burger': 0.7907877685576852, 'burger king': 1.0666416760494113, 'fuck burger king': 0.9999999999999998, 'fries': 1.2572951951631912, 'donut lemon burger': 0.8245616022774844, 'mcdonald’s&gt;&gt;&gt;burger king': 0.7432429121622104, 'french fries': 0.7215137172114583, 'curly fries': 0.4999999999999999, 'charge burger': 0.38047270961771007, 'warren burger': 0.3346138658463873, 'chili cheese burger': 0.3230473423692231, 'flavor fries': 0.3091413057713317, 'dairy queen merger': 0.30513293012948206, 'skanky burger': 0.27287617250061713, 'crazy day': 0.26944415125775256, 'fast food restaurants': 0.26316233918467186, 'wet naps': 0.2614568690422643, 'burger king bruh': 0.24102348054102785, 'local street food': 0.23850857463973751, 'smn burger': 0.22712382749938284, 'sad girl era': 0.2223768923052492, 'money': 0.2167107161732355, 'american burger cheese': 0.20980830215244695, 'burger ya kfc ke buns': 0.20867010177093204, 'burger ka sakin':

In [64]:
kw_monthly_main.count_documents({})

2

#### 8. Aggregate keyword extraction results per year

After **running the kw_extraction.py script** for the **yearly aggregation ("2022")**, we get:

In [65]:
db.list_collection_names()

['sentiment_monthly_main',
 'data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_daily_main',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'sentiment_weekly_main',
 'data_twitter',
 'sentiment_test',
 'kw_yearly_main',
 'sentiment_daily_test',
 'sentiment_daily_main',
 'kw_freq_weight_weekly_test',
 'kw_weekly_main',
 'sentiment_yearly_main',
 'kw_monthly_main']

We have a new collection, **kw_yearly_main**.

In [66]:
kw_yearly_main = db['kw_yearly_main']

In [67]:
for doc in kw_yearly_main.find():
    print(doc)

{'_id': {'year': '2022'}, 'kw_weights': {'burger': 0.7385264739990225, 'burger king': 0.9617323402823511, 'fuck burger king': 0.9999999999999998, 'fries': 0.9795852387999127, 'donut lemon burger': 0.8245616022774844, 'mcdonald’s&gt;&gt;&gt;burger king': 0.7432429121622104, 'french fries': 1.1037716798194288, 'curly fries': 0.27585116572656965, 'charge burger': 0.38047270961771007, 'warren burger': 0.3346138658463873, 'chili cheese burger': 0.3230473423692231, 'flavor fries': 0.3091413057713317, 'dairy queen merger': 0.30513293012948206, 'skanky burger': 0.27287617250061713, 'crazy day': 0.26944415125775256, 'fast food restaurants': 0.26316233918467186, 'wet naps': 0.2614568690422643, 'burger king bruh': 0.24102348054102785, 'local street food': 0.23850857463973751, 'smn burger': 0.22712382749938284, 'sad girl era': 0.2223768923052492, 'money': 0.12798420473033972, 'american burger cheese': 0.20980830215244695, 'burger ya kfc ke buns': 0.20867010177093204, 'burger ka sakin': 0.207609429

In [68]:
kw_yearly_test = db['kw_freq_weight_annually_test']

In [69]:
for doc in kw_yearly_test.find():
    print(doc)

{'_id': {'year': '2022'}, 'kw_weights': {'burger': 0.7385264739990225, 'burger king': 0.9617323402823511, 'fuck burger king': 0.9999999999999998, 'fries': 0.9795852387999127, 'donut lemon burger': 0.8245616022774844, 'mcdonald’s&gt;&gt;&gt;burger king': 0.7432429121622104, 'french fries': 1.1037716798194288, 'curly fries': 0.27585116572656965, 'charge burger': 0.38047270961771007, 'warren burger': 0.3346138658463873, 'chili cheese burger': 0.3230473423692231, 'flavor fries': 0.3091413057713317, 'dairy queen merger': 0.30513293012948206, 'skanky burger': 0.27287617250061713, 'crazy day': 0.26944415125775256, 'fast food restaurants': 0.26316233918467186, 'wet naps': 0.2614568690422643, 'burger king bruh': 0.24102348054102785, 'local street food': 0.23850857463973751, 'smn burger': 0.22712382749938284, 'sad girl era': 0.2223768923052492, 'money': 0.12798420473033972, 'american burger cheese': 0.20980830215244695, 'burger ya kfc ke buns': 0.20867010177093204, 'burger ka sakin': 0.207609429

The result we get is consistent with our test collection.