# C-More

### Aggregate data with main.py

In [1]:
import pymongo

In [2]:
client = pymongo.MongoClient('mongodb://localhost:27017/')

In [3]:
db = client['rep_analysis_main']

In [4]:
db.list_collection_names()

['data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'data_twitter',
 'sentiment_test',
 'sentiment_daily_test',
 'kw_freq_weight_weekly_test']

We will start by running our sentiment_analysis.py script directly in order to **aggregate the sentiment analysis results**.

#### 1. Aggregate sentiment analysis results per day

We will use the sentiment_test collection as our source of data.

After **running the sentiment_analysis.py script** for the **daily aggregation** ("2022-09-29"), we get:

In [5]:
db.list_collection_names()

['data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'data_twitter',
 'sentiment_test',
 'sentiment_daily_main',
 'sentiment_daily_test',
 'kw_freq_weight_weekly_test']

We have a new collection, **sentiment_daily_main**.

In [8]:
sentiment_daily_main = db['sentiment_daily_main']

In [9]:
for doc in sentiment_daily_main.find():
    print(doc)

{'_id': ObjectId('6353cf57deb6e8aaa03d2042'), 'extracted_at': '2022-09-29', 'year': '2022', 'month': '09', 'week_of_year': '39', 'positive_count': 30, 'negative_count': 20, 'neutral_count': 50}


In [12]:
sentiment_daily_test = db['sentiment_daily_test']

In [13]:
for doc in sentiment_daily_test.find({"extracted_at": {"$eq": "2022-09-29"}}):
    print(doc)

{'_id': ObjectId('635144f3bbc8183bba8b8a1f'), 'extracted_at': '2022-09-29', 'year': '2022', 'month': '09', 'week_of_year': '39', 'positive_count': 30, 'negative_count': 20, 'neutral_count': 50}


The result we get is consistent with our test collection.

After running the script for all the days we have available, we get the following result:

In [19]:
for doc in sentiment_daily_main.find():
    print(doc)

{'_id': ObjectId('6353d5cac877f987b8c9adab'), 'extracted_at': '2022-09-29', 'year': '2022', 'month': '09', 'week_of_year': '39', 'positive_count': 30, 'negative_count': 20, 'neutral_count': 50}
{'_id': ObjectId('6353d5d6dfb5241a922d9301'), 'extracted_at': '2022-09-30', 'year': '2022', 'month': '09', 'week_of_year': '39', 'positive_count': 43, 'negative_count': 21, 'neutral_count': 36}
{'_id': ObjectId('6353d60ab6d703c41b363abd'), 'extracted_at': '2022-10-01', 'year': '2022', 'month': '10', 'week_of_year': '39', 'positive_count': 32, 'negative_count': 25, 'neutral_count': 43}
{'_id': ObjectId('6353d612d63257cf6cc24191'), 'extracted_at': '2022-10-02', 'year': '2022', 'month': '10', 'week_of_year': '40', 'positive_count': 44, 'negative_count': 18, 'neutral_count': 38}
{'_id': ObjectId('6353d61b3fb72a4f4619b4b4'), 'extracted_at': '2022-10-03', 'year': '2022', 'month': '10', 'week_of_year': '40', 'positive_count': 37, 'negative_count': 19, 'neutral_count': 44}
{'_id': ObjectId('6353d62387da

#### 2. Aggregate sentiment analysis results per week

After **running the sentiment_analysis.py script** for the **weekly aggregation ("39")**, we get:

In [20]:
db.list_collection_names()

['data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'sentiment_weekly_main',
 'data_twitter',
 'sentiment_test',
 'sentiment_daily_test',
 'sentiment_daily_main',
 'kw_freq_weight_weekly_test']

We have a new collection, **sentiment_weekly_main**.

In [21]:
sentiment_weekly_main = db['sentiment_weekly_main']

In [22]:
for doc in sentiment_weekly_main.find():
    print(doc)

{'_id': {'year_week': ['2022', '39']}, 'total_positive_count': 105, 'total_negative_count': 66, 'total_neutral_count': 129, 'total_number_of_days': 3}


In [23]:
sentiment_weekly_test = db['sentiment_weekly_test']

In [25]:
for doc in sentiment_weekly_test.find():
    print(doc)

{'_id': {'year_week': ['2022', '39']}, 'total_positive_count': 105, 'total_negative_count': 66, 'total_neutral_count': 129, 'total_number_of_days': 3}
{'_id': {'year_week': ['2022', '40']}, 'total_positive_count': 285, 'total_negative_count': 129, 'total_neutral_count': 286, 'total_number_of_days': 7}
{'_id': {'year_week': ['2022', '41']}, 'total_positive_count': 279, 'total_negative_count': 140, 'total_neutral_count': 281, 'total_number_of_days': 7}
{'_id': {'year_week': ['2022', '42']}, 'total_positive_count': 159, 'total_negative_count': 70, 'total_neutral_count': 142, 'total_number_of_days': 4}


The result we get is consistent with our test collection.

For all the available weeks, we get:

In [26]:
for doc in sentiment_weekly_main.find():
    print(doc)

{'_id': {'year_week': ['2022', '39']}, 'total_positive_count': 105, 'total_negative_count': 66, 'total_neutral_count': 129, 'total_number_of_days': 3}
{'_id': {'year_week': ['2022', '40']}, 'total_positive_count': 285, 'total_negative_count': 129, 'total_neutral_count': 286, 'total_number_of_days': 7}
{'_id': {'year_week': ['2022', '41']}, 'total_positive_count': 279, 'total_negative_count': 140, 'total_neutral_count': 281, 'total_number_of_days': 7}
{'_id': {'year_week': ['2022', '42']}, 'total_positive_count': 159, 'total_negative_count': 70, 'total_neutral_count': 142, 'total_number_of_days': 4}


#### 3. Aggregate sentiment analysis results per month

After **running the sentiment_analysis.py script** for the **monthly aggregation ("09")**, we get:

In [27]:
db.list_collection_names()

['sentiment_monthly_main',
 'data_test',
 'kw_freq_weight_monthly_test',
 'kw_freq_weight_test',
 'kw_freq_weight_annually_test',
 'client_info',
 'sentiment_monthly_test',
 'kw_freq_weight_daily_test',
 'sentiment_annually_test',
 'sentiment_weekly_test',
 'sentiment_weekly_main',
 'data_twitter',
 'sentiment_test',
 'sentiment_daily_test',
 'sentiment_daily_main',
 'kw_freq_weight_weekly_test']

We have a new collection, **sentiment_monthly_main**.

In [28]:
sentiment_monthly_main = db['sentiment_monthly_main']

In [29]:
for doc in sentiment_monthly_main.find():
    print(doc)

{'_id': {'year_month': ['2022', '09']}, 'total_positive_count': 73, 'total_negative_count': 41, 'total_neutral_count': 86, 'total_number_of_days': 2, 'year': '2022'}


In [30]:
sentiment_monthly_test = db['sentiment_monthly_test']

In [31]:
for doc in sentiment_monthly_test.find():
    print(doc)

{'_id': {'year_month': ['2022', '09']}, 'total_positive_count': 73, 'total_negative_count': 41, 'total_neutral_count': 86, 'total_number_of_days': 2, 'year': '2022'}
{'_id': {'year_month': ['2022', '10']}, 'total_positive_count': 755, 'total_negative_count': 364, 'total_neutral_count': 752, 'total_number_of_days': 19, 'year': '2022'}


The result we get is consistent with our test collection.

For all the available months, we get:

In [32]:
for doc in sentiment_monthly_main.find():
    print(doc)

{'_id': {'year_month': ['2022', '09']}, 'total_positive_count': 73, 'total_negative_count': 41, 'total_neutral_count': 86, 'total_number_of_days': 2, 'year': '2022'}
{'_id': {'year_month': ['2022', '10']}, 'total_positive_count': 755, 'total_negative_count': 364, 'total_neutral_count': 752, 'total_number_of_days': 19, 'year': '2022'}
