# Telegram Exploratory Data Analysis

### Prerequisites

To ensure the correct usage of this Notebook, please follow the Prerequisites defined in the [project's GitHub](https://github.com/mitryp/SocialScienceFinal) README.

### Questions

Using this Notebook, you can expore the private messages data and chat catalogue, exported from Telegram.
This file contains exploration of the dataset and can provide answers and visualization for the following questions:

1. What is the most common length of messages in the dataset? What are the mean and extremes?
2. How are the hours of your activity distributed?
3. In which hours do you receive the most messages?
4. How the average numbers of messages received and sent by hour are compared?
5. When did the anomalies in messaging pattern occurr?
6. Which days the number of photos sent per day is much higher than usual?
7. Per group, how many private chats with its users do you have?
8. What are the chats with which you are connected more than with average chat?
9. How the data you send is distributed between types?
10. How the chat lenghts (numbers of messages) are distributed?
11. Chats of which length prevail?
12. Who ignores more - you or your companions?
13. Who is more a chatterbox - you or your companions?
14. How often messages are forwarded in chats?
15. How often different users or channels are quoted in your private dialogues?
16. How the top-25 most quoted users in all chats are compared to the quantity of messages they sent to you in private chats?
17. Plot the users you are the most connected with (i.e. with the biggest amount of chats in common).
18. Who are the top 25 users that have sent the longest voice messages?
19. Who are the top 10 users whose voice messages are the largest fraction of their ovarall message quantity?
20. How the numbers of messages sent and received in each month are compared?

> Each of the questions will have its own extended description below.

### Initial setup

#### User-specific constants

In [None]:
# Provide valid paths to the merged csv files generated by the telegram-dialogs-analysis-v2/0_merge_data.ipynb
DIALOGS_MERGED_DATA_PATH = "data/merged_data/dialogs_data_all.csv"
DIALOGS_META_MERGED_DATA_PATH = "data/merged_data/dialogs_users_all.csv"

In [None]:
# Provide your Telegram id
owner_id=0

#### Imports, DataFrames, and essential functions
Below are imports, some functions that will be used throughout this document, and initial dataframes.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import json
import re
from scipy import stats
from math import isnan

def parse(u_json):
    if isinstance(u_json, dict):
        return u_json
    
    try:
        return json.loads(re.sub("'", '"', re.sub('None', 'null', u_json)))
    except json.JSONDecodeError:
        return {}

def id_to_name(ident: int) -> str | None:
    try:
        return df_meta[df_meta['dialog_id'] == ident]['name'].iloc[0]
    except:
        return None
    
def fwd_user_id_by_name(name):    
    if not name:
        return None
    
    try:
        return df_meta[df_meta['users'].notnull() & (df_meta['name'] == name)]['dialog_id'].iloc[0]
    except IndexError:
        return None

def normalize_nans(val, replace_with=0):
    return replace_with if isnan(val) else val

df = pd.read_csv(DIALOGS_MERGED_DATA_PATH)
df_meta = pd.read_csv(DIALOGS_META_MERGED_DATA_PATH)
df_meta['users'] = df_meta['users'].apply(parse)
df_meta = (df_meta[(df_meta['users'] != {}) & df_meta['users'].notnull()])

#### Classifiers
The functions below are used to classify rows of dataframes and will be used soon below.

In [None]:
# Returns either 'long', 'medium' or 'short' depending on the chat length zscore.
# The threshold parameter defines the range in which chats are considered 'medium', being [-threshold, threshold).
# The values below the negative threshold are classified as 'short', and the values above or equal to the positive threshold
# are classified as 'long'.
def classify_chat_by_lng_zscore(chat_length_zscore, threshold: float):
    if chat_length_zscore < -threshold:
        return 'short'
    if chat_length_zscore < threshold:
        return 'medium'
    return 'long'

# Returns either 'I am a chatterbox', 'My companion is a chatterbox', or 'We\'re equally chatterboxes' depending on the 
# 'from_length' and 'to_length' values in the given row.
def classify_chatterbox(row):
    from_length = row['from_length']
    to_length = row['to_length']
    
    if from_length > to_length:
        return 'I am a chatterbox'
    
    if from_length < to_length:
        return 'My companion is a chatterbox'
    
    return 'We\'re equally chatterboxes'

# Returns either 'I am ignored :(', 'I ignore B-)', or 'All ignore the same :P' depending on the 'from_count' and 'to_count'
# values in the given row.
def classify_igorance(row):
    fcnt = row['from_count']
    tcnt = row['to_count']
    
    if fcnt > tcnt:
        return 'I am ignored :('
    
    if fcnt < tcnt:
        return 'I ignore B-)'
    
    return 'All ignore the same :P'

### Pre-flight check
Make sure your data is what the Notebook is expecting.

> The output of the next block should look like "(x, 10) (y, 4)" 

In [None]:
print(df.shape, df_meta.shape)

### Exploration begins

Let us start with more simple visualizations, and then proceed to more complex ones.

### 1. Message length distribution
> What is the most common length of messages in the dataset? What are the mean and extremes?

First of all, let's see how the message lengths are distributed. To achieve this, we create a histogram of message lengths and their frequency on a logarithmic scale, alongside of text stats.

In [None]:
my_msg_lengths = ((df[(df['type'] == 'text') & (df['from_id'].notnull())])['message']).dropna().apply(len)
dfc = df.copy()
dfc['msg_len'] = my_msg_lengths

length_counts = (dfc.groupby(dfc['msg_len']))['msg_len'].count()

print('Average message length: %.2f symbols' % my_msg_lengths.mean())
print('The longest message sent by the owner was of length {0:.0f} symbols, and they sent {2:.0f} of those;'
      '\nThe shortest was only {1:.0f}, and they sent {3:.0f} messages of such a length'.format(my_msg_lengths.max(), my_msg_lengths.min(), length_counts[my_msg_lengths.max()], length_counts[my_msg_lengths.min()]))

len_freq_df = pd.DataFrame({'length': length_counts.index, 'frequency': length_counts}).reset_index()
del len_freq_df['msg_len']

expanded_data = len_freq_df.reindex(len_freq_df.index.repeat(len_freq_df['frequency']))
plt.hist(expanded_data['length'], bins=50, log=True, edgecolor='white')

plt.xlabel('Message length')
plt.ylabel('Frequency')
plt.title('Distribution of message lengths')
plt.show()

### 2. Messaging hours distribution

> How are the hours of user activity distributed?


Let's now visualize the activity hours of the user. Do do so, we will calculate the average number of messages sent at each hour at any given day and then plot a distribution histogram.

In [None]:
# Create a copy of the existing DataFrame and parse datetime data from the 'date' column.
# This operation may take a long time when performed on large datasets or a hadware not powerful enough.
dfc = df.copy()
dfc['date'] = dfc['date'].apply(lambda d: pd.to_datetime(d, format='%Y-%m-%d %H:%M:%S+00:00'))
dfc['hour'] = dfc['date'].dt.hour

In [None]:
sent_per_hour = ((dfc[dfc['from_id'].notnull()]).groupby('hour')).count()['id']
sent_per_hour.plot(kind='bar', width=0.9)

plt.title('My messaging hours distribution')
plt.xlabel('Hour')
plt.ylabel('Messages')
plt.grid(False)
plt.show()

### 3. Companions messaging hour distribution
Let's now proceed to your companions:
> In which hours do you receive the most messages?

To visualize this, we will repeat the exact same process, as at the previous step.
But worry not, as we will actually need both of these results in the next question.

In [None]:
received_per_hour = ((dfc[dfc['to_id'] == owner_id].groupby('hour')).count()['id'])
received_per_hour.plot(kind='bar', width=0.9, color='orange')

plt.title('My companions messaging hours distribution')
plt.xlabel('Hour')
plt.ylabel('Messages')
plt.grid(False)
plt.show()

### 4. Received/sent per hour comparison
> How the average numbers of messages received and sent by hour are compared?

This time, we will reuse the data obtained in the previous blocks to compare the results and find the patterns, if there are any.

We shall plot both series together. Since they posess the same index, it is quite trivial.

In [None]:
x = [str(i) for i in range(24)]
x_axis = np.arange(24)

plt.bar(x_axis - .2, received_per_hour, .4, label='Received', log=True, color='orange')
plt.bar(x_axis + .2, sent_per_hour, .4, label='Sent', log=True, color='#1f77b4')
plt.legend()

x = [str(x) for x in range(24)]
x_axis = np.arange(24)
plt.xticks(x_axis, x, rotation=90)

plt.xlabel('Hour')
plt.ylabel('Message count')
plt.title('Comparison of private messages received to private messages sent for each hour')

plt.show()

### 5. Messaging anomalies
> When did the anomalies in messaging pattern occurr?

Maybe we will be able to identify some important events?

Let's group messages by date and count them, and then use stats.zscore to generate a series with zscores for each date.
Then we will be able to plot the result.

The deviation threshold is set to 2 by default and displayed with a red crossed line.
The mean is displayed with a light green line.

In [None]:
threshold = 2

_data = pd.DataFrame(dfc[dfc['from_id'].notnull()].groupby(dfc['date'].dt.date)['date'].count().rename('count'))
_data['zscore'] = stats.zscore(_data)
_data['zscore'].rename('_zscore').plot()

plt.axhline(y=threshold, color='red', linestyle='--', label='Threshold of %.0f deviations' % threshold)
plt.axhline(y=0, color='lightgreen', label='Mean')
plt.ylim(min(_data['zscore'])-1, max(_data['zscore']+1))
plt.xlabel('Date')
plt.ylabel('Z-score')
plt.legend()
plt.title('Messages compared to mean by day')
plt.show()

### 6. Days with photoshoots
> Which days the number of photos sent per day is much higher than usual?

Let's imagine you are a photographer. You may find Telegram convenient to deliver photos to your clients, as it does not limit the overall space of attachments, like other services do.

We should then be able to find possible days when the photoshoots may occured, finding the deviations in quantities of photos sent per day.

> If you are not a photographer but sometimes take photos with your friends when you hang out together, and then send them the photos, you may be able to find the days you stayed with your peers this way.

In [None]:
photos_per_date = ((dfc[dfc['type'] == 'photo']).groupby(dfc['date'].dt.date))['date'].count().rename('_date')
threshold = 2.5
photos_per_day_scores = stats.zscore(photos_per_date)

most_probable_dates = list(photos_per_day_scores[photos_per_day_scores > threshold].sort_values(ascending=False)[:5].index)
print('The candidate days for photoshoots are:\n%s' % '\n'.join(map(lambda dt: dt.strftime('%d %b %Y'), most_probable_dates)))

photos_per_day_scores.plot()
plt.axhline(y=threshold, color='red', label='Threshold of %.1f deviations' % threshold)
plt.axhline(y=0, color='lightgreen', label='Mean')
plt.ylabel('Z-Score')
plt.xlabel('Date')
plt.legend()
plt.show()

### 7. Private chats with group users
> Per group, how many private chats with its users do you have?

Let's check, how strongly you are connected with members of your groups.

> _In this context, I sometimes refer to a private chat as a connection._

To do this, we shall create a new dataset, relevant to the task. We will also reuse it later again.

In [None]:
# extract values from 'df_meta' into a new dataframe
user_meta = pd.DataFrame({
    'dialog_id': df_meta['dialog_id'],
    'dialog_name': df_meta['name'],
    'type': df_meta['type'],
    'user_id': df_meta['users'].apply(lambda j: j.get('user_id')),
})

Now, let's plot a distribution of connections (people with whom you have a private dialogue) per chat.

To do this, we will use the newly created DataFrame `user_meta` and perform a set intersection, and then group the DataFrame by dialogue and count them.

In [None]:
user_groups = user_meta[user_meta['type'] == 'Group']
private_dialogs = user_meta[user_meta['type'] == 'Private dialog']

# the group members with whom I have private dialogs
connected_group_members = (user_meta[user_meta['user_id'].isin(set(user_groups['user_id']) & set(private_dialogs['user_id']))])
connected_group_members = connected_group_members[connected_group_members['type'] == 'Group']

connections_per_group = (connected_group_members.groupby('dialog_id')['user_id']).count()

connections_per_group.hist(edgecolor='white')
plt.xlabel('Connections per chat')
plt.ylabel('Frequency')
plt.title('Distribution of connections in group chats')
plt.grid(False)
plt.show()

### 8. Chat connections anomalies
> What are the chats with which you are connected more than with average chat?

In this block, we will find the anomalous chats from the previous block using the zscore method, as we have already done in this Notebook.

In [None]:
dfc_meta = df_meta[df_meta['type'] == 'Group'].copy().drop_duplicates(subset='dialog_id').set_index('dialog_id')

zscores = stats.zscore(connections_per_group).rename('score')
cpg_df = pd.DataFrame({'score': zscores, 'name': dfc_meta['name']})
srtd = cpg_df.sort_values('score', ascending=False)
top_names = list(srtd.head(10)['name'])

print('Chats with which you are connected more than on average:\n%s' % '\n'.join(top_names))
srtd.plot(kind='bar', legend=False)
plt.xticks([])
plt.xlabel('')
plt.title('Z-scores')
plt.show()


### 9. Message data type distribution
> How the data you send is distributed between types?

Let's now change our perspective and examine other data that you send.

Firstly, we will display a pie diagram of the data distribution by its type.
We had to make the diagram this large to prevent the data labels from clipping together.

In [None]:
data_frequency_by_type = df.groupby(["type"])["type"].count()
plt.pie(data_frequency_by_type, labels=list(data_frequency_by_type.index), radius=3)
plt.show()

### 10. Chat length distribution
> How the chat lenghts (numbers of messages) are distributed?

It was quite a large diagram! It's better to do some general analysis for a bit, as for us.

Let's see how the dialogue lengths are distributed.

We will group the DataFrame by the dialog and count messages, and then plot them in a histogram.

In [None]:
msgs_per_chat = (df.groupby('dialog_id'))['dialog_id'].count().rename('msg_count')
msgs_per_chat.hist(log=True, bins=100)
plt.xlabel('Message count')
plt.ylabel('Chats frequency')
plt.grid(False)
plt.show()

### 11. Chat length classification
> Chats of which length prevail?

It may be interesting to know whether short, medium or long chats are more common. 

We will use zscore to calculate the scores and set the threshold of the medium chats between `-0.1` and `0.1` deviations from the mean, and then classify them with the function which we created in the Initial setup section.

We expect that the short chats will leave the other types far behind.

In [None]:
clts = stats.zscore(msgs_per_chat).apply(lambda score: classify_chat_by_lng_zscore(score, threshold=0.1)).rename('chat_length_type')
_grouped = clts.groupby(clts).count()
plt.pie(_grouped, labels=_grouped.index, autopct='%1.1f%%')

print('Result:\n%s chats are prevailing' % _grouped.sort_values(ascending=False).index[0])
plt.show()

### 12. Disregard analysis
> Who ignores more - you or your companions?

We can find out, who writes more messages on average - you or people you communicate with.

To classify ignorance, we will use a simple model in which direct comparison by message count is used, meaning that only the dialogues with the same quantity of messages sent and recieved will have the same "level of ignorance".

To answer this question, we will create a new dataframe with number of messages sent and received per chat, and then classify the rows using the "ignorance classifier" from the Initial setup section.

After this, we will group chats by their ignorance and count members of each group.

The final step is to draw a pie diagram.

In [None]:
msg_count_df = pd.DataFrame({
    'chat_type': clts, 
    'msg_count': msgs_per_chat, 
    'from_count': df[df['to_id'] != owner_id].groupby('dialog_id')['dialog_id'].count(),
    'to_count': df[df['to_id'] == owner_id].groupby('dialog_id')['dialog_id'].count(),
})

msg_count_df['from_count'] = msg_count_df['from_count'].apply(normalize_nans)
msg_count_df['to_count'] = msg_count_df['to_count'].apply(normalize_nans)

msg_count_df['ignorance'] = msg_count_df.apply(classify_igorance, axis=1)
_grouped = msg_count_df.groupby('ignorance')['ignorance'].count()

plt.pie(_grouped, labels=_grouped.index, autopct='%1.1f%%')
plt.show()

### 13. Chatterboxes
> Who is more a chatterbox - you or your companions? 

To shed some light on the previous result, we can check whose overall message length is larger on average. This can help us explain the reason for the result of the previous block.

To check this, we will filter only the needed information from the original DataFrame - messages, dialogues and information about the sender. Then we can normalize the input, as some strings in the dataset may be broken, and then group the messages by dialogue and get the overall lengths of both sent and received messages in each chat.

After this is done, we can classify the rows with the "chatterbox classifier", group by the newly-created field and count group entries. Now, we can plot another pie diagram showing the distribution of message lengths.

In [None]:
texts_df = df[df['type'] == 'text'][['id', 'dialog_id', 'from_id', 'to_id', 'message']].copy()
texts_df['message'] = texts_df['message'].apply(lambda m: m if isinstance(m, str) else normalize_nans(m, ''))
texts_df['msg_len'] = texts_df['message'].apply(len)

In [None]:
_length_df = pd.DataFrame({
    'from_length': texts_df[texts_df['to_id'] != owner_id].groupby('dialog_id')['msg_len'].sum(),
    'to_length': texts_df[texts_df['to_id'] == owner_id].groupby('dialog_id')['msg_len'].sum()
})

_length_df['chatterbox_type'] = _length_df.apply(classify_chatterbox, axis=1)
_grouped = _length_df.groupby('chatterbox_type')['chatterbox_type'].count()
plt.pie(_grouped, labels=_grouped.index, autopct='%1.1f%%')
plt.show()

### 14. Forwarded messages per chat
> How often messages are forwarded in chats?

It may be good to know how often you forward messages from other users in your chats.

To check it, we need to filter all forwarded messages and put them into a separate DataFrame first.

In [None]:
fwds = df[df['fwd_from'].notnull()].copy()

Now, we can simply group the messages by dialogue, count them and plot a histogram.

In [None]:
_fwds_per_chat = fwds.groupby('dialog_id')['fwd_from'].count()

_fwds_per_chat.hist(log=True, bins=150)
plt.xlabel('Forwarded messages count')
plt.ylabel('Chats')
plt.show()

Let's now print the average fraction of forwarded messages per chat to support the observations:

In [None]:
msgs_per_dialog = df.groupby('dialog_id').count()
fwds_fractions = msgs_per_dialog['fwd_from'] / msgs_per_dialog['id']

fwds_avg_fraction = fwds_fractions.mean()

print('On average, forwarded messages make up about {:.2f}% of overall messages'.format(fwds_avg_fraction * 100))

_largest_chats = msgs_per_dialog.sort_values('id').tail(3)

print('In the three largest chats, this fraction is {:.2f}%'.format((_largest_chats['fwd_from'].mean() / _largest_chats['id'].mean() * 100)))

### 15. Quoted users/channels
> How often different users or channels are quoted in your private dialogues?

This will be interesting. Let's see how often _different people_ are quoted in your chats.

To do this, we will group the messages by their 'fwd_from' column, which contains the identifier of the forwarded account or channel, and then count the members of each group.

After this, we will be able to plot their distribution histogram.

In [None]:
plt.hist(fwds.groupby('fwd_from')['fwd_from'].count(), log=True, bins=50, edgecolor='white')
plt.xlabel('Quotations')
plt.ylabel('Chats')
plt.title('Quotations per chat distribution')
plt.show()

### 16. 25 top-quoted users
> How the top-25 most quoted users in all chats are compared to the quantity of messages they sent to you in private chats?

Now, let's explore the previous data more and see how the user quotations correlate with their overall message count in your private chats.
This can lead to interesting discoveries.

> Note that as we used Telegram Desktop export capabilities intead of the proposed scripts, this block may work incorrectly when used with the data obtained with the proposed scripts.

To do this, we will need to first find the id of the user this message is forwarded from for each message using the utility function 'fwd_user_id_by_name':

In [None]:
fwds['fwd_from_id'] = fwds['fwd_from'].apply(fwd_user_id_by_name)

Now, we shall filter out the users with whom we don't have private dialogues and count forwarded and overall messages per user.

After this, we can sort the rows by quotations and plot the first 25 of them.

In [None]:
prvt_fwds = fwds[fwds['fwd_from_id'].notnull()].copy()

fwd_and_sent = pd.DataFrame({
    'quotations': prvt_fwds.groupby('fwd_from_id')['fwd_from_id'].count(),
    'overall_msgs': df[df['from_id'].isnull()].groupby('dialog_id')['id'].count(),
})
_df = fwd_and_sent[fwd_and_sent['quotations'].notnull() & fwd_and_sent['overall_msgs'].notnull()].sort_values('quotations').tail(25)
_df.index = _df.index.map(id_to_name)

x = list(_df.index)
x_axis = np.arange(len(x))

plt.bar(x_axis - .2, _df['overall_msgs'], .4, label='Overall msgs sent', log=True)
plt.bar(x_axis + .2, _df['quotations'], .4, label='Quotations', log=True)

plt.xticks(x_axis, x, rotation=90)
plt.xlabel('Contact name')
plt.ylabel('Message count')
plt.legend()
plt.title('Comparison of private messages received from users\nand their quotations in all private chats')
plt.show()

### 17. Users with the most groups in common
> Plot the users you are the most connected with.

In this block, we will find the users, with whom you have the largest amount of groups in common.

To do this, we will use the DataFrame which we defined in the block 7, filter group members, grop by user id and count the common groups.

Then we will sort the series and plot a bar diagram of top-25 users.

In [None]:
connections = user_meta[(user_meta['type'] == 'Group') &
                        (user_meta['user_id'] != owner_id)]\
              .groupby('user_id')['user_id'].count().sort_values()
connections.index = connections.index.map(id_to_name)
connections[connections.index.notnull()].tail(25).plot(kind='bar')
yint = range(0, max(connections+2), 2)
plt.yticks(yint)

plt.title('Users with the biggest amount of chats in common')
plt.xlabel('Contact name')
plt.ylabel('Connections')
plt.show()

### 18. Not conscise (real chatterboxes) users 🥰
> Who are the top 25 users that have sent the longest voice messages?

Now let's relax for a little bit and find users, who sent the longest voice messages and display them sorted by their length.

To do this, we will filter the voice messages, which were not sent by the author themself, group by the dialog id and find the maximum duration of the voice messages.

In [None]:
voices = df[(df['type'] == 'voice') & df['duration'].notnull() & df['from_id'].isnull()]
max_voices = voices.groupby('dialog_id')['duration'].max()
max_voices.index = max_voices.index.map(id_to_name)
max_voices[max_voices.index.notnull()].sort_values().tail(25).plot(kind='bar')

plt.title('Top 25 chatterboxes and the duration of their longest voice messages')
plt.xlabel('Contact name')
plt.ylabel('Voice message duration (sec)')
plt.show()

We can now plot a histogram of voice messsage lengths to support our plot above:

In [None]:
df[(df['type'] == 'voice') &
   df['duration'].notnull() &
   df['from_id'].isnull()]\
 ['duration'].hist(bins=15, log=True, edgecolor='white')

plt.xlabel('Duration')
plt.ylabel('Frequency')
plt.grid(False)
plt.show()

### 19. Disrespectful voice messages correlation
> Who are the top 10 users whose voice messages are the largest fraction of their ovarall message quantity?

Let's visualize the fraction of voice messages per chat.

To do this, we will once again filter voice messages and group them by dialog, but this time we will count the entries of each group instead. 

Then we will find the overall messages sent by each user and divide the series.

After this, we will be able to plot the top-10 users with the largest voice message fractions.

In [None]:
voices_per_user = df[df['from_id'].isnull() & (df['type'] == 'voice')].groupby('dialog_id')['duration'].count()
messages_per_user = df[df['from_id'].isnull()].groupby('dialog_id')['id'].count()

fracts = (voices_per_user / messages_per_user * 100)
fracts = fracts[fracts.notnull()].sort_values()
fracts.index = fracts.index.map(id_to_name)

fracts[fracts.index.notnull()].tail(10).plot(kind='bar')

plt.ylim(0, 100)
plt.xlabel('Contact name')
plt.ylabel('Percentage')
plt.title('Fraction of voice messages of messages sent overall by user')
plt.show()

### 20. Messages at each month
> How the numbers of messages sent and received in each month are compared?

To finish with, let's compare the number of messages sent and received in each month to visualize the historical data.

We will reuse the DataFrame with parsed dates, which we created at the top part of this Notebook, group the rows by year and month, and then count sent and received messages separately. 

Then we will plot them on the same canvas together to find possible correlations.

In [None]:
dfc['year'] = dfc['date'].dt.year
dfc['month'] = dfc['date'].dt.month

sent_per_ym = dfc[dfc['from_id'].notnull()].groupby(['year', 'month'])['id'].count()
received_per_ym = dfc[dfc['from_id'].isnull()].groupby(['year', 'month'])['id'].count()

sent_per_ym.plot(label='Sent')
received_per_ym.plot(label='Received')

plt.title('Messages sent and received at each month')
plt.xlabel('Year, month')
plt.ylabel('Messages')
plt.xticks(rotation=90)
plt.legend()

plt.show()

---

_Thanks for experimenting with this Notebook! We hope this analysis was interesting to conduct on your data!_