<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# Gmail - List most intersting emails
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/Gmail/Gmail_Get_emails_stats_by_sender.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/Open_in_Naas_Lab.svg"/></a><br><br><a href="https://github.com/jupyter-naas/awesome-notebooks/issues/new?assignees=&labels=&template=template-request.md&title=Tool+-+Action+of+the+notebook+">Template request</a> | <a href="https://github.com/jupyter-naas/awesome-notebooks/issues/new?assignees=&labels=bug&template=bug_report.md&title=Gmail+-+Get+emails+stats+by+sender:+Error+short+description">Bug report</a> | <a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/Naas/Naas_Start_data_product.ipynb" target="_parent">Generate Data Product</a>

**Tags:** #gmail #productivity #naas_drivers #operations #automation #analytics #plotly

**Author:** [Antonio Georgiev](www.linkedin.com/in/antonio-georgiev-b672a325b)

**Description:** This notebook analyses users' inbox and extracts a list of senders that the user is most interested in, depending on the user's opening rate of the sender's emails in the last two weeks.

## Input

### Import libraries

In [2]:
import naas
from naas_drivers import email
import pandas as pd
import numpy as np
import plotly.express as px

### Setup Variables
Create an application password following [this procedure](https://support.google.com/mail/answer/185833?hl=en)
- `username`: This variable stores the username or email address associated with the email account
- `password`: This variable stores the password or authentication token required to access the email account
- `smtp_server`: This variable represents the SMTP server address used for sending emails.
- `box`: This variable stores the name or identifier of the mailbox or folder within the email account that will be accessed.

In [3]:
username = "florent@naas.ai"
password = naas.secret.get("GMAIL_APP_PASSWORD")
smtp_server = "imap.gmail.com"
box = "INBOX"

## Model

### Connect to email box

In [4]:
emails = email.connect(username, password, username, smtp_server)

In [5]:
emails

<naas_drivers.tools.email.Email at 0x7f9e7c0a3190>

### Get email list

In [6]:
df_emails = emails.get()

In [9]:
df_emails.head(1)

Unnamed: 0,uid,subject,from,to,cc,bcc,reply_to,date,text,html,flags,headers,size_rfc822,size,obj,attachments
0,8369,Re: [jupyter-naas/awesome-notebooks] YahooFina...,"{'email': 'notifications@github.com', 'name': ...",[{'email': 'awesome-notebooks@noreply.github.c...,"[{'email': 'florent@naas.ai', 'name': 'Florent...",[],[{'email': 'reply+ALOOVTNLRCS3YARVDWKY7IOCQLTY...,2023-05-30 01:34:13-07:00,json =\r\nmeta: \r\n title: my finance assist...,"<p></p>\r\n<p dir=""auto"">json =<br>\r\nmeta:<b...","(SEEN,)","{'delivered-to': ('florent@naas.ai',), 'receiv...",8129,7981,"[Delivered-To, Received, X-Google-Smtp-Source,...",0


In [14]:
for index, row in df_emails.iterrows():
    uid = row["uid"]
    print(index, uid)
    break

0 8369


In [15]:
for row in df_emails.itertuples():
    uid = row.uid
    print(index, uid)
    break

0 8369


### Filter emails for the last two weeks

In [11]:
two_weeks_ago = pd.Timestamp.now() - pd.DateOffset(weeks=2)
df_emails['date'] = pd.to_datetime(df_emails['date'])
filtered_emails = df_emails[df_emails['date'] >= two_weeks_ago]
print("Emails fetched:", len(filtered_emails))

ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

### Fetch the message IDs for the filtered emails (message IDs contain information regarding the email's state (eg "Opened"))

In [None]:
message_ids = filtered_emails['MessageId'].tolist()

## Check if each email in the list has been opened

In [None]:
opened_senders = []
for message_id in message_ids:
    response = emails._mailbox.get_message_history(message_id)
    history = response['history']
    for event in history:
        if 'message' in event and 'labelsAdded' in event['message']:
            labels_added = event['message']['labelsAdded']
            if any(label['labelId'] == 'OPENED' for label in labels_added):
                sender = filtered_emails.loc[filtered_emails['MessageId'] == message_id, 'From'].iloc[0]
                opened_senders.append(sender)

## Get most viewed senders by counting the occurencies

In [None]:
most_viewed_senders = pd.Series(opened_senders).value_counts().head(10)

## Output

### Print the list with the most viewed senders

In [None]:
print("Most viewed senders (based on opened emails) for the last 2 weeks:")
print(most_viewed_senders)