# Sincerity Tool For Email

Sincerity from one man to another takes human effort.  With digital bots and Generative AI, sincerity in digital communications like email is questionable.  

Sincerity tool is a simple email content checker aim to rate the author sincerity.  

**Method:**
* Program based on user inputs on received email content
* Based on the email text content, using Gen AI to derive the email intent and Gen AI generate another 5 pesudo emails content
* Compare all the pesudo emails with original received email and compute the sincerity index
* Sincerity index = closer 1, likely not generated by Gen AI

**Note:**
* This project is still in experimental phase.  Hence, sincerity index accurancy is questionable.

This project is based on Google's Gemini - [Gemini API: Quickstart with Python](https://ai.google.dev/tutorials/python_quickstart) and [Document Similarity with Examples in Python](https://www.linkedin.com/pulse/document-similarity-examples-python-rany-elhousieny-phd%E1%B4%AC%E1%B4%AE%E1%B4%B0-0i5lc/)

## Setup

### Install the Python SDK

The Python SDK for the Gemini API, is contained in the [`google-generativeai`](https://pypi.org/project/google-generativeai/) package. Install the dependency using pip:

In [1]:
!pip install -q -U google-generativeai

Install HuggingFace sentence transformer for document similarity comparison.  

In [2]:
!pip install sentence-transformers



## Import packages

Import the necessary packages.

In [3]:
import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown

In [4]:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from scipy.spatial.distance import euclidean

from gensim.models import Word2Vec
import numpy as np

In [5]:
def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

Using Google Colab API key storage to securely retieve API key

In [6]:
# Used to securely store your API key
from google.colab import userdata

### Setup your API key

Before you can use the Gemini API, you must first obtain an API key. If you don't already have one, create a key with one click in Google AI Studio.

<a class="button button-primary" href="https://makersuite.google.com/app/apikey" target="_blank" rel="noopener noreferrer">Get an API key</a>

In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name `GOOGLE_API_KEY`.

Once you have the API key, pass it to the SDK. You can do this in two ways:

* Put the key in the `GOOGLE_API_KEY` environment variable (the SDK will automatically pick it up from there).
* Pass the key to `genai.configure(api_key=...)`

You will need a HuggingFace API Key and place in Colab's secrets manager as `HF_TOKEN`.

In [7]:
# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

# Email Content inputs
Please input your email content when prompted.  You could use 'copy and paste' method.

In [8]:
email_content = input("Enter the email body here: ")

Enter the email body here: Subject: Reflecting on Gratitude: A Year of Blessings and Accomplishments  Dear [Recipient],  As the year draws to a close, I find myself reflecting on the journey we've undertaken in 2022. It brings to mind the words from 1 Samuel 7:12, "Thus far the Lord has helped us." These words resonate deeply with me as they encapsulate the essence of our experiences throughout the year. In every triumph, every milestone reached, and every challenge overcome, we have seen the hand of God guiding our steps and granting us success.  In February, we embarked on a journey filled with hope and determination. It was a month marked by the launch of our new program, a testament to our commitment to innovation and excellence. Through God's grace, we were able to navigate the complexities and challenges of implementation, laying a strong foundation for the future.  March brought with it a series of successful events that showcased the talent and dedication of our team. From conf

In [9]:
no_of_words = len(email_content.split())

In [10]:
# Debug
no_of_words

465

## Generate Gen AI emails with user given headers

In [11]:
model = genai.GenerativeModel('gemini-pro')

In [12]:
prompt = "What is the intent on the following email content: /'" + email_content + "/'"

In [13]:
response_content = []
messages = [
    {'role':'user',
     'parts': [prompt]}
]
response = model.generate_content(messages)

print('The intent of the email: ')
to_markdown(response.text)

The intent of the email: 


> **Intent:**
> 
> The email's intent is to express gratitude for the past year's achievements and blessings while reflecting on the challenges overcome. It aims to convey a sense of appreciation and acknowledgment for the role of faith and determination in the sender's journey.
> 
> **Key Points:**
> 
> * Acknowledges the past year's journey and the realization of God's guidance.
> * Highlights specific successes and milestones achieved throughout the year.
> * Expresses gratitude for collaborations, student progress, and personal accomplishments.
> * Emphasizes the importance of faith, perseverance, and resilience.
> * Encourages continued gratitude and hope for the future.

In [14]:
prompt = "Generate an email about " + str(no_of_words) + " words with the following intent: /'" + response.text + "/'"

In [15]:
response_content = []
messages = [
    {'role':'user',
     'parts': [prompt]}
]
response = model.generate_content(messages)
response_content.append(response.text)

#to_markdown(response.text)

In [16]:
prompt2 = "Okay, write another new email about " + str(no_of_words) + " words based on this intent?"

In [17]:
for i in range(1,5):
  messages.append({'role':'model',
                  'parts':[response.text]})

  messages.append({'role':'user',
                  'parts':[prompt2]})

  response = model.generate_content(messages)
  response_content.append(response.text)

#to_markdown(response.text)

In [18]:
# Debug
response_content

['Dear Friends and Family,\n\nAs the curtain falls on another chapter of our lives, and the glimmering lights of a New Year beckon us forward, I find myself enveloped in a profound sense of gratitude and awe. Reflecting upon the past year\'s odyssey, I am filled with an overwhelming realization of God\'s unwavering presence, guiding me through every step of the way.\n\nThroughout this extraordinary year, I have witnessed firsthand the transformative power of faith and determination. Together, we have achieved milestones that once seemed distant and conquered challenges that threatened to overwhelm us. The journey, though arduous at times, has been an enriching tapestry woven with threads of resilience and triumph.\n\nI extend my heartfelt gratitude to each of you who has been a beacon of support and encouragement along this path. Your unwavering belief in my abilities and your constant prayers have been an invaluable source of strength. I am eternally grateful for the collaborations we

# Document Similarity

Taken from [Document Similarity with Examples in Python](https://www.linkedin.com/pulse/document-similarity-examples-python-rany-elhousieny-phd%E1%B4%AC%E1%B4%AE%E1%B4%B0-0i5lc/)

## Cosine Similarity

Cosine similarity is a popular method for measuring the similarity between two documents. It calculates the cosine of the angle between two vectors, which represent the documents in a multi-dimensional space. The cosine value ranges from -1 to 1, where 1 indicates identical documents, and -1 indicates completely dissimilar documents.
https://youtu.be/e9U0QAFbfLI?si=tEG1GpqDkFX05Y_I

In [19]:
average_list = []

In [20]:
# Vectorize the documents
vectorizer = TfidfVectorizer()
ave = 0
for txt in response_content:

  tfidf_matrix = vectorizer.fit_transform([email_content, txt])

  # Calculate cosine similarity
  cosine_sim = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix)
  ave += cosine_sim[0][1]
  print(f"Cosine Similarity: {cosine_sim[0][1]}")

ave = ave / len(response_content)
average_list.append(ave)

Cosine Similarity: 0.7249994869934567
Cosine Similarity: 0.7450535095868107
Cosine Similarity: 0.7248498744961469
Cosine Similarity: 0.7248498744961469
Cosine Similarity: 0.7248498744961469


## Jaccard Similarity

Jaccard similarity measures the similarity between two sets by dividing the size of their intersection by the size of their union. In the context of documents, it compares the sets of words present in each document.

In [21]:
def jaccard_similarity(doc1, doc2):
    words_doc1 = set(doc1.split())
    words_doc2 = set(doc2.split())

    intersection = words_doc1.intersection(words_doc2)
    union = words_doc1.union(words_doc2)

    return len(intersection) / len(union)

ave = 0
for txt in response_content:
  # Calculate Jaccard similarity
  jaccard_sim = jaccard_similarity(email_content, txt)
  ave += jaccard_sim
  print(f"Jaccard Similarity: {jaccard_sim}")

ave = ave / len(response_content)
average_list.append(ave)

Jaccard Similarity: 0.18598382749326145
Jaccard Similarity: 0.1969309462915601
Jaccard Similarity: 0.19489559164733178
Jaccard Similarity: 0.19489559164733178
Jaccard Similarity: 0.19489559164733178


## Euclidean Distance

Euclidean distance is a measure of the straight-line distance between two points in a multi-dimensional space. In document similarity, it is used to measure the distance between the vector representations of the documents. A smaller distance indicates higher similarity.

In [22]:
# Vectorize the documents
vectorizer = CountVectorizer()
ave = 0

for txt in response_content:
  count_matrix = vectorizer.fit_transform([email_content, txt])

  # Convert to dense matrix
  dense_matrix = count_matrix.toarray()

  # Calculate Euclidean distance
  euclidean_dist = euclidean(dense_matrix[0], dense_matrix[1])

  ave += euclidean_dist
  print(f"Euclidean Distance: {euclidean_dist}")

ave = ave / len(response_content)
# Converting Euclidean Distance to Similarity percentage
if ave > 100.0:
  ave = 100.0
ave = (100 - ave)/100
average_list.append(ave)

Euclidean Distance: 34.058772731852805
Euclidean Distance: 33.74907406137241
Euclidean Distance: 44.26059195266146
Euclidean Distance: 44.26059195266146
Euclidean Distance: 44.26059195266146


## Modern and Advanced Methods for Document Similarity

There are more modern and advanced methods for document similarity that have emerged with the advent of deep learning and neural networks. Some of these methods include:

## Word Embeddings (Word2Vec)

Word embeddings are dense vector representations of words, where similar words have similar vectors. Document similarity can be calculated by averaging the embeddings of all words in each document and then computing the cosine similarity between these average vectors.

In [23]:
ave = 0
for txt in response_content:
  # Sample documents
  documents = [email_content, txt]

  # Tokenize documents
  tokenized_docs = [doc.split() for doc in documents]

  # Train Word2Vec model
  model = Word2Vec(sentences=tokenized_docs, vector_size=100, window=5, min_count=1, workers=4)

  # Calculate average vector for each document
  def average_vector(doc):
      return np.mean([model.wv[word] for word in doc if word in model.wv], axis=0)

  doc_vectors = [average_vector(doc) for doc in tokenized_docs]

  # Calculate cosine similarity
  similarity = cosine_similarity([doc_vectors[0]], [doc_vectors[1]])
  ave += similarity[0][0]
  print("Cosine similarity (Word2Vec):", similarity[0][0])

ave = ave / len(response_content)
average_list.append(ave)

Cosine similarity (Word2Vec): 0.8982183
Cosine similarity (Word2Vec): 0.902188
Cosine similarity (Word2Vec): 0.9108893
Cosine similarity (Word2Vec): 0.9108893
Cosine similarity (Word2Vec): 0.9108893


## Sentence and Document Embeddings (BERT)

 Similar to word embeddings, sentence and document embeddings provide dense vector representations for entire sentences or documents. Pre-trained models like BERT (Bidirectional Encoder Representations from Transformers) can be used to generate these embeddings, which can then be used to calculate document similarity.

In [24]:
# Load pre-trained BERT model for sentence embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
ave = 0

for txt in response_content:
  # Sample sentences
  sentences = [email_content, txt]

  # Generate embeddings
  embeddings = model.encode(sentences)

  # Calculate cosine similarity
  similarity = cosine_similarity([embeddings[0]], [embeddings[1]])
  ave += similarity[0][0]
  print("Cosine similarity (BERT):", similarity[0][0])

ave = ave / len(response_content)
average_list.append(ave)

Cosine similarity (BERT): 0.67732185
Cosine similarity (BERT): 0.67600596
Cosine similarity (BERT): 0.6802878
Cosine similarity (BERT): 0.6802878
Cosine similarity (BERT): 0.6802878


# Sincerity Index
1 = 100% Sincere, likely not generated by Geneative AI

Based on average of the following document similarity comparsion:
1. Cosine Similarity
2. Jaccard Similarity
3. Euclidean Distance
4. Word Embeddings (Word2Vec)
5. Sentence and Document Embeddings (BERT)

In [25]:
average_list

[0.7289205240137416,
 0.1935203097453634,
 0.5988207546975808,
 0.9066148519515991,
 0.6788382291793823]

In [26]:
sincerity_index = 1- (sum(average_list)/len(average_list))

In [27]:
print('The Sincerity Index: ', sincerity_index)

The Sincerity Index:  0.3786570660824665
