## Problem statement: 
Connecting users with relevant content gets challenging as the size of data increases.
This code provides two search possibilities through BM25 and Universal Sentence Encoder.
BM25 is the algorithm that is behind solutions such as SOLR. In a nutshell, tt is more of a keyword driven search. There is a nice implementation of BM25 in Python, courtesy of https://github.com/dorianbrown/rank_bm25

Universal Sentence Encoder provides more of semantic understanding of a given text and allows for searches that may be semantically related.

We will perform searches as part of keyword search, and also search for document simility.


## Data
Data is from the Enron email dataset. We will use it to demonstrate capability to search through a given text and get relavant search results.
Any data can be used, with a caveat that Universal Sentence Encoder embeddings 'delute' as the size of a given text grows. 

The original dataset can be found at https://www.cs.cmu.edu/~enron/
This code will use Kaggle's CSV version, which has done a great job of consolidating all the emails into one file. This CSV can be found at https://www.kaggle.com/wcukierski/enron-email-dataset?select=emails.csv

In [None]:
#import libraries we may need
import pandas as pd
import os, sys, email
from sklearn.utils import shuffle
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import re
import seaborn as sns
from nltk import sent_tokenize
import nltk
nltk.download('punkt')
from tqdm import tqdm
tqdm.pandas()
from sklearn.metrics.pairwise import cosine_similarity

  import pandas.util.testing as tm


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [None]:
!gdown --id 1gY0cz4O7YwiQOMPUMbTw24kC_SCU3iZl --output enron_emails.zip

Downloading...
From: https://drive.google.com/uc?id=1gY0cz4O7YwiQOMPUMbTw24kC_SCU3iZl
To: /content/enron_emails.zip
375MB [00:03, 114MB/s]


In [None]:
#unzip the data
!unzip enron_emails.zip

Archive:  enron_emails.zip
replace emails.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: emails.csv              


In [None]:
emails_df = pd.read_csv('emails.csv')

In [None]:
#due to resource constraints, look at only a sample
#use random_state to ensure the reproducibility of the examples.
#emails_df = shuffle(emails_df).copy()
emails_df = emails_df.sample(n=100000, random_state=1).copy()

In [None]:
## Helper functions - thanks to Zichen Wang for this code
def get_text_from_email(msg):
    '''To get the content from email objects'''
    parts = []
    for part in msg.walk():
        if part.get_content_type() == 'text/plain':
            parts.append( part.get_payload() )
    return ''.join(parts)

def split_email_addresses(line):
    '''To separate multiple email addresses'''
    if line:
        addrs = line.split(',')
        addrs = frozenset(map(lambda x: x.strip(), addrs))
    else:
        addrs = None
    return addrs

In [None]:
# Parse the emails into a list email objects
messages = list(map(email.message_from_string, emails_df['message']))
emails_df.drop('message', axis=1, inplace=True)
# Get fields from parsed email objects
keys = messages[0].keys()
for key in keys:
    emails_df[key] = [doc[key] for doc in messages]
# Parse content from emails
emails_df['content'] = list(map(get_text_from_email, messages))
# Split multiple email addresses
emails_df['From'] = emails_df['From'].map(split_email_addresses)
emails_df['To'] = emails_df['To'].map(split_email_addresses)

# Extract the root of 'file' as 'user'
emails_df['user'] = emails_df['file'].map(lambda x:x.split('/')[0])
del messages

emails_df.head()

Unnamed: 0,file,Message-ID,Date,From,To,Subject,Mime-Version,Content-Type,Content-Transfer-Encoding,X-From,X-To,X-cc,X-bcc,X-Folder,X-Origin,X-FileName,content,user
186822,jones-t/all_documents/634.,<17820178.1075846925335.JavaMail.evans@thyme>,"Tue, 4 Jan 2000 08:20:00 -0800 (PST)",(tana.jones@enron.com),(alicia.goodrow@enron.com),Re: Dinner,1.0,text/plain; charset=us-ascii,7bit,Tana Jones,Alicia Goodrow,,,\Tanya_Jones_Dec2000\Notes Folders\All documents,JONES-T,tjones.nsf,"It would be nice if you could be at my dinner, since I probably won't know \nanyone else. Anytime you want to go to lunch to check on the house status, \nI'd be glad to go...",jones-t
308790,mann-k/all_documents/5690.,<29110382.1075845717882.JavaMail.evans@thyme>,"Tue, 15 May 2001 11:03:00 -0700 (PDT)",(kay.mann@enron.com),(sheila.tweed@enron.com),Re: Override letter,1.0,text/plain; charset=us-ascii,7bit,Kay Mann,Sheila Tweed,,,\Kay_Mann_June2001_1\Notes Folders\All documents,MANN-K,kmann.nsf,"Absolutely. \n\n\nFrom: Sheila Tweed@ECT on 05/15/2001 06:02 PM\nTo: Kay Mann/Corp/Enron@ENRON\ncc: \n\nSubject: Re: Override letter \n\nGood point! Can Peter start to draft an override letter?\n\n\n\n\tKay Mann@ENRON\n\t05/15/2001 05:55 PM\n\t\t \n\t\t To: pthompson@akllp.com\n\t\t cc: Sheila Tweed/HOU/ECT@ECT, Roseann Engeldorf/Enron@EnronXGate, Scott \nDieball/ENRON_DEVELOPMENT@ENRON_DEVELOPMENt, John G \nRigby/ENRON_DEVELOPMENT@ENRON_DEVELOPMENT\n\t\t Subject: Override letter\n\nAs a reminder to all of us, we will need a form override letter to go with \nthe form turbine contract. \n\nKay\n\n\n\n",mann-k
82383,dasovich-j/sent/423.,<6812040.1075843194135.JavaMail.evans@thyme>,"Thu, 28 Sep 2000 08:59:00 -0700 (PDT)",(jeff.dasovich@enron.com),(christine.piesco@oracle.com),Teams,1.0,text/plain; charset=us-ascii,7bit,Jeff Dasovich,Christine.Piesco@oracle.com,,,\Jeff_Dasovich_Dec2000\Notes Folders\Sent,DASOVICH-J,jdasovic.nsf,"Christine:\n\nMy apologies. My schedule melted down after we talked on Monday. Here's \nwhere folks came out. There's some concern about size. We're supposed to be \nno larger than 3, but we lobbied Aceves and he apparently Ok'd our \n""oversized"" group. The other folks in the group--who talked to him \noriginally--are pretty sure that five will violate the rules. Folks wondered \nif there were other groups that are smaller than ours that you could hook up \nwith. Sorry about that---it's a wrinkle that I didn't think about when we \nspoke. If it gets real ugly trying to find a smaller group, let me know. \nFortunately there's not another team case due for two weeks.\n\nBest,\nJeff",dasovich-j
227299,kaminski-v/var/63.,<21547648.1075856642126.JavaMail.evans@thyme>,"Mon, 9 Oct 2000 01:23:00 -0700 (PDT)",(tanya.tamarchenko@enron.com),(vince.kaminski@enron.com),Re: FYI: UK Var issues,1.0,text/plain; charset=us-ascii,7bit,Tanya Tamarchenko,Vince J Kaminski,,,\Vincent_Kaminski_Jun2001_5\Notes Folders\Var,Kaminski-V,vkamins.nsf,"Vince, \nUK VAR breached the limit last week.\nUK traders asked us to review the correlations across UK gas and power as \nwell as the correlations across EFA slots.\nWe did part of the work last week.\nNow we'll update the correlations based on historical prices.\n\nTanya.\n\n\n\n\nRichard Lewis\n10/08/2000 07:31 AM\nTo: Tanya Tamarchenko/HOU/ECT@ECT\ncc: Oliver Gaylard/LON/ECT@ECT, James New/LON/ECT@ECT, Steven \nLeppard/LON/ECT@ECT, Rudy Dautel/HOU/ECT@ECT, Kirstee Hewitt/LON/ECT@ECT, \nNaveen Andrews/Corp/Enron@ENRON, David Port/Market Risk/Corp/Enron@ENRON, Ted \nMurphy/HOU/ECT@ECT, Simon Hastings/LON/ECT@ECT, Paul D'Arcy/LON/ECT@ECT, Amir \nGhodsian/LON/ECT@ECT \nSubject: Re: VaR correlation scenarios \n\nThanks Tanya, these are interesting results. I am on vacation next week, so \nhere are my current thoughts. I am contactable on my mobile if necessary.\n\nGas to power correlations\n I see your point about gas to power correlation only affecting VAR for the \ncombined gas and power portfolio, and this raises an interesting point: At a \nconservative 30% long term correlation, combined VAR is o1mm less than \npreviously expected - so how does this affect the limit breach? Strictly \nspeaking, we are still over our UK power limit, but the limit was set when we \nwere assuming no gas power correlation and therefore a higher portfolio VAR. \n\nA suggested way forward given the importance of the spread options to the UK \nGas and Power books- \ncan we allocate to the gas and power books a share of the reduction in \nportfolio VAR - ie [Reduction = Portfolio VAR - sum(Power VAR + Gas VAR)]?\n\nAlso, if I understand your mail correctly, Matrix 1 implies 55% gas power \ncorrelation is consistent with our correlation curves, and this reduces total \nVAR by o1.8mm.\n\nEFA slot correlations\nThe issue of whether our existing EFA to EFA correlation matrix is correct is \na separate issue. I don't understand where the Matrix 2 EFA to EFA \ncorrelations come from, but I am happy for you to run some historical \ncorrelations from the forward curves (use the first 2 years, I would \nsuggest). Our original matrix was based on historicals, but the analysis is \nworth doing again. Your matrix 2 results certainly indicate how important \nthese correlations are.\n\nClosing thoughts\nFriday's trading left us longer so I would not expect a limit breach on \nMonday. We are still reviewing the shape of the long term curve, and I'd \nlike to wait until both Simon Hastings and I are back in the office (Monday \nweek) before finalising this.\n \nRegards\n\nRichard\n \n\n\n\nTanya Tamarchenko\n06/10/2000 22:59\nTo: Oliver Gaylard/LON/ECT@ECT, Richard Lewis/LON/ECT@ECT, James \nNew/LON/ECT@ECT, Steven Leppard/LON/ECT@ECT, Rudy Dautel/HOU/ECT@ECT, Kirstee \nHewitt/LON/ECT@ECT, Naveen Andrews/Corp/Enron@ENRON, David Port/Market \nRisk/Corp/Enron@ENRON, Ted Murphy/HOU/ECT@ECT\ncc: \n\nSubject: Re: VaR correlation scenarios \n\nEverybody,\nOliver sent us the VAR number for different correlations for UK-Power \nportfolio separately from UK-Gas portfolio.\n\nFirst, if VAR is calculated accurately the correlation between Power and Gas \ncurves should not affect VAR number for Power and VAR number for Gas, only \nthe aggregate number will be affected. The changes you see are due to the \nfact that we use Monte-Carlo simulation method,\nwhich accuracy depends on the number of simulations. Even if we don't change \nthe correlations but use different realizations of random numbers,\nwe get slightly different result from the model.\n\nSo: to see the effect of using different correlations between Gas and Power \nwe should look at the aggregate number.\n\nI calculated weighted correlations based on 2 curves I got from Paul. As the \nweights along the term structure I used the product of price, position and \nvolatility for each time bucket for Gas and each of EFA slots. The results \nare shown below:\n\n\nInserting these numbers into the original correlation matrix produced \nnegatively definite correlation matrix, which brakes VAR engine. \nCorrelation matrix for any set of random variables is non-negative by \ndefinition, and remains non-negatively definite if calculated properly based \non any historical data.\nHere, according to our phone discussion, we started experimenting with \ncorrelations, assuming the same correlation for each EFA slot and ET Elec \nversus Gas. I am sending you the spreadsheet which summaries the results. In \naddition to the aggregate VAR numbers for the runs Oliver did, you can see \nthe VAR numbers based on correlation Matrix 1 and Matrix 2. In Matrix 1 the \ncorrelations across EFA slots are identical to these in original matrix.\nI obtained this matrix by trial and error. Matrix 2 is produces by Naveen \nusing Finger's algorithm, it differs from original matrix across EFA slots as \nwell\nas in Power versus Gas correlations and gives higher VAR than matrix 1 does. \n\nConcluding: we will look at the historical forward prices and try to \ncalculate historical correlations from them.\n\nTanya.\n\n\n\n\nOliver Gaylard\n10/06/2000 01:50 PM\nTo: Richard Lewis/LON/ECT@ECT, James New/LON/ECT@ECT, Steven \nLeppard/LON/ECT@ECT, Rudy Dautel/HOU/ECT@ECT, Kirstee Hewitt/LON/ECT@ECT, \nNaveen Andrews/Corp/Enron@ENRON, Tanya Tamarchenko/HOU/ECT@ECT, David \nPort/Market Risk/Corp/Enron@ENRON\ncc: \nSubject: VaR correlation scenarios\n\nThe results were as follows when changing the gas/power correlations:\n\nCorrelation VaR-UK Power book VaR- UK Gas book\n 0.0 o10.405MM o3.180MM\n 0.1 o10.134MM o3.197MM\n 0.2 o10.270MM o3.185MM\n 0.3 o10.030MM o3.245MM\n 0.4 Cholesky decomposition failed (Not positive definite)\n 0.5 Cholesky decomposition failed (Not positive definite)\n 0.6 Cholesky decomposition failed (Not positive definite)\n 0.7 Cholesky decomposition failed (Not positive definite)\n 0.8 Cholesky decomposition failed (Not positive definite)\n 0.9 Cholesky decomposition failed (Not positive definite)\n 1.0 Cholesky decomposition failed (Not positive definite)\n \nPeaks and off peaks were treated the same to avoid violating the matrix's \nintegrity. \n\nInteresting to note that for a higher correlation of 0.2 the power VaR \nincreases which is counter to intuition. This implies that we need to look \ninto how the correlations are being applied within the model. Once we can \nderive single correlations from the term structure, is the next action to \nunderstand how they are being applied and whether the model captures the P+L \nvolatility in the spread option deals.\n\nFrom 0.4 onwards the VaR calculation failed.\n\nOliver\n\n \n\n\n\n\n\n\n\n",kaminski-v
301824,mann-k/_sent_mail/3208.,<12684200.1075846107179.JavaMail.evans@thyme>,"Fri, 13 Oct 2000 01:50:00 -0700 (PDT)",(kay.mann@enron.com),"(lisa.bills@enron.com, ben.jacoby@enron.com)",Change Order #5--Pleasanton Transformer,1.0,text/plain; charset=us-ascii,7bit,Kay Mann,"Lisa Bills, Ben Jacoby",,,\Kay_Mann_June2001_4\Notes Folders\'sent mail,MANN-K,kmann.nsf,"Any problems/comments?\n---------------------- Forwarded by Kay Mann/Corp/Enron on 10/13/2000 08:43 \nAM ---------------------------\n\n\nDale Rasmussen@ECT\n10/12/2000 07:17 PM\nTo: Don Hammond/PDX/ECT@ECT, Jody Blackburn/PDX/ECT@ECT, Kay \nMann/Corp/Enron@Enron, Kathleen Clark/ENRON_DEVELOPMENT@ENRON_DEVELOPMENT\ncc: Ed Clark/PDX/ECT@ECT, Alan Larsen/PDX/ECT@ECT \n\nSubject: Change Order #5--Pleasanton Transformer\n\nA redraft of Change Order #5 to the LM 6000 contract to provide for the \npurchase of a GE transformer for the Pleasanton project is attached. \n\nPlease note that I have added some guaranty/LD provisions in Exhibit A, and \nconfirm whether these are appropriate in scope and amount.\n\nWe will attach the documents included here as separate files as Exhibits B \nand C. I understand that GE is to provide a more current copy of their \nstandard specifications for the transformer to be included in the Exhibits.\n\n\nKathleen: Kay tells me that you are the keeper of change orders. \nCongratulations!! The discussion of current amount balances, etc. in the \nattached are clearly wrong if we are up to CO#5--it was all based on this \nbeing the third CO. Can you please let me know what the right numbers should \nbe?\n\nThanks in advance for your input. I understand there is some urgency with \ngetting this out.\n\n\n",mann-k


In [None]:
#check our work so far
print(emails_df.shape)
print(emails_df.describe())

(100000, 18)
                                     file  \
count                              100000   
unique                             100000   
top     stokley-c/chris_stokley/sent/360.   
freq                                    1   

                                           Message-ID  \
count                                          100000   
unique                                         100000   
top     <10635759.1075861580948.JavaMail.evans@thyme>   
freq                                                1   

                                         Date                  From  \
count                                  100000                100000   
unique                                  76699                 10153   
top     Wed, 27 Jun 2001 16:02:00 -0700 (PDT)  (kay.mann@enron.com)   
freq                                      225                  3264   

                            To Subject Mime-Version  \
count                    95699  100000        99992   
unique   

##Universal Sentence Encoder
Paper describing the model: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46808.pdf

In [None]:
#used for display only
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)

In [None]:
module_url = "https://tfhub.dev/google/universal-sentence-encoder/4" #@param ["https://tfhub.dev/google/universal-sentence-encoder/4", "https://tfhub.dev/google/universal-sentence-encoder-large/5"]
model = hub.load(module_url)
print ("module %s loaded" % module_url)

module https://tfhub.dev/google/universal-sentence-encoder/4 loaded


In [None]:
#helper function to get the USE embeddings
def embed(input):
    return model(input)

def embed_keywords(input):
    return model([input])

In [None]:
#let's do few experiments.Since USE tends to 'water down' the embeddings as text gets longer, we will embed each sentence individually. 
#But, let's also see what happens if we do the embeddings on the entire text
emails_df['processed_sentence'] = emails_df['content'].apply(sent_tokenize)
print("Running embeddings for each sentence within a given text....")
emails_df['sentence_embeddings'] = emails_df['processed_sentence'].progress_map(embed)
print("Running embeddings for an entire text....")
emails_df['whole_text_embeddings'] = emails_df['content'].progress_map(embed_keywords)

  0%|          | 0/100000 [00:00<?, ?it/s]

Running embeddings for each sentence within a given text....


100%|██████████| 100000/100000 [13:30<00:00, 123.40it/s]
  0%|          | 14/100000 [00:00<12:05, 137.78it/s]

Running embeddings for an entire text....


100%|██████████| 100000/100000 [07:57<00:00, 209.32it/s]


In [None]:
#take a look at one of the document embeddings
print(emails_df['processed_sentence'].iloc[0])
print("Shape of the 1st document embeddings vector: ", emails_df['sentence_embeddings'].iloc[0].shape)
print("Embeddings for each sentence of a text: ", emails_df['sentence_embeddings'].iloc[0])
print("Embeddings for the whole text: ", emails_df['whole_text_embeddings'].iloc[0])

["It would be nice if you could be at my dinner, since I probably won't know \nanyone else.", "Anytime you want to go to lunch to check on the house status, \nI'd be glad to go..."]
Shape of the 1st document embeddings vector:  (2, 512)
Embeddings for each sentence of a text:  tf.Tensor(
[[-2.8037254e-02 -1.0244286e-02  2.9761788e-02 ...  8.4245134e-05
   1.2398518e-02  3.3947281e-03]
 [ 7.2494321e-02 -4.8069630e-02 -3.7942678e-02 ...  1.5248727e-02
   3.7127510e-02  3.7425518e-02]], shape=(2, 512), dtype=float32)
Embeddings for the whole text:  tf.Tensor(
[[ 0.02080704 -0.07532615 -0.03094116 -0.04379633 -0.01084704 -0.00278797
  -0.04071044  0.01106039  0.05870884 -0.08559489 -0.02174503 -0.05142517
   0.04234329 -0.01064559 -0.11033544 -0.03897433 -0.06515872  0.08520376
  -0.01921097  0.00274822  0.03649516 -0.02962157  0.07895856 -0.02449481
   0.00526776  0.01947003  0.00713179 -0.02251233  0.02954272  0.03544835
   0.02535996  0.05734557  0.05706366  0.00299956  0.01652213  0.10

In [None]:
#need to optimize - e.g. use batch score comparisons, look at https://www.tensorflow.org/hub/tutorials/semantic_similarity_with_tf_hub_universal_encoder

def find_similarities_sentence_level (search_query, search_target=emails_df):
  #get similarity between search query and each document
  doc_num = 0
  similarity_scores = np.zeros ((len(search_target['sentence_embeddings'])))
  for document in search_target['sentence_embeddings']:
    sent_num = 0
    max_similarity = -1
    for sentence in document:
      cosine_sim = cosine_similarity(np.array(search_query).reshape(1,-1), np.array(sentence).reshape(1,-1))
      if (cosine_sim > max_similarity):
        max_similarity = cosine_sim
      sent_num += 1
    similarity_scores[doc_num] = max_similarity
    doc_num += 1
  return similarity_scores

In [None]:
def find_similarities_whole_text_level (search_query, search_target=emails_df):
  #get similarity between search query and each document
  doc_num = 0
  similarity_scores = np.zeros ((len(search_target['whole_text_embeddings'])))
  for document in search_target['whole_text_embeddings']:
    similarity_scores[doc_num] = cosine_similarity(np.array(search_query).reshape(1,-1), np.array(document).reshape(1,-1))
    doc_num += 1
  return similarity_scores

In [None]:
def get_top_n(scores, n=5):
  top_n = np.argsort(scores)[::-1][:n]
  return [i for i in top_n]

In [None]:
def search_engine(query_data, encoding_level = find_similarities_sentence_level):
  query_data.columns = ['query_text']
  query_data_np = np.array(query_data)
  results = []
  
  for line in query_data_np:
    query = line[0]
    query_embeddings = pd.Series(query).apply(embed_keywords)
    
    #get the similarity scores
    document_similarity_scores = query_embeddings.apply(encoding_level)
    scores = document_similarity_scores[0]
    #look at only the searches with higher relevance scores
    top_n = get_top_n(scores, n=10)
    #result_df = emails_all_df[['Message-ID', 'From', 'To', 'Subject', 'content']].iloc[top_n]
    result_df = emails_df[['content']].iloc[top_n]
    
    if len(result_df) > 0:
      relevance_score = scores[top_n]
      result_df['query'] = query
      result_df['Relevance Score'] = relevance_score
      results.append(result_df.values.tolist())

  flat = [x for sublist in results for x in sublist]
  #column_names = ['Message-ID', 'From', 'To', 'Subject', 'content', 'query', 'Relevance Score']
  column_names = ['content', 'query', 'Relevance Score']
  query_matches = pd.DataFrame(data = flat, columns = column_names)

  return query_matches

In [None]:
data = {'query_text': ['accounting practices']}
sample_query_df = pd.DataFrame(data=data, columns=['query_text'])
sample_query_df

Unnamed: 0,query_text
0,accounting practices


In [None]:
res = search_engine(sample_query_df, encoding_level = find_similarities_sentence_level)

In [None]:
display(res)

Unnamed: 0,content,query,Relevance Score
0,"Gareth: Per your voice mail. Marcus thought this was closing next week. I \nwill be out on Monday and Tuesday and will have to find a backup. Let me \nknow if you have further info. Thanks. Sara\n----- Forwarded by Sara Shackleton/HOU/ECT on 11/14/2000 08:37 AM -----\n\n\tTreasa Kirby\n\t11/14/2000 08:15 AM\n\t\t \n\t\t To: Sara Shackleton/HOU/ECT@ECT\n\t\t cc: Brenda L Funk/HOU/ECT@ECT, Marcus Von Bock Und Polach/LON/ECT@ECT\n\t\t Subject: TD crude prepay\n\nSara,\nIn response to your questions below:\n\n1. Rod Nelson will be handling credit as the deal is now being executed by \nENA.\n\n2. Tax is being handled by Steve Jacobson in the London office ( he is our US \ntax person )\n\n3. Accounting is Matt Landy\n\n4. The confirmations are being handled by Kim Theriot and John Wilson is the \ntrader who will be signing on behalf of ENA. We have a call with them today \nat 3.00pm UK time so if there is any change I will let you know.\n\n5. We are still waiting for the term sheet / confirms from TD and are \nexpecting them this afternoon. I will forward them to you asap.\n\n6. TD have proposed using the 98 ISDA which I believe is already in place \nwith ENA and TD Texas.\n\n7. This is just one transaction with 3 individual wti swaps, two of which are \nwith Enron, Enron/TD , TD/Morgan Stanley, Morgan Stanley/ Enron\n\nI will send you the transaction diagram from TD and the ISDA agreement.\n\nIf there is anything else you need please feel free to call me on 0207 783 \n5404\n\nRegards\n\n\n\n---------------------- Forwarded by Treasa Kirby/LON/ECT on 14/11/2000 13:35 \n---------------------------\n\n\nMarcus Von Bock Und Polach\n14/11/2000 11:36\nTo: Treasa Kirby/LON/ECT@ECT\ncc: \n\nSubject: TD crude prepay\n\nas discussed\n---------------------- Forwarded by Marcus Von Bock Und Polach/LON/ECT on \n14/11/2000 11:40 ---------------------------\nFrom: Sara Shackleton on 13/11/2000 11:54 CST\nTo: Marcus Von Bock Und Polach/LON/ECT@ECT\ncc: Gareth Bahlmann/HOU/ECT@ECT \n\nSubject: TD crude prepay\n\nMarcus: Can you please let me know what you need.\nWho is handling credit? tax? accounting?\nWho is booking the confirmations? Have the confirms been drafted or \nnegotiated? Any changes? How many transactions?\n\nSara\n\n\n\n",accounting practices,0.79362
1,"----- Forwarded by Sara Shackleton/HOU/ECT on 12/11/2000 09:46 AM -----\n\n\tTreasa Kirby\n\t11/14/2000 08:15 AM\n\t\t \n\t\t To: Sara Shackleton/HOU/ECT@ECT\n\t\t cc: Brenda L Funk/HOU/ECT@ECT, Marcus Von Bock Und Polach/LON/ECT@ECT\n\t\t Subject: TD crude prepay\n\nSara,\nIn response to your questions below:\n\n1. Rod Nelson will be handling credit as the deal is now being executed by \nENA.\n\n2. Tax is being handled by Steve Jacobson in the London office ( he is our US \ntax person )\n\n3. Accounting is Matt Landy\n\n4. The confirmations are being handled by Kim Theriot and John Wilson is the \ntrader who will be signing on behalf of ENA. We have a call with them today \nat 3.00pm UK time so if there is any change I will let you know.\n\n5. We are still waiting for the term sheet / confirms from TD and are \nexpecting them this afternoon. I will forward them to you asap.\n\n6. TD have proposed using the 98 ISDA which I believe is already in place \nwith ENA and TD Texas.\n\n7. This is just one transaction with 3 individual wti swaps, two of which are \nwith Enron, Enron/TD , TD/Morgan Stanley, Morgan Stanley/ Enron\n\nI will send you the transaction diagram from TD and the ISDA agreement.\n\nIf there is anything else you need please feel free to call me on 0207 783 \n5404\n\nRegards\n\n\n\n---------------------- Forwarded by Treasa Kirby/LON/ECT on 14/11/2000 13:35 \n---------------------------\n\n\nMarcus Von Bock Und Polach\n14/11/2000 11:36\nTo: Treasa Kirby/LON/ECT@ECT\ncc: \n\nSubject: TD crude prepay\n\nas discussed\n---------------------- Forwarded by Marcus Von Bock Und Polach/LON/ECT on \n14/11/2000 11:40 ---------------------------\nFrom: Sara Shackleton on 13/11/2000 11:54 CST\nTo: Marcus Von Bock Und Polach/LON/ECT@ECT\ncc: Gareth Bahlmann/HOU/ECT@ECT \n\nSubject: TD crude prepay\n\nMarcus: Can you please let me know what you need.\nWho is handling credit? tax? accounting?\nWho is booking the confirmations? Have the confirms been drafted or \nnegotiated? Any changes? How many transactions?\n\nSara\n\n\n\n",accounting practices,0.79362
2,"Questions:\n\nWhat does being short/long a position mean?\nWhat is curve shift?\nWhat affects curve shift?\nWhat are the basic components of a P&L?\nWhat is a swap?\nWhat are different types of swaps?\nWhat is basis?\nWhat is MTM value?\nWhat is Accrual value?\nWhat do you consider stress in the workplace? How do you handle stress? \nExamples of a stressful situation?\nWhat do you consider long work hours?\n\nSome aren't technical. I don't have any problem solving questions.\n\nDG\n---------------------- Forwarded by Darron C Giron/HOU/ECT on 12/27/2000 \n11:15 AM ---------------------------\n \n\t\n\t\n\tFrom: William Kelly 12/27/2000 11:00 AM\n\t\n\nTo: Jeffrey C Gossett/HOU/ECT@ECT\ncc: David Baumbach/HOU/ECT@ECT, Darron C Giron/HOU/ECT@ECT, Kam \nKeiser/HOU/ECT@ECT, Errol McLaughlin/Corp/Enron@ENRON \nSubject: Technical Questions\n\nI also do not have many tech questions. I look for trainability as well. \n\nMy questions:\n-What do you know about the commodities market?\n-Do you do any online trading, if so what, and why - looking for trading of \noptions, stocks, futs, etc for more difficult follow-up questions like\n why options, not buy/sell the underlying. How do you determine what to \ntrade and when, how long do you hold it. Also looking for how often they \nplay on the internet and whether they do it at work. \n-What do you know about the industry? How many units in a contract? How do \nyou hedge physical gas? Financial positions? \n-What is V@R? Not looking for a technical answer, just if they are familiar \nwith the concept? What other trading controls do they expect to be in place?\n-How well do you handle ridicule? (Always looking for a certain central \ntrader's new lackey)\n-How well versed are you in Excel? What have you done where you are using \nexcel to improve your work product? How did you learn it? Have you shared \nit with anyone else? What other improvements have you made? Would you give \nme a name of a person I could verify the impact this has had in your current \norganization? (I don't ever call, I just am looking to see if there answer \nis the truth and what kind of ego they have) \n-Further educational goals? MBA, CFA, - looking for goals and motivators.\n-Are you easily incited to violence?\n\nWK\n\n\n---------------------- Forwarded by William Kelly/HOU/ECT on 12/27/2000 10:42 \nAM ---------------------------\n \n\tEnron Capital Management\n\t\n\tFrom: David Baumbach 12/27/2000 09:49 AM\n\t\n\nTo: Jeffrey C Gossett/HOU/ECT@ECT\ncc: Kam Keiser/HOU/ECT@ECT, William Kelly/HOU/ECT@ECT, Darron C \nGiron/HOU/ECT@ECT, Errol McLaughlin/Corp/Enron@ENRON \nSubject: Technical Questions\n\nI do not have very many technical questions but here a the few that I have \nused:\n\n- Do you understand what curveshift is? If so ... Price today was $10.20 and \nyesterday it was $10.13 ... what was my curveshift?\n- What makes up a P&L statement?\n- What is MTM accounting? accrual accounting?\n- If they have knowledge of options, what is a delta position?\n\nI guess these are more general analytical questions:\n\n- Give me an example of an problem you've had and how you solved it?\n- The ""eight ball"" question (basic problem solving)\n- The ""light switch"" question (advanced problem solving)\n\nWhen I interview I am looking for trainability, better than average problem \nsolving skills and a desire to work in this environment. If the candidate \nhas these skills then I believe we can teach them what they need to know. I \nalso think it is candidates with these basic attributes that are a better fit \nfor us. They know they don't know everything (about Risk) and work hard to \nget up to speed. \n\nDave\n\n\n\n",accounting practices,0.718619
3,Accounting Magic. Am I good or what?,accounting practices,0.703172
4,"Dance partner?\n\n -----Original Message-----\nFrom: \tGaffney, Chris \nSent:\tFriday, December 14, 2001 2:54 PM\nTo:\tDorland, Chris\nSubject:\tRE: Critical Applications, Models and Data\n\nPerfect. Was your brother disappointed that we had to let his dance partner go earlier this week?\n\n -----Original Message-----\nFrom: \tDorland, Chris \nSent:\tFriday, December 14, 2001 2:20 PM\nTo:\tGaffney, Chris\nSubject:\tRE: Critical Applications, Models and Data\n\nAll the trading and marketing stuff is in I/Newco/Trading and I/Newco/Marketing and the master description file in I/Newco has descriptions of their contents.\n\nChris\n\n -----Original Message-----\nFrom: \tGaffney, Chris \nSent:\tFriday, December 14, 2001 11:58 AM\nTo:\tDorland, Chris; Hedstrom, Peggy; Scott, Laura; Zufferli, John; Le Dain, Eric; Davies, Derek; Keohane, Peter; Milnthorp, Rob; Devries, Paul; Gillis, Brian\nCc:\tBrodeur, Stephane; Clark, Chad; Richey, Cooper; Taylor, Michael J.; Taylor, Fabian; Draper, Lon; Cooke, Ian; Torres, Carlos; Lambie, Chris; Cowan, Mike; Watt, Ryan; Drozdiak, Dean; Hrap, Gerry; Oh, Grant; Laporte, Nicole; Dunsmore, Paul; Macphee, Mike; Biever, Jason; Otto, Randy; Johnston, Greg; Cappelletto, Nella; Dorland, Dan; Wilson, Jan; Rondeau, Clayton; Martin, Brad\nSubject:\tRE: Critical Applications, Models and Data\n\nChris - In order that we are able to set up the transfer of files in an orderly fashion and, more importantly, maintain security protections for certain directories (e.g. legal) I believe a better approach is to replicate our existing directory tree structure under a new I:/Newco/ directory. At this level we can then break things out into the required subfolders (e.g. accounting, trading, legal, ...). \n\nIn this regard we have set up the I:/Newco/ directory and started to create subfolders. I ask everyone who has put information in the I:/trading/newco directory to move it to I:/Newco/ under their applicable subdirectory (which may need to be created). I apologize that this is a hassle for those who have already transfered files.\n\nIf you have any concerns please let me know.\n\nRegards\nChris\n\n -----Original Message-----\nFrom: \tDorland, Chris \nSent:\tThursday, December 13, 2001 10:58 AM\nTo:\tHedstrom, Peggy; Scott, Laura; Zufferli, John; Le Dain, Eric; Davies, Derek; Keohane, Peter; Milnthorp, Rob; Devries, Paul; Gillis, Brian\nCc:\tBrodeur, Stephane; Clark, Chad; Richey, Cooper; Taylor, Michael J.; Taylor, Fabian; Draper, Lon; Cooke, Ian; Torres, Carlos; Lambie, Chris; Cowan, Mike; Watt, Ryan; Drozdiak, Dean; Hrap, Gerry; Oh, Grant; Laporte, Nicole; Dunsmore, Paul; Macphee, Mike; Biever, Jason; Otto, Randy; Johnston, Greg; Gaffney, Chris; Cappelletto, Nella; Dorland, Dan; Wilson, Jan; Rondeau, Clayton; Martin, Brad\nSubject:\tCritical Applications, Models and Data\n\nHi everyone!\n\nOnce again I find myself being tasked by Geof Storey. Houston has requested that we make lists of all applications that we use and also that we consolidate all of the files (models and data) that are critical to our business (I don't think MP3's count) into one directory on the network. This is so that these items can be included in the proposed deal with newco. The Calgary Trading group has set up a directory on I:/trading/newco and I think if each group did the same that would be great. Each group should probably have one person in charge of the directory so there isn't any overlap. Call me with any ?'s/\n\nThanx \n\nChris",accounting practices,0.642239
5,Hi!\n\n These sales to Torch are valid. Accounting is going to \ncorrect. If you do not see this happen soon - please let me know.\n\n \n \n Thank You!,accounting practices,0.596024
6,"The following position has become available in Volume Management reporting to \nChris Stokley. There are 2 positions available.\n\nEssential Functions:\n\nResponsible for preparing and reconciling estimated and actual service costs \nrelated to doing business in the California market with the ISO (Independent \nSystem Operator). Responsibilities will increase as other markets open.\nResponsible for ensuring all disputes between EPMI and the ISO are \ndocumented, submitted and brought to resolution prior to the close of the ISO \ndispute period.\nResponsible for explaining variance between EPMI estimated costs vs. ISO \nactualized costs.\nResponsible for performing a control function to ensure internal systems are \nkept in sync.\nAnticipate and respond to customer (internal and external) requests, \ninquiries, and investigate and resolve issues in a timely basis.\nIdentify and recommend opportunities to re-engineer processes and procedures.\nProvide a liaison function between Logistics (Scheduling), Power Settlements \nand Accounting.\n\nJob Requirements:\n\nCollege degree or equivalent work experience.\nStrong accounting skills.\nShould possess strong technical knowledge of the Power Industry.\nMust possess strong analytical skills.\nHeavy attention to detail and strong organizational capabilities.\nMust possess excellent oral, written, and interpersonal skills.\nPC proficiency, including Microsoft Work, Advanced Excel, and Access.\n\nSpecial Job Characteristics:\n\nMust be highly motivated. Self-starter with ability to recognize and solve \nproblems.\nOvertime may be required.\nSome travel may be required.\n\nChris/Murray will be holding a brown bag lunch on Tuesday, March 6, 2001 in \nMt. Hood to discuss this opportunity. Interested applicants should advise \nAmy FitzPatrick no later than close of business on Thursday, March 8, 2001.\n\n\n",accounting practices,0.588061
7,"REMINDER - INFO SESSION TODAY AT 11:30 AM in MT HOOD.\n---------------------- Forwarded by Amy FitzPatrick/PDX/ECT on 03/06/2001 \n12:11 PM ---------------------------\n\n\nAmy FitzPatrick\n03/01/2001 02:03 PM\nTo: Portland West Desk\ncc: \nSubject: Volume Management Opening\n\nThe following position has become available in Volume Management reporting to \nChris Stokley. There are 2 positions available.\n\nEssential Functions:\n\nResponsible for preparing and reconciling estimated and actual service costs \nrelated to doing business in the California market with the ISO (Independent \nSystem Operator). Responsibilities will increase as other markets open.\nResponsible for ensuring all disputes between EPMI and the ISO are \ndocumented, submitted and brought to resolution prior to the close of the ISO \ndispute period.\nResponsible for explaining variance between EPMI estimated costs vs. ISO \nactualized costs.\nResponsible for performing a control function to ensure internal systems are \nkept in sync.\nAnticipate and respond to customer (internal and external) requests, \ninquiries, and investigate and resolve issues in a timely basis.\nIdentify and recommend opportunities to re-engineer processes and procedures.\nProvide a liaison function between Logistics (Scheduling), Power Settlements \nand Accounting.\n\nJob Requirements:\n\nCollege degree or equivalent work experience.\nStrong accounting skills.\nShould possess strong technical knowledge of the Power Industry.\nMust possess strong analytical skills.\nHeavy attention to detail and strong organizational capabilities.\nMust possess excellent oral, written, and interpersonal skills.\nPC proficiency, including Microsoft Work, Advanced Excel, and Access.\n\nSpecial Job Characteristics:\n\nMust be highly motivated. Self-starter with ability to recognize and solve \nproblems.\nOvertime may be required.\nSome travel may be required.\n\nChris/Murray will be holding a brown bag lunch on Tuesday, March 6, 2001 at \n11:30 am in Mt. Hood to discuss this opportunity. Interested applicants \nshould advise Amy FitzPatrick no later than close of business on Thursday, \nMarch 8, 2001.\n\n\n\n\n",accounting practices,0.588061
8,"As stated in my earlier email, the Company is under an obligation to retain=\n, among other documents identified in the Bankruptcy Court's February 15, 2=\n002 Order, a copy of which you all have received by email, all documents th=\nat relate to a pending or threatened investigation or lawsuit. As I am sure=\n you are aware from reading the newspapers and watching television, because=\n of the number of investigations and lawsuits the universe of relevant docu=\nments in just this one category covered by the Order is very large. You mus=\nt retain all relevant company-related documents until these actual or threa=\ntened lawsuits and investigations are over. No one can then second-guess wh=\nether you destroyed a relevant document. However, many of you have requeste=\nd guidance on what must be retained in this category of documents. To that =\nend, attached are the various subject matters of subpoenas and document req=\nuests that we have received. While you should be sure to take the time to r=\neview the subpoenas and document requests themselves, some of the topics co=\nvered are the following:\n1. All special purpose entities (including, but not limited to, Whitewing,=\n Marlin, Atlantic, Osprey, Braveheart, Yosemite, MEGS, Margaux, Backbone, N=\nahanni, Moose, Fishtail, and Blackhawk)\n2. All LJM entities\n3. Chewco\n4. JEDI I and II\n5. The Raptor structures\n6. Related party transactions\n7. Portland General acquisition\n8. Elektro acquisition\n9. Cuiaba project\n10. Nowa Sarzyna project\n11. Dabhol project\n12. The Dynegy merger\n13. All accounting records\n14. All structured finance documents\n15. Audit records\n16. All records relating to purchases or sales of Enron stock\n17. All records relating to Enron stock options\n18. All records relating to the Enron Savings Plan, Cash Balance Plan, ESOP=\n, and any other employee benefit plans=20\n19. Communications with analysts\n20. Communications with investors\n21. Communications with credit rating agencies\n22. All documents relating to California\n23. All documents relating to Rio Piedras\n24. All documents relating to pipeline safety\n25. All corporate tax documents\n26. All structured finance documents\n27. ENA collateralized loan obligations\n28. All periodic reports to management (including, but not limited to, VAR =\nReports, Daily Position Reports, Capital Portfolio Statements, Merchant Por=\ntfolio Statements, and Earnings Flash Reports)\n29. All press releases and records of public statements\n30. All DASHs\n31. All policy manuals\n32. All records relating to political contributions\n33. All documents relating to or reflecting communications with the SEC, CF=\nTC, FERC, or DOL\n34. All documents relating to Enron's dark fiber optic cable=20\n35. Mariner\n36. Matrix\n37. ECT Securities\n38. Enron Online\n39. All documents relating to the Enron PAC\n40. All documents reflecting any communication with any federal agency, Con=\ngress, or the Executive Office of the President\n41. All documents relating to Enron Broadband\n42. Drafts and non-identical duplicates relating to any of the foregoing.\nThough lengthy, this list is not inclusive. Please review the subpoenas and=\n document requests for the precise topics covered. If you have any question=\ns please contact Bob Williams, the Company's Litigation Manager, at (713) 3=\n45-2402 or email him at Robert.C.Williams@enron.com.\nAs always, thank you for your patience during this challenging time.",accounting practices,0.566686
9,"As stated in my earlier email, the Company is under an obligation to retain=\n, among other documents identified in the Bankruptcy Court's February 15, 2=\n002 Order, a copy of which you all have received by email, all documents th=\nat relate to a pending or threatened investigation or lawsuit. As I am sure=\n you are aware from reading the newspapers and watching television, because=\n of the number of investigations and lawsuits the universe of relevant docu=\nments in just this one category covered by the Order is very large. You mus=\nt retain all relevant company-related documents until these actual or threa=\ntened lawsuits and investigations are over. No one can then second-guess wh=\nether you destroyed a relevant document. However, many of you have requeste=\nd guidance on what must be retained in this category of documents. To that =\nend, attached are the various subject matters of subpoenas and document req=\nuests that we have received. While you should be sure to take the time to r=\neview the subpoenas and document requests themselves, some of the topics co=\nvered are the following:\n1. All special purpose entities (including, but not limited to, Whitewing,=\n Marlin, Atlantic, Osprey, Braveheart, Yosemite, MEGS, Margaux, Backbone, N=\nahanni, Moose, Fishtail, and Blackhawk)\n2. All LJM entities\n3. Chewco\n4. JEDI I and II\n5. The Raptor structures\n6. Related party transactions\n7. Portland General acquisition\n8. Elektro acquisition\n9. Cuiaba project\n10. Nowa Sarzyna project\n11. Dabhol project\n12. The Dynegy merger\n13. All accounting records\n14. All structured finance documents\n15. Audit records\n16. All records relating to purchases or sales of Enron stock\n17. All records relating to Enron stock options\n18. All records relating to the Enron Savings Plan, Cash Balance Plan, ESOP=\n, and any other employee benefit plans=20\n19. Communications with analysts\n20. Communications with investors\n21. Communications with credit rating agencies\n22. All documents relating to California\n23. All documents relating to Rio Piedras\n24. All documents relating to pipeline safety\n25. All corporate tax documents\n26. All structured finance documents\n27. ENA collateralized loan obligations\n28. All periodic reports to management (including, but not limited to, VAR =\nReports, Daily Position Reports, Capital Portfolio Statements, Merchant Por=\ntfolio Statements, and Earnings Flash Reports)\n29. All press releases and records of public statements\n30. All DASHs\n31. All policy manuals\n32. All records relating to political contributions\n33. All documents relating to or reflecting communications with the SEC, CF=\nTC, FERC, or DOL\n34. All documents relating to Enron's dark fiber optic cable=20\n35. Mariner\n36. Matrix\n37. ECT Securities\n38. Enron Online\n39. All documents relating to the Enron PAC\n40. All documents reflecting any communication with any federal agency, Con=\ngress, or the Executive Office of the President\n41. All documents relating to Enron Broadband\n42. Drafts and non-identical duplicates relating to any of the foregoing.\nThough lengthy, this list is not inclusive. Please review the subpoenas and=\n document requests for the precise topics covered. If you have any question=\ns please contact Bob Williams, the Company's Litigation Manager, at (713) 3=\n45-2402 or email him at Robert.C.Williams@enron.com.\nAs always, thank you for your patience during this challenging time.",accounting practices,0.566686


In [None]:
#now, let's see what results do we get if we do the search based on the whole text embeddings
res2 = search_engine(sample_query_df, encoding_level = find_similarities_whole_text_level)

In [None]:
display(res2)

Unnamed: 0,content,query,Relevance Score
0,Accounting Magic. Am I good or what?,accounting practices,0.461578
1,\nELCON commercial practices group[,accounting practices,0.419706
2,"Here is the point person for Devon Accounting questions/information:\n\nGary Wade\nDevon Energy Corporation\nRevenue Accounting Supervisor\ntel 405-552-4721\nwadeg@dvn.com\n20 North Broadway, Suite 1500\nOklahoma City, OK 73102-8260\nFax 405-552-4550",accounting practices,0.412697
3,"Shirley,\n\n2 expense checks:\n\n\n$8,937.30\n$4,971.30\n\n\nVince\n",accounting practices,0.388641
4,"\tKarl,\n\tSorry for the delay in responding. I have spoken to Jim Saunders on the Accounting allocation. He is sending 3% of his time for the accounting information flow to and from Co. 1195 as well as the investment accounting, and time spent on the reorganization accounting matters. The remaining dollars of the accounting allocation are from Rod Hayslett for the time he is spending on Co. 1195 issues. I hope this gives you the information you need.\n\tMary\n\n -----Original Message-----\nFrom: \tGeaccone, Tracy \nSent:\tFriday, October 19, 2001 9:16 AM\nTo:\tBotello, Mary\nSubject:\tFW: Allocations\n\n\n\n -----Original Message-----\nFrom: \tJackson, Karl \nSent:\tThursday, October 18, 2001 6:16 PM\nTo:\tGeaccone, Tracy\nSubject:\tAllocations\n\nI'm getting back to something, I left you a message last week. When Barney presented the ETS budget for the pipeline operations, he included amounts allocated from ETS for IT ($170k) and Accounting ($59k). You left me a phone mail message regarding the IT charge and suggested that I contact Caroline Barnes (which I did today). Both of these appear to be new charges for 2002 and we'd like to get information on what they relate to. Can you shed some light on the Accounting charge?\n\nThanks\nKarl",accounting practices,0.385297
5,"Tracy - \nAs discussed.\nSteve\n\n\n\n \n\nRob Brown\nManager, Enron Corp.\nFinancial Accounting & Reporting\nOff. 713.853.9702\nCell 713.303.4497\nrob.brown@enron.com",accounting practices,0.381349
6,"Good afternoon Sandeep:\n\nI sent Krishna's expenses for his recent trip to our accounting dept. this \nmorning and expensed them to the Research Group's Co and RC#. \nHowever, we need to be reimbursed for these expenses and I would like \nto know the best way to do this. Our accounting dept. thought we should \njust send an invoice with the back up material to Dabhol Power Co. asking \nthem to cut a check made payable to Enron Corp. and then it would be \ncredited to our Co# and RC#.\n\nPlease let me know if this is acceptable, and if so, who the invoice should\nbe made out to and where I should send it.\n\nI appreciate your assistance in this matter.\n\nHave a great day!\n\nShirley Crenshaw\n713-853-5290\n",accounting practices,0.375931
7,"Good afternoon Sandeep:\n\nI sent Krishna's expenses for his recent trip to our accounting dept. this \nmorning and expensed them to the Research Group's Co and RC#. \nHowever, we need to be reimbursed for these expenses and I would like \nto know the best way to do this. Our accounting dept. thought we should \njust send an invoice with the back up material to Dabhol Power Co. asking \nthem to cut a check made payable to Enron Corp. and then it would be \ncredited to our Co# and RC#.\n\nPlease let me know if this is acceptable, and if so, who the invoice should\nbe made out to and where I should send it.\n\nI appreciate your assistance in this matter.\n\nHave a great day!\n\nShirley Crenshaw\n713-853-5290\n",accounting practices,0.375931
8,"Greg,\n\nI work in the accounting group for TechQuest Capital and was wondering if\nyou had a contact for me in your accounting department. We haven't received\npayment for our invoice and I wanted to inquiry about its status.\n\nThank you for your help.\n\nDebi Estey",accounting practices,0.375235
9,"Please get together and make sure that the deferred taxes are moved to the correct entities, reflecting where the reserves are.\n\nPatty\n\n -----Original Message-----\nFrom: \tHunter, Todd \nSent:\tFriday, October 12, 2001 5:41 PM\nTo:\tLee, Patricia A.\nCc:\tRay, Sara; Richards, Todd; Seelig, Sally; Husser, Shanna; Fischer, Mary\nSubject:\tRE: Sales Tax Payable accrual\n\nPatty, The $20 MM Sales Tax Accrual was originally recorded to 986, acct 30113000. We transferred $5 MM in January 2001 for the PX Credit Reserve. This month (Sept 2001) we moved $10 MM from this acct to the Bonus Accrual on 985. I don't know the status of the remaining $4.5 MM left in the Taxes Payable -Other acct.\n\nTodd\n \n\n\nFrom:\tPatricia A Lee/ENRON@enronXgate on 10/10/2001 06:49 PM\nTo:\tSara Ray/HOU/EES@EES, Todd Hunter/HOU/EES@EES\ncc:\tTodd Richards/ENRON@enronXgate, Sally Seelig/HOU/EES@EES, Shanna Husser/ENRON@enronXgate, Mary Fischer/ENRON@enronXgate \nSubject:\tRE: Sales Tax Payable accrual\n\nI think we have found the account - but it looks like this has been transferred to 985 from 986. Can you provide us with the entries for both companies and let me know when you plan on taking this to the income statement - if you plan on doing this?\n\nThanks,\n\nPatty\n\n -----Original Message-----\nFrom: \tRay, Sara \nSent:\tWednesday, October 10, 2001 6:08 PM\nTo:\tHunter, Todd\nCc:\tLee, Patricia A.\nSubject:\tSales Tax Payable accrual\n\nTodd,\nCan you please provide Patty with the company and account number where the yearend sales tax accrual is recorded. They are going to start making payments against the accrual. Is the balance still at the original $15MM?\n\n",accounting practices,0.353929


##Results
As expected, the search results based on the embeddings on sentence level tend to be longer texts, the search results based on the embeddings on whole text tend to be shorter. 
In both cases we seem to be getting some relevant results. One of the interesting search results is this one: "Shirley,\n\n2 expense checks:\n\n\n$8,937.30\n$4,971.30\n\n\nVince\n"
Even though there was no mention of 'accounting' per se - that's the boost given by the Universal Sentence Encoder

In [None]:
#Let's try another search
data = {'query_text': ['upcoming audit']}
sample_query_df = pd.DataFrame(data=data, columns=['query_text'])
sample_query_df

Unnamed: 0,query_text
0,upcoming audit


In [None]:
display(search_engine(sample_query_df, encoding_level = find_similarities_sentence_level))

Unnamed: 0,content,query,Relevance Score
0,"As stated in my earlier email, the Company is under an obligation to retain=\n, among other documents identified in the Bankruptcy Court's February 15, 2=\n002 Order, a copy of which you all have received by email, all documents th=\nat relate to a pending or threatened investigation or lawsuit. As I am sure=\n you are aware from reading the newspapers and watching television, because=\n of the number of investigations and lawsuits the universe of relevant docu=\nments in just this one category covered by the Order is very large. You mus=\nt retain all relevant company-related documents until these actual or threa=\ntened lawsuits and investigations are over. No one can then second-guess wh=\nether you destroyed a relevant document. However, many of you have requeste=\nd guidance on what must be retained in this category of documents. To that =\nend, attached are the various subject matters of subpoenas and document req=\nuests that we have received. While you should be sure to take the time to r=\neview the subpoenas and document requests themselves, some of the topics co=\nvered are the following:\n1. All special purpose entities (including, but not limited to, Whitewing,=\n Marlin, Atlantic, Osprey, Braveheart, Yosemite, MEGS, Margaux, Backbone, N=\nahanni, Moose, Fishtail, and Blackhawk)\n2. All LJM entities\n3. Chewco\n4. JEDI I and II\n5. The Raptor structures\n6. Related party transactions\n7. Portland General acquisition\n8. Elektro acquisition\n9. Cuiaba project\n10. Nowa Sarzyna project\n11. Dabhol project\n12. The Dynegy merger\n13. All accounting records\n14. All structured finance documents\n15. Audit records\n16. All records relating to purchases or sales of Enron stock\n17. All records relating to Enron stock options\n18. All records relating to the Enron Savings Plan, Cash Balance Plan, ESOP=\n, and any other employee benefit plans=20\n19. Communications with analysts\n20. Communications with investors\n21. Communications with credit rating agencies\n22. All documents relating to California\n23. All documents relating to Rio Piedras\n24. All documents relating to pipeline safety\n25. All corporate tax documents\n26. All structured finance documents\n27. ENA collateralized loan obligations\n28. All periodic reports to management (including, but not limited to, VAR =\nReports, Daily Position Reports, Capital Portfolio Statements, Merchant Por=\ntfolio Statements, and Earnings Flash Reports)\n29. All press releases and records of public statements\n30. All DASHs\n31. All policy manuals\n32. All records relating to political contributions\n33. All documents relating to or reflecting communications with the SEC, CF=\nTC, FERC, or DOL\n34. All documents relating to Enron's dark fiber optic cable=20\n35. Mariner\n36. Matrix\n37. ECT Securities\n38. Enron Online\n39. All documents relating to the Enron PAC\n40. All documents reflecting any communication with any federal agency, Con=\ngress, or the Executive Office of the President\n41. All documents relating to Enron Broadband\n42. Drafts and non-identical duplicates relating to any of the foregoing.\nThough lengthy, this list is not inclusive. Please review the subpoenas and=\n document requests for the precise topics covered. If you have any question=\ns please contact Bob Williams, the Company's Litigation Manager, at (713) 3=\n45-2402 or email him at Robert.C.Williams@enron.com.\nAs always, thank you for your patience during this challenging time.",upcoming audit,0.703599
1,"As stated in my earlier email, the Company is under an obligation to retain=\n, among other documents identified in the Bankruptcy Court's February 15, 2=\n002 Order, a copy of which you all have received by email, all documents th=\nat relate to a pending or threatened investigation or lawsuit. As I am sure=\n you are aware from reading the newspapers and watching television, because=\n of the number of investigations and lawsuits the universe of relevant docu=\nments in just this one category covered by the Order is very large. You mus=\nt retain all relevant company-related documents until these actual or threa=\ntened lawsuits and investigations are over. No one can then second-guess wh=\nether you destroyed a relevant document. However, many of you have requeste=\nd guidance on what must be retained in this category of documents. To that =\nend, attached are the various subject matters of subpoenas and document req=\nuests that we have received. While you should be sure to take the time to r=\neview the subpoenas and document requests themselves, some of the topics co=\nvered are the following:\n1. All special purpose entities (including, but not limited to, Whitewing,=\n Marlin, Atlantic, Osprey, Braveheart, Yosemite, MEGS, Margaux, Backbone, N=\nahanni, Moose, Fishtail, and Blackhawk)\n2. All LJM entities\n3. Chewco\n4. JEDI I and II\n5. The Raptor structures\n6. Related party transactions\n7. Portland General acquisition\n8. Elektro acquisition\n9. Cuiaba project\n10. Nowa Sarzyna project\n11. Dabhol project\n12. The Dynegy merger\n13. All accounting records\n14. All structured finance documents\n15. Audit records\n16. All records relating to purchases or sales of Enron stock\n17. All records relating to Enron stock options\n18. All records relating to the Enron Savings Plan, Cash Balance Plan, ESOP=\n, and any other employee benefit plans=20\n19. Communications with analysts\n20. Communications with investors\n21. Communications with credit rating agencies\n22. All documents relating to California\n23. All documents relating to Rio Piedras\n24. All documents relating to pipeline safety\n25. All corporate tax documents\n26. All structured finance documents\n27. ENA collateralized loan obligations\n28. All periodic reports to management (including, but not limited to, VAR =\nReports, Daily Position Reports, Capital Portfolio Statements, Merchant Por=\ntfolio Statements, and Earnings Flash Reports)\n29. All press releases and records of public statements\n30. All DASHs\n31. All policy manuals\n32. All records relating to political contributions\n33. All documents relating to or reflecting communications with the SEC, CF=\nTC, FERC, or DOL\n34. All documents relating to Enron's dark fiber optic cable=20\n35. Mariner\n36. Matrix\n37. ECT Securities\n38. Enron Online\n39. All documents relating to the Enron PAC\n40. All documents reflecting any communication with any federal agency, Con=\ngress, or the Executive Office of the President\n41. All documents relating to Enron Broadband\n42. Drafts and non-identical duplicates relating to any of the foregoing.\nThough lengthy, this list is not inclusive. Please review the subpoenas and=\n document requests for the precise topics covered. If you have any question=\ns please contact Bob Williams, the Company's Litigation Manager, at (713) 3=\n45-2402 or email him at Robert.C.Williams@enron.com.\nAs always, thank you for your patience during this challenging time.",upcoming audit,0.703599
2,"Here are some bullets briefly describing the FERC audit high points. I think the scope is ""right on the money"" and the timing is reasonably good also. The text of the audit solicitation follows the bullets. I've made contact with Andy Sakallaris, head of FERC procurement, to attend the pre-bid conference next Tuesday (assuming it is held, since expressions of interest thus far have been from CA based firms unwilling to make the trip to DC). Ray\n\nAudit description- FERC seeks to take all appropriate steps for prospective improvements in California markets; accordingly, an independent operational audit of the CAISO will be performed to determine any areas in which the CAISO could enhance its effectiveness in fulfilling its responsibilities in operating the transmission system under its control and administering real-time energy markets.\n\nScope- Provide a comparison of the CAISO's actual operations and its FERC tariff and determine whether the CAISO's operations are adequately transparent to market participants.\n\nAudit period- 10/1/00-10/1/01\n\nDeliverables timing- Draft report due on January 9, 2002; final report due January 16, 2002.\n\nAudit schedule- Begins on November 1, 2001, and ends no later than January 4, 2002.\n\nPre-proposal conference- A conference is scheduled for prospective offerors at FERC next Tuesday; FERC will give a brief explanation of the requirements and clarify any questions asked. \n\n\n\nPART: U.S. GOVERNMENT PROCUREMENTS\nSUBPART: SERVICES\nCLASSCOD: R--Professional, Administrative and Management Support\n Services\nOFFADD: Federal Energy Regulatory Commission (FERC), 888 First\n Street, NE, Washington, DC 20426\nSUBJECT: R--OPERATIONAL AUDIT OF CALIFORNIA INDEPENDENT SYSTEM\n OPERATIOR. INC. CAISO\nSOL FERC02RMT22071\nDUE 102601\nPOC Charlotte Handley, 202-219-1156 or Andrew Sakallaris,202-219-1150\nDESC: This is a combined synopsis/solicitation for commercial\n items prepared in accordance with the format in Subpart 12.6,\n as supplemented with additional information included in this\n notice. This announcement constitutes the only solicitation;\n proposals are being requested and a written solicitation will\n not be issued. FERC02RMT22071 is issued as a request for proposal\n and all provisions/clauses are those in effect through FAC\n 97-27. This solicitation is unrestricted under NAICS number\n 561. Statement of Work: I. BACKGROUND. The California electricity\n market has experienced great stress from Summer 2000 to Summer\n 2001, with demand outstripping supply, wholesale prices rising\n dramatically, and key market participants becoming financially\n unstable. As a result, the California Independent System Operator,\n Inc. (CAISO), which operates the electricity transmission grid\n throughout most of California, was forced to adapt to rapidly\n changing circumstances. During all of this, the CAISO worked\n hard to ensure system reliability. Under the Federal Energy\n Regulatory Commission's market mitigation plan, wholesale prices\n stabilized in California in Summer 2001. The Commission seeks\n to take all appropriate steps for prospective improvements\n in California markets, including improvements to help the CAISO\n in effectively performing its increasing responsibilities.\n Accordingly, an independent operational audit of the CAISO\n will be performed to determine the areas, if any, in which\n the CAISO could enhance its effectiveness in fulfilling its\n responsibilities in operating the transmission system under\n its control and administering certain real-time energy markets.\n II. SCOPE OF WORK. The Independent Public Accounting (Contractor)\n firm shall perform an operational audit of the CAISO processes,\n practices, and procedures in accordance with Generally Accepted\n Government Auditing Standards (GAGAS). The Contractor shall\n provide a comparison of the CAISO's actual operations and its\n Commission tariff (procedures and tariff can be found at www.caiso.com).\n The Contractor shall ascertain whether the CAISO's operations\n are adequately transparent to market participants. To the extent\n possible, the Contractor shall also ascertain whether the CAISO's\n procedures and operations are consistent with the most effective\n business practice. III. AUDIT PERIOD. The audit of the CAISO\n shall cover the period commencing October 1, 2000, through\n October 1, 2001. IV. DELIVERABLES. The contractor must provide\n the Contracting Officer's Representative (COR) with a draft\n and final report detailing the findings, operating deficiencies\n and/or major observations and recommendations. These reports\n and other deliverables must be provided to the COR by the close\n of business on the following dates: 1) Draft Report - January\n 9, 2002 2) Final Report - January 16, 2002. V. SCHEDULE. The\n operational audit will commence on November 1, 2001, and it\n must be completed no later than January 4, 2002. The contractor\n shall meet with the COR to discuss the audit no later than\n November 6, 2001. The contractor must provide the Commission\n with an operational audit plan no later than November 14, 2001.\n A detailed operational audit plan will be submitted to the\n COR for review, feedback, and approval. The COR or designated\n representative will review the plan. VI. Working Papers. The\n contractor shall prepare and provide a copy of work papers\n in accordance with GAGAS. All work papers shall include the\n purpose, sources of information, the procedures performed,\n results thereof, and conclusions as appropriate for the planning,\n internal control, compliance, and substantive testing phases.\n For all findings, work papers shall clearly show the condition\n or problem, criteria, cause, effect, and recommendation for\n improvement. All working papers shall be cross-referenced to\n audit programs, summaries, and final audit report. These documents\n will become the property of the Commission. During the course\n of the audit or upon completion of the audit work, the Contractor's\n audit report and work papers shall be subject to access and\n review by the Commission. A preproposal conference is scheduled\n for prospective offerors at FERC on Tuesday, October 16, 2001,\n at 10:00 am at 888 First Street, NE, Washington, DC 20426,\n Room 3M-3. FERC will give a brief explanation of the requirements\n and clarify any questions asked. Questions submitted in advance\n to A. Sakallaris (andrew.sakallaris@ferc.fed.us) is appreciated\n and preferable. Contractual, technical, and legal representatives\n will be on hand to answer any questions. Bring no more than\n three representatives and E-mail or phone A. Sakallaris, 202-219-1150\n to indicate the number and names of individuals attending.\n If unable to attend, E-Mail A. Sakallaris and provide the point\n of contact's mailing and E-mail address and FERC will arrange\n to deliver any additional information that may become available\n as a result of the pre-proposal conference. All offerors and\n subcontractors will complete and submit with their proposals,\n either the Organizational Conflicts of Interest (OCI) representation\n or the OCI disclosure (not both). Complete and submit the OCI\n Questionnaire. Request OCI information from A. Sakallaris.\n See DOE clauses 952.209-8 OCI-Disclosure and 952.209-72 OCI.\n Same/similar language will be incorporated into the resultant\n contract. Period of Performance: From approximately November\n 1, 2001 to January 16, 2002. FAR Provision 52.212-1, Instructions\n to Offerors-Commercial (Oct 2000), applies. Addendum to FAR\n 52.212-1 SUBPARAGRAPH (b) Reply by e- mail the proposal to\n FERC, ATTN: C. Handley, Division of Procurement, FA-12, 888\n First St, NE, Wash DC 20426, e-mail: charlotte.handley@ferc.fed.us.\n It is our intent to evaluate based on initial proposals. Offerors\n shall submit five copies of their price and technical proposal,\n signed by an official authorized to bind the offeror, at the\n above address no later than 2:00 PM local time, Friday, October\n 26, 2001. Price Proposal: Addendum to FAR 52.212-1 SUBPARAGRAPH\n (b)(6): The price proposal shall be completely separate from\n the technical proposal. Offerors must breakout pricing by month\n and provide (1) labor categories (2) corresponding firm-fixed-price,\n fully loaded labor rates (3) proposed number of hours estimated\n for each labor category (4) Other Direct Costs as applicable\n (subcontractor and travel). Supporting details shall also be\n provided as appropriate (e.g., travel destinations, number\n of trips, computer time, discounts offered (excluding prompt\n payment)). Addendum to FAR 52.212-1 SUBPARAGRAPH (c) The offeror\n agrees to hold the prices in its offer firm for 120 calendar\n days. Provision 52.212-2, Evaluation-Commercial Items (Jan\n 1999), applies. Addendum to FAR 52.212-2 SUBPARAGRAPH, (a)\n Technical Proposal: Past Performance (30 points). Ensure that\n the company names, points of contact and phone numbers are\n current. The Offeror and each subcontractor proposed must each\n select three references to complete a ""Contractor Past-Performance\n Evaluation."" All areas of the sheet must be filled in. If an\n answer to a specific question is not provided, the applicable\n area must be annotated with the reason why (e.g. ""Reference\n failed to provide an answer""). Higher scores will be assigned\n for contracts that are at least similar in size and complexity.\n FERC may obtain and evaluate information from sources other\n than those provided by the offeror. FAR 15.305(a)(2)(iv) states\n ""In the case of an offeror without a record of relevant past\n performance or for whom information on past performance is\n not available, the offeror may not be evaluated favorably or\n unfavorably on past performance."" For the purposes of this\n evaluation not being evaluated favorably or unfavorably means\n the offeror will receive a rating of ""good or acceptable"" on\n company past performance. To obtain a copy of the past performance\n form to be completed by the Offeror's references, request from\n A. Sakallaris by email. The Offeror will be responsible and\n ensure that the references submit or reply by e-mail their\n completed response to C. Handley, located at the above address\n by 2:00 PM local time, October 26, 2001. Offerors should also\n notify the references that FERC may be contacting them regarding\n the past performance information. Prior Audit Experience (25\n points). The offeror must cover in its proposal the extent\n of prior related experience in electric bulk power markets,\n and the ability to effectively and efficiently conduct this\n operational audit in accordance with GAGAS. The proposal must\n include previous and/or current experience, within the last\n 3 years, where the offeror has performed audits similar in\n terms of size and complexity of this requirement. Higher scores\n will be assigned to those offerors who have demonstrated successful\n experience in electric bulk power markets within the last three\n years. To be acceptable, the services must have been satisfactorily\n performed and at least similar in terms of size and complexity\n to the work required under this contract. Understanding the\n Statement of Work (20 points). The proposal must clearly demonstrate\n in sufficient and precise detail the offeror's expertise in\n understanding and analyzing independent system operator tariffs,\n market operations and services, and transmission issues. The\n offeror must describe its ability to perform a comprehensive\n and thorough analysis of the electric bulk power market operated\n by the CAISO. Higher scores will be assigned to those offerors\n who demonstrate an understanding and greater knowledge of independent\n system operator tariffs, market operations and services, and\n transmission issues. Higher scores will also be assigned to\n offerors who demonstrate a comprehensive understanding of bulk\n power markets like those operated by the CAISO. Audit Methodology\n and Quality Control (15 points). The technical proposal must\n include the audit methodology and must address each phase of\n the audit. The proposal must identify specific aspects of each\n phase and explain how each phase/aspect is related. The offeror\n must include estimated completion times for each phase of the\n audit and address critical completion dates. The technical\n proposal must describe the audit approach to documenting systems\n and internal controls, and effective procedures, including\n consideration of risk and materiality, to determine the extent\n of audit testing. Quality control is an important part of the\n technical proposal. The proposal must demonstrate the offeror's\n internal quality control procedures. The proposal must include\n a copy of the firm's latest peer review report, comments, and\n the response to the peer review report. Higher scores will\n be assigned to those offerors who demonstrate and offer a sound,\n valid, effective, and innovative audit methodology that reflects\n an understanding of the statement of work. ""Valid"" entails\n the use of techniques that are known to be feasible with respect\n to the area addressed; ""Effective"" refers to the workability\n and appropriateness of the methodology. ""Innovative"" means\n the development and application of a novel, yet valid technique\n that will increase the effectiveness of the approach. Higher\n scores will be assigned to offerors who demonstrate sound and\n effective internal quality control procedures. Professional\n Qualifications (10 points). The offeror must describe in its\n proposal: 1) the professional qualifications of the staff that\n will be assigned to this contract and 2) demonstrate that it\n has sufficient resources to perform the audit and qualified\n staff to perform critical tasks necessary to support the audit.\n The written response must also include the following information\n for all staff participating on the audit: Labor Category (i.e.,\n Partner, Manager, etc.); CPA certification date (if applicable);\n Other applicable certification(s); Total years of audit and/or\n other experience; Advance degree(s) obtained; and Total years\n employed by the offeror. Higher scores will be based on offeror's\n educational background, recent work experience, and certifications\n obtain by their staff. In particular, higher scores will be\n given to those who offer recent experience with independent\n system operator tariffs, market operations and services, transmission\n issues, and bulk power markets like those operated by the CAISO.\n Addendum to FAR 52.212-2 SUBPARAGRAPH (a). Technical and past\n performance, when combined, are more important than price.\n However, award shall be made to the offeror whose proposal\n is determined to best meet the needs of the Government after\n consideration of all factors i.e., provides the ""best value.""\n ""Best value,"" for the purpose of the contract is defined as\n the procurement process that results in the most advantageous\n acquisition decision for the Government and is performed through\n an integrated assessment and trade-off analysis between technical\n and price factors. Include a completed copy of Provision at\n 52.212.3, Offeror Representations and Certifications-Commercial\n Items (May 2001) with offer. Clause 52.212-4, Contract Terms\n and Conditions-Commercial Items (May 2001), applies. Addendum\n to FAR 52.212-4: 1. 52.252-2 Clauses Incorporated By Reference\n (FEB 1998). This contract incorporates one or more clauses\n by reference, with the same force and effect as if they were\n given in full text. Upon request, the Contracting Officer will\n make their full text available. Clauses are as follows: 52.217-8\n and 52.217-9. Clause 52.212-5, Contract Terms and Conditions\n Required To Implement Statutes or Executive Orders-Commercial\n Items (May 2001) applies. Addendum to FAR 52.212-5: The following\n clauses are incorporated by reference: 52.203-6 with A1; 52.219-4;\n 52.219-8; 52.222-21; 52.222-26; 52.222-35; 52-222-36; 52.222-37;\n 52.225-13; 52.225-15; 52.225-16; 52.232-33; 52.239-1. E-mail\n questions to C. Handley at charlotte.handley@ferc.fed.us. no\n later than October 19, 2001. E-Mail C. Handley of your intent\n to submit a proposal no later than October 19, 2001. \nLINKURL: andrew.sakallaris@ferc.fed.us\nLINKDESC: Click here to request copies of OCI questionnaire and <andrew.sakallaris@ferc.fed.us>\n the past performance form\nEMAILADD: charlotte.handley@ferc.fed.us\nEMAILDESC: charlotte.handley@ferc.fed.us <mailto:charlotte.handley@ferc.fed.us>\nCITE: (W-282 SN5102V8)\n",upcoming audit,0.662598
3,"Here are issues which I'm unclear about, all of which impact the drafting of \nthe agreement:\n\n\nHow are we setting the heat rate? Do we have the GADS (?)? Is it a daily \naverage, weekly average? Flat rate? Adjusted? If so, how often?\n\nAre we comfortable that there are no permit restrictions?\n\nWe have determined that we won't deal with fuel oil, right?\n\nWe aren't making money on gas, right?\n\nDo we have a defn of costs that we like yet, and if so, how does it fit in to \nthe picture?\n\nIs there an up-to-date set of exhibits? \n\nIt would help if theh commercial part of the team could send me the \nfollowing, in words and/or formulae:\n\nThe defn and method of establishing the bogey (target production cost) \nformula can be an exhibit, which would be great for the commercial team to \nwork on. Are we determined how we should deal with imbalances (part of cost \nof power)? How are we setting the bogey? Formula? Subject to audit? Two \nbogeys or one (gas and oil)?\n\nWhat is defn of profit? I think I have the general idea, but a sentence or \ntwo would be helpful as a reality check. What costs are included on the buy \nand sell side?\n\nRe stack model: a sentence or two describing what it is and how it is used.\n\nUpdated exhibit on facilities, contracted resources, operating limits.\n\nWhat information does MDEA need for us to provide in order to split \ncosts/profits? Are we clear that Cities buy gas, MDEA buys/sells power?\n\nIt would be great if I could get one set of answers to these \nquestions/issues, which has commercial buy in all around.\n\nThanks,\n\nKay",upcoming audit,0.647438
4,"Here are issues which I'm unclear about, all of which impact the drafting of \nthe agreement:\n\n\nHow are we setting the heat rate? Do we have the GADS (?)? Is it a daily \naverage, weekly average? Flat rate? Adjusted? If so, how often?\n\nAre we comfortable that there are no permit restrictions?\n\nWe have determined that we won't deal with fuel oil, right?\n\nWe aren't making money on gas, right?\n\nDo we have a defn of costs that we like yet, and if so, how does it fit in to \nthe picture?\n\nIs there an up-to-date set of exhibits? \n\nIt would help if theh commercial part of the team could send me the \nfollowing, in words and/or formulae:\n\nThe defn and method of establishing the bogey (target production cost) \nformula can be an exhibit, which would be great for the commercial team to \nwork on. Are we determined how we should deal with imbalances (part of cost \nof power)? How are we setting the bogey? Formula? Subject to audit? Two \nbogeys or one (gas and oil)?\n\nWhat is defn of profit? I think I have the general idea, but a sentence or \ntwo would be helpful as a reality check. What costs are included on the buy \nand sell side?\n\nRe stack model: a sentence or two describing what it is and how it is used.\n\nUpdated exhibit on facilities, contracted resources, operating limits.\n\nWhat information does MDEA need for us to provide in order to split \ncosts/profits? Are we clear that Cities buy gas, MDEA buys/sells power?\n\nIt would be great if I could get one set of answers to these \nquestions/issues, which has commercial buy in all around.\n\nThanks,\n\nKay",upcoming audit,0.647438
5,"More:\n\nWe need internal agreement on the cancellation fee schedule, to be included \nas an exhibit.\n\nWhat is the status of the Yazoo City master gas agreement? Do we still want \na master gas agreement for Clarksdale, also? Do we want it now?\n\nThanks,\n\nKay\n\n\n\n\n\n\nKay Mann\n05/23/2001 05:05 PM\nTo: Reagan Rorschach/Enron@EnronXGate, Heather Kroll/Enron@EnronXGate, David \nFairley/Enron@EnronXGate, Tom May/Enron@EnronXGate\ncc: \n\nSubject: Open items for MDEA\n\nHere are issues which I'm unclear about, all of which impact the drafting of \nthe agreement:\n\n\nHow are we setting the heat rate? Do we have the GADS (?)? Is it a daily \naverage, weekly average? Flat rate? Adjusted? If so, how often?\n\nAre we comfortable that there are no permit restrictions?\n\nWe have determined that we won't deal with fuel oil, right?\n\nWe aren't making money on gas, right?\n\nDo we have a defn of costs that we like yet, and if so, how does it fit in to \nthe picture?\n\nIs there an up-to-date set of exhibits? \n\nIt would help if theh commercial part of the team could send me the \nfollowing, in words and/or formulae:\n\nThe defn and method of establishing the bogey (target production cost) \nformula can be an exhibit, which would be great for the commercial team to \nwork on. Are we determined how we should deal with imbalances (part of cost \nof power)? How are we setting the bogey? Formula? Subject to audit? Two \nbogeys or one (gas and oil)?\n\nWhat is defn of profit? I think I have the general idea, but a sentence or \ntwo would be helpful as a reality check. What costs are included on the buy \nand sell side?\n\nRe stack model: a sentence or two describing what it is and how it is used.\n\nUpdated exhibit on facilities, contracted resources, operating limits.\n\nWhat information does MDEA need for us to provide in order to split \ncosts/profits? Are we clear that Cities buy gas, MDEA buys/sells power?\n\nIt would be great if I could get one set of answers to these \nquestions/issues, which has commercial buy in all around.\n\nThanks,\n\nKay\n\n",upcoming audit,0.647438
6,"Schedule reflects amount due for cancellation on or before the stated dates?\n\n\nFrom: Reagan Rorschach/ENRON@enronXgate on 05/24/2001 01:19 PM\nTo: Kay Mann/Corp/Enron@Enron\ncc: Kayne Coulter/ENRON@enronXgate, Tom May/ENRON@enronXgate, Jeffrey \nMiller/ENRON@enronXgate, David Fairley/ENRON@enronXgate \n\nSubject: RE: Open items for MDEA\n\nKay, see attached. The monthly fee should be $13,003. I took the $300k we \nhave all been discussing and backed out the $12,500 they paid under the ILA \nterms.\n\n\n\nReagan C. Rorschach\nEnron North America\n1400 Smith Street\nHouston, Texas 77002\n713.345.3363\n\n -----Original Message-----\nFrom: Mann, Kay \nSent: Thursday, May 24, 2001 11:35 AM\nTo: Mann, Kay\nCc: Rorschach, Reagan; Kroll, Heather; Fairley, David; May, Tom\nSubject: Re: Open items for MDEA\n\nMore:\n\nWe need internal agreement on the cancellation fee schedule, to be included \nas an exhibit.\n\nWhat is the status of the Yazoo City master gas agreement? Do we still want \na master gas agreement for Clarksdale, also? Do we want it now?\n\nThanks,\n\nKay\n\n\n\n\n\n\nKay Mann\n05/23/2001 05:05 PM\nTo: Reagan Rorschach/Enron@EnronXGate, Heather Kroll/Enron@EnronXGate, David \nFairley/Enron@EnronXGate, Tom May/Enron@EnronXGate\ncc: \n\nSubject: Open items for MDEA\n\nHere are issues which I'm unclear about, all of which impact the drafting of \nthe agreement:\n\n\nHow are we setting the heat rate? Do we have the GADS (?)? Is it a daily \naverage, weekly average? Flat rate? Adjusted? If so, how often?\n\nAre we comfortable that there are no permit restrictions?\n\nWe have determined that we won't deal with fuel oil, right?\n\nWe aren't making money on gas, right?\n\nDo we have a defn of costs that we like yet, and if so, how does it fit in to \nthe picture?\n\nIs there an up-to-date set of exhibits? \n\nIt would help if theh commercial part of the team could send me the \nfollowing, in words and/or formulae:\n\nThe defn and method of establishing the bogey (target production cost) \nformula can be an exhibit, which would be great for the commercial team to \nwork on. Are we determined how we should deal with imbalances (part of cost \nof power)? How are we setting the bogey? Formula? Subject to audit? Two \nbogeys or one (gas and oil)?\n\nWhat is defn of profit? I think I have the general idea, but a sentence or \ntwo would be helpful as a reality check. What costs are included on the buy \nand sell side?\n\nRe stack model: a sentence or two describing what it is and how it is used.\n\nUpdated exhibit on facilities, contracted resources, operating limits.\n\nWhat information does MDEA need for us to provide in order to split \ncosts/profits? Are we clear that Cities buy gas, MDEA buys/sells power?\n\nIt would be great if I could get one set of answers to these \nquestions/issues, which has commercial buy in all around.\n\nThanks,\n\nKay\n\n\n\n\n",upcoming audit,0.647438
7,"Kay, see attached. The monthly fee should be $13,003. I took the $300k we have all been discussing and backed out the $12,500 they paid under the ILA terms.\n\n \n\nReagan C. Rorschach\nEnron North America\n1400 Smith Street\nHouston, Texas 77002\n713.345.3363\n\n -----Original Message-----\nFrom: \tMann, Kay \nSent:\tThursday, May 24, 2001 11:35 AM\nTo:\tMann, Kay\nCc:\tRorschach, Reagan; Kroll, Heather; Fairley, David; May, Tom\nSubject:\tRe: Open items for MDEA\n\nMore:\n\nWe need internal agreement on the cancellation fee schedule, to be included as an exhibit.\n\nWhat is the status of the Yazoo City master gas agreement? Do we still want a master gas agreement for Clarksdale, also? Do we want it now?\n\nThanks,\n\nKay\n\n\n\n\n\n \nKay Mann\n05/23/2001 05:05 PM\nTo:\tReagan Rorschach/Enron@EnronXGate, Heather Kroll/Enron@EnronXGate, David Fairley/Enron@EnronXGate, Tom May/Enron@EnronXGate\ncc:\t \n\nSubject:\tOpen items for MDEA\n\nHere are issues which I'm unclear about, all of which impact the drafting of the agreement:\n\n\nHow are we setting the heat rate? Do we have the GADS (?)? Is it a daily average, weekly average? Flat rate? Adjusted? If so, how often?\n\nAre we comfortable that there are no permit restrictions?\n\nWe have determined that we won't deal with fuel oil, right?\n\nWe aren't making money on gas, right?\n\nDo we have a defn of costs that we like yet, and if so, how does it fit in to the picture?\n\nIs there an up-to-date set of exhibits? \n\nIt would help if theh commercial part of the team could send me the following, in words and/or formulae:\n\nThe defn and method of establishing the bogey (target production cost) formula can be an exhibit, which would be great for the commercial team to work on. Are we determined how we should deal with imbalances (part of cost of power)? How are we setting the bogey? Formula? Subject to audit? Two bogeys or one (gas and oil)?\n\nWhat is defn of profit? I think I have the general idea, but a sentence or two would be helpful as a reality check. What costs are included on the buy and sell side?\n\nRe stack model: a sentence or two describing what it is and how it is used.\n\nUpdated exhibit on facilities, contracted resources, operating limits.\n\nWhat information does MDEA need for us to provide in order to split costs/profits? Are we clear that Cities buy gas, MDEA buys/sells power?\n\nIt would be great if I could get one set of answers to these questions/issues, which has commercial buy in all around.\n\nThanks,\n\nKay\n\n\n\n\n<Embedded Picture (Device Independent Bitmap)>",upcoming audit,0.647438
8,"Notice to Members No. 01-167\nMay 16, 2001\n\n\nTo:\nAll NYMEX Division Members\nAll COMEX Division Members\n\nFrom:\nNeal Wolkoff, Executive Vice President\n\nRe:\n-Changes to Audit Trail Summary Fine Structure-NYMEX & COMEX Divisions\n-Incorporation of Spreads into One Minute Trade Time Submission for NYMEX\n\n\nModification of =01&Clear-out Period=018\n\nThe Board of Directors has approved modifications to the fine structure for=\n=20\none-minute trade time submission for both NYMEX and COMEX divisions.=20\nPresently members falling below 80% timely submission per month are issued=\n=20\nescalating warning letters and fines unless and until they clear out their=\n=20\nrecord by being in compliance for six consecutive months. After six months=\n,=20\nthe member returns to a =01&warning letter=018 level. Effective immediatel=\ny, this=20\n=01&clear-out period=018 is reduced from six months to four months. One-mi=\nnute=20\ntrade timing statistics for the month ending April 30, 2001 will reflect th=\nis=20\nmodification.\n\n\nImplementation of One-Minute Trade-Time Classes\n\nThe Board also accepted the recommendation of the Compliance Review Committ=\nee=20\nand Floor Committee to adopt a training program for members facing one-minu=\nte=20\ntrade time fines on the NYMEX division. The purpose of the program is to=\n=20\nimprove Exchange trade submission timing through member education by=20\nidentifying individual issues, tailoring individual solutions and statistic=\nal=20\nfollow-up. Training will be conducted at dates to be announced during June=\n,=20\nJuly and August 2001 by a team comprised of Committee Members and Complianc=\ne=20\nStaff with the assistance of the Training and Education Committee.\n\nAs incentive, any member attending the class will be allowed to =01&do over=\n=018 up=20\nto two subsequent months in which their trade time rates are between 70% an=\nd=20\n79%. The member will still have to complete four violation free months=20\nbefore his record is =01&cleared out=018 but he may take up to six months t=\no do=20\nso. Members may take one class per year.\n\nAudit Trail Classes\n\n? Purpose=01*to reduce number of members falling below the 80% standard for=\n pit=20\ncard submission and to facilitate the return of members to a =01&warning le=\ntter=018=20\nlevel.\n? Members who have received a warning letter or a fine for pit card=20\nviolations may:\n Attend an audit trail class, to be offered in June, July and August of 200=\n1,=20\nwith future dates to be determined by the Floor Committee/Compliance Review=\n=20\nCommittee. The class will be an intensive discussion of pit card problems=\n=20\nand pitfalls. Materials will focus on how the audit trail works and=20\nsuccessful strategies for achieving a passing grade. Members attending cla=\nss=20\nwill be required to analyze their own trading style in light of the materia=\nl=20\npresented and pinpoint areas which they need to improve. Members will also=\n=20\nbe required to outline a plan of attack to address raising their pit card=\n=20\nperformance rates. Each plan will be discussed with the group and members=\n=20\nwill receive feedback from Floor and Compliance Review Committee Members, a=\nnd=20\nCompliance and Floor Department Staff. Members attending classes are=20\nstrongly encouraged to bring their clerks with them!\n Upon completion of the class, members will be allowed to =01&do over=018 u=\np to two=20\nsubsequent months in which they received the warning or fine for substandar=\nd=20\ntrade time rates between 70% and 79%, on their way to achieving four months=\n=20\nclear. No fine or warning letter will be issued for a =01&do over=018 mont=\nh but=20\nthe month will not count as a passing month. A member who falls below 70%=\n=20\nwill receive a fine or warning letter.\n The benefit to the member will be that the month that would have counted a=\ns=20\na failing month will not count at all, as long as the member achieved at=20\nleast a 70% rate. Members still need four problem free months to return to=\n a=20\nwarning letter stage from a fine stage. A member who has been problem free=\n=20\nfor 3 months would benefit by =01&doing over=018 a fourth (and possibly a f=\nifth)=20\nfailing month, thus having the opportunity to return to a warning letter=20\nphase despite intervening failing months. A member who falls below 70%=20\nbefore achieving four problem free months will receive a fine or a warning=\n=20\nletter, as appropriate. The member can still use a second =01&do over=018 =\nmonth as=20\nlong as he meets all criteria.\n Staff will follow up with members attending the classes. Members will=20\nreceive a breakdown of their most recent audit trail fine. They will also =\nbe=20\ninvited to discuss their percentages with staff one-on-one. Further, staff=\n=20\nwill supply daily or weekly statistics to any member by e-mail upon request=\n.\n Members are eligible for only one class per year. A member must take a=20\nclass before being given the benefits listed herein. Classes will be offer=\ned=20\nin the third week of the months listed, at a date to be announced, and the=\n=20\nprogram will apply to the next full month of audit trail results. The=20\nCompliance Review Committee and the Floor Committee will evaluate the resul=\nts=20\nof this program before deciding whether to extend this program into 2002.\n\n\n\n\nIncorporation of Spreads into One-Minute\nTrade Time =01&Pit Card=018 Statistics for NYMEX Division\n\nPresent one-minute trade time =01&pit card=018 statistics do not include sp=\nread=20\ntimes. During the most recent NYMEX Rule Enforcement Review--the CFTC audi=\nt=20\nof the Exchange=01,s Compliance programs and procedures--the CFTC recommend=\ned=20\nthat the NYMEX Division incorporate spread trades into the one-minute time=\n=20\nstatistics. The CFTC has made this recommendation in the past and the=20\nExchange has committed to respond to it. The COMEX Division one-minute trad=\ne=20\ntime statistics already include spreads.\n\nNYMEX Division Members should note that intra-commodity spread trades will =\nbe=20\nincorporated into pit card one-minute trade time accuracy rates after a=20\n=01&phase in=018 process of approximately 90 days. In order to avoid subst=\nandard=20\npercentages once spreads are incorporated, members must make sure that thei=\nr=20\nspread prints are price reported in order to ensure that the Price Change=\n=20\nRegister for spreads is accurate. If a spread print is not reported, you=\n=20\nwill not be credited with timely submission of pit cards, even if you throw=\n=20\nthe trade into the pit in a timely manner.\n\nPit card statistics for outright trades are presently posted on the NYMEX=\n=20\nfloor on a weekly basis. Compliance Staff will continue to post statistics=\n=20\nfor outright trades and will additionally post the statistics for spread=20\ntrades. These spread trade statistics are for information only so that you=\n=20\ncan assess your pit card submission rate in advance of incorporation of=20\nspreads in the one-minute trade time submission percentages. For informati=\non=20\npurposes, the seller of the spread is considered to be the party who sells=\n=20\nthe premium month. In the case where the spread trades flat, the seller is=\n=20\nthe party who sells the nearby month.\n\nIf you have any questions about this information, please call Nancy Minett,=\n=20\nCompliance Counsel, at (212) 299-2940, or Thomas LaSala, Vice President=20\nCompliance, at (212) 299-2897.\n\n\n\n__________________________________________________\nPlease click on the link below to indicate you have received this\nemail.\n\n""http://208.206.41.61/email/email_log.cfm?useremail=3Dsara.shackleton@enron=\n.com&\nrefdoc=3D(01-167)""\n\nNote: If you click on the above line and nothing happens, please copy\nthe text between the quotes, open your internet browser,\npaste it into the web site address and press Return.\n",upcoming audit,0.592623
9,"Notice to Members No. 01-190\nJune 7, 2001\n\nREMINDER =01) THIRD NOTICE\n\nTo:\nAll NYMEX Division Members\nAll COMEX Division Members\n\nFrom:\nNeal Wolkoff, Executive Vice President\n\nRe:\nChanges to Audit Trail Summary Fine Structure-NYMEX & COMEX Divisions\n\n\nModification of =01&Clear-out Period=018\n\nThe Board of Directors has approved modifications to the fine structure for=\n=20\none-minute trade time submission for both NYMEX and COMEX Divisions. =20\nPresently members falling below the 80% timely submission standard are issu=\ned=20\nescalating warning letters and fines unless and until they clear out their=\n=20\nrecord by being in compliance for six consecutive months. After six months=\n,=20\nthe member returns to a =01&warning letter=018 level. Effective immediatel=\ny, this=20\n=01&clear-out period=018 is reduced from six months to four months. One-mi=\nnute=20\ntrade timing statistics for the month ending April 30, 2001 will reflect th=\nis=20\nmodification.\n\n\nImplementation of One-Minute Trade-Time Classes\n\nThe Board also accepted the recommendation of the Compliance Review Committ=\nee=20\nand Floor Committee to adopt a training program for members facing one-minu=\nte=20\ntrade time fines on the NYMEX Division. The purpose of the program is to=\n=20\nimprove Exchange trade submission timing through member education by=20\nidentifying individual issues, tailoring individual solutions and statistic=\nal=20\nfollow-up. Training will be conducted on Thursday, June 14, Tuesday, July=\n=20\n10, and Wednesday, August 15, 2001 by a team comprised of Committee Members=\n=20\nand Compliance Staff with the assistance of the Training and Education=20\nCommittee. You may sign up for these classes at the Corrections Windows on=\n=20\nthe 3rd and 5th floors, or by contacting Lois S. Shapiro, Associate=20\nCompliance Counsel, at (212) 299-2853. The June 14, 2001 class will be hel=\nd=20\nin the Seminar Room on the 10th floor at 3:45 p.m. Plan to stay until 6:00=\n=20\np.m. Credit will not be given to anyone who does not attend the entire cla=\nss.\n\nAs incentive, any member attending any one class will be allowed to =01&do =\nover=018=20\nup to two subsequent months in which their trade time rates are between 70%=\n=20\nand 79%. The member will still have to complete four violation-free months=\n=20\nbefore his record is =01&cleared out=018 but he may take up to six months t=\no do=20\nso. Members may take one class per year.\n\nAudit Trail Classes\n\n? Purpose: to improve pit card submission statistics Exchange-wide by=20\nreducing the number of individual members falling below the 80% standard fo=\nr=20\ntimely pit card submission and to facilitate their return to a =01&warning=\n=20\nletter=018 level.\n\n? Members who have received a warning letter or a fine for pit card=20\nviolations may:\n\n_ Attend an audit trail class, to be offered on June 14, July10, and August=\n=20\n15, 2001, with future dates to be determined by the Floor=20\nCommittee/Compliance Review Committee. The class will be an intensive=20\ndiscussion of pit card problems and pitfalls. Materials will focus on how=\n=20\nthe audit trail works and successful strategies for achieving a passing=20\npercentage. Members will be required to analyze their own trading style in=\n=20\nlight of the material presented and pinpoint areas in which they need to=20\nimprove. Members will also be required to outline a plan for raising their=\n=20\npit card performance rates. Each plan will be discussed with the group and=\n=20\nmembers will receive feedback from Floor Committee and Compliance Review=20\nCommittee Members, and Compliance and Floor Department Staff. Members=20\nattending classes are strongly encouraged to bring their clerks with them!\n\n_ Upon completion of the class, members will be allowed to =01&do over=018 =\nup to=20\ntwo subse-quent months in which their timely submission rates fall between=\n=20\n70% and 79%. No fine or warning letter will be issued for a =01&do over=01=\n8 month,=20\nbut the month will not count as a passing month. A member who falls below=\n=20\n70% will receive a fine or warning letter.\n\n_ The benefit to the member will be that the month that would have counted =\nas=20\na failing month will not count at all, as long as the member achieved at=20\nleast a 70% rate. Members still need four problem-free months to return to=\n a=20\nwarning letter stage. A member who has been problem-free for 3 months woul=\nd=20\nbenefit by =01&doing over=018 a fourth (and possibly a fifth) failing month=\n, thus=20\nhaving the opportunity to return to the warning letter stage despite=20\nintervening failing months. A member who falls below 70% before achieving=\n=20\nfour problem-free months will receive a fine or a warning letter, as=20\nappropriate. The member can still use a second =01&do over=018 month as lo=\nng as he=20\nmeets all the above criteria.\n\n_ Staff will follow up with members attending the classes. Members will=20\nreceive a breakdown of their most recent audit trail fine. They will also =\nbe=20\ninvited to discuss their percentages with staff on an individual basis. =20\nFurther, staff will supply daily or weekly statistics to any member by e-ma=\nil=20\nor telephone upon request.\n\n_ Members are eligible for only one class per year. A member must take a=\n=20\nclass to take advantage of the benefits listed herein. Classes will be=20\noffered on June 14, July 10 and August 15, 2001, and the program will apply=\n=20\nto the next full month of audit trail statistics. The Compliance Review=20\nCommittee and the Floor Committee will evaluate the results of this program=\n=20\nbefore deciding whether to extend this program into 2002.\n\n\n\n\nIncorporation of Spreads into One-Minute\nTrade Time =01&Pit Card=018 Statistics for NYMEX Division\n\nPresent one-minute trade time =01&pit card=018 statistics do not include sp=\nread=20\ntimes. During the most recent NYMEX Rule Enforcement Review--the CFTC audi=\nt=20\nof the Exchange=01,s Compliance programs and procedures--the CFTC recommend=\ned=20\nthat the NYMEX Division incorporate spread trades into the one-minute time=\n=20\nstatistics. The CFTC has made this recommendation in the past and the=20\nExchange has committed to respond to it. The COMEX Division one-minute trad=\ne=20\ntime statistics already include spreads.\n\nNYMEX Division Members should note that intra-commodity spread trades will =\nbe=20\nincorporated into pit card one-minute trade time accuracy rates after a=20\n=01&phase in=018 process of approximately 90 days. In order to avoid subst=\nandard=20\npercentages once spreads are incorporated, members must make sure that thei=\nr=20\nspread prints are price reported in order to ensure that the Price Change=\n=20\nRegister for spreads is accurate. If a spread print is not reported, you=\n=20\nwill not be credited with timely submission of pit cards, even if you throw=\n=20\nthe trade into the pit in a timely manner.\n\nPit card statistics for outright trades are presently posted on the NYMEX=\n=20\nfloor on a weekly basis. Compliance Staff will continue to post statistics=\n=20\nfor outright trades and will additionally post the statistics for spread=20\ntrades. These spread trade statistics are for information only so that you=\n=20\ncan assess your pit card submission rate in advance of incorporation of=20\nspreads in the one-minute trade time submission percentages. For informati=\non=20\npurposes, the seller of the spread is considered to be the party who sells=\n=20\nthe premium month. In the case where the spread trades flat, the seller is=\n=20\nthe party who sells the nearby month.\n\nIf you have any questions about this information, please call Nancy Minett,=\n=20\nCompliance Counsel, at (212) 299-2940, or Thomas LaSala, Vice President=20\nCompliance, at (212) 299-2897.\n\n\n\n__________________________________________________\nPlease click on the link below to indicate you have received this\nemail.\n\n""http://208.206.41.61/email/email_log.cfm?useremail=3Dsara.shackleton@enron=\n.com&\nrefdoc=3D(01-190)""\n\nNote: If you click on the above line and nothing happens, please copy\nthe text between the quotes, open your internet browser,\npaste it into the web site address and press Return.\n",upcoming audit,0.592623


In [None]:
display(search_engine(sample_query_df, encoding_level = find_similarities_whole_text_level))

Unnamed: 0,content,query,Relevance Score
0,see!,upcoming audit,0.41072
1,"FYI...........\n---------------------- Forwarded by Jeffrey C Gossett/HOU/ECT on 05/07/2001 \n07:49 AM ---------------------------\nFrom: Mechelle Atwood/ENRON@enronXgate on 05/04/2001 03:24 PM\nTo: Robert L Hall/ENRON@enronXgate, Jeffrey C Gossett/HOU/ECT@ECT, Bryce \nBaxter/HOU/ECT@ECT\ncc: Wes Colwell/ENRON@enronXgate, ""john.j.boudreaux@andersen.com"" \n<'john.j.boudreaux@andersen.com'>@SMTP@enronXgate, \n""kate.e.agnew@us.andersen.com"" \n<'kate.e.agnew@us.andersen.com'>@SMTP@enronXgate, Nicole \nMendez/ENRON@enronXgate \nSubject: Gas Trading Audit Notification\n\n\nThe Gas Trading Audit is scheduled to begin May 11, 2001 and is expected to \ntake approximately two to three months to complete. The audit will be \nexecuted by Arthur Andersen and led by Kate Agnew.\n\nWe will schedule an opening meeting with you the week of May 7, 2001 to \ndiscuss audit scope and any concerns or suggestions you may have.\n\nThe objective of the audit will be to determine whether procedures and \ncontrols exercised over activities are adequate and operating effectively. \n\nResults of the audit will be discussed with you and other appropriate \npersonnel both during the audit and at the audit closing meeting. \n\nWe will endeavor to conduct this audit in a manner that will minimize \ninterruption of your normal business activities and we look forward to \nworking with you and your personnel. Please call me at x5-4554 if have any \nquestions. \n\n\nMechelle Atwood\nDirector, Enron Assurance Services\nEnron Wholesale Services\n713.345.4554; Location: EB2383\nMechelle.Atwood@enron.com\n\n",upcoming audit,0.405648
2,"FYI...........\n---------------------- Forwarded by Jeffrey C Gossett/HOU/ECT on 05/07/2001 \n07:49 AM ---------------------------\nFrom: Mechelle Atwood/ENRON@enronXgate on 05/04/2001 03:24 PM\nTo: Robert L Hall/ENRON@enronXgate, Jeffrey C Gossett/HOU/ECT@ECT, Bryce \nBaxter/HOU/ECT@ECT\ncc: Wes Colwell/ENRON@enronXgate, ""john.j.boudreaux@andersen.com"" \n<'john.j.boudreaux@andersen.com'>@SMTP@enronXgate, \n""kate.e.agnew@us.andersen.com"" \n<'kate.e.agnew@us.andersen.com'>@SMTP@enronXgate, Nicole \nMendez/ENRON@enronXgate \nSubject: Gas Trading Audit Notification\n\n\nThe Gas Trading Audit is scheduled to begin May 11, 2001 and is expected to \ntake approximately two to three months to complete. The audit will be \nexecuted by Arthur Andersen and led by Kate Agnew.\n\nWe will schedule an opening meeting with you the week of May 7, 2001 to \ndiscuss audit scope and any concerns or suggestions you may have.\n\nThe objective of the audit will be to determine whether procedures and \ncontrols exercised over activities are adequate and operating effectively. \n\nResults of the audit will be discussed with you and other appropriate \npersonnel both during the audit and at the audit closing meeting. \n\nWe will endeavor to conduct this audit in a manner that will minimize \ninterruption of your normal business activities and we look forward to \nworking with you and your personnel. Please call me at x5-4554 if have any \nquestions. \n\n\nMechelle Atwood\nDirector, Enron Assurance Services\nEnron Wholesale Services\n713.345.4554; Location: EB2383\nMechelle.Atwood@enron.com\n\n",upcoming audit,0.405648
3,"FYI...........\n---------------------- Forwarded by Jeffrey C Gossett/HOU/ECT on 05/07/2001 \n07:49 AM ---------------------------\nFrom: Mechelle Atwood/ENRON@enronXgate on 05/04/2001 03:24 PM\nTo: Robert L Hall/ENRON@enronXgate, Jeffrey C Gossett/HOU/ECT@ECT, Bryce \nBaxter/HOU/ECT@ECT\ncc: Wes Colwell/ENRON@enronXgate, ""john.j.boudreaux@andersen.com"" \n<'john.j.boudreaux@andersen.com'>@SMTP@enronXgate, \n""kate.e.agnew@us.andersen.com"" \n<'kate.e.agnew@us.andersen.com'>@SMTP@enronXgate, Nicole \nMendez/ENRON@enronXgate \nSubject: Gas Trading Audit Notification\n\n\nThe Gas Trading Audit is scheduled to begin May 11, 2001 and is expected to \ntake approximately two to three months to complete. The audit will be \nexecuted by Arthur Andersen and led by Kate Agnew.\n\nWe will schedule an opening meeting with you the week of May 7, 2001 to \ndiscuss audit scope and any concerns or suggestions you may have.\n\nThe objective of the audit will be to determine whether procedures and \ncontrols exercised over activities are adequate and operating effectively. \n\nResults of the audit will be discussed with you and other appropriate \npersonnel both during the audit and at the audit closing meeting. \n\nWe will endeavor to conduct this audit in a manner that will minimize \ninterruption of your normal business activities and we look forward to \nworking with you and your personnel. Please call me at x5-4554 if have any \nquestions. \n\n\nMechelle Atwood\nDirector, Enron Assurance Services\nEnron Wholesale Services\n713.345.4554; Location: EB2383\nMechelle.Atwood@enron.com\n\n",upcoming audit,0.405648
4,"I will forward each of these per our discussion this morning and will add you to all notifications going forward. I have also asked AA for their most updated timeline for the year (they were going to make some changes to the one originally presented at the Audit Planning meeting a few weeks ago) - I will pass it along when I get it. Thanks.\n\n -----Original Message-----\nFrom: \tAtwood, Mechelle \nSent:\tFriday, May 04, 2001 3:35 PM\nTo:\tWhite, Stacey\nCc:\tColwell, Wes; 'john.j.boudreaux@us.andersen.com'; 'jennifer.stevenson@us.andersen.com'; Mendez, Nicole\nSubject:\tPower Trading Audit Notification\n\n\nThe Power Trading Audit is scheduled to begin May 23, 2001 and is expected to take approximately six to eight weeks to complete. The audit will be executed by Arthur Andersen and led by Jennifer Stevenson.\n\nWe will schedule an opening meeting with you the week of May 14, 2001 to discuss audit scope and any concerns or suggestions you may have.\n\nThe objective of the audit will be to determine whether procedures and controls exercised over activities are adequate and operating effectively. \n\nResults of the audit will be discussed with you and other appropriate personnel both during the audit and at the audit closing meeting. \n\nWe will endeavor to conduct this audit in a manner that will minimize interruption of your normal business activities and we look forward to working with you and your personnel. Please call me at x5-4554 if have any questions.\n\nMechelle Atwood\nDirector, Enron Assurance Services\nEnron Wholesale Services\n713.345.4554; Location: EB2383\nMechelle.Atwood@enron.com",upcoming audit,0.402882
5,Are we going to inspect tomorrow?,upcoming audit,0.399118
6,Are we going to inspect tomorrow?,upcoming audit,0.399118
7,See attached.\n,upcoming audit,0.397572
8,See attached.\n,upcoming audit,0.397572
9,See attached\n,upcoming audit,0.397572


# BM25 Search
BM25 algorithm has been around since the 1970s. It is fast and provides different type of searh results from the semantic-type search done above.
It's a bag-of-words type algorithm that looks at query terms within a search document, regardless of their proximity in that document.
More info on https://en.wikipedia.org/wiki/Okapi_BM25

Note: relevance scoring is on a different scale with BM25 than USE.


In [None]:
!pip install rank_bm25
from rank_bm25 import BM25Okapi



In [None]:
import sys
sys.path.append ('/content/drive/My Drive/Colab Notebooks/utils/')
from text_preprocessing import *

In [None]:
#do text preprocessing
params = [nltk_process_text, regex, replace_invalid_chars, remove_blanks]
emails_df['processed_text'] = text_prep(emails_df['content'], params = params)

  0%|          | 12/100000 [00:00<13:57, 119.45it/s]

applying: nltk_process_text


100%|██████████| 100000/100000 [06:31<00:00, 255.41it/s]
 11%|█         | 10785/100000 [00:00<00:00, 107848.82it/s]

applying: regex


100%|██████████| 100000/100000 [00:00<00:00, 110617.45it/s]
  3%|▎         | 3382/100000 [00:00<00:02, 33807.46it/s]

applying: replace_invalid_chars


100%|██████████| 100000/100000 [00:02<00:00, 33828.67it/s]
  0%|          | 57/100000 [00:00<03:04, 540.83it/s]

applying: remove_blanks


100%|██████████| 100000/100000 [03:43<00:00, 447.48it/s]


In [None]:
def BM25_search(query):
  query = text_prep(pd.Series(query))
  tokenized_query = query[0].split(" ")
  scores = bm25.get_scores(tokenized_query)
  #look at only the searches with higher relevance scores
  top_n = np.argsort(scores)[::-1][:10]
  result_df = emails_df[['To', 'From','content']].iloc[top_n]
  if len(result_df) > 0:
    relevance_scores = scores[top_n]
    result_df['Relevance Scores'] = relevance_scores
  return result_df

In [None]:
emails_df[['content', 'processed_text']].head(5)

Unnamed: 0,content,processed_text
186822,"It would be nice if you could be at my dinner, since I probably won't know \nanyone else. Anytime you want to go to lunch to check on the house status, \nI'd be glad to go...",it would nice could dinner since i probably know anyone else. anytime want go lunch check house status i 'd glad go ...
308790,"Absolutely. \n\n\nFrom: Sheila Tweed@ECT on 05/15/2001 06:02 PM\nTo: Kay Mann/Corp/Enron@ENRON\ncc: \n\nSubject: Re: Override letter \n\nGood point! Can Peter start to draft an override letter?\n\n\n\n\tKay Mann@ENRON\n\t05/15/2001 05:55 PM\n\t\t \n\t\t To: pthompson@akllp.com\n\t\t cc: Sheila Tweed/HOU/ECT@ECT, Roseann Engeldorf/Enron@EnronXGate, Scott \nDieball/ENRON_DEVELOPMENT@ENRON_DEVELOPMENt, John G \nRigby/ENRON_DEVELOPMENT@ENRON_DEVELOPMENT\n\t\t Subject: Override letter\n\nAs a reminder to all of us, we will need a form override letter to go with \nthe form turbine contract. \n\nKay\n\n\n\n",absolutely. from sheila tweed ect 05 15 2001 06 02 pm to kay mann corp enron enron cc subject re override letter good point ! can peter start draft override letter ? kay mann enron 05 15 2001 05 55 pm to pthompson akllp.com cc sheila tweed hou ect ect roseann engeldorf enron enronxgate scott dieball enron_development enron_development john g rigby enron_development enron_development subject override letter as reminder u need form override letter go form turbine contract. kay
82383,"Christine:\n\nMy apologies. My schedule melted down after we talked on Monday. Here's \nwhere folks came out. There's some concern about size. We're supposed to be \nno larger than 3, but we lobbied Aceves and he apparently Ok'd our \n""oversized"" group. The other folks in the group--who talked to him \noriginally--are pretty sure that five will violate the rules. Folks wondered \nif there were other groups that are smaller than ours that you could hook up \nwith. Sorry about that---it's a wrinkle that I didn't think about when we \nspoke. If it gets real ugly trying to find a smaller group, let me know. \nFortunately there's not another team case due for two weeks.\n\nBest,\nJeff",christine my apology. my schedule melted talked monday. here 's folk came out. there 's concern size. we 're supposed larger 3 lobbied aceves apparently ok 'd `` oversized '' group. the folk group who talked originally are pretty sure five violate rule. folks wondered group smaller could hook with. sorry that it 's wrinkle i think spoke. if get real ugly trying find smaller group let know. fortunately there 's another team case due two week. best jeff
227299,"Vince, \nUK VAR breached the limit last week.\nUK traders asked us to review the correlations across UK gas and power as \nwell as the correlations across EFA slots.\nWe did part of the work last week.\nNow we'll update the correlations based on historical prices.\n\nTanya.\n\n\n\n\nRichard Lewis\n10/08/2000 07:31 AM\nTo: Tanya Tamarchenko/HOU/ECT@ECT\ncc: Oliver Gaylard/LON/ECT@ECT, James New/LON/ECT@ECT, Steven \nLeppard/LON/ECT@ECT, Rudy Dautel/HOU/ECT@ECT, Kirstee Hewitt/LON/ECT@ECT, \nNaveen Andrews/Corp/Enron@ENRON, David Port/Market Risk/Corp/Enron@ENRON, Ted \nMurphy/HOU/ECT@ECT, Simon Hastings/LON/ECT@ECT, Paul D'Arcy/LON/ECT@ECT, Amir \nGhodsian/LON/ECT@ECT \nSubject: Re: VaR correlation scenarios \n\nThanks Tanya, these are interesting results. I am on vacation next week, so \nhere are my current thoughts. I am contactable on my mobile if necessary.\n\nGas to power correlations\n I see your point about gas to power correlation only affecting VAR for the \ncombined gas and power portfolio, and this raises an interesting point: At a \nconservative 30% long term correlation, combined VAR is o1mm less than \npreviously expected - so how does this affect the limit breach? Strictly \nspeaking, we are still over our UK power limit, but the limit was set when we \nwere assuming no gas power correlation and therefore a higher portfolio VAR. \n\nA suggested way forward given the importance of the spread options to the UK \nGas and Power books- \ncan we allocate to the gas and power books a share of the reduction in \nportfolio VAR - ie [Reduction = Portfolio VAR - sum(Power VAR + Gas VAR)]?\n\nAlso, if I understand your mail correctly, Matrix 1 implies 55% gas power \ncorrelation is consistent with our correlation curves, and this reduces total \nVAR by o1.8mm.\n\nEFA slot correlations\nThe issue of whether our existing EFA to EFA correlation matrix is correct is \na separate issue. I don't understand where the Matrix 2 EFA to EFA \ncorrelations come from, but I am happy for you to run some historical \ncorrelations from the forward curves (use the first 2 years, I would \nsuggest). Our original matrix was based on historicals, but the analysis is \nworth doing again. Your matrix 2 results certainly indicate how important \nthese correlations are.\n\nClosing thoughts\nFriday's trading left us longer so I would not expect a limit breach on \nMonday. We are still reviewing the shape of the long term curve, and I'd \nlike to wait until both Simon Hastings and I are back in the office (Monday \nweek) before finalising this.\n \nRegards\n\nRichard\n \n\n\n\nTanya Tamarchenko\n06/10/2000 22:59\nTo: Oliver Gaylard/LON/ECT@ECT, Richard Lewis/LON/ECT@ECT, James \nNew/LON/ECT@ECT, Steven Leppard/LON/ECT@ECT, Rudy Dautel/HOU/ECT@ECT, Kirstee \nHewitt/LON/ECT@ECT, Naveen Andrews/Corp/Enron@ENRON, David Port/Market \nRisk/Corp/Enron@ENRON, Ted Murphy/HOU/ECT@ECT\ncc: \n\nSubject: Re: VaR correlation scenarios \n\nEverybody,\nOliver sent us the VAR number for different correlations for UK-Power \nportfolio separately from UK-Gas portfolio.\n\nFirst, if VAR is calculated accurately the correlation between Power and Gas \ncurves should not affect VAR number for Power and VAR number for Gas, only \nthe aggregate number will be affected. The changes you see are due to the \nfact that we use Monte-Carlo simulation method,\nwhich accuracy depends on the number of simulations. Even if we don't change \nthe correlations but use different realizations of random numbers,\nwe get slightly different result from the model.\n\nSo: to see the effect of using different correlations between Gas and Power \nwe should look at the aggregate number.\n\nI calculated weighted correlations based on 2 curves I got from Paul. As the \nweights along the term structure I used the product of price, position and \nvolatility for each time bucket for Gas and each of EFA slots. The results \nare shown below:\n\n\nInserting these numbers into the original correlation matrix produced \nnegatively definite correlation matrix, which brakes VAR engine. \nCorrelation matrix for any set of random variables is non-negative by \ndefinition, and remains non-negatively definite if calculated properly based \non any historical data.\nHere, according to our phone discussion, we started experimenting with \ncorrelations, assuming the same correlation for each EFA slot and ET Elec \nversus Gas. I am sending you the spreadsheet which summaries the results. In \naddition to the aggregate VAR numbers for the runs Oliver did, you can see \nthe VAR numbers based on correlation Matrix 1 and Matrix 2. In Matrix 1 the \ncorrelations across EFA slots are identical to these in original matrix.\nI obtained this matrix by trial and error. Matrix 2 is produces by Naveen \nusing Finger's algorithm, it differs from original matrix across EFA slots as \nwell\nas in Power versus Gas correlations and gives higher VAR than matrix 1 does. \n\nConcluding: we will look at the historical forward prices and try to \ncalculate historical correlations from them.\n\nTanya.\n\n\n\n\nOliver Gaylard\n10/06/2000 01:50 PM\nTo: Richard Lewis/LON/ECT@ECT, James New/LON/ECT@ECT, Steven \nLeppard/LON/ECT@ECT, Rudy Dautel/HOU/ECT@ECT, Kirstee Hewitt/LON/ECT@ECT, \nNaveen Andrews/Corp/Enron@ENRON, Tanya Tamarchenko/HOU/ECT@ECT, David \nPort/Market Risk/Corp/Enron@ENRON\ncc: \nSubject: VaR correlation scenarios\n\nThe results were as follows when changing the gas/power correlations:\n\nCorrelation VaR-UK Power book VaR- UK Gas book\n 0.0 o10.405MM o3.180MM\n 0.1 o10.134MM o3.197MM\n 0.2 o10.270MM o3.185MM\n 0.3 o10.030MM o3.245MM\n 0.4 Cholesky decomposition failed (Not positive definite)\n 0.5 Cholesky decomposition failed (Not positive definite)\n 0.6 Cholesky decomposition failed (Not positive definite)\n 0.7 Cholesky decomposition failed (Not positive definite)\n 0.8 Cholesky decomposition failed (Not positive definite)\n 0.9 Cholesky decomposition failed (Not positive definite)\n 1.0 Cholesky decomposition failed (Not positive definite)\n \nPeaks and off peaks were treated the same to avoid violating the matrix's \nintegrity. \n\nInteresting to note that for a higher correlation of 0.2 the power VaR \nincreases which is counter to intuition. This implies that we need to look \ninto how the correlations are being applied within the model. Once we can \nderive single correlations from the term structure, is the next action to \nunderstand how they are being applied and whether the model captures the P+L \nvolatility in the spread option deals.\n\nFrom 0.4 onwards the VaR calculation failed.\n\nOliver\n\n \n\n\n\n\n\n\n\n",vince uk var breached limit last week. uk trader asked u review correlation across uk gas power well correlation across efa slot. we part work last week. now we 'll update correlation based historical price. tanya. richard lewis 10 08 2000 07 31 am to tanya tamarchenko hou ect ect cc oliver gaylard lon ect ect james newithlon ect ect steven leppard lon ect ect rudy dautel hou ect ect kirstee hewitt lon ect ect naveen andrews corp enron enron david port market risk corp enron enron ted murphy hou ect ect simon hastings lon ect ect paul d'arcy lon ect ect amir ghodsian lon ect ect subject re var correlation scenario thanks tanya interesting result. i vacation next week current thought. i contactable mobile necessary. gas power correlation i see point gas power correlation affecting var combined gas power portfolio raise interesting point at conservative 30 percent long term correlation combined var o1mm le previously expected affect limit breach ? strictly speaking still uk power limit limit set assuming gas power correlation therefore higher portfolio var. a suggested way forward given importance spread option uk gas power books allocate gas power book share reduction portfolio var ie reduction = portfolio var sum power var + gas var ? also i understand mail correctly matrix 1 implies 55 percent gas power correlation consistent correlation curve reduces total var o1.8mm. efa slot correlation the issue whether existing efa efa correlation matrix correct separate issue. i understand matrix 2 efa efa correlation come from i happy run historical correlation forward curve use first 2 year i would suggest. our original matrix based historicals analysis worth again. your matrix 2 result certainly indicate important correlation are. closing thought friday 's trading left u longer i would expect limit breach monday. we still reviewing shape long term curve i 'd like wait simon hastings i back office monday week finalising this. regards richard tanya tamarchenko 06 10 2000 22 59 to oliver gaylard lon ect ect richard lewis lon ect ect james newithlon ect ect steven leppard lon ect ect rudy dautel hou ect ect kirstee hewitt lon ect ect naveen andrews corp enron enron david port market risk corp enron enron ted murphy hou ect ect cc subject re var correlation scenario everybody oliver sent u var number different correlation uk power portfolio separately uk gas portfolio. first var calculated accurately correlation power gas curve affect var number power var number gas aggregate number affected. the change see due fact use monte carlo simulation method accuracy depends number simulation. even change correlation use different realization random number get slightly different result model. so see effect using different correlation gas power look aggregate number. i calculated weighted correlation based 2 curve i got paul. as weight along term structure i used product price position volatility time bucket gas efa slot. the result shown below inserting number original correlation matrix produced negatively definite correlation matrix brake var engine. correlation matrix set random variable non negative definition remains non negatively definite calculated properly based historical data. here according phone discussion started experimenting correlation assuming correlation efa slot et elec versus gas. i sending spreadsheet summary result. in addition aggregate var number run oliver did see var number based correlation matrix 1 matrix 2. in matrix 1 correlation across efa slot identical original matrix. i obtained matrix trial error. matrix 2 produce naveen using finger 's algorithm differs original matrix across efa slot well power versus gas correlation give higher var matrix 1 doe. concluding look historical forward price try calculate historical correlation them. tanya. oliver gaylard 10 06 2000 01 50 pm to richard lewis lon ect ect james newithlon ect ect steven leppard lon ect ect rudy dautel hou ect ect kirstee hewitt lon ect ect naveen andrews corp enron enron tanya tamarchenko hou ect ect david port market risk corp enron enron cc subject var correlation scenario the result follows changing gas power correlation correlation var uk power book var uk gas book 0.0 o10.405mm o3.180mm 0.1 o10.134mm o3.197mm 0.2 o10.270mm o3.185mm 0.3 o10.030mm o3.245mm 0.4 cholesky decomposition failed not positive definite 0.5 cholesky decomposition failed not positive definite 0.6 cholesky decomposition failed not positive definite 0.7 cholesky decomposition failed not positive definite 0.8 cholesky decomposition failed not positive definite 0.9 cholesky decomposition failed not positive definite 1.0 cholesky decomposition failed not positive definite peaks peak treated avoid violating matrix 's integrity. interesting note higher correlation 0.2 power var increase counter intuition. this implies need look correlation applied within model. once derive single correlation term structure next action understand applied whether model capture p+l volatility spread option deal. from 0.4 onwards var calculation failed. oliver
301824,"Any problems/comments?\n---------------------- Forwarded by Kay Mann/Corp/Enron on 10/13/2000 08:43 \nAM ---------------------------\n\n\nDale Rasmussen@ECT\n10/12/2000 07:17 PM\nTo: Don Hammond/PDX/ECT@ECT, Jody Blackburn/PDX/ECT@ECT, Kay \nMann/Corp/Enron@Enron, Kathleen Clark/ENRON_DEVELOPMENT@ENRON_DEVELOPMENT\ncc: Ed Clark/PDX/ECT@ECT, Alan Larsen/PDX/ECT@ECT \n\nSubject: Change Order #5--Pleasanton Transformer\n\nA redraft of Change Order #5 to the LM 6000 contract to provide for the \npurchase of a GE transformer for the Pleasanton project is attached. \n\nPlease note that I have added some guaranty/LD provisions in Exhibit A, and \nconfirm whether these are appropriate in scope and amount.\n\nWe will attach the documents included here as separate files as Exhibits B \nand C. I understand that GE is to provide a more current copy of their \nstandard specifications for the transformer to be included in the Exhibits.\n\n\nKathleen: Kay tells me that you are the keeper of change orders. \nCongratulations!! The discussion of current amount balances, etc. in the \nattached are clearly wrong if we are up to CO#5--it was all based on this \nbeing the third CO. Can you please let me know what the right numbers should \nbe?\n\nThanks in advance for your input. I understand there is some urgency with \ngetting this out.\n\n\n",any problems comments ? forwarded kay mann corp enron 10 13 2000 08 43 am dale rasmussen ect 10 12 2000 07 17 pm to don hammond pdx ect ect jody blackburn pdx ect ect kay mann corp enron enron kathleen clark enron_development enron_development cc ed clark pdx ect ect alan larsen pdx ect ect subject change order number 5 pleasanton transformer a redraft change order number 5 lm 6000 contract provide purchase ge transformer pleasanton project attached. please note i added guaranty ld provision exhibit a confirm whether appropriate scope amount. we attach document included separate file exhibits b c. i understand ge provide current copy standard specification transformer included exhibits. kathleen kay tell keeper change order. congratulations ! ! the discussion current amount balance etc. attached clearly wrong co number 5 it based third co. can please let know right number be ? thanks advance input. i understand urgency getting out.


In [None]:
corpus = emails_df['processed_text'].copy()

In [None]:
tokenized_corpus = [doc.split(" ") for doc in corpus]
bm25 = BM25Okapi(tokenized_corpus)

In [None]:
#query = input("Enter your search:")
query = "accounting practices"
BM25_search(query)

100%|██████████| 1/1 [00:00<00:00, 425.99it/s]
100%|██████████| 1/1 [00:00<00:00, 319.15it/s]
100%|██████████| 1/1 [00:00<00:00, 332.96it/s]

applying: nltk_process_text
applying: regex
applying: remove_blanks





Unnamed: 0,To,From,content,Relevance Scores
270526,,(judy.e.collazo@ssmb.com),"Power & Natural Gas\nMark-to-Market Brouhaha; Buy Energy Marketers\n\n* Recent press has questioned accounting practices of energy\nmarketers. Concerns center on use of ""mark-to-market"" accounting\n* Stocks down 8.7% since Monday\n* View concerns as unfounded\n* Reiterate 1H (Buy, High Risk) ratings on Enron (ENE), Dynegy (DYN)\nand 2M (Outperform, Medium Risk) on Duke Energy (DUK)\n\n <<trc71789.pdf>>\n\nRaymond C. Niles\nPower/Natural Gas Research\nSalomon Smith Barney\n(212) 816-2807\nray.niles@ssmb.com\n\ns\n\n\n\n\n - trc71789.pdf",13.264762
27408,(bob.hall@enron.com),(sally.beck@enron.com),"Here is another one...\n---------------------- Forwarded by Sally Beck/HOU/ECT on 04/11/2001 11:13 AM \n---------------------------\n\n\nMspartalis@aol.com on 04/11/2001 10:51:33 AM\nTo: sally.beck@enron.com\ncc: \nSubject: Accounting Director positions: Robert (Bob) Mickits; Big 5 Energy \nservices\n\n\nSally, \n\nThis is the second of the two emails I am sending you. ? \n\nPlease review the resume of Robert (Bob) Mickits. ?He is currently a \nconsultant with a Big 5 firm, loaded with GAS Energy experience. ?His \nqualifications are: \n* ????CPA; BBA - Accounting \n* ????E&Y \n* ????16 years progressive Accounting/Finance experience and responsibilities \n* ????Developed a business process risk model identifying process \ncharacteristics, business risks and associated best practices for the energy \ntrading and marketing industry. \n* ????Experienced and responsible for front, middle and back office \naccounting, risk management for energy trading and marketing firms. \n* ????Directed all accounting policies & procedures, commodity and \nderivative \naccounting, reviewed and provided analysis on 133 accounting issues. \n* ????Provided clients with infrastructure review and audits for global \ntrading operations. \n* ????Excellent communication skills. \n\nThanks, \nMike \nE. Michael Spartalis, CPA, CPC \nML&R Personnel Solutions \n281-782-3411 (Cell) ?? \nmspartalis@aol.com (temp)\n - Mickits_Robert_0401_rev.doc\n",12.84375
270805,(jpickard@softrax.com),(rosalee.fleming@enron.com),"Julie Pickard <JPickard@softrax.com> on 10/24/2000 05:16:08 PM\nTo: \ncc: SOFTRAX Mkting <Marketing@softrax.com>, Scott Schults \n<SSchults@Softrax.com>, Wes Reuning <WReuning@Softrax.com>, Greta Krupetsky \n<GKrupetsky@softrax.com>, ""Robert O'Connor"" <ROConnor@Softrax.com>, ""Sue \nO'Leary"" <SOLeary@Softrax.com> \nSubject: Revenue Recognition Seminar and Cocktail Reception Invitation fro m \nSOFTRAX\n\n\n If you wish to be removed from this mailing list - please reply and\ntype the word ""remove"" in the subject line.\n\n SOFTRAX Corporation is pleased to invite you to a Financial\nRoundtable discussion on Revenue Recognition. This seminar is free of\ncharge.\n\n Seats are limited, so if you are interested, please contact Julie\nPickard (jpickard@softrax.com) to indicate that you will be attending no\nlater than Monday November 6, 2000. We look forward to seeing you! Please\ninclude your full name, company name, title, email address and phone number.\n\n\n Issues in Accounting for Software and Internet Activities\n Sponsored by: SOFTRAX Corporation\n\n\n Presenter: Ashwinpaul C. Sondhi, PhD\n When: Wednesday, November 15, 2000; 3:00\nPM-5:30 PM\n Cocktail Reception\nto follow 5:30 PM - 7:00 PM\n Where: Omni Austin Hotel, 700 San\nJacinto at 8th Street, Austin, TX 78701. 512.476.3700\n Seminar is held in\nthe Congress Room\n Reception is held in\nthe Chambers Room\n\n Who Should Attend: Controllers, CFO's, CPA's, VC's, and those\ncharged with any aspect of financial reporting for companies\n\n RSVP: Jpickard@softrax.com (by Monday\nNovember 6, 2000)\n\n\n In this discussion you will learn the latest about:\n\n* EITF 00-3: Application of AICPA SOP 97-2 to Arrangements that\nInclude the Right to Use Software stored on Another Entity's Hardware.\n* EITF 99-17: Accounting for Advertising Barter Transactions\n* EITF 00-2: Accounting for Web Site Development Costs\n* Revenue recognition\n* Recognition of ad revenue with ""hits""\n* ""impression"" guarantees\n* Transparency of deferral\n* Reporting for barter transactions\n* Classification of expenses between operating\n* categories and segment reporting\n* AICPA Technical Practice Aid on rebates and heavily\n* discounted introductory offers, plus models to determine how much\nthis\n* practice really costs\n* Deferred revenue and sales growth\n* Handling cost of service outages\n* An update on pooling\n\n\n\n About the Speaker:\n Ashwinpaul (Tony) C. Sondhi, PhD received his PhD in Accounting and\nEconomics/Management Science in 1985 from New York University. His research\nhas been published in several accounting and finance journals.\n\n Mr. Sondhi is co-author of:\n * The Analysis of Financial Statements, 1998\n * Impairments and Write-offs of Long-Lived Assets\n * CFA Readings in Financial Statement Analysis\n\n He has also edited:\n * Credit Analysis of Nontraditional Debt Securities\n * Off-Balance Sheet Financing Techniques\n\n Sondhi is a member of the Financial Accounting Policy Committee of\nthe AIMR and has served on the committee of the AICPA, the FASB, and the\nInternational Accounting Standards Committee. He was an advisor to the FASB\non its project comparing U.S. and International Financial Reporting\nStandards. He taught at New York University, Columbia University and at\nGeorgetown. He is currently a Visiting Professor at Stockholm University,\nSweden and Copenhagen Business School, Denmark.\n Sondhi serves on the board of directors of two mutual funds and is\nan advisor to several US and Foreign companies. His consulting activities\ninclude valuation, comparative analysis of financing and capital structure\nalternatives, creation and operation of finance, securitization,\nintellectual property, and investment subsidiaries, analysis of covenants\nand development of debt agreements.\n\n\n\n\n\n\n\n\n",11.825248
273672,(jpickard@softrax.com),(rosalee.fleming@enron.com),"Julie Pickard <JPickard@softrax.com> on 10/24/2000 05:16:08 PM\nTo: \ncc: SOFTRAX Mkting <Marketing@softrax.com>, Scott Schults \n<SSchults@Softrax.com>, Wes Reuning <WReuning@Softrax.com>, Greta Krupetsky \n<GKrupetsky@softrax.com>, ""Robert O'Connor"" <ROConnor@Softrax.com>, ""Sue \nO'Leary"" <SOLeary@Softrax.com> \nSubject: Revenue Recognition Seminar and Cocktail Reception Invitation fro m \nSOFTRAX\n\n\n If you wish to be removed from this mailing list - please reply and\ntype the word ""remove"" in the subject line.\n\n SOFTRAX Corporation is pleased to invite you to a Financial\nRoundtable discussion on Revenue Recognition. This seminar is free of\ncharge.\n\n Seats are limited, so if you are interested, please contact Julie\nPickard (jpickard@softrax.com) to indicate that you will be attending no\nlater than Monday November 6, 2000. We look forward to seeing you! Please\ninclude your full name, company name, title, email address and phone number.\n\n\n Issues in Accounting for Software and Internet Activities\n Sponsored by: SOFTRAX Corporation\n\n\n Presenter: Ashwinpaul C. Sondhi, PhD\n When: Wednesday, November 15, 2000; 3:00\nPM-5:30 PM\n Cocktail Reception\nto follow 5:30 PM - 7:00 PM\n Where: Omni Austin Hotel, 700 San\nJacinto at 8th Street, Austin, TX 78701. 512.476.3700\n Seminar is held in\nthe Congress Room\n Reception is held in\nthe Chambers Room\n\n Who Should Attend: Controllers, CFO's, CPA's, VC's, and those\ncharged with any aspect of financial reporting for companies\n\n RSVP: Jpickard@softrax.com (by Monday\nNovember 6, 2000)\n\n\n In this discussion you will learn the latest about:\n\n* EITF 00-3: Application of AICPA SOP 97-2 to Arrangements that\nInclude the Right to Use Software stored on Another Entity's Hardware.\n* EITF 99-17: Accounting for Advertising Barter Transactions\n* EITF 00-2: Accounting for Web Site Development Costs\n* Revenue recognition\n* Recognition of ad revenue with ""hits""\n* ""impression"" guarantees\n* Transparency of deferral\n* Reporting for barter transactions\n* Classification of expenses between operating\n* categories and segment reporting\n* AICPA Technical Practice Aid on rebates and heavily\n* discounted introductory offers, plus models to determine how much\nthis\n* practice really costs\n* Deferred revenue and sales growth\n* Handling cost of service outages\n* An update on pooling\n\n\n\n About the Speaker:\n Ashwinpaul (Tony) C. Sondhi, PhD received his PhD in Accounting and\nEconomics/Management Science in 1985 from New York University. His research\nhas been published in several accounting and finance journals.\n\n Mr. Sondhi is co-author of:\n * The Analysis of Financial Statements, 1998\n * Impairments and Write-offs of Long-Lived Assets\n * CFA Readings in Financial Statement Analysis\n\n He has also edited:\n * Credit Analysis of Nontraditional Debt Securities\n * Off-Balance Sheet Financing Techniques\n\n Sondhi is a member of the Financial Accounting Policy Committee of\nthe AIMR and has served on the committee of the AICPA, the FASB, and the\nInternational Accounting Standards Committee. He was an advisor to the FASB\non its project comparing U.S. and International Financial Reporting\nStandards. He taught at New York University, Columbia University and at\nGeorgetown. He is currently a Visiting Professor at Stockholm University,\nSweden and Copenhagen Business School, Denmark.\n Sondhi serves on the board of directors of two mutual funds and is\nan advisor to several US and Foreign companies. His consulting activities\ninclude valuation, comparative analysis of financing and capital structure\nalternatives, creation and operation of finance, securitization,\nintellectual property, and investment subsidiaries, analysis of covenants\nand development of debt agreements.\n\n\n\n\n\n\n\n\n",11.825248
357309,(gregory.schockling@enron.com),(joe.parks@enron.com),"T\nhe first detailed allegations that Enron\nabused mark-to-market accounting to\nhide wholesale trading losses have\nemerged in congressional testimony pro-vided\nby a professor and former deriva-tives\ntrader who reviewed the Enron case\nfor Senate Governmental Affairs Com-mittee.\nThose charges, along with allega-tions\nthat have surfaced that mark-to-market\naccounting was misused by Enron\nEnergy Services, the company's retail\nenergy and services unit, are expected to\nsupport the push by accounting rulemak-ers\nfor much more disclosure on mark-to-market\naccounting practices and the\nimpact of mark-to-market accounting on\nearnings trading firms' earnings.\nIn testimony Thursday before the\nSenate Governmental Affairs Committee\nlast week, Frank Partnoy of the Universi-ty\nof San Diego Law School, said it\nappears that Enron traders mismarked\nforward curves and manipulated the\namount of profits and losses that would\nbe reported in financial statements.\n""In a nutshell, it appears that some\nEnron employees used dummy accounts\nand rigged valuation methodologies to cre-ate\nfalse profit-and-loss entries for the deriv-atives\nEnron traded,"" said Partnoy, who\nreviewed the case over several weeks at the\ncommittee's request. ""It appears that Enron\ntraders selectively mismarked their forward\ncurves, typically in order to hide losses.""\n""Traders are compensated based on\ntheir profits,"" he observed, ""so if a trader\ncan hide losses by mismarking forward\ncurves, he or she is likely to receive a\nlarger bonus.""\nPartnoy said mismarking forward\ncurves was not the only problem.\n""For each trade,"" he testified, ""a\ntrader would report either a profit or loss,\ntypically in spreadsheet format. These\nprofit-and-loss reports were designed to\nreflect economic reality. Frequently they\ndid not. Instead of recording the entire\nprofit for a trade in one column, some\ntraders reportedly split the profit from a\ntrade into two columns. The first column\nreflected the portion of the actual profits\nthe trader intended to add to Enron's cur-rent\nfinancial statements. The second col-umn,\nironically labeled the 'prudency'\nreserve, included the remainder.""\nPartnoy told the committee that in\nhis estimation, ""Enron's 'prudency'\nreserves did not depict economic reality,\nnor could they have been intended to do\nso. Instead, 'prudency' was a slush fund\nthat could be used to smooth out profits\nand losses over time. The portion of prof-its\nrecorded as 'prudency' could be used\nto offset any future losses,"" he said.\nHe recommended that investigators\nquestion Enron employees who were\ninvolved in these transactions ""to get a sense\nof whether my summaries are complete.""\nAlleged Enron abuses of mark-to-market accounting emerge",11.115811
390869,"(bill.williams@enron.com, cara.semperger@enron.com, holden.salisbury@enron.com)",(diana.scholtes@enron.com),"FYI\n\n-----Original Message-----\nFrom: Grow, Lisa [mailto:LGrow@idahopower.com]\nSent: Monday, November 19, 2001 4:21 PM\nTo: Interchange Scheduling & Accounting Subcommittee (ISAS)\nSubject: IPC's OASIS and E-tag Servers on PPT\n\n\n\n\nTo our customers and counterparts:\n\nAs of Sunday, November 18 our OASIS and E-tag servers are running on PPT.\nThe only business practices we are changing now, is that our OASIS\ntransmission products are all based upon PPT. Our scheduling system/e-tag\nserver is also running on PPT. However, we can and will continue to accept\ne-tags in any timezone, as long as the profile has a valid OASIS\nreservation. We will also continue to check out with our counterparties in\nthe time zone that we have historically used with each neighboring control\narea. \n\nThe cost? $4.79 for a tall, skinny latte' and a plain bagel with light\ncream cheese for our programmer.\n\nIf you have any questions, please give me a call at 208-388-2243.\n\nThanks,\nLisa\n\n",10.611128
256224,"(bill.rust@enron.com, david.forster@enron.com, j..sturm@enron.com, lloyd.will@enron.com, terri.clynes@enron.com, oscar.dalton@enron.com, doug.sewell@enron.com, dustin.collins@enron.com, larry.valderrama@enron.com, chris.dorland@enron.com, greg.trefz@enron.com, jeff.king@enron.com, m..presto@enron.com, matt.lorenz@enron.com, guy.sharfman@enron.com, john.kinser@enron.com, juan.padron@enron.com, william.abler@enron.com, don.baughman@enron.com, maria.valdes@enron.com, beau.ratliff@enron.com, jeff.merola@enron.com, d..baughman@enron.com, juan.hernandez@enron.com)",(andy.rodriquez@enron.com),"Good news. It sounds like MISO will offer a way to handle losses financially. Based on their previously filed tariff, it appeared that they would not, but it seems they have reconsidered and recently filed a new Attachment M that details how they will handle losses financially. I have not yet seen the new Attachment M, so it may be a complex or costly process, but hopefully, it will be fine.\n\nNow, for the Important Question. MISO does not believe they will be able to process third-party loss schedules (either INT or EXT, in Tag parlance) on Dec 15. I don't think this is an issue for us (I expect we will typically be either accounting for losses financially or through in-kind schedules (dropping off losses along the path).\n\nIf this is not correct, and we do sometimes schedule in losses under separate transactions, please let me know ASAP. Also let me know how prevalent that practice is. MISO is trying to figure out how urgent the need is to accommodate that on Dec 15. While they don't believe they will be able to meet that deadline, they want to know how many resources they should allocate to addressing the problem.\n\nThanks,\n\nAndy Rodriquez\nRegulatory Affairs - Enron Corp.\nandy.rodriquez@enron.com\n713-345-3771",10.23932
162885,"(mary.botello@enron.com, tracy.geaccone@enron.com, jerry.peters@enron.com, dana.jones@enron.com, rod.hayslett@enron.com, ben.humann@enron.com, morris.brassfield@enron.com, steve.gilbert@enron.com, john.keiser@enron.com)",(james.saunders@enron.com),"As mentioned at the ETS FAA '02 Budget mtg. today, below are new acctg. rules we need to be aware of:\n\nSFAS142 - Goodwill and Other Intangibles\nEffective in '02\nMajor point: Goodwill and intangible amortization discontinued\n\nSFAS143 - Asset Retirement Obligations (ARO)\nEffective in '03\nMajor point: Need to recognize obligations (set up a liability) related to planned asset retirements\n<we may have an ""opportunity"" here>\n\nExposure Draft - PP&E Accounting\nDeadline for comments: 10/15/01\nProposed Effective date: '03\nMajor points:\n>PPE must be accounted for according to a standardized model, which includes specifically identified types of costs and stages of construction\n>Overhaul costs> expensed as incurred\n>GA and Overheads> expensed as incurred\n>Composite depr> disallowed\n>Cost of removal> expensed as incurred\nNote: ""general"" regulatory acctg practices that are deemed contrary to GAAP will most likely NOT be allowed, but specific Ordered acctg. MAY (via FAS71 on regulated acctg)\nNote: John Cobb is ""polling"" the industry on this topic",10.038327
8747,"(murray.o'neil@enron.com, jeff.richter@enron.com, valarie.sabo@enron.com, robert.badeer@enron.com, tim.belden@enron.com, carla.hoffman@enron.com, chris.stokley@enron.com)",(christian.yoder@enron.com),"Legal has been assessing the risks of doing block forward trades as financial \nand for now, subject to future changes that may be required as discussions \nwith the CAPX legal experts continue, we can state the basic rules as \nfollows:\n\nIt is okay to do up to 50% of our Block Forward business as financial.\nIt is very important to monitor this 50% level very closely and we should not \nexceed it. We should not rely on the PX to tell us what the level is. We \nshould confirm it ourselves. A skeptical regulator, looking at PX records \nshould never be able to see that we ever did more than half our block forward \nbusiness as financial. \nOne of the legal rules that we must comply with in this area is that there \nmust be a bona fide commercial reason for going financial. This \ndumbfoundingly simple sounding rule is important. Somehow, when we \ncommunicate our decision to the PX to go from physical to financial, we \nshould give our reason. I'm not sure whether our decision is expressed by \nphone, or electronically, but in either case, the person making the change \nwith the PX should get in an expression something like this: ""we would like \nto change these trades to financial because we think the elimination of \nphysical risk will benefit us commercially."" Please be patient with this \nself serving requirement and do it. \n\n I have not worked out with any of the other back office groups how this new \npractice will be handled. Obviously any changes it will require in \nscheduling, settlements and accounting need to be dealt with too. Please call \nme with any questions. ----cgy \n",9.591184
69463,(jeff.dasovich@enron.com),(catherine.mckalip-thompson@enron.com),"FYI -=20\n\n -----Original Message-----\nFrom: =09Iannarone, Lauren =20\nSent:=09Friday, November 02, 2001 9:46 AM\nTo:=09McKalip-Thompson, Catherine\nSubject:=09anderson\n\n06/20/2001 The Globe and Mail Metro B13 ""All material Copyright (c) Bell Gl=\nobemedia Publishing Inc. and its licensors. All rights reserved."" WASHINGTO=\nN -- The U.S. Securities and Exchange Commission, in one of the first fraud=\n cases ever filed against a Big Five accounting firm, fined Arthur Andersen=\n LLP and three partners more than $7-million (U.S.) in connection with audi=\nts of Waste Management Inc.'s annual financial results.=20\nIn a complaint filed in U.S. District Court here, the SEC alleged that Arth=\nur Andersen and its partners allowed Waste Management to continue for sever=\nal years a series of improper accounting practices that inflated the garbag=\ne-hauling concern's earnings. The complaint alleges fraud on the part of th=\nree audit partners assigned to the Waste Management account: Robert Allgyer=\n, 56 years old, of Lake Forest, Ill., who has retired; Walter Cercavschi, 4=\n5, of Harwood Heights, Ill.; and Edward Maier, 54, of Chicago.=20\nArthur Anderson agreed to pay the fine to settle the case, but the firm and=\n the auditors don't admit or deny the allegations. Among the audit partners=\n, Mr. Allgyer agreed to pay $50,000, Mr. Maier, $40,000, and Mr. Cercavschi=\n, $30,000. Arthur Andersen agreed to pay $7-million. As part of a campaign=\n to curb what it sees as growing accounting fraud, the SEC has broadened a =\nnumber of investigations of companies' earnings reports to include audit wo=\nrk done by outside accountants. In bringing the first fraud case against a=\nny audit firm since 1985, SEC enforcement chief Richard Walker said the act=\nion ""helps to underscore the importance of auditors as gatekeepers to our c=\napital markets and shows the SEC won't shy away from making auditors comply=\n with their responsibilities."" Under a related administrative proceeding f=\niled yesterday alleging professional misconduct, the SEC also censured Arth=\nur Andersen, the three audit partners and a fourth partner, Robert Kutsenda=\n, who was the regional audit director at the time of the alleged violations=\n. As part of the censure, the four audit partners are barred from doing acc=\nounting work for public companies for a period of one to five years. ""This=\n settlement allows the firm and its partners to close a very difficult chap=\nter and move on,"" said Terry Hatchett, Arthur Andersen's managing partner-N=\north America. ""The SEC has not questioned the underlying quality or effecti=\nveness of our overall audit methodology, nor has the SEC limited our abilit=\ny to conduct audits for other public companies."" An attorney for Mr. Allgy=\ner declined to comment. Attorneys for Mr. Maier, Mr. Cercavschi and Mr. Kut=\nsenda didn't return phone calls. Waste Management said it ""has co-operated=\n fully with the SEC in the investigation, and does not believe that the SEC=\n will seek any action against Waste Management in connection with the event=\ns detailed in the Arthur Andersen settlement."" The Waste Management accoun=\nting scandal stands out for it size and breadth. After a board-led probe tu=\nrned up years of questionable accounting, the company took a $3.5-billion c=\nharge in 1998, and since then the trash hauler and Arthur Andersen agreed t=\no pay $220-million to settle shareholder litigation in the matter. The com=\npany admitted it had overstated its pretax earnings by $1.43-billion in 199=\n2 to 1996 -- the biggest restatement in SEC history. Neither Waste Manageme=\nnt nor any of its employees have been disciplined by the SEC. The SEC said =\nyesterday that the investigation continues. Within the SEC, the Arthur And=\nerson investigation became a centrepiece of the commission's aggressive cam=\npaign to demonstrate that conflicts of interest can be caused by consulting=\n and other non-auditing services that numerous accounting firms now routine=\nly offer audit clients. =09\n",9.220367


#Summary
Each type of search presented gives different kind of relevance to a given search and is closely related to things like document lenght, text preprocessing etc. 
Future work:
1. optimize code, especially USE-related search
2. normalize search relevance score and create an algorithm to combine searches from different models