# CS 276 Programming Assignment 2: Spelling Corrector

## I. Overview

In this assignment, we will build a probabilistic spelling corrector to automatically correct errors in queries. More formally, given a (possibly corrupt) raw query $R$, our goal is to find the intended query $Q$ which maximizes the probability $P(Q\mid R)$. That is, we want to guess the query which the user probably meant to submit. By Bayes' Theorem we have
$$
    P(Q\mid R) = \frac{P(R\mid Q)P(Q)}{P(R)}\propto P(R\mid Q)P(Q).
$$
Since our goal is to find the value of $Q$ which maximizes $P(Q\mid R)$, this shows it is sufficient to maximize $P(R\mid Q)P(Q)$. With the above formulation in mind, we will build a probabilistic spelling corrector consisting of 4 parts:
  1. **Language Model.**
      Estimates the prior distribution of unigrams and bigrams, allowing us to estimate $P(Q)$. We will use maximum-likelihood estimation, which counts the occurrences of token unigrams and bigrams in the training corpus in order to determine their prior probabilities.
  2. **Edit Probability Model.**
      Estimates the likelihood of errors that may occur in a query, which allows us to estimate $P(R\mid Q)$. In particular, this component estimates the probability of characters being mistakenly deleted, inserted, substituted, or transposed in a query term.
  3. **Candidate Generator.**
      Takes a raw query $R$ submitted by the user, and generates candidates for $Q$.
  4. **Candidate Scorer.**
      Combines (1), (2), and (3) to compute $Q^{*} = \arg\max_{Q}P(Q\mid R)$. That is, for each $Q$ generated by the candidate generator, the scorer uses the language model to estimate $P(Q)$ and uses the edit probability model to estimate $P(R\mid Q)$, and finally chooses $Q$ which maximizes $P(Q)P(R\mid Q)$.

## II. Assignment Details

The assignment is due at **4:00 PM PST on Tuesday, May 7th, 2019**. We have split the assignment up into the following parts:
  1. [Task 1: Spelling Correction with Uniform Edit Costs](#uniform): **55%** of your total grade for this assignment depends on a correctly implemented solution for task 1. Your solution will be evaluated on a hidden test set, and full credit will be given to models that are within 1% of the staff implementation's test-set accuracy or higher. We do not publish the test set queries or our accuracy on the test set. However, as a guideline for performance, the staff implementation with uniform edit probability model gets **82.42% on the dev set.** We will give partial credit on a non-linear scale (which disproportionately favors models that are closer to our threshold for full credit, as an encouragement to squeeze out more performance improvements).
  2. [Task 2: Spelling Correction with Empirical Edit Costs](#empirical): **25%** of your total grade is based on your implementation of task 2. Full credit will be granted for accuracy levels within 1% of the staff implementation's test-set accuracy or higher. Again, we do not publish our test set accuracy, but the staff implementation with empirical edit probability model gets **87.91% on the dev set.** As with Task 1, we will give partial for lower accuracy levels, we will give partial credit on a non-linear scale, with credit accruing more rapidly as your solution gets closer to the target.
  3. [Written Report](#written): **20%** of your grade is based on the 1-2 page report that you will submit through Gradescope. See [Section VI](#written) for instructions and grading breakdown.
  4. [Extra Credit (Optional)](#extra): **Up to 10%** extra credit will be awarded for implementing extensions, with an explanation in the report. It is not necessary for the extensions to radically improve accuracy to get credit. As described in [Section VII](#extra), you can also get a small amount of extra credit if your system is a top performer in terms of accuracy or running time.

The submission procedure is the same as in PA1, but we repeat the instructions here for your reference:
  - This assignment should be done in teams of two or individually. Assignments are graded the same for one and two person teams.
  - The notebook will automatically generate Python files in `submission` folder. To submit your assignment, **upload the Python files to the PA2-code assignment on Gradescope.** Note that you need to upload all the individual files in the `submission` folder without zipping it.
  - While solving the assignment, do **NOT** change class and method names, otherwise the autograder tests will fail.
  - You'll also have to **upload a PDF version of the notebook (which would be primarily used to grade your report section of the notebook) to PA2-PDF assignment on Gradescope.** Note that directly converting the PDF truncates code cells. To get a usable PDF version, first click on `File > Print Preview`, which will open in a new tab, then print to PDF using your browser's print functionality.
  - After uploading the PDF make sure you tag all the relevant pages to each question. We reserve the right to penalize for mistagged submissions.
  - If you are solving the assignment in a team of two, add the other student as a group member after submitting the assignment. Do **NOT** submit the same assignment twice.

#### A Note on Numerical Stability

Many of the probabilities we will encounter in this assignment are very small. When we multiply many small numbers together, there is a risk of [underﬂow](https://en.wikipedia.org/wiki/Arithmetic_underflow). Therefore, it is common practice to perform this type of probability calculation in log space. Recall that:
  1. The log function is monotonically increasing, therefore $\arg\max p = \arg\max\log p$.
  2. We have $\log(pq) = \log p + \log q$, and by extension $\log\left(\prod_{i} p_i\right) = \sum_{i}\log p_i$.

As a result, if we want to maximize $P(\textbf{x}) = P(x_1)P(x_2)\cdots P(x_n)$, we can equivalently maximize $\log P(\textbf{x}) = \log P(x_1) + \log P(x_2) + \cdots + \log P(x_n)$. **For numerical stability, we recommend that you use this log-space formulation throughout the assignment.**

<a id="dataset"></a>
## III. Dataset

The dataset you will be working with for this assignment is available as a zip file at [this link](http://web.stanford.edu/class/cs276/pa/pa2-data.zip). The unzipped data directory will contain the following subdirectories:
  - **Language Modeling Morpus (`pa2-data/corpus/`).** 99,904 documents crawled from the stanford.edu domain. The corpus is organized in a block structure found at `pa2-data/corpus/`, where you'll find 10 files. Each line in a file represents the text of a single document. You will use the tokens in these documents to build a language model.
  - **Query Training Set (`pa2-data/training_set/`).** 819,722 pairs of misspelled queries and their corresponding corrected versions, with each pair separated by an edit distance of at most one. The two queries are tab-separated in the file `pa2-data/training_set/edit1s.txt`. You will use this data to build a probability model for the "noisy channel" of spelling errors.
  - **Query Dev Set (`pa2-data/dev_set`).** 455 pairs of misspelled and corrected queries, which you will use to measure the performance of your model.  There are three files in `pa2-data/dev_set/`: the (possibly) misspelled queries are in `queries.txt`, corrected versions are in `gold.txt`, and Google's suggested spelling corrections are in `google.txt`.
  
Run the following code blocks to import packages, download, and unzip the data.

In [1]:
%reload_ext autograding_magics

In [2]:
# %%tee submission/imports.py

# Import modules
import math
import os
import urllib.request
import zipfile
from collections import Counter
from tqdm import tqdm
# from numpy import argmax

<a id='uniform'></a>
## IV. Task 1: Spelling Correction with Uniform Edit Costs (55%)

### IV.1. Language Model

We will now build a language model to estimate $P(Q)$ from the training corpus. We will treat $Q$ as a sequence of terms $(w_1, \ldots, w_n)$ whose probability is computed as
$$
P(w_1, \ldots, w_n) = P(w_1)P(w_2\mid w_1)\cdots P(w_n\mid w_{n-1}),
$$
where $P(w_1)$ is the unigram probability of term $w_1$, and $P(w_{i}\mid w_{i-1})$ is the bigram probability of $(w_{i-1}, w_i)$ for $i \in \{2, \ldots, n\}$.

#### IV.1.1. Calculating Unigram and Bigram Probabilities

Our language model will use the maximum likelihood estimates (MLE) for both probabilities, which turn out to be their observed frequencies:
$$
\begin{align*}
    P_{\text{MLE}}(w_i) & = \frac{\texttt{count}(w_i)}{T},
    &
    P_{\text{MLE}}(w_i\mid w_{i-1}) & = \frac{\texttt{count}((w_{i}, w_{i-1}))}{\texttt{count}(w_{i-1})},
\end{align*}
$$
where $T$ is the total number of tokens in our corpus, and where $\texttt{count}$ simply counts occurrences of unigrams or bigrams in the corpus. In summary, computing unigram probabilities $P(w_i)$ and bigram probabilities $P(w_{i}\mid w_{i-1})$ is a simple matter of counting the unigrams and bigrams that appear throughout the corpus.

Fill out the following code block to count the unigrams and bigrams in our corpus.

In [3]:
# %%tee submission/language_model_part1.py

class LanguageModel:
    """Models prior probability of unigrams and bigrams."""

    def __init__(self, corpus_dir='pa2-data/corpus', lambda_=0.1):
        """Iterates over all whitespace-separated tokens in each file in
        `corpus_dir`, and counts the number of occurrences of each unigram and
        bigram. Also keeps track of the total number of tokens in the corpus.

        Args:
            corpus_dir (str): Path to directory containing corpus.
            lambda_ (float): Interpolation factor for smoothing by unigram-bigram
                interpolation. You only need to save `lambda_` as an attribute for now, and
                it will be used later in `LanguageModel.get_bigram_logp`. See Section
                IV.1.2. below for further explanation.
        """
        self.lambda_ = lambda_
        self.total_num_tokens = 0        # Counts total number of tokens in the corpus
        
        self.unigram_counts = {}          # Initialize dictionary to maintain unigram counts
        self.bigram_counts ={}            # Initialize dictionary to maintain bigram counts
        
        for i in range(10):
            file = corpus_dir + '/' + str(i) + '.txt'
            with open(file, 'r') as fp:
                doc = fp.read()
                doc = doc.split()
                self.total_num_tokens += len(doc)
                for tok_id in range(len(doc)):
                    try:
                        self.unigram_counts[doc[tok_id]]+=1
                    except:
                        self.unigram_counts[doc[tok_id]]=1
                    try:
                        self.bigram_counts[doc[tok_id]+ " " + doc[tok_id+1]]+=1
                    except:
                        if(tok_id!=len(doc)-1):
                            self.bigram_counts[doc[tok_id]+ " " + doc[tok_id+1]]=1

Now that we have counted the unigrams and bigrams in our corpus, we will add methods for computing query probabilities. First, however, a note about handling bigrams which never occur in our corpus:

<a id='smoothing'></a>
#### IV.1.2. Smoothing by Interpolation

The unigram probability model will also serve as our vocabulary, since we are making the assumption that our query language is derived from our document corpus. As a result, we do not need to perform [Laplace smoothing](https://en.wikipedia.org/wiki/Additive_smoothing) on our unigram probabilities, since our candidates will be drawn from this very vocabulary. However, even if we have two query terms that are both members of our query language, there is no guarantee that their corresponding *bigram* appears in our training corpus. To handle this data sparsity problem, we will *interpolate* unigram and bigram probabilities to get our ﬁnal conditional probability estimates:
$$
P(w_2\mid w_1) = \lambda P_{\text{MLE}}(w_2) + (1 - \lambda)P_{\text{MLE}}(w_2\mid w_1).
$$
Try setting $\lambda$ to a small value (say, 0.1) in the beginning, and experiment later with varying this parameter to see if you can get better correction accuracies on the development dataset. However, be careful not to overﬁt your development dataset. (You might consider reserving a small portion of your development data to tune the parameters).

Fill out the functions below to complete our `LanguageModel` class.

In [4]:
# %%tee submission/language_model_part2.py

# NOTE: Syntax on the following line just extends the `LanguageModel` class
class LanguageModel(LanguageModel):
    def get_unigram_logp(self, unigram):
        """Computes the log-probability of `unigram` under this `LanguageModel`.

        Args:
            unigram (str): Unigram for which to compute the log-probability.

        Returns:
            log_p (float): Log-probability of `unigram` under this
                `LanguageModel`.
        """
        try:
            return self.unigram_counts[unigram] / self.total_num_tokens
        except:
            return 0.000000000000000001                      

    def get_bigram_logp(self, w_1, w_2):
        """Computes the log-probability of `unigram` under this `LanguageModel`.

        Note:
            Use self.lambda_ for the unigram-bigram interpolation factor.

        Args:
            w_1 (str): First word in bigram.
            w_2 (str): Second word in bigram.

        Returns:
            log_p (float): Log-probability of `bigram` under this
                `LanguageModel`.
        """
        try: 
            return math.log(self.lambda_*self.get_unigram_logp(w_2) + (1 - self.lambda_)*(self.bigram_counts[w_1 + " " + w_2]/self.unigram_counts[w_1]), 10)

        except:
            return -18

    def get_query_logp(self, query):
        """Computes the log-probability of `query` under this `LanguageModel`.

        Args:
            query (str): Whitespace-delimited sequence of terms in the query.

        Returns:
            log_p (float): Log-probability assigned to the query under this
                `LanguageModel`.
        """
        query = query.split()
        
        # Implementing the P(w1,...wn) formula
        probability_product = 0
        for i in range(1,len(query)):
            probability_product = probability_product + self.get_bigram_logp(query[i - 1], query[i])
        probability_product = probability_product + math.log(self.get_unigram_logp(query[0]), 10)            # log(unigram) because get_unigram_logp() does not return log
        return probability_product

In [5]:
# Make sure your implementation passes the following sanity checks
# Note: Constructing the language model could take 30 seconds or longer
# We suggest using `tqdm` to track progress in your `LanguageModel.__init__` function.
lm = LanguageModel()

assert len(lm.unigram_counts) == 347071, 'Invalid num. unigrams: {}'.format(len(lm.unigram_counts))
assert len(lm.bigram_counts) == 4497257, 'Invalid num. bigrams: {}'.format(len(lm.bigram_counts))
assert lm.total_num_tokens == 25498340, 'Invalid num. tokens: {}'.format(lm.total_num_tokens)

# Test a reasonable query with and without typos (you should try your own)!
query_wo_typo = "stanford university"
query_w_typo = "stanfrod universit"

p_wo_typo = math.exp(lm.get_query_logp(query_wo_typo))                           # WHY exp???
p_w_typo = math.exp(lm.get_query_logp(query_w_typo))
print('P("{}") == {}'.format(query_wo_typo, p_wo_typo))
print('P("{}") == {}'.format(query_w_typo, p_w_typo))
if p_wo_typo <= p_w_typo:
    print('\nAre you sure "{}" should be assigned higher probability than "{}"?'
          .format(query_w_typo, query_wo_typo))
print('All tests passed!')

P("stanford university") == 0.08400910983345951
P("stanfrod universit") == 1.2497632217906412e-11
All tests passed!


### IV.2. Edit Probability Model

The edit probability model attempts to estimate $P(R\mid Q)$. That is, for a fixed candidate query $Q$, the edit probability model estimates the probability that a (possibly corrupt) raw query $R$ was submitted. We quantify the distance between the candidate query $Q$ and the actual input $R$ using the [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance). In Damerau-Levenshtein distance, the possible edits are **insertion**, **deletion**, **substitution**, and **transposition**, each involving single characters as operands. We have provided a base class for `EditCostModel`s below.

In [6]:
# %%tee submission/base_edit_probability_model.py

class BaseEditProbabilityModel:
    def get_edit_logp(self, edited, original):
        """Gets the log-probability of editing `original` to arrive at `edited`.
        The `original` and `edited` arguments are both single terms that are at
        most one edit apart.
        
        Note: The order of the arguments is chosen so that it reads like an
        assignment expression:
            > edited := EDIT_FUNCTION(original)
        or, alternatively, you can think of it as a (unnormalized) conditional probability:
            > log P(edited | original)

        Args:
            edited (str): Edited term.
            original (str): Original term.

        Returns:
            logp (float): Log-probability of `edited` given `original`
                under this `EditProbabilityModel`.
                
        """
        raise NotImplementedError  # Force subclass to implement this method

**It is important to understand that `get_edit_logp` will be called with `original` and `edited` each being single terms that are at most 1 edit apart.** Moreover, its outputs need not be normalized probabilities that sum to 1 over all possible edits to `original` (you can think of the return value more as a "likelihood score" than a true probability). We provide an example usage below for clarity:
```python
epm = EditProbabilityModelSubclass(...)  # You will define such a subclass later
original = 'stanford'
edited = 'stanfrod'                      # Edited by transposing 'o' and 'r'
score = epm.get_edit_logp(edited, original)
```

#### IV.2.1. Uniform-Cost Edit Model

As a first pass, we will implement a *uniform-cost edit model.* This model simplifies the computation of the edit probability by assuming that every individual edit in the Damerau-Levenshtein distance has the same probability. You should try a range of values for your uniform edit probability, but in the beginning 0.01 - 0.10 is appropriate. One important thing to remember in building your model is that the user's input query $R$ may indeed be the right one in a majority of cases (*i.e.,* $R = Q$). Thus we typically choose a high ﬁxed probability for `edited == original`; a reasonable range is 0.90 - 0.95.

The edit probability model that you construct here will be used when you rank candidates for query corrections. The candidate generator (described in the next section) will make one edit at a time, and it will call the edit probability model each time it makes a single edit to a term, summing log-probabilities for multi-edit changes. Therefore, all you need to do in this part is to calculate the probability of `edited` given that it is **at most one edit from `original`.** This means that `get_edit_logp` will be very simple in this case.

Fill out the following class to implement a uniform-cost edit model.

In [7]:
# %%tee submission/uniform_edit_probability_model.py

class UniformEditProbabilityModel(BaseEditProbabilityModel):
    def __init__(self, edit_prob=0.05):
        """
        Args:
            edit_prob (float): Probability of a single edit occurring, where
                an edit is an insertion, deletion, substitution, or transposition,
                as defined by the Damerau-Levenshtein distance.
        """
        self.edit_prob = edit_prob

    def get_edit_logp(self, edited, original):
        """Gets the log-probability of editing `original` to arrive at `edited`.
        The `original` and `edited` arguments are both single terms that are at
        most one edit apart.
        
        Note: The order of the arguments is chosen so that it reads like an
        assignment expression:
            > edited := EDIT_FUNCTION(original)
        or, alternatively, you can think of it as a (unnormalized) conditional probability:
            > log P(edited | original)

        Args:
            edited (str): Edited term.
            original (str): Original term.

        Returns:
            logp (float): Log-probability of `edited` given `original`
                under this `EditProbabilityModel`.
        """
        prob = 0.0
        if edited == original:
            prob = 1 - 0.4 # Fixed probablity
        else:
            prob = 0.4                                  
        return math.log(prob, 10)

In [8]:
EDIT_PROB = 0.4
epm = UniformEditProbabilityModel(edit_prob=EDIT_PROB)
edited, original = 'did you go to stanford on university at stranforde', 'did you go to stranford on unversit at stranforde'
epm.get_edit_logp(edited, original)

-0.39794000867203755

In [9]:
math.log(0.8,10)            # -0.0223
#math.log(0.05, 10)           # -1.301

# -12*-0.096
lm.get_bigram_logp("stranford", "unviersity") + math.log(lm.get_unigram_logp("stranford"), 10)

# -2.4*-0.69
lm.get_bigram_logp("stanford", "university") + math.log(lm.get_unigram_logp("stanford"), 10)

#cs.get_score("stranford unviersity", epm.get_edit_logp("stranford unviersity", "stranford unviersity"))

-2.4768300356208153

Make sure you pass the following sanity checks:

In [10]:
EDIT_PROB = 0.4
epm = UniformEditProbabilityModel(edit_prob=EDIT_PROB)

# Test a basic edit
edited, original = 'stanfrod', 'stanford'
assert math.isclose(epm.get_edit_logp(edited, original), math.log(EDIT_PROB, 10))

# Test a non-edit
assert math.isclose(epm.get_edit_logp(original, original), math.log(1. - EDIT_PROB, 10))

print('All tests passed!')

All tests passed!


### IV.3. Candidate Generator

Recall that the candidate generator takes a raw query $R$ submitted by the user, and generates candidates for the intended query $Q$. Since we know that more than 97% of spelling errors are found within an edit distance of 2 from the user's intended query, we encourage you to consider possible query corrections that are within distance 2 of $R$. This is the approach taken by Peter Norvig in [his essay on spelling correction](http://norvig.com/spell-correct.html). However, it is not tractable to use a pure "brute force" generator that produces all possible strings within distance 2 of $R$, because for any $R$ of non-trivial length, the number of candidates would be enormous. Thus we would have to evaluate the language and edit probability models on a huge number of candidates.


#### IV.3.1. Candidate Generator with Restricted Search Space

We can make the naïve approach tractable by aggressively narrowing down the search space while generating candidates. There are many valid approaches to efficient candidate generation, but here are a few basic ideas:
  - Begin by looking at *each individual term* in the query string $R$, and consider all possible edits that are distance 1 from that term.
  - Remember that you might consider hyphens and/or spaces as elements of your character set. This will allow you to consider some relatively common errors, like when a space is accidentally inserted in a word, or two terms in the query were mistakenly separated by a space when they should actually be joined.
  - Each time you generate an edit to a term, make sure that the edited term appears in the dictionary. (Remember that we have assumed that all words in a valid candidate query will be found in our training corpus, as mentioned above in [Section IV.1.2](#smoothing) above).
  - If you have generated possible edits to multiple individual terms, take the Cartesian product over these terms to produce a complete candidate query that includes edits to multiple terms. (But remember that you probably shouldn't go beyond a total edit distance of 2 for the query overall).
  
Again, there are many possible extensions and variations on the strategies mentioned here. We encourage you to explore some diﬀerent options, and then describe in your written report the strategies that you ultimately used, and how you optimized their performance. Note that **solutions that exhaustively generate and score all possible query candidates at edit distances 1 and 2 will run too slowly and will not receive full credit.**

In [15]:
# %%tee submission/candidate_generator.py

# CHANGES IN THIS VERSION - COMMENTS MADE AT LINE #73, #87

class CandidateGenerator:
    # Alphabet to use for insertion and substitution
    alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
                'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
                '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
                ' ', ',', '.', '-']

    def __init__(self, lm, epm):
        """
        Args:
            lm (LanguageModel): Language model to use for prior probabilities, P(Q).
            epm (EditProbabilityModel): Edit probability model to use for P(R|Q).
        """
        self.lm = lm
        self.epm = epm
        self.vocab = set(lm.unigram_counts.keys())

    def get_num_oov(self, query):
        """Get the number of out-of-vocabulary (OOV) words in `query`."""
        return sum(1 for w in query.strip().split()
                   if w not in self.lm.unigram_counts)

    def filter_and_yield(self, query, lp):
        if query.strip() and self.get_num_oov(query) == 0:
            yield query, lp
            
    def in_vocab(self, words):
        return set(word for word in words if word in self.vocab)
    
    def edit_distance_one(self, word):
        splits     = [(word[:i], word[i:])    for i in range(len(word) + 1)]
        deletes    = [L + R[1:]               for L, R in splits if R]
        transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R)>1]
        replaces   = [L + c + R[1:]           for L, R in splits if R for c in self.alphabet]
        inserts    = [L + c + R               for L, R in splits for c in self.alphabet]
        
        return set(deletes + transposes + replaces + inserts)
    
    def edit_distance_two(self, word):
        return set(e2 for e1 in self.edit_distance_one(word) for e2 in self.edit_distance_one(e1))
    
    def get_candidates(self, query):
        """Starts from `query`, and performs EDITS OF DISTANCE <=2 to get new
        candidate queries. To make scoring tractable, only returns/yields
        candidates that satisfy certain criteria (ideas for such criteria are
        described in bullet points above).

        Hint: We suggest you implement a helper function that takes a term and
            generates all possible edits of distance one from that term.
            It should probably only return edits that are in the vocabulary
            (i.e., edits for which `self.get_num_oov(edited) == 0`).

        Args:
            query (str): Starting query.

        Returns:
            Iterable over tuples (cdt, cdt_edit_logp) of candidates and
                their associated edit log-probabilities. Return value could be
                a list or a generator yielding tuples of this form.
        """
        
        terms = query.strip().split()                   # List of terms in the query
        
        distance_one = []                               # Stores one edit distance terms  [[candidate terms,...], [index of word in query]]
        distance_two = []                               # Stores two edit distance terms  [[candidate terms,...], [index of word in query]]
        
        pos = 0
        
#         It's a list. Dictionary -> Repeated words gives errors. There are better ways of doing it, but lists are simple
        terms_dict = []                            
        
        for i in range(len(terms)): 
            terms_dict.append([terms[i], i])
        
        for key, value in terms_dict:
            temp = self.edit_distance_one(key)
                
            distance_one.append([temp, value])
            distance_two.append([self.edit_distance_two(key).difference(temp), value])     # OPTIMIZATION 1: 'difference' to avoid duplicates

            
            
#       very_berry_temp->doesn't matter
#       Basically, the 2nd try and except for both accepted1 and accepted2 makes sure we add the position,
#       even if there are NO edited terms present in the vocab. In such a case, add the query term itself
#       as the value.

        # OPTIMIZATION 2 : Remove one edited terms not in vocab
        accepted1 = {}                                 # Stores accepted 1-edit distance terms. {index:{terms,}}
        for termsEdited, index in distance_one:
            for j in termsEdited:
                if j in self.vocab:
                    try:
                        accepted1[index].add(j)
                    except:
                        accepted1[index] = {j,}
            try: 
                very_berry_temp = accepted1[index]
            except:
                accepted1[index] = {terms[index], }
                        
        accepted2 = {}                                # Stores accepted 2-edit distance terms. {index:{terms,}}
        for termsEdited, index in distance_two:
            sampledSet = itertools.islice(termsEdited, 3)
            for j in termsEdited:
                if j in self.vocab:
                    try:
                        accepted2[index].add(j)
                    except:
                        accepted2[index] = {j,}
            try: 
                very_berry_temp = accepted2[index]
            except:
                accepted2[index] = {terms[index], }
        
        # Generate Candidate Queries with one-edit and zero-edit distance replacements
        terms_indexed = [[k, v] for k,v in terms_dict]                      # {current_word : index_in_query}
        query_terms = terms
        candidate_queries_1 = []                                                    # Final candidate query list of one-edit replacements
        cq = []                                                                     # Temporary list of candidate queries
        candidate = ""                                                              # Temporary candidate
        for i in range(len(terms_indexed)-1):
            i_word, i_index = terms_indexed[i][0], terms_indexed[i][1]
            candidate = ' '.join(query_terms[:i_index])                      # i = consider i'th to_be_edited word to be the first replacement
                                                                             # Candidate includes all words upto the i'th word as-is.
            for edited_word in accepted1[i_index]:
                if candidate:                                                # if to handle correct addition of whitespaces
                    cq.append(candidate + " " + edited_word)
                else:
                    cq.append(edited_word)
                                                                             # cq holds all possible queries with correction on the i'th term (and upto ith)
                                                                             # Next step: Generate all possible queries hereforth
            if candidate:                                                    # Include un-edited current word as well
                cq.append(candidate + " " + i_word)
            else:
                cq.append(i_word)
                
            j = i+1
            candidate = ""
            
            cq2 = []
            for ind in cq: 
                cq2.append(ind + " " + ' '.join(query_terms[i_index + 1:]))
            candidate_queries_1 += cq2                                   # Add completed queries to the final list
            
            while(j<len(terms_indexed)):
                j_word, j_index = terms_indexed[j][0], terms_indexed[j][1]   # Next Incorrect word and Index of the next incorrect word
                candidate = ' '.join(query_terms[i_index+1:j_index])         # All the correct words in between the prev incorrect and current incorrect
                cq2 = []                                                     # Temporary Candidate Query List
                for edited_word in accepted1[j_index]:
                    for ind in range(len(cq)):                               # Append correct words in between + edited terms to the half-candidate queries and complete
                        if candidate:
                            cq2.append(cq[ind] + " " + candidate + " " + edited_word + " " + ' '.join(query_terms[j_index+1:]))
                        else:
                            cq2.append(cq[ind] + " " + edited_word + " " + ' '.join(query_terms[j_index+1:]))
                j+=1                                                         # Next incorrect word
                candidate_queries_1 += cq2                                   # Add completed queries to the final list
                
            cq = []
            
        '''
        print("\n---------------------ONE EDIT DISTANCE--------------------------\n")
        print(candidate_queries_1)
        print("\n---------------------                 --------------------------\n")
        '''
                
        # Generate Candidate Queries with a Single two-edit replacement
        
        pos = 0
        candidate_queries_2 = []
        candidate = ""
        query_terms = terms
        for term, value in terms_dict:
            for i in accepted2[pos]:
                candidate_queries_2.append(candidate + i + " " + ' '.join(query_terms[pos+1:]))
            candidate += term + " "                                     # Exclude correction of current incorrect term and append as-is.
            pos += 1
        '''
        print("\n---------------------TWO EDIT DISTANCE--------------------------\n")
        print(candidate_queries_2)
        print("\n---------------------                 --------------------------\n")
        '''
                
        
        # Adding my doubts here:
        '''
        1. Candidate generation has to be done for each term - but how will the cartesian product work?
        2. How to ensure cartesian product terms have edit distance <= 2? 
        3. Once we get the valid candidate queries, epms for each word have to be summed or multiplied?
        4. All the above steps still seems extremely computationally expensive? How do you optimize it?
        '''
        
        # Yield the unedited query first
        # We provide this line as an example of how to use `self.filter_and_yield`
        candidate = candidate_queries_1 + candidate_queries_2
        res= []
        for edited_query in candidate: 
            #yield from self.filter_and_yield(query, self.epm.get_edit_logp(edited_query, query))
            res.append([edited_query.strip(), self.epm.get_edit_logp(edited_query.strip(), query)])
            
        #res.remove([query, math.log(0.8, 10)]) # TOREMOVE ORIGINAL QUERY FROM LIST. SHOULD NOT BE THERE
        #print(res)
        return res
        
        
model = CandidateGenerator(LanguageModel(), UniformEditProbabilityModel(BaseEditProbabilityModel))
#model.get_candidates("did you go to stranford on unversit at stranforde")
#model.get_candidates('stranford unviersity')

In [16]:
testcg = CandidateGenerator(lm, epm)
testcg.get_candidates("pagge 1 page 2 page")

[['page 1 page 2 page', -0.39794000867203755],
 ['paige 1 page 2 page', -0.39794000867203755],
 ['paggi 1 page 2 page', -0.39794000867203755],
 ['bagge 1 page 2 page', -0.39794000867203755],
 ['hagge 1 page 2 page', -0.39794000867203755],
 ['pogge 1 page 2 page', -0.39794000867203755],
 ['pagge 1 page 2 page', -0.22184874961635637],
 ['page u page 2 page', -0.39794000867203755],
 ['paige u page 2 page', -0.39794000867203755],
 ['paggi u page 2 page', -0.39794000867203755],
 ['bagge u page 2 page', -0.39794000867203755],
 ['hagge u page 2 page', -0.39794000867203755],
 ['pogge u page 2 page', -0.39794000867203755],
 ['pagge u page 2 page', -0.39794000867203755],
 ['page i1 page 2 page', -0.39794000867203755],
 ['paige i1 page 2 page', -0.39794000867203755],
 ['paggi i1 page 2 page', -0.39794000867203755],
 ['bagge i1 page 2 page', -0.39794000867203755],
 ['hagge i1 page 2 page', -0.39794000867203755],
 ['pogge i1 page 2 page', -0.39794000867203755],
 ['pagge i1 page 2 page', -0.39794000

Make sure your candidate generator passes the following sanity checks. Feel free to add more tests here as you see fit.

In [17]:
cg = CandidateGenerator(lm, epm)
query = 'stanford university'
num_candidates = 0
did_generate_original = False
for candidate, candidate_logp in cg.get_candidates(query):
    num_candidates += 1
    if candidate == query:
        did_generate_original = True

    assert cg.get_num_oov(query) == 0, \
        "You should not generate queries with out-of-vocab terms ('{}' has OOV terms)".format(candidate)

assert 1e2 <= num_candidates <= 1e4, \
    "You should generate between 100 and 10,000 terms (generated {})".format(num_candidates)

assert did_generate_original, "You should generate the original query ({})".format(query)

### Begin your code

### End your code

print('All tests passed!')

All tests passed!


### IV.4. Candidate Scorer

The candidate scorer's job is to find the most likely query $Q$ given the raw query $R$. It does this by combining the language model for $P(Q)$, the edit probability model for $P(R\mid Q)$, and the candidate generator (to get candidates for $Q$). Formally, given raw query $R$, the candidate scorer outputs
$$
    Q^{*} = \arg\max_{Q_{i}} P(Q_{i}\mid R) = \arg\max_{Q_{i}} P(R\mid Q_{i}) P(Q_{i}),
$$
where the max is taken over candidate queries $Q_{i}\in\{Q_1, \ldots, Q_{n}\}$ produced by the candidate generator given $R$.

#### IV.4.1. Candidate Scorer with Weighting
When combining probabilities from the language model and the edit probability model, we can use a parameter to weight the two models differently:
$$
    P(Q\mid R)\propto P(R\mid Q)P(Q)^{\mu}.
$$
Start out with $\mu = 1$, and then experiment later with different values of $\mu$ to see which one gives you the best spelling correction accuracy. Again, be careful not to overfit your development dataset. 

Fill out the following class to complete the spelling corrector with uniform edit cost model.

In [18]:
# %%tee submission/candidate_scorer.py

class CandidateScorer:
    """Combines the `LanguageModel`, `EditProbabilityModel`, and
    `CandidateGenerator` to produce the most likely query Q given a raw query R.
    Since the candidate generator already uses the edit probability model, we
    do not need to take the edit probability model as an argument in the constructor.
    """
    def __init__(self, lm, cg, mu=1.):
        """
        Args:
            lm (LanguageModel): Language model for estimating P(Q).
            cg (CandidateGenerator): Candidate generator for generating possible Q.
            mu (float): Weighting factor for the language model (see write-up).
                Remember that our probability computations are done in log-space.
        """
        self.lm = lm
        self.cg = cg
        self.mu = mu
    
    def get_score(self, query, log_edit_prob):
        """Uses the language model and `log_edit_prob` to compute the final
    b    score for a candidate `query`. Uses `mu` as weighting exponent for P(Q).

        Args:
            query (str): Candidate query.
            log_edit_prob (float): Log-probability of candidate query given
                original query (i.e., log(P(R|Q), where R is `query`).

        Returns:
            log_p (float): Final score for the query, i.e., the log-probability
                of the query.
        """
        ### Begin your code
        
        p_q = self.lm.get_query_logp(query)
        try:
            return log_edit_prob*p_q
        except:
            return -100 # Why are we returning 100 here?

        ### End your code

    def correct_spelling(self, r):
        """Corrects spelling of raw query `r` to get the intended query `q`.

        Args:
            r (str): Raw input query from the user.

        Returns:
            q (str): Spell-corrected query. That is, the query that maximizes
                P(R|Q)*P(Q) under the language model and edit probability model,
                restricted to Q's generated by the candidate generator.
        """
        ### Begin your code
        
        # generate candidate queries
        candidates = self.cg.get_candidates(r) # get candidates here using self.cg
        final_scores = [0]*len(candidates)
#         for i in candidates:
#             final_scores.append(0)
            
        min_index = 0
        for i in range(len(final_scores)):
            final_scores[i] = self.get_score(candidates[i][0],candidates[i][1])
            if(final_scores[i]<final_scores[min_index]):
                min_index = i
        '''        
        for i in range(len(final_scores)):
            print(candidates[i][0], candidates[i][1], final_scores[i])
        print("\n#######################################################################################################\n")
        print(candidates[min_index][0], "\t", final_scores[min_index])
        '''
        return candidates[min_index][0]

        ### End your code

In [19]:
# Assumes LanguageModel lm was already built above
print('Building edit probability model...')
epm = UniformEditProbabilityModel()
print('Building candidate generator...')
cg = CandidateGenerator(lm, epm)
print('Building candidate scorer model...')
cs = CandidateScorer(lm, cg, mu=1.0)
print('Running spelling corrector...')

# Add your own queries here to test your spelling corrector
queries = [('stranford unviersity', 'stanford university'),
             ('stanford unviersity', 'stanford university'),
             ('sanford university', 'stanford university')]
for query, expected in queries:
    corrected = cs.correct_spelling(query)
    print("\t'{}' corrected to '{}'".format(query, corrected))
    assert corrected == expected, "Expected '{}', got '{}'".format(expected, corrected)
print('All tests passed!')

Building edit probability model...
Building candidate generator...
Building candidate scorer model...
Running spelling corrector...
	'stranford unviersity' corrected to 'stanford university'
	'stanford unviersity' corrected to 'stanford university'
	'sanford university' corrected to 'stanford university'
All tests passed!


#### IV.4.2. Dev Set Evaluation (Uniform)

Now that we have constructed a basic spelling corrector, we will evaluate its performance on the held-out dev set. Recall that the dev set is stored across the files in `pa2-data/dev_set/`:
  - `queries.txt`: One raw query $R$ per line.
  - `google.txt`: Google's corrected queries $Q$ (one per line, same order as `queries.txt`).
  - `gold.txt`: Ground-truth queries $Q$ (again, one per line, same order).
  
Run the following cells to evaluate your spelling corrector on the dev set using your uniform edit probability model. We will also evaluate your model on a private test set after submission. For full credit, your spelling corrector with uniform edit probability model should achieve accuracy within 1% of the staff implementation *on the test set.* **We do not provide test set queries, but as a guideline for performance, the staff implementation gets 82.42% accuracy on the dev set.**

In [20]:
def dev_eval(candidate_scorer, verbose=False):
    """Evaluate `candidate_scorer` on the dev set."""
    query_num = 1
    yours_correct = 0
    google_correct = 0
    # Read originals, ground-truths, Google's predictions
    dev_dir = 'pa2-data/dev_set/'
    with tqdm(total=455, unit=' queries') as pbar, \
            open(os.path.join(dev_dir, 'queries.txt'), 'r') as query_fh, \
            open(os.path.join(dev_dir, 'gold.txt'), 'r') as gold_fh, \
            open(os.path.join(dev_dir, 'google.txt'), 'r') as google_fh:
        while True:
            # Read one line
            query = query_fh.readline().rstrip('\n')
            print("Query = ", query)
            if not query:
                # Finished all queries
                break
            corrected = candidate_scorer.correct_spelling(query)
            corrected = ' '.join(corrected.split())  # Squash multiple spaces
            gold = gold_fh.readline().rstrip('\n')
            google = google_fh.readline().rstrip('\n')

            # Count whether correct
            if corrected == gold:
                yours_correct += 1
            if google == gold:
                google_correct += 1

            # Print running stats
            yours_accuracy = yours_correct / query_num * 100
            google_accuracy = google_correct / query_num * 100
            if verbose:
                print('QUERY {:03d}'.format(query_num))
                print('---------')
                print('(original):      {}'.format(query))
                print('(corrected):     {}'.format(corrected))
                print('(google):        {}'.format(google))
                print('(gold):          {}'.format(gold))
                print('Google accuracy: {}/{} ({:5.2f}%)\n'
                      .format(google_correct, query_num, google_accuracy))
                print('Your accuracy:   {}/{} ({:5.2f}%)'
                      .format(yours_correct, query_num, yours_accuracy))
            
            pbar.set_postfix(google='{:5.2f}%'.format(google_accuracy),
                             yours='{:5.2f}%'.format(yours_accuracy))
            pbar.update()
            query_num += 1

In [77]:
# Set verbose=True for debugging output
# For reference, our implementation takes ~1 min, 40 sec to run and gets 82.42% accuracy
dev_eval(cs, verbose=False)

  0%|                                                                                                                             | 0/455 [00:00<?, ? queries/s]

Query =  quade quad cache xontroller
Query =  ['quade', 'quad', 'cache', 'xontroller']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


  0%|▏                                                                                     | 1/455 [00:00<06:29,  1.17 queries/s, google=100.00%, yours=100.00%]

Query =  co2 in
Query =  ['co2', 'in']
I =  0
I =  1
dict_keys([0, 1])


  0%|▍                                                                                       | 2/455 [00:01<05:01,  1.50 queries/s, google=50.00%, yours=50.00%]

Query =  powered by blacklight
Query =  ['powered', 'by', 'blacklight']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


  1%|▌                                                                                       | 3/455 [00:01<05:28,  1.38 queries/s, google=66.67%, yours=66.67%]

Query =  mw tth singledays 8 as a result one may
Query =  ['mw', 'tth', 'singledays', '8', 'as', 'a', 'result', 'one', 'may']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])


  1%|▊                                                                                       | 4/455 [00:06<13:56,  1.85s/ queries, google=50.00%, yours=50.00%]

Query =  when searching databases look for
Query =  ['when', 'searching', 'databases', 'look', 'for']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  1%|▉                                                                                       | 5/455 [00:07<12:25,  1.66s/ queries, google=60.00%, yours=60.00%]

Query =  incidence x ray absorption spectrooscopy
Query =  ['incidence', 'x', 'ray', 'absorption', 'spectrooscopy']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  1%|█▏                                                                                      | 6/455 [00:09<13:42,  1.83s/ queries, google=66.67%, yours=66.67%]

Query =  floor conf rm bringin our to content stanford univesity
Query =  ['floor', 'conf', 'rm', 'bringin', 'our', 'to', 'content', 'stanford', 'univesity']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])


  2%|█▎                                                                                      | 7/455 [00:12<15:33,  2.08s/ queries, google=71.43%, yours=57.14%]

Query =  plung from great heights
Query =  ['plung', 'from', 'great', 'heights']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


  2%|█▌                                                                                      | 8/455 [00:13<12:12,  1.64s/ queries, google=75.00%, yours=50.00%]

Query =  what et is
Query =  ['what', 'et', 'is']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


  2%|█▋                                                                                      | 9/455 [00:13<09:40,  1.30s/ queries, google=77.78%, yours=44.44%]

Query =  case of chained messages theon
Query =  ['case', 'of', 'chained', 'messages', 'theon']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  2%|█▉                                                                                     | 10/455 [00:14<09:12,  1.24s/ queries, google=70.00%, yours=50.00%]

Query =  school of earth sciences
Query =  ['school', 'of', 'earth', 'sciences']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


  2%|██                                                                                     | 11/455 [00:15<08:11,  1.11s/ queries, google=72.73%, yours=54.55%]

Query =  numbered there is one line
Query =  ['numbered', 'there', 'is', 'one', 'line']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  3%|██▎                                                                                    | 12/455 [00:16<08:08,  1.10s/ queries, google=75.00%, yours=58.33%]

Query =  artificially created entities
Query =  ['artificially', 'created', 'entities']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


  3%|██▍                                                                                    | 13/455 [00:18<09:06,  1.24s/ queries, google=76.92%, yours=61.54%]

Query =  koret pavilion taube hellel house
Query =  ['koret', 'pavilion', 'taube', 'hellel', 'house']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  3%|██▋                                                                                    | 14/455 [00:19<08:32,  1.16s/ queries, google=78.57%, yours=64.29%]

Query =  the fast paths
Query =  ['the', 'fast', 'paths']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


  3%|██▊                                                                                    | 15/455 [00:19<06:46,  1.08 queries/s, google=80.00%, yours=66.67%]

Query =  hilton 5 14 03 webmaster recital hall map audience genral
Query =  ['hilton', '5', '14', '03', 'webmaster', 'recital', 'hall', 'map', 'audience', 'genral']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
I =  9
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


  4%|███                                                                                    | 16/455 [00:23<13:06,  1.79s/ queries, google=81.25%, yours=62.50%]

Query =  community partnerships renew & new
Query =  ['community', 'partnerships', 'renew', '&', 'new']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  4%|███▎                                                                                   | 17/455 [00:24<12:14,  1.68s/ queries, google=76.47%, yours=64.71%]

Query =  pagge 1 page 2 page
Query =  ['pagge', '1', 'page', '2', 'page']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  4%|███▍                                                                                   | 18/455 [00:28<15:59,  2.20s/ queries, google=77.78%, yours=61.11%]

Query =  medows june 2004 halfway up
Query =  ['medows', 'june', '2004', 'halfway', 'up']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  4%|███▋                                                                                   | 19/455 [00:29<13:49,  1.90s/ queries, google=78.95%, yours=63.16%]

Query =  senor networks proceedings
Query =  ['senor', 'networks', 'proceedings']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


  4%|███▊                                                                                   | 20/455 [00:30<11:55,  1.64s/ queries, google=80.00%, yours=65.00%]

Query =  forign affairs reporter the age
Query =  ['forign', 'affairs', 'reporter', 'the', 'age']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  5%|████                                                                                   | 21/455 [00:34<16:09,  2.23s/ queries, google=80.95%, yours=61.90%]

Query =  they have not explictly
Query =  ['they', 'have', 'not', 'explictly']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


  5%|████▏                                                                                  | 22/455 [00:35<15:13,  2.11s/ queries, google=77.27%, yours=63.64%]

Query =  t41 t 42 a43
Query =  ['t41', 't', '42', 'a43']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


  5%|████▍                                                                                  | 23/455 [00:37<14:17,  1.99s/ queries, google=78.26%, yours=60.87%]

Query =  invalueable way to see what
Query =  ['invalueable', 'way', 'to', 'see', 'what']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  5%|████▌                                                                                  | 24/455 [00:40<15:47,  2.20s/ queries, google=79.17%, yours=62.50%]

Query =  huang qixing huang evangelos kalogerakis
Query =  ['huang', 'qixing', 'huang', 'evangelos', 'kalogerakis']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  5%|████▊                                                                                  | 25/455 [00:41<14:10,  1.98s/ queries, google=80.00%, yours=64.00%]

Query =  cife summer program2012
Query =  ['cife', 'summer', 'program2012']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


  6%|████▉                                                                                  | 26/455 [00:42<12:03,  1.69s/ queries, google=80.77%, yours=61.54%]

Query =  university's faculty in 1962
Query =  ["university's", 'faculty', 'in', '1962']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


  6%|█████▏                                                                                 | 27/455 [00:45<13:17,  1.86s/ queries, google=81.48%, yours=62.96%]

Query =  serrast stanford ca
Query =  ['serrast', 'stanford', 'ca']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


  6%|█████▎                                                                                 | 28/455 [00:48<15:56,  2.24s/ queries, google=82.14%, yours=60.71%]

Query =  argue that fx purchases
Query =  ['argue', 'that', 'fx', 'purchases']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


  6%|█████▌                                                                                 | 29/455 [00:49<13:53,  1.96s/ queries, google=82.76%, yours=62.07%]

Query =  service contribution pleaze
Query =  ['service', 'contribution', 'pleaze']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


  7%|█████▋                                                                                 | 30/455 [00:50<12:02,  1.70s/ queries, google=80.00%, yours=63.33%]

Query =  european conference on machine
Query =  ['european', 'conference', 'on', 'machine']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


  7%|█████▉                                                                                 | 31/455 [00:52<12:50,  1.82s/ queries, google=80.65%, yours=64.52%]

Query =  son to a
Query =  ['son', 'to', 'a']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


  7%|██████                                                                                 | 32/455 [00:53<11:25,  1.62s/ queries, google=78.12%, yours=62.50%]

Query =  the proposes water
Query =  ['the', 'proposes', 'water']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


  7%|██████▎                                                                                | 33/455 [00:54<09:25,  1.34s/ queries, google=78.79%, yours=63.64%]

Query =  the network desktop hardware and usda 1907 click
Query =  ['the', 'network', 'desktop', 'hardware', 'and', 'usda', '1907', 'click']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


  7%|██████▌                                                                                | 34/455 [00:56<11:22,  1.62s/ queries, google=79.41%, yours=64.71%]

Query =  a person services health
Query =  ['a', 'person', 'services', 'health']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


  8%|██████▋                                                                                | 35/455 [00:57<10:16,  1.47s/ queries, google=80.00%, yours=65.71%]

Query =  institute for international
Query =  ['institute', 'for', 'international']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


  8%|██████▉                                                                                | 36/455 [01:00<11:43,  1.68s/ queries, google=80.56%, yours=66.67%]

Query =  of the university registrar
Query =  ['of', 'the', 'university', 'registrar']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


  8%|███████                                                                                | 37/455 [01:02<12:31,  1.80s/ queries, google=81.08%, yours=67.57%]

Query =  ddlm 2004 as you can
Query =  ['ddlm', '2004', 'as', 'you', 'can']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  8%|███████▎                                                                               | 38/455 [01:03<12:31,  1.80s/ queries, google=78.95%, yours=68.42%]

Query =  been argues that the transformation
Query =  ['been', 'argues', 'that', 'the', 'transformation']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  9%|███████▍                                                                               | 39/455 [01:06<14:04,  2.03s/ queries, google=79.49%, yours=69.23%]

Query =  urls of a posting and
Query =  ['urls', 'of', 'a', 'posting', 'and']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  9%|███████▋                                                                               | 40/455 [01:08<13:11,  1.91s/ queries, google=80.00%, yours=70.00%]

Query =  with geant4 i
Query =  ['with', 'geant4', 'i']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


  9%|███████▊                                                                               | 41/455 [01:08<10:31,  1.52s/ queries, google=80.49%, yours=70.73%]

Query =  2012 stanford university system requirements
Query =  ['2012', 'stanford', 'university', 'system', 'requirements']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


  9%|████████                                                                               | 42/455 [01:10<12:02,  1.75s/ queries, google=80.95%, yours=71.43%]

Query =  to visit the froze
Query =  ['to', 'visit', 'the', 'froze']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


  9%|████████▏                                                                              | 43/455 [01:11<10:13,  1.49s/ queries, google=79.07%, yours=69.77%]

Query =  channel podcasts panel discussion kqed's
Query =  ['channel', 'podcasts', 'panel', 'discussion', "kqed's"]
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 10%|████████▍                                                                              | 44/455 [01:13<09:56,  1.45s/ queries, google=79.55%, yours=70.45%]

Query =  courses dfj etl lectures mayfield
Query =  ['courses', 'dfj', 'etl', 'lectures', 'mayfield']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 10%|████████▌                                                                              | 45/455 [01:14<09:25,  1.38s/ queries, google=80.00%, yours=71.11%]

Query =  address is there an easy
Query =  ['address', 'is', 'there', 'an', 'easy']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 10%|████████▊                                                                              | 46/455 [01:16<10:30,  1.54s/ queries, google=80.43%, yours=71.74%]

Query =  theend of an
Query =  ['theend', 'of', 'an']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 10%|████████▉                                                                              | 47/455 [01:16<08:16,  1.22s/ queries, google=80.85%, yours=70.21%]

Query =  effort comercial human
Query =  ['effort', 'comercial', 'human']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 11%|█████████▏                                                                             | 48/455 [01:17<07:15,  1.07s/ queries, google=81.25%, yours=70.83%]

Query =  symposium detector development
Query =  ['symposium', 'detector', 'development']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 11%|█████████▎                                                                             | 49/455 [01:18<07:46,  1.15s/ queries, google=81.63%, yours=71.43%]

Query =  students academic programs student activiies guide lines slac i 730 0a21t
Query =  ['students', 'academic', 'programs', 'student', 'activiies', 'guide', 'lines', 'slac', 'i', '730', '0a21t']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
I =  9
I =  10
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])


 11%|█████████▌                                                                             | 50/455 [01:22<12:18,  1.82s/ queries, google=82.00%, yours=70.00%]

Query =  students faculty & staff
Query =  ['students', 'faculty', '&', 'staff']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 11%|█████████▊                                                                             | 51/455 [01:23<10:38,  1.58s/ queries, google=82.35%, yours=70.59%]

Query =  for descovering and confirming in
Query =  ['for', 'descovering', 'and', 'confirming', 'in']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 11%|█████████▉                                                                             | 52/455 [01:25<10:52,  1.62s/ queries, google=82.69%, yours=69.23%]

Query =  culure parameters and the
Query =  ['culure', 'parameters', 'and', 'the']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 12%|██████████▏                                                                            | 53/455 [01:26<10:40,  1.59s/ queries, google=83.02%, yours=69.81%]

Query =  no text full text
Query =  ['no', 'text', 'full', 'text']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 12%|██████████▎                                                                            | 54/455 [01:27<08:58,  1.34s/ queries, google=83.33%, yours=70.37%]

Query =  by modern millitary forces
Query =  ['by', 'modern', 'millitary', 'forces']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 12%|██████████▌                                                                            | 55/455 [01:28<08:30,  1.28s/ queries, google=83.64%, yours=69.09%]

Query =  information in the
Query =  ['information', 'in', 'the']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 12%|██████████▋                                                                            | 56/455 [01:30<09:34,  1.44s/ queries, google=83.93%, yours=69.64%]

Query =  services available througha off campus
Query =  ['services', 'available', 'througha', 'off', 'campus']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 13%|██████████▉                                                                            | 57/455 [01:31<09:42,  1.46s/ queries, google=84.21%, yours=70.18%]

Query =  of pension fundsaving
Query =  ['of', 'pension', 'fundsaving']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 13%|███████████                                                                            | 58/455 [01:32<08:29,  1.28s/ queries, google=84.48%, yours=68.97%]

Query =  j biol chem 1999
Query =  ['j', 'biol', 'chem', '1999']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 13%|███████████▎                                                                           | 59/455 [01:33<06:54,  1.05s/ queries, google=84.75%, yours=69.49%]

Query =  blog cs 193p iphone
Query =  ['blog', 'cs', '193p', 'iphone']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 13%|███████████▍                                                                           | 60/455 [01:33<05:48,  1.13 queries/s, google=85.00%, yours=70.00%]

Query =  3 technology 4 performance
Query =  ['3', 'technology', '4', 'performance']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 13%|███████████▋                                                                           | 61/455 [01:35<07:15,  1.11s/ queries, google=85.25%, yours=70.49%]

Query =  to creating your first ontology
Query =  ['to', 'creating', 'your', 'first', 'ontology']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 14%|███████████▊                                                                           | 62/455 [01:36<07:10,  1.10s/ queries, google=85.48%, yours=70.97%]

Query =  10 ubv 2
Query =  ['10', 'ubv', '2']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 14%|████████████                                                                           | 63/455 [01:36<05:30,  1.19 queries/s, google=84.13%, yours=69.84%]

Query =  for sevial many abandoned
Query =  ['for', 'sevial', 'many', 'abandoned']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 14%|████████████▏                                                                          | 64/455 [01:37<05:28,  1.19 queries/s, google=82.81%, yours=68.75%]

Query =  are being investigated
Query =  ['are', 'being', 'investigated']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 14%|████████████▍                                                                          | 65/455 [01:38<05:49,  1.12 queries/s, google=83.08%, yours=69.23%]

Query =  study of india 2008 much
Query =  ['study', 'of', 'india', '2008', 'much']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 15%|████████████▌                                                                          | 66/455 [01:39<06:04,  1.07 queries/s, google=83.33%, yours=69.70%]

Query =  read more no subscription requied
Query =  ['read', 'more', 'no', 'subscription', 'requied']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 15%|████████████▊                                                                          | 67/455 [01:41<07:53,  1.22s/ queries, google=83.58%, yours=68.66%]

Query =  the software development community at
Query =  ['the', 'software', 'development', 'community', 'at']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 15%|█████████████                                                                          | 68/455 [01:43<09:02,  1.40s/ queries, google=83.82%, yours=69.12%]

Query =  of acual projects
Query =  ['of', 'acual', 'projects']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 15%|█████████████▏                                                                         | 69/455 [01:43<07:20,  1.14s/ queries, google=84.06%, yours=69.57%]

Query =  continued to attrect
Query =  ['continued', 'to', 'attrect']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 15%|█████████████▍                                                                         | 70/455 [01:44<06:38,  1.03s/ queries, google=84.29%, yours=68.57%]

Query =  conference lina khatib larry dimon assoc prof sean
Query =  ['conference', 'lina', 'khatib', 'larry', 'dimon', 'assoc', 'prof', 'sean']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 16%|█████████████▌                                                                         | 71/455 [01:46<07:41,  1.20s/ queries, google=84.51%, yours=67.61%]

Query =  nathan abbott way
Query =  ['nathan', 'abbott', 'way']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 16%|█████████████▊                                                                         | 72/455 [01:46<06:22,  1.00 queries/s, google=84.72%, yours=68.06%]

Query =  humanities and sciences
Query =  ['humanities', 'and', 'sciences']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 16%|█████████████▉                                                                         | 73/455 [01:47<06:19,  1.01 queries/s, google=84.93%, yours=68.49%]

Query =  pert1is the panalytical x pert
Query =  ['pert1is', 'the', 'panalytical', 'x', 'pert']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 16%|██████████████▏                                                                        | 74/455 [01:49<07:14,  1.14s/ queries, google=83.78%, yours=68.92%]

Query =  applied to blood flow
Query =  ['applied', 'to', 'blood', 'flow']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 16%|██████████████▎                                                                        | 75/455 [01:49<06:26,  1.02s/ queries, google=84.00%, yours=69.33%]

Query =  union paces but we
Query =  ['union', 'paces', 'but', 'we']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 17%|██████████████▌                                                                        | 76/455 [01:50<05:27,  1.16 queries/s, google=82.89%, yours=68.42%]

Query =  data from brovser opera then
Query =  ['data', 'from', 'brovser', 'opera', 'then']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 17%|██████████████▋                                                                        | 77/455 [01:51<05:20,  1.18 queries/s, google=83.12%, yours=68.83%]

Query =  proceedings topocs publications academic writing
Query =  ['proceedings', 'topocs', 'publications', 'academic', 'writing']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 17%|██████████████▉                                                                        | 78/455 [01:53<08:42,  1.39s/ queries, google=83.33%, yours=69.23%]

Query =  sulait home su
Query =  ['sulait', 'home', 'su']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 17%|███████████████                                                                        | 79/455 [01:54<07:06,  1.13s/ queries, google=82.28%, yours=69.62%]

Query =  series searchworks strat
Query =  ['series', 'searchworks', 'strat']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 18%|███████████████▎                                                                       | 80/455 [01:55<06:45,  1.08s/ queries, google=81.25%, yours=70.00%]

Query =  cardwith at
Query =  ['cardwith', 'at']
I =  0
I =  1
dict_keys([0, 1])


 18%|███████████████▍                                                                       | 81/455 [01:55<05:28,  1.14 queries/s, google=81.48%, yours=69.14%]

Query =  the houseof
Query =  ['the', 'houseof']
I =  0
I =  1
dict_keys([0, 1])


 18%|███████████████▋                                                                       | 82/455 [01:56<04:29,  1.38 queries/s, google=81.71%, yours=68.29%]

Query =  more free wheeling said roberts a
Query =  ['more', 'free', 'wheeling', 'said', 'roberts', 'a']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 18%|███████████████▊                                                                       | 83/455 [01:57<06:32,  1.06s/ queries, google=80.72%, yours=67.47%]

Query =  the portrait page format postscript athlete if yes please
Query =  ['the', 'portrait', 'page', 'format', 'postscript', 'athlete', 'if', 'yes', 'please']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])


 18%|████████████████                                                                       | 84/455 [02:00<10:16,  1.66s/ queries, google=80.95%, yours=67.86%]

Query =  california 94305 4121 650.725 1575
Query =  ['california', '94305', '4121', '650.725', '1575']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 19%|████████████████▎                                                                      | 85/455 [02:02<09:51,  1.60s/ queries, google=81.18%, yours=68.24%]

Query =  facilities bechtel confernce
Query =  ['facilities', 'bechtel', 'confernce']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 19%|████████████████▍                                                                      | 86/455 [02:03<09:02,  1.47s/ queries, google=81.40%, yours=68.60%]

Query =  the atmosphere and renwable energy
Query =  ['the', 'atmosphere', 'and', 'renwable', 'energy']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 19%|████████████████▋                                                                      | 87/455 [02:04<08:39,  1.41s/ queries, google=81.61%, yours=68.97%]

Query =  results are adirect
Query =  ['results', 'are', 'adirect']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 19%|████████████████▊                                                                      | 88/455 [02:05<07:16,  1.19s/ queries, google=81.82%, yours=68.18%]

Query =  the frist paper i discuss
Query =  ['the', 'frist', 'paper', 'i', 'discuss']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 20%|█████████████████                                                                      | 89/455 [02:06<06:52,  1.13s/ queries, google=82.02%, yours=68.54%]

Query =  winter _____ spring _____ summer
Query =  ['winter', '_____', 'spring', '_____', 'summer']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 20%|█████████████████▏                                                                     | 90/455 [02:07<06:10,  1.01s/ queries, google=82.22%, yours=68.89%]

Query =  wire mesh to hold
Query =  ['wire', 'mesh', 'to', 'hold']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 20%|█████████████████▍                                                                     | 91/455 [02:07<05:14,  1.16 queries/s, google=82.42%, yours=69.23%]

Query =  and the program
Query =  ['and', 'the', 'program']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 20%|█████████████████▌                                                                     | 92/455 [02:08<04:57,  1.22 queries/s, google=82.61%, yours=69.57%]

Query =  california boating safety
Query =  ['california', 'boating', 'safety']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 20%|█████████████████▊                                                                     | 93/455 [02:09<05:04,  1.19 queries/s, google=82.80%, yours=69.89%]

Query =  operations manager mary
Query =  ['operations', 'manager', 'mary']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 21%|█████████████████▉                                                                     | 94/455 [02:10<05:11,  1.16 queries/s, google=82.98%, yours=70.21%]

Query =  the interaction greatly influences
Query =  ['the', 'interaction', 'greatly', 'influences']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 21%|██████████████████▏                                                                    | 95/455 [02:11<06:21,  1.06s/ queries, google=83.16%, yours=70.53%]

Query =  models underestimate the
Query =  ['models', 'underestimate', 'the']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 21%|██████████████████▎                                                                    | 96/455 [02:12<06:21,  1.06s/ queries, google=83.33%, yours=70.83%]

Query =  navigation contract support computer resource
Query =  ['navigation', 'contract', 'support', 'computer', 'resource']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 21%|██████████████████▌                                                                    | 97/455 [02:14<07:27,  1.25s/ queries, google=83.51%, yours=71.13%]

Query =  tocquevilles democracy in america related
Query =  ['tocquevilles', 'democracy', 'in', 'america', 'related']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 22%|██████████████████▋                                                                    | 98/455 [02:16<08:47,  1.48s/ queries, google=83.67%, yours=70.41%]

Query =  established in1994 to
Query =  ['established', 'in1994', 'to']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 22%|██████████████████▉                                                                    | 99/455 [02:17<07:44,  1.31s/ queries, google=83.84%, yours=69.70%]

Query =  suitedin purpose programmes bring faculty members
Query =  ['suitedin', 'purpose', 'programmes', 'bring', 'faculty', 'members']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 22%|██████████████████▉                                                                   | 100/455 [02:19<08:53,  1.50s/ queries, google=83.00%, yours=69.00%]

Query =  foreign language standards
Query =  ['foreign', 'language', 'standards']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 22%|███████████████████                                                                   | 101/455 [02:20<08:16,  1.40s/ queries, google=83.17%, yours=69.31%]

Query =  optical science amo in
Query =  ['optical', 'science', 'amo', 'in']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 22%|███████████████████▎                                                                  | 102/455 [02:21<07:18,  1.24s/ queries, google=83.33%, yours=69.61%]

Query =  199708041649 laa10477 havarti cs
Query =  ['199708041649', 'laa10477', 'havarti', 'cs']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 23%|███████████████████▍                                                                  | 103/455 [02:22<07:19,  1.25s/ queries, google=83.50%, yours=69.90%]

Query =  prograns program on
Query =  ['prograns', 'program', 'on']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 23%|███████████████████▋                                                                  | 104/455 [02:23<06:53,  1.18s/ queries, google=83.65%, yours=69.23%]

Query =  training axes oracle financials reportmart
Query =  ['training', 'axes', 'oracle', 'financials', 'reportmart']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 23%|███████████████████▊                                                                  | 105/455 [02:26<08:55,  1.53s/ queries, google=82.86%, yours=69.52%]

Query =  by catagery forums by time stanford the standford office
Query =  ['by', 'catagery', 'forums', 'by', 'time', 'stanford', 'the', 'standford', 'office']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])


 23%|████████████████████                                                                  | 106/455 [02:30<13:28,  2.32s/ queries, google=83.02%, yours=68.87%]

Query =  in car use
Query =  ['in', 'car', 'use']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 24%|████████████████████▏                                                                 | 107/455 [02:30<10:20,  1.78s/ queries, google=83.18%, yours=69.16%]

Query =  using clack network eds people publications resaerch other
Query =  ['using', 'clack', 'network', 'eds', 'people', 'publications', 'resaerch', 'other']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 24%|████████████████████▍                                                                 | 108/455 [02:33<11:05,  1.92s/ queries, google=82.41%, yours=69.44%]

Query =  author guide fgst author dog factors that contribute to
Query =  ['author', 'guide', 'fgst', 'author', 'dog', 'factors', 'that', 'contribute', 'to']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])


 24%|████████████████████▌                                                                 | 109/455 [02:35<11:34,  2.01s/ queries, google=82.57%, yours=68.81%]

Query =  section 7.5 ft
Query =  ['section', '7.5', 'ft']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 24%|████████████████████▊                                                                 | 110/455 [02:35<09:21,  1.63s/ queries, google=82.73%, yours=69.09%]

Query =  t f
Query =  ['t', 'f']
I =  0
I =  1
dict_keys([0, 1])


 24%|████████████████████▉                                                                 | 111/455 [02:36<06:47,  1.19s/ queries, google=81.98%, yours=68.47%]

Query =  poon balaji prabhakar electrical
Query =  ['poon', 'balaji', 'prabhakar', 'electrical']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 25%|█████████████████████▏                                                                | 112/455 [02:37<06:42,  1.17s/ queries, google=82.14%, yours=68.75%]

Query =  abstract a crucial lemma in
Query =  ['abstract', 'a', 'crucial', 'lemma', 'in']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 25%|█████████████████████▎                                                                | 113/455 [02:38<06:52,  1.21s/ queries, google=82.30%, yours=69.03%]

Query =  highalnd refer the relationship
Query =  ['highalnd', 'refer', 'the', 'relationship']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 25%|█████████████████████▌                                                                | 114/455 [02:39<07:02,  1.24s/ queries, google=82.46%, yours=69.30%]

Query =  useful copyright charts and tools
Query =  ['useful', 'copyright', 'charts', 'and', 'tools']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 25%|█████████████████████▋                                                                | 115/455 [02:41<07:20,  1.30s/ queries, google=82.61%, yours=69.57%]

Query =  of a wide on how we
Query =  ['of', 'a', 'wide', 'on', 'how', 'we']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 25%|█████████████████████▉                                                                | 116/455 [02:43<09:11,  1.63s/ queries, google=82.76%, yours=69.83%]

Query =  speakers to say smething one
Query =  ['speakers', 'to', 'say', 'smething', 'one']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 26%|██████████████████████                                                                | 117/455 [02:45<09:28,  1.68s/ queries, google=82.91%, yours=69.23%]

Query =  stsm at ssrl under
Query =  ['stsm', 'at', 'ssrl', 'under']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 26%|██████████████████████▎                                                               | 118/455 [02:46<07:32,  1.34s/ queries, google=82.20%, yours=68.64%]

Query =  chicken tenders the heisman
Query =  ['chicken', 'tenders', 'the', 'heisman']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 26%|██████████████████████▍                                                               | 119/455 [02:46<06:44,  1.20s/ queries, google=82.35%, yours=68.91%]

Query =  rports by author
Query =  ['rports', 'by', 'author']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 26%|██████████████████████▋                                                               | 120/455 [02:47<05:50,  1.05s/ queries, google=82.50%, yours=69.17%]

Query =  regional opinions blogs
Query =  ['regional', 'opinions', 'blogs']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 27%|██████████████████████▊                                                               | 121/455 [02:48<05:32,  1.00 queries/s, google=82.64%, yours=69.42%]

Query =  rss increas text size
Query =  ['rss', 'increas', 'text', 'size']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 27%|███████████████████████                                                               | 122/455 [02:49<05:08,  1.08 queries/s, google=81.97%, yours=69.67%]

Query =  the costs and benifits of
Query =  ['the', 'costs', 'and', 'benifits', 'of']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 27%|███████████████████████▏                                                              | 123/455 [02:50<06:18,  1.14s/ queries, google=82.11%, yours=69.11%]

Query =  impacts of global warming q&a
Query =  ['impacts', 'of', 'global', 'warming', 'q&a']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 27%|███████████████████████▍                                                              | 124/455 [02:51<06:04,  1.10s/ queries, google=82.26%, yours=69.35%]

Query =  on serra turn right on
Query =  ['on', 'serra', 'turn', 'right', 'on']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 27%|███████████████████████▋                                                              | 125/455 [02:53<06:26,  1.17s/ queries, google=81.60%, yours=69.60%]

Query =  contnt of this frame at kenji haertel edward krumboltz john
Query =  ['contnt', 'of', 'this', 'frame', 'at', 'kenji', 'haertel', 'edward', 'krumboltz', 'john']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
I =  9
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


 28%|███████████████████████▊                                                              | 126/455 [02:55<08:43,  1.59s/ queries, google=81.75%, yours=69.84%]

Query =  from the salon slides
Query =  ['from', 'the', 'salon', 'slides']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 28%|████████████████████████                                                              | 127/455 [02:56<07:36,  1.39s/ queries, google=81.89%, yours=70.08%]

Query =  the ring and on the
Query =  ['the', 'ring', 'and', 'on', 'the']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 28%|████████████████████████▏                                                             | 128/455 [02:58<08:24,  1.54s/ queries, google=82.03%, yours=70.31%]

Query =  provides onlymild security
Query =  ['provides', 'onlymild', 'security']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 28%|████████████████████████▍                                                             | 129/455 [02:59<07:20,  1.35s/ queries, google=82.17%, yours=69.77%]

Query =  ksb search the research opportunities usefull
Query =  ['ksb', 'search', 'the', 'research', 'opportunities', 'usefull']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 29%|████████████████████████▌                                                             | 130/455 [03:02<09:39,  1.78s/ queries, google=81.54%, yours=70.00%]

Query =  tim don ph
Query =  ['tim', 'don', 'ph']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 29%|████████████████████████▊                                                             | 131/455 [03:02<07:44,  1.43s/ queries, google=80.92%, yours=70.23%]

Query =  stanford gsb skip to nontent
Query =  ['stanford', 'gsb', 'skip', 'to', 'nontent']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 29%|████████████████████████▉                                                             | 132/455 [03:04<07:13,  1.34s/ queries, google=81.06%, yours=70.45%]

Query =  dispatch of physiciannurse
Query =  ['dispatch', 'of', 'physiciannurse']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 29%|█████████████████████████▏                                                            | 133/455 [03:05<07:25,  1.38s/ queries, google=81.20%, yours=69.92%]

Query =  food vs energy he
Query =  ['food', 'vs', 'energy', 'he']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 29%|█████████████████████████▎                                                            | 134/455 [03:06<06:37,  1.24s/ queries, google=81.34%, yours=70.15%]

Query =  aegean sea in this well
Query =  ['aegean', 'sea', 'in', 'this', 'well']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 30%|█████████████████████████▌                                                            | 135/455 [03:07<06:38,  1.24s/ queries, google=81.48%, yours=70.37%]

Query =  linguistic information plays
Query =  ['linguistic', 'information', 'plays']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 30%|█████████████████████████▋                                                            | 136/455 [03:08<06:34,  1.24s/ queries, google=81.62%, yours=70.59%]

Query =  on theaper
Query =  ['on', 'theaper']
I =  0
I =  1
dict_keys([0, 1])


 30%|█████████████████████████▉                                                            | 137/455 [03:09<05:04,  1.04 queries/s, google=81.02%, yours=70.07%]

Query =  content related content stanford university
Query =  ['content', 'related', 'content', 'stanford', 'university']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 30%|██████████████████████████                                                            | 138/455 [03:10<06:13,  1.18s/ queries, google=81.16%, yours=70.29%]

Query =  opportunties for motivated grad
Query =  ['opportunties', 'for', 'motivated', 'grad']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 31%|██████████████████████████▎                                                           | 139/455 [03:13<07:45,  1.47s/ queries, google=81.29%, yours=70.50%]

Query =  nhow for our four from
Query =  ['nhow', 'for', 'our', 'four', 'from']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 31%|██████████████████████████▍                                                           | 140/455 [03:13<06:48,  1.30s/ queries, google=80.71%, yours=70.00%]

Query =  thepper arm
Query =  ['thepper', 'arm']
I =  0
I =  1
dict_keys([0, 1])


 31%|██████████████████████████▋                                                           | 141/455 [03:14<05:15,  1.01s/ queries, google=80.14%, yours=69.50%]

Query =  center on food security
Query =  ['center', 'on', 'food', 'security']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 31%|██████████████████████████▊                                                           | 142/455 [03:16<06:37,  1.27s/ queries, google=80.28%, yours=69.72%]

Query =  up messeges are the xerox mouse
Query =  ['up', 'messeges', 'are', 'the', 'xerox', 'mouse']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 31%|███████████████████████████                                                           | 143/455 [03:18<08:20,  1.60s/ queries, google=80.42%, yours=69.93%]

Query =  many nothave permission to
Query =  ['many', 'nothave', 'permission', 'to']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 32%|███████████████████████████▏                                                          | 144/455 [03:19<07:31,  1.45s/ queries, google=80.56%, yours=69.44%]

Query =  where she manged
Query =  ['where', 'she', 'manged']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 32%|███████████████████████████▍                                                          | 145/455 [03:20<06:11,  1.20s/ queries, google=80.69%, yours=68.97%]

Query =  304669 101719 4063882026 75360
Query =  ['304669', '101719', '4063882026', '75360']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 32%|███████████████████████████▌                                                          | 146/455 [03:21<05:53,  1.15s/ queries, google=80.14%, yours=68.49%]

Query =  football rollerblading tennis program see also
Query =  ['football', 'rollerblading', 'tennis', 'program', 'see', 'also']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 32%|███████████████████████████▊                                                          | 147/455 [03:23<07:13,  1.41s/ queries, google=80.27%, yours=68.71%]

Query =  data from browser
Query =  ['data', 'from', 'browser']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 33%|███████████████████████████▉                                                          | 148/455 [03:23<05:42,  1.12s/ queries, google=80.41%, yours=68.92%]

Query =  from shaw university in 1927
Query =  ['from', 'shaw', 'university', 'in', '1927']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 33%|████████████████████████████▏                                                         | 149/455 [03:25<06:40,  1.31s/ queries, google=80.54%, yours=69.13%]

Query =  schlors as the
Query =  ['schlors', 'as', 'the']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 33%|████████████████████████████▎                                                         | 150/455 [03:26<05:31,  1.09s/ queries, google=80.67%, yours=69.33%]

Query =  officers join alumni
Query =  ['officers', 'join', 'alumni']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 33%|████████████████████████████▌                                                         | 151/455 [03:26<04:50,  1.05 queries/s, google=80.79%, yours=69.54%]

Query =  cassman pa mattson jin shun
Query =  ['cassman', 'pa', 'mattson', 'jin', 'shun']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 33%|████████████████████████████▋                                                         | 152/455 [03:27<05:11,  1.03s/ queries, google=80.26%, yours=69.74%]

Query =  does not support the
Query =  ['does', 'not', 'support', 'the']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 34%|████████████████████████████▉                                                         | 153/455 [03:28<04:53,  1.03 queries/s, google=80.39%, yours=69.93%]

Query =  group supri d alternative website the body whuch is low
Query =  ['group', 'supri', 'd', 'alternative', 'website', 'the', 'body', 'whuch', 'is', 'low']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
I =  9
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


 34%|█████████████████████████████                                                         | 154/455 [03:32<08:33,  1.71s/ queries, google=80.52%, yours=69.48%]

Query =  ice ph d ice ph
Query =  ['ice', 'ph', 'd', 'ice', 'ph']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 34%|█████████████████████████████▎                                                        | 155/455 [03:33<07:56,  1.59s/ queries, google=80.00%, yours=69.68%]

Query =  as that is the
Query =  ['as', 'that', 'is', 'the']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 34%|█████████████████████████████▍                                                        | 156/455 [03:36<09:18,  1.87s/ queries, google=80.13%, yours=69.87%]

Query =  david l jaffee ms and
Query =  ['david', 'l', 'jaffee', 'ms', 'and']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 35%|█████████████████████████████▋                                                        | 157/455 [03:37<08:05,  1.63s/ queries, google=80.25%, yours=70.06%]

Query =  privilege on the column grantable
Query =  ['privilege', 'on', 'the', 'column', 'grantable']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 35%|█████████████████████████████▊                                                        | 158/455 [03:38<07:45,  1.57s/ queries, google=80.38%, yours=70.25%]

Query =  gamma exposure constant is
Query =  ['gamma', 'exposure', 'constant', 'is']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 35%|██████████████████████████████                                                        | 159/455 [03:39<06:56,  1.41s/ queries, google=80.50%, yours=70.44%]

Query =  market gardans as a
Query =  ['market', 'gardans', 'as', 'a']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 35%|██████████████████████████████▏                                                       | 160/455 [03:40<05:51,  1.19s/ queries, google=80.62%, yours=70.62%]

Query =  may also be of intrest
Query =  ['may', 'also', 'be', 'of', 'intrest']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 35%|██████████████████████████████▍                                                       | 161/455 [03:41<06:06,  1.25s/ queries, google=80.75%, yours=70.81%]

Query =  request form staff directorys
Query =  ['request', 'form', 'staff', 'directorys']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 36%|██████████████████████████████▌                                                       | 162/455 [03:42<05:49,  1.19s/ queries, google=80.25%, yours=70.99%]

Query =  come to more recent university economics departlment stanford center
Query =  ['come', 'to', 'more', 'recent', 'university', 'economics', 'departlment', 'stanford', 'center']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])


 36%|██████████████████████████████▊                                                       | 163/455 [03:46<09:17,  1.91s/ queries, google=80.37%, yours=71.17%]

Query =  1 academic interview handout
Query =  ['1', 'academic', 'interview', 'handout']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 36%|██████████████████████████████▉                                                       | 164/455 [03:47<08:17,  1.71s/ queries, google=80.49%, yours=71.34%]

Query =  process message re transportation
Query =  ['process', 'message', 're', 'transportation']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 36%|███████████████████████████████▏                                                      | 165/455 [03:49<08:30,  1.76s/ queries, google=80.61%, yours=71.52%]

Query =  aims to provllde users with swrl unified theories+
Query =  ['aims', 'to', 'provllde', 'users', 'with', 'swrl', 'unified', 'theories+']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 36%|███████████████████████████████▍                                                      | 166/455 [03:51<08:37,  1.79s/ queries, google=80.72%, yours=71.08%]

Query =  the john m olin postings and threads click
Query =  ['the', 'john', 'm', 'olin', 'postings', 'and', 'threads', 'click']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 37%|███████████████████████████████▌                                                      | 167/455 [03:53<09:29,  1.98s/ queries, google=80.84%, yours=71.26%]

Query =  events tadsahi fukami historical contingency
Query =  ['events', 'tadsahi', 'fukami', 'historical', 'contingency']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 37%|███████████████████████████████▊                                                      | 168/455 [03:55<08:55,  1.87s/ queries, google=80.95%, yours=71.43%]

Query =  list an d index society cd1040 file the verisions with green
Query =  ['list', 'an', 'd', 'index', 'society', 'cd1040', 'file', 'the', 'verisions', 'with', 'green']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
I =  9
I =  10
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])


 37%|███████████████████████████████▉                                                      | 169/455 [04:00<13:53,  2.92s/ queries, google=81.07%, yours=71.01%]

Query =  ish a great tool
Query =  ['ish', 'a', 'great', 'tool']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 37%|████████████████████████████████▏                                                     | 170/455 [04:01<10:26,  2.20s/ queries, google=81.18%, yours=71.18%]

Query =  for ubuntu 11.04 proveding an oppertunity
Query =  ['for', 'ubuntu', '11.04', 'proveding', 'an', 'oppertunity']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 38%|████████████████████████████████▎                                                     | 171/455 [04:02<09:34,  2.02s/ queries, google=81.29%, yours=71.35%]

Query =  the cdd a social
Query =  ['the', 'cdd', 'a', 'social']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 38%|████████████████████████████████▌                                                     | 172/455 [04:03<07:54,  1.68s/ queries, google=81.40%, yours=71.51%]

Query =  4581 fad 650 725 2592
Query =  ['4581', 'fad', '650', '725', '2592']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 38%|████████████████████████████████▋                                                     | 173/455 [04:04<06:58,  1.48s/ queries, google=80.92%, yours=71.68%]

Query =  of newpor and
Query =  ['of', 'newpor', 'and']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 38%|████████████████████████████████▉                                                     | 174/455 [04:05<05:26,  1.16s/ queries, google=81.03%, yours=71.26%]

Query =  morabito australian unions the
Query =  ['morabito', 'australian', 'unions', 'the']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 38%|█████████████████████████████████                                                     | 175/455 [04:06<05:20,  1.14s/ queries, google=81.14%, yours=71.43%]

Query =  members all pertinent information that
Query =  ['members', 'all', 'pertinent', 'information', 'that']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 39%|█████████████████████████████████▎                                                    | 176/455 [04:09<08:59,  1.93s/ queries, google=81.25%, yours=71.59%]

Query =  on call rooms graduate medical
Query =  ['on', 'call', 'rooms', 'graduate', 'medical']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 39%|█████████████████████████████████▍                                                    | 177/455 [04:11<08:10,  1.76s/ queries, google=81.36%, yours=71.75%]

Query =  to run the
Query =  ['to', 'run', 'the']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 39%|█████████████████████████████████▋                                                    | 178/455 [04:11<06:30,  1.41s/ queries, google=81.46%, yours=71.91%]

Query =  data from the browser's
Query =  ['data', 'from', 'the', "browser's"]
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 39%|█████████████████████████████████▊                                                    | 179/455 [04:12<05:33,  1.21s/ queries, google=81.56%, yours=72.07%]

Query =  the wind of fredoom
Query =  ['the', 'wind', 'of', 'fredoom']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 40%|██████████████████████████████████                                                    | 180/455 [04:13<05:00,  1.09s/ queries, google=81.67%, yours=72.22%]

Query =  provided throughout this article to
Query =  ['provided', 'throughout', 'this', 'article', 'to']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 40%|██████████████████████████████████▏                                                   | 181/455 [04:15<05:49,  1.28s/ queries, google=81.77%, yours=72.38%]

Query =  579 sorra mall stanfor ca
Query =  ['579', 'sorra', 'mall', 'stanfor', 'ca']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 40%|██████████████████████████████████▍                                                   | 182/455 [04:16<05:42,  1.25s/ queries, google=81.87%, yours=72.53%]

Query =  often the exit angle is
Query =  ['often', 'the', 'exit', 'angle', 'is']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 40%|██████████████████████████████████▌                                                   | 183/455 [04:17<05:26,  1.20s/ queries, google=81.97%, yours=72.68%]

Query =  all postings outline chose
Query =  ['all', 'postings', 'outline', 'chose']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 40%|██████████████████████████████████▊                                                   | 184/455 [04:18<04:55,  1.09s/ queries, google=81.52%, yours=72.83%]

Query =  aperson contact us
Query =  ['aperson', 'contact', 'us']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 41%|██████████████████████████████████▉                                                   | 185/455 [04:20<05:53,  1.31s/ queries, google=81.62%, yours=72.43%]

Query =  navigational testdirectory news center
Query =  ['navigational', 'testdirectory', 'news', 'center']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 41%|███████████████████████████████████▏                                                  | 186/455 [04:22<07:12,  1.61s/ queries, google=81.72%, yours=72.04%]

Query =  failure of viral capsids 2
Query =  ['failure', 'of', 'viral', 'capsids', '2']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 41%|███████████████████████████████████▎                                                  | 187/455 [04:24<07:49,  1.75s/ queries, google=81.82%, yours=72.19%]

Query =  stanford graduate school of business
Query =  ['stanford', 'graduate', 'school', 'of', 'business']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 41%|███████████████████████████████████▌                                                  | 188/455 [04:26<07:27,  1.67s/ queries, google=81.91%, yours=72.34%]

Query =  douglsas k owens
Query =  ['douglsas', 'k', 'owens']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 42%|███████████████████████████████████▋                                                  | 189/455 [04:26<06:00,  1.35s/ queries, google=82.01%, yours=72.49%]

Query =  1 recent comments
Query =  ['1', 'recent', 'comments']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 42%|███████████████████████████████████▉                                                  | 190/455 [04:27<05:07,  1.16s/ queries, google=82.11%, yours=72.63%]

Query =  won t talk to them
Query =  ['won', 't', 'talk', 'to', 'them']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 42%|████████████████████████████████████                                                  | 191/455 [04:28<04:44,  1.08s/ queries, google=81.68%, yours=72.77%]

Query =  data simulated data are
Query =  ['data', 'simulated', 'data', 'are']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 42%|████████████████████████████████████▎                                                 | 192/455 [04:29<04:29,  1.03s/ queries, google=81.77%, yours=72.92%]

Query =  cover letters interviewing strategies on
Query =  ['cover', 'letters', 'interviewing', 'strategies', 'on']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 42%|████████████████████████████████████▍                                                 | 193/455 [04:30<05:27,  1.25s/ queries, google=81.35%, yours=72.54%]

Query =  like for you
Query =  ['like', 'for', 'you']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 43%|████████████████████████████████████▋                                                 | 194/455 [04:31<04:29,  1.03s/ queries, google=81.44%, yours=72.68%]

Query =  is due novenber typeset every book on buddism
Query =  ['is', 'due', 'novenber', 'typeset', 'every', 'book', 'on', 'buddism']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 43%|████████████████████████████████████▊                                                 | 195/455 [04:33<05:59,  1.38s/ queries, google=81.54%, yours=72.82%]

Query =  cm2 g total 0.16498 cm2
Query =  ['cm2', 'g', 'total', '0.16498', 'cm2']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 43%|█████████████████████████████████████                                                 | 196/455 [04:34<05:15,  1.22s/ queries, google=81.63%, yours=72.96%]

Query =  technological inovation social
Query =  ['technological', 'inovation', 'social']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 43%|█████████████████████████████████████▏                                                | 197/455 [04:35<05:25,  1.26s/ queries, google=81.73%, yours=73.10%]

Query =  2003 director human bilolgy program
Query =  ['2003', 'director', 'human', 'bilolgy', 'program']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 44%|█████████████████████████████████████▍                                                | 198/455 [04:36<05:10,  1.21s/ queries, google=81.82%, yours=73.23%]

Query =  mus sic links suggest a purchase
Query =  ['mus', 'sic', 'links', 'suggest', 'a', 'purchase']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 44%|█████████████████████████████████████▌                                                | 199/455 [04:38<05:29,  1.29s/ queries, google=81.91%, yours=72.86%]

Query =  cite this send
Query =  ['cite', 'this', 'send']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 44%|█████████████████████████████████████▊                                                | 200/455 [04:38<04:18,  1.01s/ queries, google=82.00%, yours=73.00%]

Query =  editing hints using
Query =  ['editing', 'hints', 'using']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 44%|█████████████████████████████████████▉                                                | 201/455 [04:39<03:39,  1.16 queries/s, google=82.09%, yours=73.13%]

Query =  subject simin aneshvar
Query =  ['subject', 'simin', 'aneshvar']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 44%|██████████████████████████████████████▏                                               | 202/455 [04:39<03:24,  1.24 queries/s, google=82.18%, yours=73.27%]

Query =  and image date
Query =  ['and', 'image', 'date']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 45%|██████████████████████████████████████▎                                               | 203/455 [04:40<02:58,  1.41 queries/s, google=81.77%, yours=72.91%]

Query =  of classics standford univeristy logo
Query =  ['of', 'classics', 'standford', 'univeristy', 'logo']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 45%|██████████████████████████████████████▌                                               | 204/455 [04:41<04:02,  1.04 queries/s, google=81.86%, yours=72.55%]

Query =  programs grants & fellowships people
Query =  ['programs', 'grants', '&', 'fellowships', 'people']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 45%|██████████████████████████████████████▋                                               | 205/455 [04:43<04:33,  1.09s/ queries, google=81.95%, yours=72.68%]

Query =  guiseppe nardulli hep ph 0111178
Query =  ['guiseppe', 'nardulli', 'hep', 'ph', '0111178']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 45%|██████████████████████████████████████▉                                               | 206/455 [04:44<04:35,  1.11s/ queries, google=82.04%, yours=72.82%]

Query =  fsi centers & programme the text of the postings
Query =  ['fsi', 'centers', '&', 'programme', 'the', 'text', 'of', 'the', 'postings']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])


 45%|███████████████████████████████████████▏                                              | 207/455 [04:47<06:59,  1.69s/ queries, google=82.13%, yours=72.95%]

Query =  21 201204 15
Query =  ['21', '201204', '15']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 46%|███████████████████████████████████████▎                                              | 208/455 [04:48<05:28,  1.33s/ queries, google=81.73%, yours=72.60%]

Query =  from febuary 4 2012
Query =  ['from', 'febuary', '4', '2012']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 46%|███████████████████████████████████████▌                                              | 209/455 [04:49<05:05,  1.24s/ queries, google=81.82%, yours=72.25%]

Query =  also taught nuclear energy
Query =  ['also', 'taught', 'nuclear', 'energy']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 46%|███████████████████████████████████████▋                                              | 210/455 [04:49<04:40,  1.14s/ queries, google=81.90%, yours=72.38%]

Query =  for distribution at
Query =  ['for', 'distribution', 'at']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 46%|███████████████████████████████████████▉                                              | 211/455 [04:51<05:17,  1.30s/ queries, google=81.99%, yours=72.51%]

Query =  2 2x x
Query =  ['2', '2x', 'x']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 47%|████████████████████████████████████████                                              | 212/455 [04:52<04:07,  1.02s/ queries, google=82.08%, yours=72.17%]

Query =  account s will
Query =  ['account', 's', 'will']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 47%|████████████████████████████████████████▎                                             | 213/455 [04:52<03:53,  1.04 queries/s, google=82.16%, yours=72.30%]

Query =  unfortunately while lay users can
Query =  ['unfortunately', 'while', 'lay', 'users', 'can']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 47%|████████████████████████████████████████▍                                             | 214/455 [04:54<05:00,  1.25s/ queries, google=82.24%, yours=72.43%]

Query =  on facebppk share on twitter
Query =  ['on', 'facebppk', 'share', 'on', 'twitter']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 47%|████████████████████████████████████████▋                                             | 215/455 [04:56<05:11,  1.30s/ queries, google=82.33%, yours=72.56%]

Query =  ca 94305 650 329 8566
Query =  ['ca', '94305', '650', '329', '8566']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 47%|████████████████████████████████████████▊                                             | 216/455 [04:57<05:32,  1.39s/ queries, google=82.41%, yours=72.69%]

Query =  the numbwe to
Query =  ['the', 'numbwe', 'to']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 48%|█████████████████████████████████████████                                             | 217/455 [04:58<04:17,  1.08s/ queries, google=82.49%, yours=72.81%]

Query =  very interested in worknig with
Query =  ['very', 'interested', 'in', 'worknig', 'with']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 48%|█████████████████████████████████████████▏                                            | 218/455 [04:59<04:36,  1.17s/ queries, google=82.57%, yours=72.94%]

Query =  onsomewhat cooincidentally for
Query =  ['onsomewhat', 'cooincidentally', 'for']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 48%|█████████████████████████████████████████▍                                            | 219/455 [05:01<05:16,  1.34s/ queries, google=82.19%, yours=72.60%]

Query =  mail code phone fax e r staf list maps
Query =  ['mail', 'code', 'phone', 'fax', 'e', 'r', 'staf', 'list', 'maps']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])


 48%|█████████████████████████████████████████▌                                            | 220/455 [05:04<07:03,  1.80s/ queries, google=81.82%, yours=72.27%]

Query =  my wacom graphire
Query =  ['my', 'wacom', 'graphire']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 49%|█████████████████████████████████████████▊                                            | 221/455 [05:04<05:34,  1.43s/ queries, google=81.90%, yours=72.40%]

Query =  which are abstract
Query =  ['which', 'are', 'abstract']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 49%|█████████████████████████████████████████▉                                            | 222/455 [05:05<04:46,  1.23s/ queries, google=81.98%, yours=72.52%]

Query =  & institutes professor health research science the vast majority of
Query =  ['&', 'institutes', 'professor', 'health', 'research', 'science', 'the', 'vast', 'majority', 'of']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
I =  9
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


 49%|██████████████████████████████████████████▏                                           | 223/455 [05:10<08:44,  2.26s/ queries, google=82.06%, yours=72.65%]

Query =  guides presentations recommendations and reports
Query =  ['guides', 'presentations', 'recommendations', 'and', 'reports']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 49%|██████████████████████████████████████████▎                                           | 224/455 [05:13<10:19,  2.68s/ queries, google=82.14%, yours=72.77%]

Query =  for bflb hypernews
Query =  ['for', 'bflb', 'hypernews']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 49%|██████████████████████████████████████████▌                                           | 225/455 [05:14<07:59,  2.09s/ queries, google=82.22%, yours=72.89%]

Query =  for one thiing
Query =  ['for', 'one', 'thiing']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 50%|██████████████████████████████████████████▋                                           | 226/455 [05:15<06:23,  1.68s/ queries, google=82.30%, yours=73.01%]

Query =  cccrma stadford edu tue sept
Query =  ['cccrma', 'stadford', 'edu', 'tue', 'sept']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 50%|██████████████████████████████████████████▉                                           | 227/455 [05:16<05:35,  1.47s/ queries, google=82.38%, yours=72.69%]

Query =  david a reis fisherds
Query =  ['david', 'a', 'reis', 'fisherds']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 50%|███████████████████████████████████████████                                           | 228/455 [05:16<04:45,  1.26s/ queries, google=82.46%, yours=72.81%]

Query =  managment group name email address
Query =  ['managment', 'group', 'name', 'email', 'address']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 50%|███████████████████████████████████████████▎                                          | 229/455 [05:18<05:06,  1.35s/ queries, google=82.53%, yours=72.49%]

Query =  thanks manju sudakar inline depth
Query =  ['thanks', 'manju', 'sudakar', 'inline', 'depth']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 51%|███████████████████████████████████████████▍                                          | 230/455 [05:19<04:41,  1.25s/ queries, google=82.61%, yours=72.61%]

Query =  and services that focus standford univestiy all
Query =  ['and', 'services', 'that', 'focus', 'standford', 'univestiy', 'all']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
dict_keys([0, 1, 2, 3, 4, 5, 6])


 51%|███████████████████████████████████████████▋                                          | 231/455 [05:21<05:50,  1.56s/ queries, google=82.68%, yours=72.29%]

Query =  the london school
Query =  ['the', 'london', 'school']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 51%|███████████████████████████████████████████▊                                          | 232/455 [05:22<04:44,  1.27s/ queries, google=82.76%, yours=72.41%]

Query =  chen ph d staff
Query =  ['chen', 'ph', 'd', 'staff']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 51%|████████████████████████████████████████████                                          | 233/455 [05:23<04:08,  1.12s/ queries, google=82.83%, yours=72.53%]

Query =  11 the hound
Query =  ['11', 'the', 'hound']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 51%|████████████████████████████████████████████▏                                         | 234/455 [05:24<04:10,  1.14s/ queries, google=82.91%, yours=72.65%]

Query =  service eating contest given
Query =  ['service', 'eating', 'contest', 'given']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 52%|████████████████████████████████████████████▍                                         | 235/455 [05:25<04:01,  1.10s/ queries, google=82.98%, yours=72.77%]

Query =  2008 standford local programming contest
Query =  ['2008', 'standford', 'local', 'programming', 'contest']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 52%|████████████████████████████████████████████▌                                         | 236/455 [05:26<04:32,  1.24s/ queries, google=83.05%, yours=72.46%]

Query =  intellectual property enforcement coordinator on
Query =  ['intellectual', 'property', 'enforcement', 'coordinator', 'on']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 52%|████████████████████████████████████████████▊                                         | 237/455 [05:29<06:22,  1.75s/ queries, google=83.12%, yours=72.57%]

Query =  your account has benn randomly
Query =  ['your', 'account', 'has', 'benn', 'randomly']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 52%|████████████████████████████████████████████▉                                         | 238/455 [05:32<07:02,  1.95s/ queries, google=83.19%, yours=72.69%]

Query =  1.00 0.00 1.00
Query =  ['1.00', '0.00', '1.00']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 53%|█████████████████████████████████████████████▏                                        | 239/455 [05:32<05:30,  1.53s/ queries, google=83.26%, yours=72.80%]

Query =  interfaces user and admin users address book add names
Query =  ['interfaces', 'user', 'and', 'admin', 'users', 'address', 'book', 'add', 'names']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])


 53%|█████████████████████████████████████████████▎                                        | 240/455 [05:35<07:08,  1.99s/ queries, google=83.33%, yours=72.92%]

Query =  same webside before that edu stanford university 425
Query =  ['same', 'webside', 'before', 'that', 'edu', 'stanford', 'university', '425']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 53%|█████████████████████████████████████████████▌                                        | 241/455 [05:39<08:55,  2.50s/ queries, google=83.40%, yours=72.61%]

Query =  13 ho el as
Query =  ['13', 'ho', 'el', 'as']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 53%|█████████████████████████████████████████████▋                                        | 242/455 [05:40<07:27,  2.10s/ queries, google=83.06%, yours=72.31%]

Query =  the posting thread
Query =  ['the', 'posting', 'thread']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 53%|█████████████████████████████████████████████▉                                        | 243/455 [05:41<06:19,  1.79s/ queries, google=83.13%, yours=72.43%]

Query =  publications send by
Query =  ['publications', 'send', 'by']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 54%|██████████████████████████████████████████████                                        | 244/455 [05:43<06:24,  1.82s/ queries, google=83.20%, yours=72.54%]

Query =  http you could try
Query =  ['http', 'you', 'could', 'try']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 54%|██████████████████████████████████████████████▎                                       | 245/455 [05:44<05:14,  1.50s/ queries, google=83.27%, yours=72.65%]

Query =  facilty profile content provider
Query =  ['facilty', 'profile', 'content', 'provider']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 54%|██████████████████████████████████████████████▍                                       | 246/455 [05:45<05:11,  1.49s/ queries, google=83.33%, yours=72.76%]

Query =  36 bit 18
Query =  ['36', 'bit', '18']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 54%|██████████████████████████████████████████████▋                                       | 247/455 [05:46<04:13,  1.22s/ queries, google=83.40%, yours=72.87%]

Query =  he has wroked on
Query =  ['he', 'has', 'wroked', 'on']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 55%|██████████████████████████████████████████████▊                                       | 248/455 [05:48<04:30,  1.31s/ queries, google=83.47%, yours=72.98%]

Query =  academic calendar masters
Query =  ['academic', 'calendar', 'masters']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 55%|███████████████████████████████████████████████                                       | 249/455 [05:49<04:23,  1.28s/ queries, google=83.13%, yours=72.69%]

Query =  3 downloaded 23 feb
Query =  ['3', 'downloaded', '23', 'feb']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 55%|███████████████████████████████████████████████▎                                      | 250/455 [05:51<05:02,  1.48s/ queries, google=83.20%, yours=72.80%]

Query =  g4system gmk were can i
Query =  ['g4system', 'gmk', 'were', 'can', 'i']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 55%|███████████████████████████████████████████████▍                                      | 251/455 [05:52<05:14,  1.54s/ queries, google=82.87%, yours=72.91%]

Query =  page which contains only the
Query =  ['page', 'which', 'contains', 'only', 'the']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 55%|███████████████████████████████████████████████▋                                      | 252/455 [05:55<06:06,  1.80s/ queries, google=82.94%, yours=73.02%]

Query =  none unselect all of ibn sina a critical
Query =  ['none', 'unselect', 'all', 'of', 'ibn', 'sina', 'a', 'critical']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 56%|███████████████████████████████████████████████▊                                      | 253/455 [05:57<06:52,  2.04s/ queries, google=83.00%, yours=73.12%]

Query =  machenery and intelligence
Query =  ['machenery', 'and', 'intelligence']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 56%|████████████████████████████████████████████████                                      | 254/455 [05:59<06:08,  1.83s/ queries, google=83.07%, yours=73.23%]

Query =  archive colophon admin logon
Query =  ['archive', 'colophon', 'admin', 'logon']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 56%|████████████████████████████████████████████████▏                                     | 255/455 [06:00<05:28,  1.64s/ queries, google=82.75%, yours=73.33%]

Query =  sulair home su home suspect stanford stanford university
Query =  ['sulair', 'home', 'su', 'home', 'suspect', 'stanford', 'stanford', 'university']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 56%|████████████████████████████████████████████████▍                                     | 256/455 [06:04<07:31,  2.27s/ queries, google=82.81%, yours=73.05%]

Query =  spam and virus filtering software
Query =  ['spam', 'and', 'virus', 'filtering', 'software']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 56%|████████████████████████████████████████████████▌                                     | 257/455 [06:06<07:14,  2.19s/ queries, google=82.88%, yours=73.15%]

Query =  process note 1 fr students
Query =  ['process', 'note', '1', 'fr', 'students']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 57%|████████████████████████████████████████████████▊                                     | 258/455 [06:08<07:39,  2.33s/ queries, google=82.95%, yours=72.87%]

Query =  research overview school
Query =  ['research', 'overview', 'school']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 57%|████████████████████████████████████████████████▉                                     | 259/455 [06:09<06:10,  1.89s/ queries, google=83.01%, yours=72.97%]

Query =  deep belowe the
Query =  ['deep', 'belowe', 'the']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 57%|█████████████████████████████████████████████████▏                                    | 260/455 [06:10<04:51,  1.49s/ queries, google=83.08%, yours=73.08%]

Query =  i can change things for
Query =  ['i', 'can', 'change', 'things', 'for']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 57%|█████████████████████████████████████████████████▎                                    | 261/455 [06:11<04:41,  1.45s/ queries, google=83.14%, yours=73.18%]

Query =  similuation our long
Query =  ['similuation', 'our', 'long']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 58%|█████████████████████████████████████████████████▌                                    | 262/455 [06:13<05:10,  1.61s/ queries, google=83.21%, yours=73.28%]

Query =  give raise to severe emittance babar database who's
Query =  ['give', 'raise', 'to', 'severe', 'emittance', 'babar', 'database', "who's"]
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 58%|█████████████████████████████████████████████████▋                                    | 263/455 [06:16<05:57,  1.86s/ queries, google=82.89%, yours=73.00%]

Query =  page 1 moran bercovici advisorzluan
Query =  ['page', '1', 'moran', 'bercovici', 'advisorzluan']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 58%|█████████████████████████████████████████████████▉                                    | 264/455 [06:18<06:18,  1.98s/ queries, google=82.58%, yours=73.11%]

Query =  record lenght the sited together with
Query =  ['record', 'lenght', 'the', 'sited', 'together', 'with']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 58%|██████████████████████████████████████████████████                                    | 265/455 [06:20<06:22,  2.01s/ queries, google=82.26%, yours=73.21%]

Query =  abstracts xx international linac
Query =  ['abstracts', 'xx', 'international', 'linac']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 58%|██████████████████████████████████████████████████▎                                   | 266/455 [06:22<06:29,  2.06s/ queries, google=82.33%, yours=73.31%]

Query =  the physics department crimefighting organization
Query =  ['the', 'physics', 'department', 'crimefighting', 'organization']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 59%|██████████████████████████████████████████████████▍                                   | 267/455 [06:25<07:11,  2.29s/ queries, google=82.40%, yours=73.03%]

Query =  health improvement progrma stanford medicine
Query =  ['health', 'improvement', 'progrma', 'stanford', 'medicine']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 59%|██████████████████████████████████████████████████▋                                   | 268/455 [06:27<06:43,  2.16s/ queries, google=82.46%, yours=73.13%]

Query =  x eido design
Query =  ['x', 'eido', 'design']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 59%|██████████████████████████████████████████████████▊                                   | 269/455 [06:27<05:17,  1.71s/ queries, google=82.16%, yours=73.23%]

Query =  xi violinist jiaotung university
Query =  ['xi', 'violinist', 'jiaotung', 'university']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 59%|███████████████████████████████████████████████████                                   | 270/455 [06:29<05:07,  1.66s/ queries, google=82.22%, yours=73.33%]

Query =  please mailchecks made out
Query =  ['please', 'mailchecks', 'made', 'out']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 60%|███████████████████████████████████████████████████▏                                  | 271/455 [06:30<04:47,  1.56s/ queries, google=82.29%, yours=73.06%]

Query =  chalenges than last year
Query =  ['chalenges', 'than', 'last', 'year']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 60%|███████████████████████████████████████████████████▍                                  | 272/455 [06:32<04:26,  1.46s/ queries, google=82.35%, yours=72.79%]

Query =  safe rosamond l naylor george
Query =  ['safe', 'rosamond', 'l', 'naylor', 'george']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 60%|███████████████████████████████████████████████████▌                                  | 273/455 [06:33<04:27,  1.47s/ queries, google=82.42%, yours=72.89%]

Query =  7252592 mail code contact us to
Query =  ['7252592', 'mail', 'code', 'contact', 'us', 'to']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 60%|███████████████████████████████████████████████████▊                                  | 274/455 [06:36<05:37,  1.87s/ queries, google=82.12%, yours=72.63%]

Query =  clase sroom for instructors
Query =  ['clase', 'sroom', 'for', 'instructors']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 60%|███████████████████████████████████████████████████▉                                  | 275/455 [06:37<05:01,  1.68s/ queries, google=82.18%, yours=72.36%]

Query =  via the kerr affect and
Query =  ['via', 'the', 'kerr', 'affect', 'and']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 61%|████████████████████████████████████████████████████▏                                 | 276/455 [06:38<04:43,  1.58s/ queries, google=82.25%, yours=72.46%]

Query =  plasmid & puhe24
Query =  ['plasmid', '&', 'puhe24']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 61%|████████████████████████████████████████████████████▎                                 | 277/455 [06:39<04:01,  1.36s/ queries, google=82.31%, yours=72.56%]

Query =  students graduate students undergraduates
Query =  ['students', 'graduate', 'students', 'undergraduates']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 61%|████████████████████████████████████████████████████▌                                 | 278/455 [06:41<04:30,  1.53s/ queries, google=82.37%, yours=72.66%]

Query =  items all day
Query =  ['items', 'all', 'day']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 61%|████████████████████████████████████████████████████▋                                 | 279/455 [06:42<03:34,  1.22s/ queries, google=82.44%, yours=72.76%]

Query =  completing a post doctoral
Query =  ['completing', 'a', 'post', 'doctoral']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 62%|████████████████████████████████████████████████████▉                                 | 280/455 [06:43<04:01,  1.38s/ queries, google=82.50%, yours=72.50%]

Query =  his her particular aread of
Query =  ['his', 'her', 'particular', 'aread', 'of']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 62%|█████████████████████████████████████████████████████                                 | 281/455 [06:46<04:47,  1.65s/ queries, google=82.56%, yours=72.24%]

Query =  the abliity to
Query =  ['the', 'abliity', 'to']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 62%|█████████████████████████████████████████████████████▎                                | 282/455 [06:46<03:44,  1.30s/ queries, google=82.62%, yours=72.34%]

Query =  politicans officials and academics
Query =  ['politicans', 'officials', 'and', 'academics']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 62%|█████████████████████████████████████████████████████▍                                | 283/455 [06:48<04:02,  1.41s/ queries, google=82.69%, yours=72.08%]

Query =  expressions library lip synch
Query =  ['expressions', 'library', 'lip', 'synch']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 62%|█████████████████████████████████████████████████████▋                                | 284/455 [06:50<04:20,  1.52s/ queries, google=82.75%, yours=71.83%]

Query =  the work was available
Query =  ['the', 'work', 'was', 'available']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 63%|█████████████████████████████████████████████████████▊                                | 285/455 [06:52<04:59,  1.76s/ queries, google=82.81%, yours=71.93%]

Query =  d ivoir croatia cuba
Query =  ['d', 'ivoir', 'croatia', 'cuba']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 63%|██████████████████████████████████████████████████████                                | 286/455 [06:52<03:54,  1.39s/ queries, google=82.52%, yours=72.03%]

Query =  the origiinal spirit
Query =  ['the', 'origiinal', 'spirit']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 63%|██████████████████████████████████████████████████████▏                               | 287/455 [06:54<03:41,  1.32s/ queries, google=82.58%, yours=72.13%]

Query =  first page preveous 2009 02 01 author
Query =  ['first', 'page', 'preveous', '2009', '02', '01', 'author']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
dict_keys([0, 1, 2, 3, 4, 5, 6])


 63%|██████████████████████████████████████████████████████▍                               | 288/455 [06:56<04:50,  1.74s/ queries, google=82.64%, yours=71.88%]

Query =  the reefcheck california monitoring
Query =  ['the', 'reefcheck', 'california', 'monitoring']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 64%|██████████████████████████████████████████████████████▌                               | 289/455 [06:58<04:40,  1.69s/ queries, google=82.70%, yours=71.63%]

Query =  tressider summer activities fair
Query =  ['tressider', 'summer', 'activities', 'fair']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 64%|██████████████████████████████████████████████████████▊                               | 290/455 [07:00<04:36,  1.68s/ queries, google=82.41%, yours=71.72%]

Query =  is so fars cs379c computation models
Query =  ['is', 'so', 'fars', 'cs379c', 'computation', 'models']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 64%|███████████████████████████████████████████████████████                               | 291/455 [07:02<05:00,  1.83s/ queries, google=82.13%, yours=71.48%]

Query =  and sustainble develpment 2010 pdf+
Query =  ['and', 'sustainble', 'develpment', '2010', 'pdf+']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 64%|███████████████████████████████████████████████████████▏                              | 292/455 [07:05<05:48,  2.14s/ queries, google=81.85%, yours=71.58%]

Query =  health scholars program
Query =  ['health', 'scholars', 'program']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 64%|███████████████████████████████████████████████████████▍                              | 293/455 [07:05<04:38,  1.72s/ queries, google=81.91%, yours=71.67%]

Query =  we help maps
Query =  ['we', 'help', 'maps']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 65%|███████████████████████████████████████████████████████▌                              | 294/455 [07:06<03:37,  1.35s/ queries, google=81.97%, yours=71.77%]

Query =  public evens page on this
Query =  ['public', 'evens', 'page', 'on', 'this']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 65%|███████████████████████████████████████████████████████▊                              | 295/455 [07:07<03:25,  1.29s/ queries, google=82.03%, yours=71.53%]

Query =  graduate school of business news
Query =  ['graduate', 'school', 'of', 'business', 'news']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 65%|███████████████████████████████████████████████████████▉                              | 296/455 [07:08<03:29,  1.32s/ queries, google=82.09%, yours=71.62%]

Query =  the receptiors might rev up
Query =  ['the', 'receptiors', 'might', 'rev', 'up']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 65%|████████████████████████████████████████████████████████▏                             | 297/455 [07:09<03:18,  1.25s/ queries, google=82.15%, yours=71.72%]

Query =  of education freeman spogli institute
Query =  ['of', 'education', 'freeman', 'spogli', 'institute']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 65%|████████████████████████████████████████████████████████▎                             | 298/455 [07:11<03:48,  1.46s/ queries, google=82.21%, yours=71.81%]

Query =  the enviroment fsi hasked to define
Query =  ['the', 'enviroment', 'fsi', 'hasked', 'to', 'define']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 66%|████████████████████████████████████████████████████████▌                             | 299/455 [07:13<03:57,  1.52s/ queries, google=82.27%, yours=71.91%]

Query =  stanford califorina 94305
Query =  ['stanford', 'califorina', '94305']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 66%|████████████████████████████████████████████████████████▋                             | 300/455 [07:14<03:26,  1.33s/ queries, google=82.33%, yours=72.00%]

Query =  webinars will be
Query =  ['webinars', 'will', 'be']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 66%|████████████████████████████████████████████████████████▉                             | 301/455 [07:15<02:55,  1.14s/ queries, google=82.39%, yours=72.09%]

Query =  pro vost and director of
Query =  ['pro', 'vost', 'and', 'director', 'of']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 66%|█████████████████████████████████████████████████████████                             | 302/455 [07:17<03:41,  1.45s/ queries, google=82.45%, yours=71.85%]

Query =  the sun's heartbeat to
Query =  ['the', "sun's", 'heartbeat', 'to']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 67%|█████████████████████████████████████████████████████████▎                            | 303/455 [07:19<04:03,  1.60s/ queries, google=82.51%, yours=71.95%]

Query =  mikiphone pocket phonogtaph
Query =  ['mikiphone', 'pocket', 'phonogtaph']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 67%|█████████████████████████████████████████████████████████▍                            | 304/455 [07:20<03:35,  1.43s/ queries, google=82.57%, yours=72.04%]

Query =  1152 email pacrc
Query =  ['1152', 'email', 'pacrc']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 67%|█████████████████████████████████████████████████████████▋                            | 305/455 [07:20<02:55,  1.17s/ queries, google=82.62%, yours=72.13%]

Query =  catapulted both king and
Query =  ['catapulted', 'both', 'king', 'and']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 67%|█████████████████████████████████████████████████████████▊                            | 306/455 [07:21<02:50,  1.14s/ queries, google=82.68%, yours=72.22%]

Query =  east europe &
Query =  ['east', 'europe', '&']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 67%|██████████████████████████████████████████████████████████                            | 307/455 [07:22<02:27,  1.00 queries/s, google=82.41%, yours=71.99%]

Query =  admissions continueing medical education
Query =  ['admissions', 'continueing', 'medical', 'education']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 68%|██████████████████████████████████████████████████████████▏                           | 308/455 [07:24<02:58,  1.21s/ queries, google=82.47%, yours=72.08%]

Query =  record for tiney
Query =  ['record', 'for', 'tiney']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 68%|██████████████████████████████████████████████████████████▍                           | 309/455 [07:25<03:07,  1.29s/ queries, google=82.52%, yours=71.84%]

Query =  the specified value you can
Query =  ['the', 'specified', 'value', 'you', 'can']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 68%|██████████████████████████████████████████████████████████▌                           | 310/455 [07:26<03:01,  1.25s/ queries, google=82.58%, yours=71.94%]

Query =  with sonar sensors in populate
Query =  ['with', 'sonar', 'sensors', 'in', 'populate']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 68%|██████████████████████████████████████████████████████████▊                           | 311/455 [07:27<02:49,  1.18s/ queries, google=82.32%, yours=71.70%]

Query =  of the yeard
Query =  ['of', 'the', 'yeard']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 69%|██████████████████████████████████████████████████████████▉                           | 312/455 [07:29<03:05,  1.29s/ queries, google=82.37%, yours=71.47%]

Query =  have already entered
Query =  ['have', 'already', 'entered']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 69%|███████████████████████████████████████████████████████████▏                          | 313/455 [07:30<02:37,  1.11s/ queries, google=82.43%, yours=71.57%]

Query =  translation of these new
Query =  ['translation', 'of', 'these', 'new']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 69%|███████████████████████████████████████████████████████████▎                          | 314/455 [07:32<03:22,  1.44s/ queries, google=82.48%, yours=71.66%]

Query =  models and conditional estimtion without
Query =  ['models', 'and', 'conditional', 'estimtion', 'without']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 69%|███████████████████████████████████████████████████████████▌                          | 315/455 [07:34<03:35,  1.54s/ queries, google=82.54%, yours=71.75%]

Query =  with a glance cast at
Query =  ['with', 'a', 'glance', 'cast', 'at']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 69%|███████████████████████████████████████████████████████████▋                          | 316/455 [07:35<03:38,  1.57s/ queries, google=82.59%, yours=71.84%]

Query =  links to some menus versa in this
Query =  ['links', 'to', 'some', 'menus', 'versa', 'in', 'this']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
dict_keys([0, 1, 2, 3, 4, 5, 6])


 70%|███████████████████████████████████████████████████████████▉                          | 317/455 [07:38<04:05,  1.78s/ queries, google=82.65%, yours=71.92%]

Query =  pta for wich classified collated compared
Query =  ['pta', 'for', 'wich', 'classified', 'collated', 'compared']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 70%|████████████████████████████████████████████████████████████                          | 318/455 [07:39<04:00,  1.76s/ queries, google=82.70%, yours=71.70%]

Query =  drive at musuem way
Query =  ['drive', 'at', 'musuem', 'way']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 70%|████████████████████████████████████████████████████████████▎                         | 319/455 [07:40<03:17,  1.45s/ queries, google=82.76%, yours=71.79%]

Query =  president's day monday february 20
Query =  ["president's", 'day', 'monday', 'february', '20']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 70%|████████████████████████████████████████████████████████████▍                         | 320/455 [07:41<03:15,  1.45s/ queries, google=82.81%, yours=71.56%]

Query =  791 institute stanford
Query =  ['791', 'institute', 'stanford']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 71%|████████████████████████████████████████████████████████████▋                         | 321/455 [07:42<02:49,  1.27s/ queries, google=82.55%, yours=71.34%]

Query =  or anyof
Query =  ['or', 'anyof']
I =  0
I =  1
dict_keys([0, 1])


 71%|████████████████████████████████████████████████████████████▊                         | 322/455 [07:43<02:04,  1.07 queries/s, google=82.61%, yours=71.12%]

Query =  voleentering public service & community
Query =  ['voleentering', 'public', 'service', '&', 'community']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 71%|█████████████████████████████████████████████████████████████                         | 323/455 [07:44<02:41,  1.23s/ queries, google=82.66%, yours=70.90%]

Query =  possible we can
Query =  ['possible', 'we', 'can']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 71%|█████████████████████████████████████████████████████████████▏                        | 324/455 [07:45<02:18,  1.06s/ queries, google=82.72%, yours=70.99%]

Query =  only ctext for
Query =  ['only', 'ctext', 'for']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 71%|█████████████████████████████████████████████████████████████▍                        | 325/455 [07:45<01:47,  1.21 queries/s, google=82.77%, yours=71.08%]

Query =  biodesignnews12 03 html jul
Query =  ['biodesignnews12', '03', 'html', 'jul']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 72%|█████████████████████████████████████████████████████████████▌                        | 326/455 [07:47<02:15,  1.05s/ queries, google=82.52%, yours=71.17%]

Query =  aeronautics and stronautics
Query =  ['aeronautics', 'and', 'stronautics']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 72%|█████████████████████████████████████████████████████████████▊                        | 327/455 [07:48<02:26,  1.14s/ queries, google=82.57%, yours=71.25%]

Query =  and information technologies design
Query =  ['and', 'information', 'technologies', 'design']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 72%|█████████████████████████████████████████████████████████████▉                        | 328/455 [07:50<02:51,  1.35s/ queries, google=82.62%, yours=71.34%]

Query =  shown with without
Query =  ['shown', 'with', 'without']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 72%|██████████████████████████████████████████████████████████████▏                       | 329/455 [07:51<02:25,  1.16s/ queries, google=82.67%, yours=71.43%]

Query =  posting thread successive
Query =  ['posting', 'thread', 'successive']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 73%|██████████████████████████████████████████████████████████████▎                       | 330/455 [07:52<02:16,  1.09s/ queries, google=82.73%, yours=71.52%]

Query =  potential safety or envrionmental consequences
Query =  ['potential', 'safety', 'or', 'envrionmental', 'consequences']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 73%|██████████████████████████████████████████████████████████████▌                       | 331/455 [07:54<03:06,  1.50s/ queries, google=82.78%, yours=71.30%]

Query =  visa master card american
Query =  ['visa', 'master', 'card', 'american']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 73%|██████████████████████████████████████████████████████████████▊                       | 332/455 [07:55<02:41,  1.32s/ queries, google=82.83%, yours=71.08%]

Query =  stay connected itunes
Query =  ['stay', 'connected', 'itunes']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 73%|██████████████████████████████████████████████████████████████▉                       | 333/455 [07:56<02:17,  1.13s/ queries, google=82.88%, yours=71.17%]

Query =  web site econf home options
Query =  ['web', 'site', 'econf', 'home', 'options']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 73%|███████████████████████████████████████████████████████████████▏                      | 334/455 [07:57<02:12,  1.09s/ queries, google=82.93%, yours=70.96%]

Query =  law crown map collections
Query =  ['law', 'crown', 'map', 'collections']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 74%|███████████████████████████████████████████████████████████████▎                      | 335/455 [07:58<02:12,  1.11s/ queries, google=82.99%, yours=71.04%]

Query =  corpus linguistics aronld zwickys blog of use copy right
Query =  ['corpus', 'linguistics', 'aronld', 'zwickys', 'blog', 'of', 'use', 'copy', 'right']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])


 74%|███████████████████████████████████████████████████████████████▌                      | 336/455 [08:01<03:20,  1.68s/ queries, google=82.74%, yours=70.83%]

Query =  135 units of doctoral residency december 10 2010 by judith
Query =  ['135', 'units', 'of', 'doctoral', 'residency', 'december', '10', '2010', 'by', 'judith']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
I =  9
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


 74%|███████████████████████████████████████████████████████████████▋                      | 337/455 [08:07<05:49,  2.96s/ queries, google=82.79%, yours=70.92%]

Query =  the postin threads
Query =  ['the', 'postin', 'threads']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 74%|███████████████████████████████████████████████████████████████▉                      | 338/455 [08:07<04:22,  2.24s/ queries, google=82.54%, yours=70.71%]

Query =  heart center nursing
Query =  ['heart', 'center', 'nursing']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 75%|████████████████████████████████████████████████████████████████                      | 339/455 [08:08<03:22,  1.75s/ queries, google=82.60%, yours=70.80%]

Query =  coaches and program
Query =  ['coaches', 'and', 'program']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 75%|████████████████████████████████████████████████████████████████▎                     | 340/455 [08:09<02:49,  1.47s/ queries, google=82.65%, yours=70.88%]

Query =  cag can be
Query =  ['cag', 'can', 'be']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 75%|████████████████████████████████████████████████████████████████▍                     | 341/455 [08:09<02:12,  1.17s/ queries, google=82.40%, yours=70.67%]

Query =  is being used
Query =  ['is', 'being', 'used']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 75%|████████████████████████████████████████████████████████████████▋                     | 342/455 [08:10<01:46,  1.06 queries/s, google=82.46%, yours=70.76%]

Query =  then reflesh the
Query =  ['then', 'reflesh', 'the']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 75%|████████████████████████████████████████████████████████████████▊                     | 343/455 [08:10<01:30,  1.23 queries/s, google=82.22%, yours=70.85%]

Query =  geant4 discussions hypernews geant4
Query =  ['geant4', 'discussions', 'hypernews', 'geant4']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 76%|█████████████████████████████████████████████████████████████████                     | 344/455 [08:12<01:50,  1.00 queries/s, google=82.27%, yours=70.93%]

Query =  chemistry department news
Query =  ['chemistry', 'department', 'news']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 76%|█████████████████████████████████████████████████████████████████▏                    | 345/455 [08:13<01:52,  1.02s/ queries, google=82.32%, yours=71.01%]

Query =  about developments changes
Query =  ['about', 'developments', 'changes']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 76%|█████████████████████████████████████████████████████████████████▍                    | 346/455 [08:14<01:52,  1.03s/ queries, google=82.37%, yours=71.10%]

Query =  importantfor us
Query =  ['importantfor', 'us']
I =  0
I =  1
dict_keys([0, 1])


 76%|█████████████████████████████████████████████████████████████████▌                    | 347/455 [08:15<01:43,  1.04 queries/s, google=82.42%, yours=70.89%]

Query =  quesytions should file a without the text of the
Query =  ['quesytions', 'should', 'file', 'a', 'without', 'the', 'text', 'of', 'the']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])


 76%|█████████████████████████████████████████████████████████████████▊                    | 348/455 [08:18<02:57,  1.66s/ queries, google=82.47%, yours=70.98%]

Query =  teh made up dramas of
Query =  ['teh', 'made', 'up', 'dramas', 'of']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 77%|█████████████████████████████████████████████████████████████████▉                    | 349/455 [08:19<02:28,  1.41s/ queries, google=82.52%, yours=70.77%]

Query =  directory gallery alumni ms japan and japanese
Query =  ['directory', 'gallery', 'alumni', 'ms', 'japan', 'and', 'japanese']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
dict_keys([0, 1, 2, 3, 4, 5, 6])


 77%|██████████████████████████████████████████████████████████████████▏                   | 350/455 [08:21<02:59,  1.71s/ queries, google=82.57%, yours=70.57%]

Query =  support graduate students appling doctor
Query =  ['support', 'graduate', 'students', 'appling', 'doctor']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 77%|██████████████████████████████████████████████████████████████████▎                   | 351/455 [08:23<02:46,  1.60s/ queries, google=82.62%, yours=70.66%]

Query =  consider journal pricing in addition
Query =  ['consider', 'journal', 'pricing', 'in', 'addition']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 77%|██████████████████████████████████████████████████████████████████▌                   | 352/455 [08:24<02:36,  1.51s/ queries, google=82.67%, yours=70.74%]

Query =  resurection is the man who
Query =  ['resurection', 'is', 'the', 'man', 'who']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 78%|██████████████████████████████████████████████████████████████████▋                   | 353/455 [08:26<02:45,  1.62s/ queries, google=82.72%, yours=70.54%]

Query =  on campus disruptions
Query =  ['on', 'campus', 'disruptions']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 78%|██████████████████████████████████████████████████████████████████▉                   | 354/455 [08:27<02:25,  1.44s/ queries, google=82.77%, yours=70.62%]

Query =  rpofessor emeritus terry
Query =  ['rpofessor', 'emeritus', 'terry']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 78%|███████████████████████████████████████████████████████████████████                   | 355/455 [08:28<02:05,  1.26s/ queries, google=82.82%, yours=70.70%]

Query =  center cerebrovascular neurosurgery epilepsy functional via palou david packard electrical
Query =  ['center', 'cerebrovascular', 'neurosurgery', 'epilepsy', 'functional', 'via', 'palou', 'david', 'packard', 'electrical']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
I =  9
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


 78%|███████████████████████████████████████████████████████████████████▎                  | 356/455 [08:32<03:34,  2.16s/ queries, google=82.87%, yours=70.79%]

Query =  and auditorum the
Query =  ['and', 'auditorum', 'the']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 78%|███████████████████████████████████████████████████████████████████▍                  | 357/455 [08:32<02:47,  1.71s/ queries, google=82.91%, yours=70.87%]

Query =  all subscribers who receive
Query =  ['all', 'subscribers', 'who', 'receive']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 79%|███████████████████████████████████████████████████████████████████▋                  | 358/455 [08:34<02:32,  1.58s/ queries, google=82.96%, yours=70.95%]

Query =  edit box set
Query =  ['edit', 'box', 'set']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 79%|███████████████████████████████████████████████████████████████████▊                  | 359/455 [08:34<01:56,  1.22s/ queries, google=83.01%, yours=71.03%]

Query =  stanford university webmaster cva
Query =  ['stanford', 'university', 'webmaster', 'cva']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 79%|████████████████████████████████████████████████████████████████████                  | 360/455 [08:36<02:02,  1.29s/ queries, google=83.06%, yours=71.11%]

Query =  2009 art nx 620
Query =  ['2009', 'art', 'nx', '620']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 79%|████████████████████████████████████████████████████████████████████▏                 | 361/455 [08:37<01:54,  1.22s/ queries, google=82.83%, yours=70.91%]

Query =  linear accelerator center
Query =  ['linear', 'accelerator', 'center']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 80%|████████████████████████████████████████████████████████████████████▍                 | 362/455 [08:38<01:49,  1.17s/ queries, google=82.87%, yours=70.99%]

Query =  page and signins
Query =  ['page', 'and', 'signins']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 80%|████████████████████████████████████████████████████████████████████▌                 | 363/455 [08:39<01:44,  1.13s/ queries, google=82.64%, yours=71.07%]

Query =  board for the
Query =  ['board', 'for', 'the']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 80%|████████████████████████████████████████████████████████████████████▊                 | 364/455 [08:39<01:27,  1.05 queries/s, google=82.69%, yours=71.15%]

Query =  thoes free swimming tadpole like
Query =  ['thoes', 'free', 'swimming', 'tadpole', 'like']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 80%|████████████████████████████████████████████████████████████████████▉                 | 365/455 [08:40<01:26,  1.05 queries/s, google=82.47%, yours=70.96%]

Query =  if they have
Query =  ['if', 'they', 'have']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 80%|█████████████████████████████████████████████████████████████████████▏                | 366/455 [08:41<01:08,  1.30 queries/s, google=82.51%, yours=71.04%]

Query =  associate director of korean studies
Query =  ['associate', 'director', 'of', 'korean', 'studies']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 81%|█████████████████████████████████████████████████████████████████████▎                | 367/455 [08:42<01:23,  1.05 queries/s, google=82.56%, yours=71.12%]

Query =  upper right corner if format kif and
Query =  ['upper', 'right', 'corner', 'if', 'format', 'kif', 'and']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
dict_keys([0, 1, 2, 3, 4, 5, 6])


 81%|█████████████████████████████████████████████████████████████████████▌                | 368/455 [08:45<02:28,  1.70s/ queries, google=82.34%, yours=70.92%]

Query =  danyluk williams college john denero
Query =  ['danyluk', 'williams', 'college', 'john', 'denero']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 81%|█████████████████████████████████████████████████████████████████████▋                | 369/455 [08:47<02:12,  1.54s/ queries, google=82.38%, yours=71.00%]

Query =  at stanford welcome
Query =  ['at', 'stanford', 'welcome']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 81%|█████████████████████████████████████████████████████████████████████▉                | 370/455 [08:48<02:16,  1.60s/ queries, google=82.43%, yours=71.08%]

Query =  watson 98 trained civil
Query =  ['watson', '98', 'trained', 'civil']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 82%|██████████████████████████████████████████████████████████████████████                | 371/455 [08:49<01:53,  1.35s/ queries, google=82.48%, yours=71.16%]

Query =  terms call number series serchworks
Query =  ['terms', 'call', 'number', 'series', 'serchworks']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 82%|██████████████████████████████████████████████████████████████████████▎               | 372/455 [08:50<01:49,  1.32s/ queries, google=82.53%, yours=71.24%]

Query =  moedling for genome
Query =  ['moedling', 'for', 'genome']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 82%|██████████████████████████████████████████████████████████████████████▌               | 373/455 [08:51<01:32,  1.13s/ queries, google=82.57%, yours=71.31%]

Query =  & art hostory
Query =  ['&', 'art', 'hostory']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 82%|██████████████████████████████████████████████████████████████████████▋               | 374/455 [08:52<01:17,  1.05 queries/s, google=82.62%, yours=71.39%]

Query =  to see more program
Query =  ['to', 'see', 'more', 'program']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 82%|██████████████████████████████████████████████████████████████████████▉               | 375/455 [08:53<01:17,  1.04 queries/s, google=82.67%, yours=71.47%]

Query =  timing configuration the need to skip to main content home
Query =  ['timing', 'configuration', 'the', 'need', 'to', 'skip', 'to', 'main', 'content', 'home']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
I =  9
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


 83%|███████████████████████████████████████████████████████████████████████               | 376/455 [08:57<02:44,  2.08s/ queries, google=82.71%, yours=71.54%]

Query =  school learing sgsi 12 12
Query =  ['school', 'learing', 'sgsi', '12', '12']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 83%|███████████████████████████████████████████████████████████████████████▎              | 377/455 [08:58<02:20,  1.81s/ queries, google=82.76%, yours=71.35%]

Query =  the posting if
Query =  ['the', 'posting', 'if']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 83%|███████████████████████████████████████████████████████████████████████▍              | 378/455 [08:59<01:49,  1.43s/ queries, google=82.80%, yours=71.43%]

Query =  choose the file you would
Query =  ['choose', 'the', 'file', 'you', 'would']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 83%|███████████████████████████████████████████████████████████████████████▋              | 379/455 [09:00<01:39,  1.31s/ queries, google=82.85%, yours=71.50%]

Query =  scholar publishing sustainability suse open coordinator juilie green
Query =  ['scholar', 'publishing', 'sustainability', 'suse', 'open', 'coordinator', 'juilie', 'green']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 84%|███████████████████████████████████████████████████████████████████████▊              | 380/455 [09:03<02:19,  1.86s/ queries, google=82.63%, yours=71.32%]

Query =  standford university photo by fred
Query =  ['standford', 'university', 'photo', 'by', 'fred']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 84%|████████████████████████████████████████████████████████████████████████              | 381/455 [09:05<02:09,  1.76s/ queries, google=82.68%, yours=71.13%]

Query =  here without your permission
Query =  ['here', 'without', 'your', 'permission']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 84%|████████████████████████████████████████████████████████████████████████▏             | 382/455 [09:06<01:50,  1.52s/ queries, google=82.72%, yours=71.20%]

Query =  of nand into solidstate
Query =  ['of', 'nand', 'into', 'solidstate']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 84%|████████████████████████████████████████████████████████████████████████▍             | 383/455 [09:07<01:36,  1.34s/ queries, google=82.77%, yours=71.02%]

Query =  drive suite 6 stanford califnornia
Query =  ['drive', 'suite', '6', 'stanford', 'califnornia']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 84%|████████████████████████████████████████████████████████████████████████▌             | 384/455 [09:08<01:48,  1.53s/ queries, google=82.81%, yours=70.83%]

Query =  petersen milind purohit
Query =  ['petersen', 'milind', 'purohit']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 85%|████████████████████████████████████████████████████████████████████████▊             | 385/455 [09:09<01:35,  1.36s/ queries, google=82.86%, yours=70.91%]

Query =  information center lane readingroom
Query =  ['information', 'center', 'lane', 'readingroom']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 85%|████████████████████████████████████████████████████████████████████████▉             | 386/455 [09:12<01:49,  1.59s/ queries, google=82.90%, yours=70.73%]

Query =  education action program
Query =  ['education', 'action', 'program']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 85%|█████████████████████████████████████████████████████████████████████████▏            | 387/455 [09:12<01:33,  1.38s/ queries, google=82.95%, yours=70.80%]

Query =  left and you should
Query =  ['left', 'and', 'you', 'should']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 85%|█████████████████████████████████████████████████████████████████████████▎            | 388/455 [09:13<01:19,  1.19s/ queries, google=82.99%, yours=70.88%]

Query =  to storeup
Query =  ['to', 'storeup']
I =  0
I =  1
dict_keys([0, 1])


 85%|█████████████████████████████████████████████████████████████████████████▌            | 389/455 [09:14<01:02,  1.06 queries/s, google=83.03%, yours=70.69%]

Query =  pulpation obesity &
Query =  ['pulpation', 'obesity', '&']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 86%|█████████████████████████████████████████████████████████████████████████▋            | 390/455 [09:14<00:59,  1.09 queries/s, google=83.08%, yours=70.77%]

Query =  is availble from many sources
Query =  ['is', 'availble', 'from', 'many', 'sources']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 86%|█████████████████████████████████████████████████████████████████████████▉            | 391/455 [09:16<01:03,  1.01 queries/s, google=83.12%, yours=70.84%]

Query =  of energy last update
Query =  ['of', 'energy', 'last', 'update']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 86%|██████████████████████████████████████████████████████████████████████████            | 392/455 [09:16<00:57,  1.10 queries/s, google=83.16%, yours=70.92%]

Query =  involved in neurotransmision by using
Query =  ['involved', 'in', 'neurotransmision', 'by', 'using']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 86%|██████████████████████████████████████████████████████████████████████████▎           | 393/455 [09:19<01:32,  1.49s/ queries, google=83.21%, yours=70.99%]

Query =  public lecures seminars and courses
Query =  ['public', 'lecures', 'seminars', 'and', 'courses']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 87%|██████████████████████████████████████████████████████████████████████████▍           | 394/455 [09:21<01:28,  1.45s/ queries, google=83.25%, yours=70.81%]

Query =  isn t configured to accomodate that were installed
Query =  ['isn', 't', 'configured', 'to', 'accomodate', 'that', 'were', 'installed']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 87%|██████████████████████████████████████████████████████████████████████████▋           | 395/455 [09:24<01:56,  1.93s/ queries, google=83.04%, yours=70.63%]

Query =  despite the cosmetic problems
Query =  ['despite', 'the', 'cosmetic', 'problems']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 87%|██████████████████████████████████████████████████████████████████████████▊           | 396/455 [09:25<01:39,  1.69s/ queries, google=83.08%, yours=70.71%]

Query =  does not provd such simulation policy the europecenter walter
Query =  ['does', 'not', 'provd', 'such', 'simulation', 'policy', 'the', 'europecenter', 'walter']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
I =  8
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])


 87%|███████████████████████████████████████████████████████████████████████████           | 397/455 [09:27<01:54,  1.97s/ queries, google=83.12%, yours=70.53%]

Query =  systems usacycling clifbar &
Query =  ['systems', 'usacycling', 'clifbar', '&']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 87%|███████████████████████████████████████████████████████████████████████████▏          | 398/455 [09:29<01:39,  1.74s/ queries, google=82.91%, yours=70.60%]

Query =  cedical record date of birth
Query =  ['cedical', 'record', 'date', 'of', 'birth']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 88%|███████████████████████████████████████████████████████████████████████████▍          | 399/455 [09:30<01:29,  1.61s/ queries, google=82.96%, yours=70.43%]

Query =  byhtmlme pl
Query =  ['byhtmlme', 'pl']
I =  0
I =  1
dict_keys([0, 1])


 88%|███████████████████████████████████████████████████████████████████████████▌          | 400/455 [09:30<01:08,  1.24s/ queries, google=82.75%, yours=70.25%]

Query =  over 1300 stanford
Query =  ['over', '1300', 'stanford']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 88%|███████████████████████████████████████████████████████████████████████████▊          | 401/455 [09:31<01:00,  1.12s/ queries, google=82.79%, yours=70.32%]

Query =  id passwirds page for more
Query =  ['id', 'passwirds', 'page', 'for', 'more']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 88%|███████████████████████████████████████████████████████████████████████████▉          | 402/455 [09:32<01:03,  1.20s/ queries, google=82.84%, yours=70.15%]

Query =  peru by studying
Query =  ['peru', 'by', 'studying']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 89%|████████████████████████████████████████████████████████████████████████████▏         | 403/455 [09:33<00:51,  1.00 queries/s, google=82.88%, yours=70.22%]

Query =  moovies read more photo robert
Query =  ['moovies', 'read', 'more', 'photo', 'robert']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 89%|████████████████████████████████████████████████████████████████████████████▎         | 404/455 [09:34<00:49,  1.03 queries/s, google=82.92%, yours=70.05%]

Query =  care overveiw community south on el camino
Query =  ['care', 'overveiw', 'community', 'south', 'on', 'el', 'camino']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
dict_keys([0, 1, 2, 3, 4, 5, 6])


 89%|████████████████████████████████████████████████████████████████████████████▌         | 405/455 [09:36<01:03,  1.28s/ queries, google=82.96%, yours=69.88%]

Query =  no 2 june 2005
Query =  ['no', '2', 'june', '2005']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 89%|████████████████████████████████████████████████████████████████████████████▋         | 406/455 [09:36<00:53,  1.08s/ queries, google=83.00%, yours=69.95%]

Query =  university department of history 450 please suggest a
Query =  ['university', 'department', 'of', 'history', '450', 'please', 'suggest', 'a']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 89%|████████████████████████████████████████████████████████████████████████████▉         | 407/455 [09:39<01:19,  1.66s/ queries, google=83.05%, yours=69.78%]

Query =  ipsoforum uncatergorized ipsofacto is an
Query =  ['ipsoforum', 'uncatergorized', 'ipsofacto', 'is', 'an']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 90%|█████████████████████████████████████████████████████████████████████████████         | 408/455 [09:42<01:24,  1.80s/ queries, google=82.84%, yours=69.61%]

Query =  care for about 135
Query =  ['care', 'for', 'about', '135']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 90%|█████████████████████████████████████████████████████████████████████████████▎        | 409/455 [09:43<01:10,  1.53s/ queries, google=82.89%, yours=69.68%]

Query =  by slac to members
Query =  ['by', 'slac', 'to', 'members']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 90%|█████████████████████████████████████████████████████████████████████████████▍        | 410/455 [09:44<01:05,  1.46s/ queries, google=82.93%, yours=69.76%]

Query =  1979 113114 if
Query =  ['1979', '113114', 'if']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 90%|█████████████████████████████████████████████████████████████████████████████▋        | 411/455 [09:44<00:50,  1.15s/ queries, google=82.73%, yours=69.59%]

Query =  those operations that the complier
Query =  ['those', 'operations', 'that', 'the', 'complier']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 91%|█████████████████████████████████████████████████████████████████████████████▊        | 412/455 [09:46<00:52,  1.21s/ queries, google=82.77%, yours=69.42%]

Query =  archaeological prehistory and events events east
Query =  ['archaeological', 'prehistory', 'and', 'events', 'events', 'east']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
dict_keys([0, 1, 2, 3, 4, 5])


 91%|██████████████████████████████████████████████████████████████████████████████        | 413/455 [09:48<01:04,  1.53s/ queries, google=82.81%, yours=69.49%]

Query =  york deffer lp plaintiff
Query =  ['york', 'deffer', 'lp', 'plaintiff']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 91%|██████████████████████████████████████████████████████████████████████████████▎       | 414/455 [09:49<01:00,  1.48s/ queries, google=82.85%, yours=69.57%]

Query =  years qalys results
Query =  ['years', 'qalys', 'results']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 91%|██████████████████████████████████████████████████████████████████████████████▍       | 415/455 [09:50<00:50,  1.27s/ queries, google=82.89%, yours=69.64%]

Query =  on facebook conncet the cloud bio slides
Query =  ['on', 'facebook', 'conncet', 'the', 'cloud', 'bio', 'slides']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
dict_keys([0, 1, 2, 3, 4, 5, 6])


 91%|██████████████████████████████████████████████████████████████████████████████▋       | 416/455 [09:52<01:00,  1.54s/ queries, google=82.93%, yours=69.71%]

Query =  transformations old buildings new
Query =  ['transformations', 'old', 'buildings', 'new']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 92%|██████████████████████████████████████████████████████████████████████████████▊       | 417/455 [09:54<01:03,  1.66s/ queries, google=82.97%, yours=69.78%]

Query =  723 1450 650 564
Query =  ['723', '1450', '650', '564']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 92%|███████████████████████████████████████████████████████████████████████████████       | 418/455 [09:56<01:00,  1.64s/ queries, google=83.01%, yours=69.86%]

Query =  navigational home education
Query =  ['navigational', 'home', 'education']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 92%|███████████████████████████████████████████████████████████████████████████████▏      | 419/455 [09:57<00:58,  1.62s/ queries, google=82.82%, yours=69.93%]

Query =  a 20 discount
Query =  ['a', '20', 'discount']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 92%|███████████████████████████████████████████████████████████████████████████████▍      | 420/455 [09:58<00:47,  1.35s/ queries, google=82.86%, yours=70.00%]

Query =  environment 2011 stanford
Query =  ['environment', '2011', 'stanford']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 93%|███████████████████████████████████████████████████████████████████████████████▌      | 421/455 [09:59<00:47,  1.39s/ queries, google=82.90%, yours=70.07%]

Query =  dsc_7433 dsc_7434 dsc_7435 dsc_7454 dsc_7461
Query =  ['dsc_7433', 'dsc_7434', 'dsc_7435', 'dsc_7454', 'dsc_7461']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 93%|███████████████████████████████████████████████████████████████████████████████▊      | 422/455 [10:01<00:47,  1.44s/ queries, google=82.94%, yours=70.14%]

Query =  column datatype null description
Query =  ['column', 'datatype', 'null', 'description']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 93%|███████████████████████████████████████████████████████████████████████████████▉      | 423/455 [10:02<00:44,  1.38s/ queries, google=82.98%, yours=70.21%]

Query =  undergraduate educater at stanford for
Query =  ['undergraduate', 'educater', 'at', 'stanford', 'for']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 93%|████████████████████████████████████████████████████████████████████████████████▏     | 424/455 [10:04<00:48,  1.57s/ queries, google=83.02%, yours=70.05%]

Query =  gift stanford home search this
Query =  ['gift', 'stanford', 'home', 'search', 'this']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 93%|████████████████████████████████████████████████████████████████████████████████▎     | 425/455 [10:05<00:43,  1.45s/ queries, google=83.06%, yours=70.12%]

Query =  the low enegry
Query =  ['the', 'low', 'enegry']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 94%|████████████████████████████████████████████████████████████████████████████████▌     | 426/455 [10:06<00:34,  1.20s/ queries, google=83.10%, yours=70.19%]

Query =  sliders lecture topic
Query =  ['sliders', 'lecture', 'topic']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 94%|████████████████████████████████████████████████████████████████████████████████▋     | 427/455 [10:07<00:28,  1.04s/ queries, google=82.90%, yours=70.26%]

Query =  520 galvez mall parkinf
Query =  ['520', 'galvez', 'mall', 'parkinf']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 94%|████████████████████████████████████████████████████████████████████████████████▉     | 428/455 [10:08<00:26,  1.01 queries/s, google=82.94%, yours=70.33%]

Query =  outlin choices are switched
Query =  ['outlin', 'choices', 'are', 'switched']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 94%|█████████████████████████████████████████████████████████████████████████████████     | 429/455 [10:09<00:25,  1.02 queries/s, google=82.98%, yours=70.40%]

Query =  fraternity ae phi sorority chabad
Query =  ['fraternity', 'ae', 'phi', 'sorority', 'chabad']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 95%|█████████████████████████████████████████████████████████████████████████████████▎    | 430/455 [10:10<00:26,  1.08s/ queries, google=82.79%, yours=70.23%]

Query =  ao recruiment talk
Query =  ['ao', 'recruiment', 'talk']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 95%|█████████████████████████████████████████████████████████████████████████████████▍    | 431/455 [10:11<00:23,  1.04 queries/s, google=82.83%, yours=70.07%]

Query =  1 1 htlm etc in
Query =  ['1', '1', 'htlm', 'etc', 'in']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 95%|█████████████████████████████████████████████████████████████████████████████████▋    | 432/455 [10:12<00:23,  1.01s/ queries, google=82.87%, yours=70.14%]

Query =  & masculinities race &
Query =  ['&', 'masculinities', 'race', '&']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 95%|█████████████████████████████████████████████████████████████████████████████████▊    | 433/455 [10:13<00:25,  1.16s/ queries, google=82.91%, yours=70.21%]

Query =  reserve material for current
Query =  ['reserve', 'material', 'for', 'current']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 95%|██████████████████████████████████████████████████████████████████████████████████    | 434/455 [10:15<00:25,  1.23s/ queries, google=82.95%, yours=70.28%]

Query =  atcc misc commesnts
Query =  ['atcc', 'misc', 'commesnts']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 96%|██████████████████████████████████████████████████████████████████████████████████▏   | 435/455 [10:15<00:21,  1.09s/ queries, google=82.99%, yours=70.34%]

Query =  posting which takes you
Query =  ['posting', 'which', 'takes', 'you']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 96%|██████████████████████████████████████████████████████████████████████████████████▍   | 436/455 [10:16<00:20,  1.06s/ queries, google=83.03%, yours=70.41%]

Query =  resources stanford university student affairs
Query =  ['resources', 'stanford', 'university', 'student', 'affairs']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 96%|██████████████████████████████████████████████████████████████████████████████████▌   | 437/455 [10:18<00:23,  1.33s/ queries, google=83.07%, yours=70.48%]

Query =  insitute for gender
Query =  ['insitute', 'for', 'gender']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 96%|██████████████████████████████████████████████████████████████████████████████████▊   | 438/455 [10:19<00:19,  1.15s/ queries, google=83.11%, yours=70.32%]

Query =  estate maps and records
Query =  ['estate', 'maps', 'and', 'records']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 96%|██████████████████████████████████████████████████████████████████████████████████▉   | 439/455 [10:20<00:18,  1.16s/ queries, google=83.14%, yours=70.39%]

Query =  role of adding value in
Query =  ['role', 'of', 'adding', 'value', 'in']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 97%|███████████████████████████████████████████████████████████████████████████████████▏  | 440/455 [10:21<00:17,  1.17s/ queries, google=83.18%, yours=70.45%]

Query =  yim and kuhan papa
Query =  ['yim', 'and', 'kuhan', 'papa']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 97%|███████████████████████████████████████████████████████████████████████████████████▎  | 441/455 [10:22<00:14,  1.03s/ queries, google=82.99%, yours=70.29%]

Query =  febuary 8 2012 5 30
Query =  ['febuary', '8', '2012', '5', '30']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 97%|███████████████████████████████████████████████████████████████████████████████████▌  | 442/455 [10:23<00:14,  1.08s/ queries, google=83.03%, yours=70.14%]

Query =  slac stanford eud hypernews user
Query =  ['slac', 'stanford', 'eud', 'hypernews', 'user']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 97%|███████████████████████████████████████████████████████████████████████████████████▋  | 443/455 [10:25<00:15,  1.26s/ queries, google=83.07%, yours=70.20%]

Query =  makes an application
Query =  ['makes', 'an', 'application']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 98%|███████████████████████████████████████████████████████████████████████████████████▉  | 444/455 [10:26<00:14,  1.30s/ queries, google=83.11%, yours=70.27%]

Query =  center for opportunity policy in
Query =  ['center', 'for', 'opportunity', 'policy', 'in']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 98%|████████████████████████████████████████████████████████████████████████████████████  | 445/455 [10:28<00:13,  1.39s/ queries, google=83.15%, yours=70.34%]

Query =  college of veterinary nutrition
Query =  ['college', 'of', 'veterinary', 'nutrition']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 98%|████████████████████████████████████████████████████████████████████████████████████▎ | 446/455 [10:29<00:12,  1.39s/ queries, google=83.18%, yours=70.40%]

Query =  developed the notion of a
Query =  ['developed', 'the', 'notion', 'of', 'a']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 98%|████████████████████████████████████████████████████████████████████████████████████▍ | 447/455 [10:31<00:11,  1.43s/ queries, google=83.22%, yours=70.47%]

Query =  alternativecertification for students stanferd
Query =  ['alternativecertification', 'for', 'students', 'stanferd']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 98%|████████████████████████████████████████████████████████████████████████████████████▋ | 448/455 [10:35<00:14,  2.08s/ queries, google=83.26%, yours=70.31%]

Query =  vcimage generate on
Query =  ['vcimage', 'generate', 'on']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


 99%|████████████████████████████████████████████████████████████████████████████████████▊ | 449/455 [10:35<00:10,  1.69s/ queries, google=83.30%, yours=70.38%]

Query =  hp support website api doxygen documantation p04 06
Query =  ['hp', 'support', 'website', 'api', 'doxygen', 'documantation', 'p04', '06']
I =  0
I =  1
I =  2
I =  3
I =  4
I =  5
I =  6
I =  7
dict_keys([0, 1, 2, 3, 4, 5, 6, 7])


 99%|█████████████████████████████████████████████████████████████████████████████████████ | 450/455 [10:38<00:10,  2.10s/ queries, google=83.11%, yours=70.22%]

Query =  student research applications for
Query =  ['student', 'research', 'applications', 'for']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


 99%|█████████████████████████████████████████████████████████████████████████████████████▏| 451/455 [10:40<00:07,  1.91s/ queries, google=83.15%, yours=70.29%]

Query =  lower roughness surfaces curved pins
Query =  ['lower', 'roughness', 'surfaces', 'curved', 'pins']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


 99%|█████████████████████████████████████████████████████████████████████████████████████▍| 452/455 [10:41<00:05,  1.69s/ queries, google=83.19%, yours=70.35%]

Query =  room students and faculty search
Query =  ['room', 'students', 'and', 'faculty', 'search']
I =  0
I =  1
I =  2
I =  3
I =  4
dict_keys([0, 1, 2, 3, 4])


100%|█████████████████████████████████████████████████████████████████████████████████████▌| 453/455 [10:42<00:03,  1.58s/ queries, google=83.22%, yours=70.42%]

Query =  jim pleased our
Query =  ['jim', 'pleased', 'our']
I =  0
I =  1
I =  2
dict_keys([0, 1, 2])


100%|█████████████████████████████████████████████████████████████████████████████████████▊| 454/455 [10:43<00:01,  1.24s/ queries, google=83.04%, yours=70.26%]

Query =  to skin strain changes
Query =  ['to', 'skin', 'strain', 'changes']
I =  0
I =  1
I =  2
I =  3
dict_keys([0, 1, 2, 3])


100%|██████████████████████████████████████████████████████████████████████████████████████| 455/455 [10:43<00:00,  1.06s/ queries, google=83.08%, yours=70.33%]

Query =  





<a id='empirical'></a>
## V. Task 2: Spelling Correction with Empirical Edit Costs (25%)


### V.1. Improved Edit Probability Model

Now that our spelling corrector is working correctly with a basic edit probability model, we will turn our attention to a somewhat more realistic approach to edit probabilities. In this task, we will learn these edit probabilities from the empirical error data provided in `data/training_set/edit1s.txt`.

#### V.1.1. Empirical Edit Costs

As outlined in [Section III](#dataset) above, you have been given a list of query pairs that are precisely edit distance 1 from each other. The ﬁrst step for this task is to devise a simple algorithm to determine which speciﬁc edit exists between the two queries in each pair. By aggregating the counts of all such edits over all queries, you can estimate the probability of each individual edit. The edit probability calculation is described in more detail in the [lecture handout on spelling correction](http://web.stanford.edu/class/cs276/handouts/spell_correction.pdf). As an example, if you need to determine the probability of the letter 'e' being (mistakenly) replaced by the letter 'a' in a query, you should calculate:
$$
    P(\texttt{sub}[a, e]) = \frac{\texttt{count}(\texttt{sub}[a, e])}{\texttt{count}(e)}.
$$
Note that the insertion and deletion operator probabilities are conditioned on the character before the character being operated on &mdash; which also means that you should devise an appropriate solution to handle the special case of insertions or deletions occurring at the beginning of a word. Finally, to account for the inevitable problem of data sparsity in our edit training ﬁle, you should apply Laplace add-one smoothing to the edit probabilities, as described in the lecture handout (linked above).

In [None]:
%%tee submission/empirical_edit_probability_model.py

class Edit:
    """Represents a single edit in Damerau-Levenshtein distance.
    We use this class to count occurrences of different edits in the training data.
    """
    INSERTION = 1
    DELETION = 2
    TRANSPOSITION = 3
    SUBSTITUTION = 4

    def __init__(self, edit_type, c1=None, c2=None):
        """
        Members:
            edit_type (int): One of Edit.{NO_EDIT,INSERTION,DELETION,
                TRANSPOSITION,SUBSTITUTION}.
            c1 (str): First (in original) char involved in the edit.
            c2 (str): Second (in original) char involved in the edit.
        """
        self.edit_type = edit_type
        self.c1 = c1
        self.c2 = c2


class EmpiricalEditProbabilityModel(BaseEditProbabilityModel):

    START_CHAR = ''      # Used to indicate start-of-query
    NO_EDIT_PROB = 0.92  # Hyperparameter for probability assigned to no-edit

    def __init__(self, training_set_path='pa2-data/training_set/edit1s.txt'):
        """Builds the necessary data structures to compute log-probabilities of
        distance-1 edits in constant time. In particular, counts the unigrams
        (single characters), bigrams (of 2 characters), alphabet size, and
        edit count for insertions, deletions, substitutions, and transpositions.

        Hint: Use the `Edit` class above. It may be easier to write the `get_edit`
        function first, since you should call that function here.

        Note: We suggest using tqdm with the size of the training set (819722) to track
        the initializers progress when parsing the training set file.

        Args:
            training_set_path (str): Path to training set of empirical error data.
        """
        # Your code needs to initialize all four of these data structures
        self.unigram_counts = Counter()  # Maps chars c1 -> count(c1)
        self.bigram_counts = Counter()   # Maps tuples (c1, c2) -> count((c1, c2))
        self.alphabet_size = 0           # Counts all possible characters

        # Maps edit-types -> dict mapping tuples (c1, c2) -> count(edit[c1, c2])
        # Example usage: 
        #   > e = Edit(Edit.SUBSTITUTION, 'a', 'b')
        #   > edit_count = self.edit_counts[e.edit_type][(e.c1, e.c2)]
        self.edit_counts = {edit_type: Counter()
                            for edit_type in (Edit.INSERTION, Edit.DELETION,
                                              Edit.SUBSTITUTION, Edit.TRANSPOSITION)}

        with open(training_set_path, 'r') as training_set:
            for example in tqdm(training_set, total=819722):
                edited, original = example.strip().split('\t')

                ### Begin your code

                ### End your code

    def get_edit(self, edited, original):
        """Gets an `Edit` object describing the type of edit performed on `original`
        to produce `edited`.

        Note: Only edits with an edit distance of at most 1 are valid inputs.

        Args:
            edited (str): Raw query, which contains exactly one edit from `original`.
            original (str): True query. Want to find the edit which turns this into `edited`.

        Returns:
            edit (Edit): `Edit` object representing the edit to apply to `original` to get `edited`.
                If `edited == original`, returns None.
        """
        ### Begin your code

        ### End your code

    def get_edit_logp(self, edited, original):
        """Gets the log-probability of editing `original` to arrive at `edited`.
        The `original` and `edited` arguments are both single terms that are at
        most one edit apart.
        
        Note: The order of the arguments is chosen so that it reads like an
        assignment expression:
            > edited := EDIT_FUNCTION(original)
        or, alternatively, you can think of it as a (unnormalized) conditional probability:
            > log P(edited | original)

        Args:
            edited (str): Edited term.
            original (str): Original term.

        Returns:
            logp (float): Log-probability of `edited` given `original`
                under this `EditProbabilityModel`.
        """
        ### Begin your code

        ### End your code

Run the following cells to evaluate your spelling corrector on the dev set using your empirical edit probability model. We will also evaluate your model on a private test set after submission. For full credit, your spelling corrector with uniform edit probability model should achieve accuracy within 1% of the staff implementation *on the test set.* **We do not provide test set queries, but as a guideline for performance, the staff implementation gets 87.91% accuracy on the dev set.**

In [None]:
# Build spelling corrector for evaluation on the dev set
# For reference, our initialization times are 25 sec for lm, and 1 min, 40 sec for epm
lm = LanguageModel()
epm = EmpiricalEditProbabilityModel()
cg = CandidateGenerator(lm, epm)
cs = CandidateScorer(lm, cg, mu=1.0)

In [None]:
# Set verbose=True for debugging output
# For reference our implementation takes ~2 min, 30 sec to run and gets 87.91% accuracy
dev_eval(cs, verbose=False)

<a id='written'></a>
## VI. Written Report (20%)

Be sure to document any design decisions you made, and give some brief rationale for them. Please keep your report concise.

#### VI.1. Overall System Design (5%)

Provide a concise (at most 5 sentences) description of the overall system design.

  > Your Answer Here

#### VI.2. Smoothing and Related Techniques (5%)

Give a short analysis of smoothing techniques used in this assignment. For example, you might produce a plot comparing different values for $\lambda$ in unigram-bigram interpolation.

  > Your Answer Here

#### VI.3. Optimizations for Candidate Generation (5%)

Provide a brief description of the techniques you used for optimizing candidate generation. Be sure to include an analysis of the amount by which each optimization sped up the overall spelling correction system, as well as any changes in accuracy you were able to measure.

  > Your Answer Here

#### VI.4. Tuning Parameters (5%)
Provide at least two plots showing how accuracy varies as you change parameter values (*e.g.,* $\mu$ and $\lambda$). Comment briefly (1-2 sentences) on each plot.

  > Your Answer Here

<a id='extra'></a>
## VII. Extra Credit (Optional, up to 10%)


We have listed a few ideas here, but really any extensions that go above and beyond the scope of tasks 1 and 2 will be considered.

1. **Expanded edit model.** We saw (or will see) in lecture that there are sometimes spelling errors that may not be within a "naive" edit distance 2 of the correct phrase, but that may have a conceptual basis that makes them very common and understandable. (Substituting 'ph' for 'f', or vice versa, is one such example.) Can you incorporate these types of errors into the edit probabilities of your edit probability model?
2. **Empirical edit costs using Wikipedia.** In task 2, you used the dataset of queries 1 edit distance apart to learn edit probabilities. If you look at the queries in this dataset, you will observe that most of these queries are related to the Stanford corpus, the same corpus used to build the language model. It would be interesting to explore what happens if the channel model and language model are learned from diﬀerent datasets (and hence diﬀerent distributions of the underlying data). To this end, you can use a dataset of spelling errors collected from Wikipedia and available on Peter Norvig’s website (http://norvig.com/ngrams/spell-errors.txt).
3. **Alternate Smoothing.** Try other smoothing algorithms (such as Kneser-Ney smoothing) to better capture probabilities in the training corpus.
4. **K-gram index.** To deal with unseen words, it is possible to develop a measure for the probability of that word being spelled correctly by developing a character k-gram index over your corpus. For example, a q not followed by a u should lead to a low probability. This index can also help you generate candidate corrections much more eﬃciently.
5. **Levenshtein Automata.** You can do even faster candidate generation using a Levenshtein transducer (http://en.wikipedia.org/wiki/Levenshtein_transducer), which uses a ﬁnite state automata for fuzzy matching of words. There is an experimental implementation in Python at https://gist.github.com/491973, but it needs to be generalized to perform the transposition operation too. This tutorial might be helpful: http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata.

Finally, we will give a small amount of extra credit to the best spell correction systems, measured in terms of both accuracy and running time (as computed on our hidden test data). The top 5 systems according to either metric will receive 5% each, while the next 15 systems will receive 2.5% each.

**If you decide to tackle an extra credit option, give a brief description of your approach and results below.**

  > Your Answer Here