# Information Retrieval 1#
## Assignment 2: Retrieval models [100 points] ##

In this assignment you will get familiar with basic and advanced information retrieval concepts. You will implement different information retrieval ranking models and evaluate their performance.

We provide you with a Indri index. To query the index, you'll use a Python package ([pyndri](https://github.com/cvangysel/pyndri)) that allows easy access to the underlying document statistics.

For evaluation you'll use the [TREC Eval](https://github.com/usnistgov/trec_eval) utility, provided by the National Institute of Standards and Technology of the United States. TREC Eval is the de facto standard way to compute Information Retrieval measures and is frequently referenced in scientific papers.

This is a **groups-of-three assignment**, the deadline is **Wednesday, January 31st**. Code quality, informative comments and convincing analysis of the results will be considered when grading. Submission should be done through blackboard, questions can be asked on the course [Piazza](piazza.com/university_of_amsterdam/spring2018/52041inr6y/home).

### Technicalities (must-read!) ###

The assignment directory is organized as follows:
   * `./assignment.ipynb` (this file): the description of the assignment.
   * `./index/`: the index we prepared for you.
   * `./ap_88_90/`: directory with ground-truth and evaluation sets:
      * `qrel_test`: test query relevance collection (**test set**).
      * `qrel_validation`: validation query relevance collection (**validation set**).
      * `topics_title`: semicolon-separated file with query identifiers and terms.

You will need the following software packages (tested with Python 3.5 inside [Anaconda](https://conda.io/docs/user-guide/install/index.html)):
   * Python 3.5 and Jupyter
   * Indri + Pyndri (Follow the installation instructions [here](https://github.com/nickvosk/pyndri/blob/master/README.md))
   * gensim [link](https://radimrehurek.com/gensim/install.html)
   * TREC Eval [link](https://github.com/usnistgov/trec_eval)

### TREC Eval primer ###
The TREC Eval utility can be downloaded and compiled as follows:

    git clone https://github.com/usnistgov/trec_eval.git
    cd trec_eval
    make

TREC Eval computes evaluation scores given two files: ground-truth information regarding relevant documents, named *query relevance* or *qrel*, and a ranking of documents for a set of queries, referred to as a *run*. The *qrel* will be supplied by us and should not be changed. For every retrieval model (or combinations thereof) you will generate a run of the top-1000 documents for every query. The format of the *run* file is as follows:

    $query_identifier Q0 $document_identifier $rank_of_document_for_query $query_document_similarity $run_identifier
    
where
   * `$query_identifier` is the unique identifier corresponding to a query (usually this follows a sequential numbering).
   * `Q0` is a legacy field that you can ignore.
   * `$document_identifier` corresponds to the unique identifier of a document (e.g., APXXXXXXX where AP denotes the collection and the Xs correspond to a unique numerical identifier).
   * `$rank_of_document_for_query` denotes the rank of the document for the particular query. This field is ignored by TREC Eval and is only maintained for legacy support. The ranks are computed by TREC Eval itself using the `$query_document_similarity` field (see next). However, it remains good practice to correctly compute this field.
   * `$query_document_similarity` is a score indicating the similarity between query and document where a higher score denotes greater similarity.
   * `$run_identifier` is an identifier of the run. This field is for your own convenience and has no purpose beyond bookkeeping.
   
For example, say we have two queries: `Q1` and `Q2` and we rank three documents (`DOC1`, `DOC2`, `DOC3`). For query `Q1`, we find the following similarity scores `score(Q1, DOC1) = 1.0`, `score(Q1, DOC2) = 0.5`, `score(Q1, DOC3) = 0.75`; and for `Q2`: `score(Q2, DOC1) = -0.1`, `score(Q2, DOC2) = 1.25`, `score(Q1, DOC3) = 0.0`. We can generate run using the following snippet:

In [221]:
import logging
import sys
import os

def write_run(model_name, data, out_f,
              max_objects_per_query=sys.maxsize,
              skip_sorting=False):
    """
    Write a run to an output file.
    Parameters:
        - model_name: identifier of run.
        - data: dictionary mapping topic_id to object_assesments;
            object_assesments is an iterable (list or tuple) of
            (relevance, object_id) pairs.
            The object_assesments iterable is sorted by decreasing order.
        - out_f: output file stream.
        - max_objects_per_query: cut-off for number of objects per query.
    """
    for subject_id, object_assesments in data.items():
        if not object_assesments:
            logging.warning('Received empty ranking for %s; ignoring.',
                            subject_id)

            continue

        #Probe types, to make sure everything goes alright.
        assert isinstance(object_assesments[0][0], float) or \
             isinstance(object_assesments[0][0], np.float32)
        assert isinstance(object_assesments[0][1], str) or \
            isinstance(object_assesments[0][1], bytes)

        if not skip_sorting:
            object_assesments = sorted(object_assesments, reverse=True)

        if max_objects_per_query < sys.maxsize:
            object_assesments = object_assesments[:max_objects_per_query]

        if isinstance(subject_id, bytes):
            subject_id = subject_id.decode('utf8')

        for rank, (relevance, object_id) in enumerate(object_assesments):
            if isinstance(object_id, bytes):
                object_id = object_id.decode('utf8')

            out_f.write(
                '{subject} Q0 {object} {rank} {relevance} '
                '{model_name}\n'.format(
                    subject=subject_id,
                    object=object_id,
                    rank=rank + 1,
                    relevance=relevance,
                    model_name=model_name))
            
# The following writes the run to standard output.
# In your code, you should write the runs to local
# storage in order to pass them to trec_eval.
write_run(
    model_name='example',
    data={
        'Q1': ((1.0, 'DOC1'), (0.5, 'DOC2'), (0.75, 'DOC3')),
        'Q2': ((-0.1, 'DOC1'), (1.25, 'DOC2'), (0.0, 'DOC3')),
    },
    out_f=sys.stdout,
    max_objects_per_query=1000)

Q1 Q0 DOC1 1 1.0 example
Q1 Q0 DOC3 2 0.75 example
Q1 Q0 DOC2 3 0.5 example
Q2 Q0 DOC2 1 1.25 example
Q2 Q0 DOC3 2 0.0 example
Q2 Q0 DOC1 3 -0.1 example


Now, imagine that we know that `DOC1` is relevant and `DOC3` is non-relevant for `Q1`. In addition, for `Q2` we only know of the relevance of `DOC3`. The query relevance file looks like:

    Q1 0 DOC1 1
    Q1 0 DOC3 0
    Q2 0 DOC3 1
    
We store the run and qrel in files `example.run` and `example.qrel` respectively on disk. We can now use TREC Eval to compute evaluation measures. In this example, we're only interested in Mean Average Precision and we'll only show this below for brevity. However, TREC Eval outputs much more information such as NDCG, recall, precision, etc.

    $ trec_eval -m all_trec -q example.qrel example.run | grep -E "^map\s"
    > map                   	Q1	1.0000
    > map                   	Q2	0.5000
    > map                   	all	0.7500
    
Now that we've discussed the output format of rankings and how you can compute evaluation measures from these rankings, we'll now proceed with an overview of the indexing framework you'll use.

### Pyndri primer ###
For this assignment you will use [Pyndri](https://github.com/cvangysel/pyndri) [[1](https://arxiv.org/abs/1701.00749)], a python interface for [Indri](https://www.lemurproject.org/indri.php). We have indexed the document collection and you can query the index using Pyndri. We will start by giving you some examples of what Pyndri can do:

First we read the document collection index with Pyndri:

In [202]:
import pyndri

index = pyndri.Index('index/')

The loaded index can be used to access a collection of documents in an easy manner. We'll give you some examples to get some idea of what it can do, it is up to you to figure out how to use it for the remainder of the assignment.

First let's look at the number of documents, since Pyndri indexes the documents using incremental identifiers we can simply take the lowest index and the maximum document and consider the difference:

In [192]:
print("There are %d documents in this collection." % (index.maximum_document() - index.document_base()))

There are 164597 documents in this collection.


Let's take the first document out of the collection and take a look at it:

In [193]:
example_document = index.document(index.document_base())
print(example_document)

('AP890425-0001', (1360, 192, 363, 0, 880, 0, 200, 0, 894, 412, 92160, 3, 192, 0, 363, 34, 1441, 0, 174134, 0, 200, 0, 894, 412, 2652, 0, 810, 107, 49, 4903, 420, 0, 1, 48, 35, 489, 0, 35, 687, 192, 243, 0, 249311, 1877, 0, 1651, 1174, 0, 2701, 117, 412, 0, 810, 391, 245233, 1225, 5838, 16, 0, 233156, 3496, 0, 393, 17, 0, 2435, 4819, 930, 0, 0, 200, 0, 894, 0, 22, 398, 145, 0, 3, 271, 115, 0, 1176, 2777, 292, 0, 725, 192, 0, 0, 50046, 0, 1901, 1130, 0, 192, 0, 408, 0, 243779, 0, 0, 553, 192, 0, 363, 0, 3747, 0, 0, 0, 0, 1176, 0, 1239, 0, 0, 1115, 17, 0, 0, 585, 192, 1963, 0, 0, 412, 54356, 0, 773, 0, 0, 0, 192, 0, 0, 1130, 0, 363, 0, 545, 192, 0, 1174, 1901, 1130, 0, 4, 398, 145, 39, 0, 577, 0, 355, 0, 491, 0, 6025, 0, 0, 193156, 88, 34, 437, 0, 0, 1852, 0, 828, 0, 1588, 0, 0, 0, 2615, 0, 0, 107, 49, 420, 0, 0, 190, 7, 714, 2701, 0, 237, 192, 157, 0, 412, 34, 437, 0, 0, 200, 6025, 26, 0, 0, 0, 0, 363, 0, 22, 398, 145, 0, 200, 638, 126222, 6018, 0, 880, 0, 0, 161, 0, 0, 319, 894, 2701, 

Here we see a document consists of two things, a string representing the external document identifier and an integer list representing the identifiers of words that make up the document. Pyndri uses integer representations for words or terms, thus a token_id is an integer that represents a word whereas the token is the actual text of the word/term. Every id has a unique token and vice versa with the exception of stop words: words so common that there are uninformative, all of these receive the zero id.

To see what some ids and their matching tokens we take a look at the dictionary of the index:

In [17]:
token2id, id2token, _ = index.get_dictionary()
print(list(id2token.items())[:15])

[(1, 'new'), (2, 'percent'), (3, 'two'), (4, '1'), (5, 'people'), (6, 'million'), (7, '000'), (8, 'government'), (9, 'president'), (10, 'years'), (11, 'state'), (12, '2'), (13, 'states'), (14, 'three'), (15, 'time')]


Using this dictionary we can see the tokens for the (non-stop) words in our example document:

In [18]:
print([id2token[word_id] for word_id in example_document[1] if word_id > 0])

['52', 'students', 'arrested', 'takeover', 'university', 'massachusetts', 'building', 'fifty', 'two', 'students', 'arrested', 'tuesday', 'evening', 'occupying', 'university', 'massachusetts', 'building', 'overnight', 'protest', 'defense', 'department', 'funded', 'research', 'new', 'york', 'city', 'thousands', 'city', 'college', 'students', 'got', 'unscheduled', 'holiday', 'demonstrators', 'occupied', 'campus', 'administration', 'building', 'protest', 'possible', 'tuition', 'increases', 'prompting', 'officials', 'suspend', 'classes', '60', 'police', 'riot', 'gear', 'arrived', 'university', 'massachusetts', '5', 'p', 'm', 'two', 'hours', 'later', 'bus', 'drove', 'away', '29', 'students', 'camped', 'memorial', 'hall', 'students', 'charged', 'trespassing', '23', 'students', 'arrested', 'lying', 'bus', 'prevent', 'leaving', 'police', '300', 'students', 'stood', 'building', 'chanting', 'looking', 'students', 'hall', 'arrested', '35', 'students', 'occupied', 'memorial', 'hall', '1', 'p', 'm',

The reverse can also be done, say we want to look for news about the "University of Massachusetts", the tokens of that query can be converted to ids using the reverse dictionary:

In [19]:
query_tokens = index.tokenize("University of Massachusetts")
print("Query by tokens:", query_tokens)
query_id_tokens = [token2id.get(query_token,0) for query_token in query_tokens]
print("Query by ids with stopwords:", query_id_tokens)
query_id_tokens = [word_id for word_id in query_id_tokens if word_id > 0]
print("Query by ids without stopwords:", query_id_tokens)

Query by tokens: ['university', '', 'massachusetts']
Query by ids with stopwords: [200, 0, 894]
Query by ids without stopwords: [200, 894]


Naturally we can now match the document and query in the id space, let's see how often a word from the query occurs in our example document:

In [20]:
matching_words = sum([True for word_id in example_document[1] if word_id in query_id_tokens])
print("Document %s has %d word matches with query: \"%s\"." % (example_document[0], matching_words, ' '.join(query_tokens)))
print("Document %s and query \"%s\" have a %.01f%% overlap." % (example_document[0], ' '.join(query_tokens),matching_words/float(len(example_document[1]))*100))

Document AP890425-0001 has 13 word matches with query: "university  massachusetts".
Document AP890425-0001 and query "university  massachusetts" have a 2.5% overlap.


While this is certainly not everything Pyndri can do, it should give you an idea of how to use it. Please take a look at the [examples](https://github.com/cvangysel/pyndri) as it will help you a lot with this assignment.

**CAUTION**: Avoid printing out the whole index in this Notebook as it will generate a lot of output and is likely to corrupt the Notebook.

### Parsing the query file
You can parse the query file (`ap_88_89/topics_title`) using the following snippet:

In [21]:
import collections
import io
import logging
import sys

def parse_topics(file_or_files,
                 max_topics=sys.maxsize, delimiter=';'):
    assert max_topics >= 0 or max_topics is None

    topics = collections.OrderedDict()

    if not isinstance(file_or_files, list) and \
            not isinstance(file_or_files, tuple):
        if hasattr(file_or_files, '__iter__'):
            file_or_files = list(file_or_files)
        else:
            file_or_files = [file_or_files]

    for f in file_or_files:
        assert isinstance(f, io.IOBase)

        for line in f:
            assert(isinstance(line, str))

            line = line.strip()

            if not line:
                continue

            topic_id, terms = line.split(delimiter, 1)

            if topic_id in topics and (topics[topic_id] != terms):
                    logging.error('Duplicate topic "%s" (%s vs. %s).',
                                  topic_id,
                                  topics[topic_id],
                                  terms)

            topics[topic_id] = terms

            if max_topics > 0 and len(topics) >= max_topics:
                break

    return topics

with open('./ap_88_89/topics_title', 'r') as f_topics:
    print(parse_topics([f_topics]))

OrderedDict([('51', 'Airbus Subsidies'), ('52', 'South African Sanctions'), ('53', 'Leveraged Buyouts'), ('54', 'Satellite Launch Contracts'), ('55', 'Insider Trading'), ('56', 'Prime (Lending) Rate Moves, Predictions'), ('57', 'MCI'), ('58', 'Rail Strikes'), ('59', 'Weather Related Fatalities'), ('60', 'Merit-Pay vs. Seniority'), ('61', 'Israeli Role in Iran-Contra Affair'), ('62', "Military Coups D'etat"), ('63', 'Machine Translation'), ('64', 'Hostage-Taking'), ('65', 'Information Retrieval Systems'), ('66', 'Natural Language Processing'), ('67', 'Politically Motivated Civil Disturbances'), ('68', 'Health Hazards from Fine-Diameter Fibers'), ('69', 'Attempts to Revive the SALT II Treaty'), ('70', 'Surrogate Motherhood'), ('71', 'Border Incursions'), ('72', 'Demographic Shifts in the U.S.'), ('73', 'Demographic Shifts across National Boundaries'), ('74', 'Conflicting Policy'), ('75', 'Automation'), ('76', 'U.S. Constitution - Original Intent'), ('77', 'Poaching'), ('78', 'Greenpeace'

### Task 1: Implement and compare lexical IR methods [35 points] ### 

In this task you will implement a number of lexical methods for IR using the **Pyndri** framework. Then you will evaluate these methods on the dataset we have provided using **TREC Eval**.

Use the **Pyndri** framework to get statistics of the documents (term frequency, document frequency, collection frequency; **you are not allowed to use the query functionality of Pyndri**) and implement the following scoring methods in **Python**:

- [TF-IDF](http://nlp.stanford.edu/IR-book/html/htmledition/tf-idf-weighting-1.html) and 
- [BM25](http://nlp.stanford.edu/IR-book/html/htmledition/okapi-bm25-a-non-binary-model-1.html) with k1=1.2 and b=0.75. **[5 points]**
- Language models ([survey](https://drive.google.com/file/d/0B-zklbckv9CHc0c3b245UW90NE0/view))
    - Jelinek-Mercer (explore different values of 𝛌 in the range [0.1, 0.5, 0.9]). **[5 points]**
    - Dirichlet Prior (explore different values of 𝛍 [500, 1000, 1500]). **[5 points]**
    - Absolute discounting (explore different values of 𝛅 in the range [0.1, 0.5, 0.9]). **[5 points]**
    - [Positional Language Models](http://sifaka.cs.uiuc.edu/~ylv2/pub/sigir09-plm.pdf) define a language model for each position of a document, and score a document based on the scores of its PLMs. The PLM is estimated based on propagated counts of words within a document through a proximity-based density function, which both captures proximity heuristics and achieves an effect of “soft” passage retrieval. Implement the PLM, all five kernels, but only the Best position strategy to score documents. Use 𝛔 equal to 50, and Dirichlet smoothing with 𝛍 optimized on the validation set (decide how to optimize this value yourself and motivate your decision in the report). **[10 points]**
    
Implement the above methods and report evaluation measures (on the test set) using the hyper parameter values you optimized on the validation set (also report the values of the hyper parameters). Use TREC Eval to obtain the results and report on `NDCG@10`, Mean Average Precision (`MAP@1000`), `Precision@5` and `Recall@1000`.

For the language models, create plots showing `NDCG@10` with varying values of the parameters. You can do this by chaining small scripts using shell scripting (preferred) or execute trec_eval using Python's `subprocess`.

Compute significance of the results using a [two-tailed paired Student t-test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html) **[5 points]**. Be wary of false rejection of the null hypothesis caused by the [multiple comparisons problem](https://en.wikipedia.org/wiki/Multiple_comparisons_problem). There are multiple ways to mitigate this problem and it is up to you to choose one.

Analyse the results by identifying specific queries where different methods succeed or fail and discuss possible reasons that cause these differences. This is *very important* in order to understand who the different retrieval functions behave.

**NOTE**: Don’t forget to use log computations in your calculations to avoid underflows. 

**IMPORTANT**: You should structure your code around the helper functions we provide below.

In [228]:
import time
start_time = time.time()
with open('./ap_88_89/topics_title', 'r') as f_topics:
    queries = parse_topics([f_topics])

index = pyndri.Index('index/')

num_documents = index.maximum_document() - index.document_base()

dictionary = pyndri.extract_dictionary(index)

tokenized_queries = {
    query_id: [dictionary.translate_token(token)
               for token in index.tokenize(query_string)
               if dictionary.has_token(token)]
    for query_id, query_string in queries.items()}

query_term_ids = set(
    query_term_id
    for query_term_ids in tokenized_queries.values()
    for query_term_id in query_term_ids)

print('Gathering statistics about', len(query_term_ids), 'terms.')

# inverted index creation.

document_lengths = {}
unique_terms_per_document = {}

inverted_index = collections.defaultdict(dict)
collection_frequencies = collections.defaultdict(int)

total_terms = 0

for int_doc_id in range(index.document_base(), index.maximum_document()):
    ext_doc_id, doc_token_ids = index.document(int_doc_id)

    document_bow = collections.Counter(
        token_id for token_id in doc_token_ids
        if token_id > 0)
    document_length = sum(document_bow.values())

    document_lengths[int_doc_id] = document_length
    total_terms += document_length

    unique_terms_per_document[int_doc_id] = len(document_bow)

    for query_term_id in query_term_ids:
        assert query_term_id is not None

        document_term_frequency = document_bow.get(query_term_id, 0)

        if document_term_frequency == 0:
            continue

        collection_frequencies[query_term_id] += document_term_frequency
        inverted_index[query_term_id][int_doc_id] = document_term_frequency


avg_doc_length = total_terms / num_documents
print('Inverted index creation took', time.time() - start_time, 'seconds.')

Gathering statistics about 456 terms.
Inverted index creation took 46.06136512756348 seconds.


In [321]:
def run_retrieval(model_name, score_fn):
    """
    Runs a retrieval method for all the queries and writes the TREC-friendly results in a file.
    
    :param model_name: the name of the model (a string)
    :param score_fn: the scoring function (a function - see below for an example) 
    """
    run_out_path = '{}.run'.format(model_name)

    if os.path.exists(run_out_path):
        print("File already exists")
        return

    retrieval_start_time = time.time()

    print('Retrieving using', model_name)
    data = {}
    count = 0
    # TODO: fill the data dictionary. 
    # The dictionary data should have the form: query_id --> (document_score, external_doc_id)
    if score_fn == tfidf:
        for key1 in inverted_index.keys():
            for key2 in inverted_index[key1].keys():
                external_document_id,_ = index.document(key2) #Get the external doc id for this document
                if key1 in data.keys():
                    data[key1].append(tuple([score_fn(key2,key1,collection_frequencies[key1]),str(external_document_id)]))
                else:
                    data[key1] = [tuple([score_fn(key2,key1,collection_frequencies[key1]),str(external_document_id)])]
      
    if score_fn == BM25:
        print("BM25 runs")
        for query_id, query_string in queries.items():
            print(query_id)
            word_list = query_string.split()
            for word in word_list:
                word = word.lower()
                if word in token2id.keys():
                    query_term_id = token2id[word]
                    for key1 in inverted_index.keys():
                        print(key1,query_id)
                        if key1 == query_id:
                            print(query_id,word_list,query_term_id)
                            
                            
    with open(run_out_path, 'w') as f_out:
        write_run(
            model_name=model_name,
            data=data,
            out_f=f_out,
            max_objects_per_query=1000)

In [322]:
# The queries dict is an ordered dict with id's as keys and queris as values
print(queries['52'])

#Query term ids give for every unique term in the queries an id
print(list(query_term_ids)[:5])

# Inverted index gives for every query_term_id and doc_id the term frequency of that term in the document
print(inverted_index[13][54006])

# Collection frequencies gives for a collection of documents for every query term id the number of occurences of that term
print(collection_frequencies[8])

South African Sanctions
[1, 2053, 8, 4104, 6153]
1
119771


In [323]:
import numpy as np
def tfidf(int_document_id, query_term_id, document_term_freq):
    """
    Scoring function for a document and a query term
    
    :param int_document_id: the document id
    :param query_token_id: the query term id (assuming you have split the query to tokens)
    :param document_term_freq: the document term frequency of the query term 
    """
    
    score = float(np.log(1+inverted_index[query_term_id][int_document_id]) * np.log(num_documents/document_term_freq))

    return score


# combining the two functions above: 
a = str(np.random.randint(0,100))
#run_retrieval('tfidf'+a, tfidf)

# TODO implement the rest of the retrieval functions
# print(query_term_ids)
def BM25(int_document_id, query_term_ids, document_term_freq): #here we pass a list of all query term ids instead of one query at a 
    score = 0
    k1 = 1.5
    b = 0.75
    # We ignore the sum including k3 since our queries are typically very short and therefore this part is not needed
    
    for query_term_id in query_term_ids:
        print(query_term_id)
    
    return score

run_retrieval('BM25'+a, BM25)
#for query_id, query_string in queries.items():
    #print(query_id,query_string)
# TODO implement tools to help you with the analysis of the results.

Retrieving using BM2561
BM25 runs
51
1 51
3076 51
2053 51
8 51
1033 51
11 51
13 51
1042 51
19 51
20 51
21 51
66582 51
1056 51
33 51
35 51
38 51
39 51
2088 51
45097 51
690 51
1070 51
178525 51
160909 51
2097 51
54 51
55 51
57 51
68 51
69 51
72 51
2121 51
1098 51
75 51
5197 51
78 51
2127 51
80 51
2129 51
82 51
1891 51
85 51
163926 51
65624 51
97318 51
92 51
94 51
97 51
1122 51
235195 51
237310 51
3173 51
119910 51
140391 51
2152 51
3177 51
107 51
4204 51
110 51
153960 51
114 51
118 51
119 51
218232 51
1145 51
231615 51
128 51
2177 51
2178 51
2181 51
163974 51
2183 51
1160 51
1161 51
138 51
140 51
4072 51
142 51
227473 51
254101 51
159894 51
2199 51
152 51
1178 51
156 51
5279 51
2208 51
163 51
1189 51
166 51
168 51
170 51
2563 51
180 51
2230 51
2231 51
186 51
71869 51
191 51
192 51
29891 51
196 51
3269 51
86216 51
203 51
205 51
20692 51
213 51
214 51
1240 51
219 51
36062 51
229 51
380 51
235 51
236 51
238 51
116975 51
241 51
1099 51
1268 51
382 51
2294 51
248 51
1275 51
252 51
82174 51
25

3269 52
86216 52
203 52
205 52
20692 52
213 52
214 52
1240 52
219 52
36062 52
229 52
380 52
235 52
236 52
238 52
116975 52
241 52
1099 52
1268 52
382 52
2294 52
248 52
1275 52
252 52
82174 52
255 52
258 52
2313 52
1510 52
269 52
20750 52
274 52
190739 52
276 52
277 52
1302 52
140799 52
281 52
1987 52
389 52
2336 52
71970 52
3291 52
3367 52
122153 52
1322 52
2439 52
116012 52
40238 52
303 52
4400 52
2353 52
2354 52
307 52
122164 52
121054 52
310 52
3383 52
313 52
735 52
5442 52
2371 52
325 52
26337 52
328 52
249163 52
104782 52
335 52
2387 52
341 52
5465 52
1373 52
351 52
4451 52
358 52
360 52
2409 52
154987 52
90480 52
128371 52
373 52
4104 52
237289 52
377 52
1404 52
3454 52
1339 52
5506 52
2587 52
388 52
1138 52
390 52
391 52
29064 52
74122 52
395 52
397 52
1424 52
1434 52
2460 52
3485 52
5189 52
416 52
156066 52
200099 52
420 52
128358 52
61863 52
53672 52
1449 52
3655 52
1266 52
431 52
5552 52
1459 52
436 52
2485 52
203193 52
1466 52
3516 52
447 52
448 52
451 52
74181 52
88519 52
2

91954 53
78639 53
1842 53
819 53
4916 53
1847 53
824 53
91961 53
829 53
832 53
76611 53
1861 53
4934 53
250383 53
120650 53
3895 53
846 53
5967 53
1848 53
851 53
3924 53
86670 53
862 53
2911 53
51043 53
243556 53
1893 53
881 53
1906 53
4755 53
5609 53
1912 53
100217 53
191898 53
1920 53
1856 53
52100 53
4193 53
2540 53
1931 53
212881 53
83861 53
917 53
922 53
220439 53
254885 53
5030 53
1959 53
942 53
5448 53
946 53
955 53
76733 53
1982 53
2549 53
171612 53
121675 53
151493 53
4040 53
43984 53
201208 53
3027 53
982 53
985 53
192484 53
2897 53
1000 53
5098 53
53245 53
4081 53
121844 53
3063 53
3066 53
3067 53
3069 53
64853 53
1 53
3076 53
2053 53
8 53
1033 53
11 53
13 53
1042 53
19 53
20 53
21 53
66582 53
1056 53
33 53
35 53
38 53
39 53
2088 53
45097 53
690 53
1070 53
178525 53
160909 53
2097 53
54 53
55 53
57 53
68 53
69 53
72 53
2121 53
1098 53
75 53
5197 53
78 53
2127 53
80 53
2129 53
82 53
1891 53
85 53
163926 53
65624 53
97318 53
92 53
94 53
97 53
1122 53
235195 53
237310 53
3173 5

163974 54
2183 54
1160 54
1161 54
138 54
140 54
4072 54
142 54
227473 54
254101 54
159894 54
2199 54
152 54
1178 54
156 54
5279 54
2208 54
163 54
1189 54
166 54
168 54
170 54
2563 54
180 54
2230 54
2231 54
186 54
71869 54
191 54
192 54
29891 54
196 54
3269 54
86216 54
203 54
205 54
20692 54
213 54
214 54
1240 54
219 54
36062 54
229 54
380 54
235 54
236 54
238 54
116975 54
241 54
1099 54
1268 54
382 54
2294 54
248 54
1275 54
252 54
82174 54
255 54
258 54
2313 54
1510 54
269 54
20750 54
274 54
190739 54
276 54
277 54
1302 54
140799 54
281 54
1987 54
389 54
2336 54
71970 54
3291 54
3367 54
122153 54
1322 54
2439 54
116012 54
40238 54
303 54
4400 54
2353 54
2354 54
307 54
122164 54
121054 54
310 54
3383 54
313 54
735 54
5442 54
2371 54
325 54
26337 54
328 54
249163 54
104782 54
335 54
2387 54
341 54
5465 54
1373 54
351 54
4451 54
358 54
360 54
2409 54
154987 54
90480 54
128371 54
373 54
4104 54
237289 54
377 54
1404 54
3454 54
1339 54
5506 54
2587 54
388 54
1138 54
390 54
391 54
29064 54
7

1329 56
2856 56
144171 56
812 56
91954 56
78639 56
1842 56
819 56
4916 56
1847 56
824 56
91961 56
829 56
832 56
76611 56
1861 56
4934 56
250383 56
120650 56
3895 56
846 56
5967 56
1848 56
851 56
3924 56
86670 56
862 56
2911 56
51043 56
243556 56
1893 56
881 56
1906 56
4755 56
5609 56
1912 56
100217 56
191898 56
1920 56
1856 56
52100 56
4193 56
2540 56
1931 56
212881 56
83861 56
917 56
922 56
220439 56
254885 56
5030 56
1959 56
942 56
5448 56
946 56
955 56
76733 56
1982 56
2549 56
171612 56
121675 56
151493 56
4040 56
43984 56
201208 56
3027 56
982 56
985 56
192484 56
2897 56
1000 56
5098 56
53245 56
4081 56
121844 56
3063 56
3066 56
3067 56
3069 56
64853 56
1 56
3076 56
2053 56
8 56
1033 56
11 56
13 56
1042 56
19 56
20 56
21 56
66582 56
1056 56
33 56
35 56
38 56
39 56
2088 56
45097 56
690 56
1070 56
178525 56
160909 56
2097 56
54 56
55 56
57 56
68 56
69 56
72 56
2121 56
1098 56
75 56
5197 56
78 56
2127 56
80 56
2129 56
82 56
1891 56
85 56
163926 56
65624 56
97318 56
92 56
94 56
97 56
1

61863 57
53672 57
1449 57
3655 57
1266 57
431 57
5552 57
1459 57
436 57
2485 57
203193 57
1466 57
3516 57
447 57
448 57
451 57
74181 57
88519 57
2636 57
39370 57
1483 57
2509 57
2510 57
77 57
243152 57
89553 57
1490 57
84387 57
6153 57
472 57
198109 57
2527 57
483 57
484 57
485 57
3558 57
178428 57
3564 57
496 57
1525 57
48553 57
3576 57
3579 57
1533 57
5630 57
1535 57
392 57
515 57
518 57
519 57
520 57
2569 57
3594 57
523 57
5814 57
192943 57
63065 57
539 57
3614 57
2598 57
1116 57
3628 57
1583 57
99893 57
30473 57
2619 57
2621 57
23989 57
3649 57
4675 57
580 57
1605 57
1633 57
1608 57
233033 57
117322 57
588 57
590 57
2642 57
2643 57
238862 57
1622 57
215481 57
3672 57
601 57
213404 57
5723 57
3676 57
609 57
172647 57
111208 57
54952 57
1644 57
238866 57
625 57
3698 57
629 57
631 57
250488 57
2324 57
4735 57
1664 57
141966 57
642 57
2697 57
176407 57
654 57
96348 57
659 57
73364 57
155285 57
73379 57
2673 57
1704 57
5802 57
2731 57
684 57
1709 57
2845 57
5810 57
692 57
2846 57
694 57

1906 59
4755 59
5609 59
1912 59
100217 59
191898 59
1920 59
1856 59
52100 59
4193 59
2540 59
1931 59
212881 59
83861 59
917 59
922 59
220439 59
254885 59
5030 59
1959 59
942 59
5448 59
946 59
955 59
76733 59
1982 59
2549 59
171612 59
121675 59
151493 59
4040 59
43984 59
201208 59
3027 59
982 59
985 59
192484 59
2897 59
1000 59
5098 59
53245 59
4081 59
121844 59
3063 59
3066 59
3067 59
3069 59
64853 59
1 59
3076 59
2053 59
8 59
1033 59
11 59
13 59
1042 59
19 59
20 59
21 59
66582 59
1056 59
33 59
35 59
38 59
39 59
2088 59
45097 59
690 59
1070 59
178525 59
160909 59
2097 59
54 59
55 59
57 59
68 59
69 59
72 59
2121 59
1098 59
75 59
5197 59
78 59
2127 59
80 59
2129 59
82 59
1891 59
85 59
163926 59
65624 59
97318 59
92 59
94 59
97 59
1122 59
235195 59
237310 59
3173 59
119910 59
140391 59
2152 59
3177 59
107 59
4204 59
110 59
153960 59
114 59
118 59
119 59
218232 59
1145 59
231615 59
128 59
2177 59
2178 59
2181 59
163974 59
2183 59
1160 59
1161 59
138 59
140 59
4072 59
142 59
227473 59
25410

2643 61
238862 61
1622 61
215481 61
3672 61
601 61
213404 61
5723 61
3676 61
609 61
172647 61
111208 61
54952 61
1644 61
238866 61
625 61
3698 61
629 61
631 61
250488 61
2324 61
4735 61
1664 61
141966 61
642 61
2697 61
176407 61
654 61
96348 61
659 61
73364 61
155285 61
73379 61
2673 61
1704 61
5802 61
2731 61
684 61
1709 61
2845 61
5810 61
692 61
2846 61
694 61
2743 61
5581 61
1723 61
226762 61
1312 61
188098 61
2760 61
202871 61
719 61
2296 61
63508 61
463 61
2783 61
1760 61
237281 61
738 61
187515 61
67300 61
2791 61
28392 61
2793 61
746 61
157422 61
5872 61
1781 61
1783 61
226043 61
231164 61
228094 61
811 61
105220 61
774 61
776 61
1801 61
781 61
5902 61
2832 61
100114 61
1812 61
789 61
4886 61
238359 61
1821 61
1822 61
804 61
1329 61
2856 61
144171 61
812 61
91954 61
78639 61
1842 61
819 61
4916 61
1847 61
824 61
91961 61
829 61
832 61
76611 61
1861 61
4934 61
250383 61
120650 61
3895 61
846 61
5967 61
1848 61
851 61
3924 61
86670 61
862 61
2911 61
51043 61
243556 61
1893 61
881 

235 62
236 62
238 62
116975 62
241 62
1099 62
1268 62
382 62
2294 62
248 62
1275 62
252 62
82174 62
255 62
258 62
2313 62
1510 62
269 62
20750 62
274 62
190739 62
276 62
277 62
1302 62
140799 62
281 62
1987 62
389 62
2336 62
71970 62
3291 62
3367 62
122153 62
1322 62
2439 62
116012 62
40238 62
303 62
4400 62
2353 62
2354 62
307 62
122164 62
121054 62
310 62
3383 62
313 62
735 62
5442 62
2371 62
325 62
26337 62
328 62
249163 62
104782 62
335 62
2387 62
341 62
5465 62
1373 62
351 62
4451 62
358 62
360 62
2409 62
154987 62
90480 62
128371 62
373 62
4104 62
237289 62
377 62
1404 62
3454 62
1339 62
5506 62
2587 62
388 62
1138 62
390 62
391 62
29064 62
74122 62
395 62
397 62
1424 62
1434 62
2460 62
3485 62
5189 62
416 62
156066 62
200099 62
420 62
128358 62
61863 62
53672 62
1449 62
3655 62
1266 62
431 62
5552 62
1459 62
436 62
2485 62
203193 62
1466 62
3516 62
447 62
448 62
451 62
74181 62
88519 62
2636 62
39370 62
1483 62
2509 62
2510 62
77 62
243152 62
89553 62
1490 62
84387 62
6153 62
47

819 63
4916 63
1847 63
824 63
91961 63
829 63
832 63
76611 63
1861 63
4934 63
250383 63
120650 63
3895 63
846 63
5967 63
1848 63
851 63
3924 63
86670 63
862 63
2911 63
51043 63
243556 63
1893 63
881 63
1906 63
4755 63
5609 63
1912 63
100217 63
191898 63
1920 63
1856 63
52100 63
4193 63
2540 63
1931 63
212881 63
83861 63
917 63
922 63
220439 63
254885 63
5030 63
1959 63
942 63
5448 63
946 63
955 63
76733 63
1982 63
2549 63
171612 63
121675 63
151493 63
4040 63
43984 63
201208 63
3027 63
982 63
985 63
192484 63
2897 63
1000 63
5098 63
53245 63
4081 63
121844 63
3063 63
3066 63
3067 63
3069 63
64853 63
64
65
1 65
3076 65
2053 65
8 65
1033 65
11 65
13 65
1042 65
19 65
20 65
21 65
66582 65
1056 65
33 65
35 65
38 65
39 65
2088 65
45097 65
690 65
1070 65
178525 65
160909 65
2097 65
54 65
55 65
57 65
68 65
69 65
72 65
2121 65
1098 65
75 65
5197 65
78 65
2127 65
80 65
2129 65
82 65
1891 65
85 65
163926 65
65624 65
97318 65
92 65
94 65
97 65
1122 65
235195 65
237310 65
3173 65
119910 65
140391 6

198109 65
2527 65
483 65
484 65
485 65
3558 65
178428 65
3564 65
496 65
1525 65
48553 65
3576 65
3579 65
1533 65
5630 65
1535 65
392 65
515 65
518 65
519 65
520 65
2569 65
3594 65
523 65
5814 65
192943 65
63065 65
539 65
3614 65
2598 65
1116 65
3628 65
1583 65
99893 65
30473 65
2619 65
2621 65
23989 65
3649 65
4675 65
580 65
1605 65
1633 65
1608 65
233033 65
117322 65
588 65
590 65
2642 65
2643 65
238862 65
1622 65
215481 65
3672 65
601 65
213404 65
5723 65
3676 65
609 65
172647 65
111208 65
54952 65
1644 65
238866 65
625 65
3698 65
629 65
631 65
250488 65
2324 65
4735 65
1664 65
141966 65
642 65
2697 65
176407 65
654 65
96348 65
659 65
73364 65
155285 65
73379 65
2673 65
1704 65
5802 65
2731 65
684 65
1709 65
2845 65
5810 65
692 65
2846 65
694 65
2743 65
5581 65
1723 65
226762 65
1312 65
188098 65
2760 65
202871 65
719 65
2296 65
63508 65
463 65
2783 65
1760 65
237281 65
738 65
187515 65
67300 65
2791 65
28392 65
2793 65
746 65
157422 65
5872 65
1781 65
1783 65
226043 65
231164 65
228

4755 67
5609 67
1912 67
100217 67
191898 67
1920 67
1856 67
52100 67
4193 67
2540 67
1931 67
212881 67
83861 67
917 67
922 67
220439 67
254885 67
5030 67
1959 67
942 67
5448 67
946 67
955 67
76733 67
1982 67
2549 67
171612 67
121675 67
151493 67
4040 67
43984 67
201208 67
3027 67
982 67
985 67
192484 67
2897 67
1000 67
5098 67
53245 67
4081 67
121844 67
3063 67
3066 67
3067 67
3069 67
64853 67
1 67
3076 67
2053 67
8 67
1033 67
11 67
13 67
1042 67
19 67
20 67
21 67
66582 67
1056 67
33 67
35 67
38 67
39 67
2088 67
45097 67
690 67
1070 67
178525 67
160909 67
2097 67
54 67
55 67
57 67
68 67
69 67
72 67
2121 67
1098 67
75 67
5197 67
78 67
2127 67
80 67
2129 67
82 67
1891 67
85 67
163926 67
65624 67
97318 67
92 67
94 67
97 67
1122 67
235195 67
237310 67
3173 67
119910 67
140391 67
2152 67
3177 67
107 67
4204 67
110 67
153960 67
114 67
118 67
119 67
218232 67
1145 67
231615 67
128 67
2177 67
2178 67
2181 67
163974 67
2183 67
1160 67
1161 67
138 67
140 67
4072 67
142 67
227473 67
254101 67
159

447 67
448 67
451 67
74181 67
88519 67
2636 67
39370 67
1483 67
2509 67
2510 67
77 67
243152 67
89553 67
1490 67
84387 67
6153 67
472 67
198109 67
2527 67
483 67
484 67
485 67
3558 67
178428 67
3564 67
496 67
1525 67
48553 67
3576 67
3579 67
1533 67
5630 67
1535 67
392 67
515 67
518 67
519 67
520 67
2569 67
3594 67
523 67
5814 67
192943 67
63065 67
539 67
3614 67
2598 67
1116 67
3628 67
1583 67
99893 67
30473 67
2619 67
2621 67
23989 67
3649 67
4675 67
580 67
1605 67
1633 67
1608 67
233033 67
117322 67
588 67
590 67
2642 67
2643 67
238862 67
1622 67
215481 67
3672 67
601 67
213404 67
5723 67
3676 67
609 67
172647 67
111208 67
54952 67
1644 67
238866 67
625 67
3698 67
629 67
631 67
250488 67
2324 67
4735 67
1664 67
141966 67
642 67
2697 67
176407 67
654 67
96348 67
659 67
73364 67
155285 67
73379 67
2673 67
1704 67
5802 67
2731 67
684 67
1709 67
2845 67
5810 67
692 67
2846 67
694 67
2743 67
5581 67
1723 67
226762 67
1312 67
188098 67
2760 67
202871 67
719 67
2296 67
63508 67
463 67
2783

2152 68
3177 68
107 68
4204 68
110 68
153960 68
114 68
118 68
119 68
218232 68
1145 68
231615 68
128 68
2177 68
2178 68
2181 68
163974 68
2183 68
1160 68
1161 68
138 68
140 68
4072 68
142 68
227473 68
254101 68
159894 68
2199 68
152 68
1178 68
156 68
5279 68
2208 68
163 68
1189 68
166 68
168 68
170 68
2563 68
180 68
2230 68
2231 68
186 68
71869 68
191 68
192 68
29891 68
196 68
3269 68
86216 68
203 68
205 68
20692 68
213 68
214 68
1240 68
219 68
36062 68
229 68
380 68
235 68
236 68
238 68
116975 68
241 68
1099 68
1268 68
382 68
2294 68
248 68
1275 68
252 68
82174 68
255 68
258 68
2313 68
1510 68
269 68
20750 68
274 68
190739 68
276 68
277 68
1302 68
140799 68
281 68
1987 68
389 68
2336 68
71970 68
3291 68
3367 68
122153 68
1322 68
2439 68
116012 68
40238 68
303 68
4400 68
2353 68
2354 68
307 68
122164 68
121054 68
310 68
3383 68
313 68
735 68
5442 68
2371 68
325 68
26337 68
328 68
249163 68
104782 68
335 68
2387 68
341 68
5465 68
1373 68
351 68
4451 68
358 68
360 68
2409 68
154987 68
90

178525 69
160909 69
2097 69
54 69
55 69
57 69
68 69
69 69
72 69
2121 69
1098 69
75 69
5197 69
78 69
2127 69
80 69
2129 69
82 69
1891 69
85 69
163926 69
65624 69
97318 69
92 69
94 69
97 69
1122 69
235195 69
237310 69
3173 69
119910 69
140391 69
2152 69
3177 69
107 69
4204 69
110 69
153960 69
114 69
118 69
119 69
218232 69
1145 69
231615 69
128 69
2177 69
2178 69
2181 69
163974 69
2183 69
1160 69
1161 69
138 69
140 69
4072 69
142 69
227473 69
254101 69
159894 69
2199 69
152 69
1178 69
156 69
5279 69
2208 69
163 69
1189 69
166 69
168 69
170 69
2563 69
180 69
2230 69
2231 69
186 69
71869 69
191 69
192 69
29891 69
196 69
3269 69
86216 69
203 69
205 69
20692 69
213 69
214 69
1240 69
219 69
36062 69
229 69
380 69
235 69
236 69
238 69
116975 69
241 69
1099 69
1268 69
382 69
2294 69
248 69
1275 69
252 69
82174 69
255 69
258 69
2313 69
1510 69
269 69
20750 69
274 69
190739 69
276 69
277 69
1302 69
140799 69
281 69
1987 69
389 69
2336 69
71970 69
3291 69
3367 69
122153 69
1322 69
2439 69
116012 6

67300 70
2791 70
28392 70
2793 70
746 70
157422 70
5872 70
1781 70
1783 70
226043 70
231164 70
228094 70
811 70
105220 70
774 70
776 70
1801 70
781 70
5902 70
2832 70
100114 70
1812 70
789 70
4886 70
238359 70
1821 70
1822 70
804 70
1329 70
2856 70
144171 70
812 70
91954 70
78639 70
1842 70
819 70
4916 70
1847 70
824 70
91961 70
829 70
832 70
76611 70
1861 70
4934 70
250383 70
120650 70
3895 70
846 70
5967 70
1848 70
851 70
3924 70
86670 70
862 70
2911 70
51043 70
243556 70
1893 70
881 70
1906 70
4755 70
5609 70
1912 70
100217 70
191898 70
1920 70
1856 70
52100 70
4193 70
2540 70
1931 70
212881 70
83861 70
917 70
922 70
220439 70
254885 70
5030 70
1959 70
942 70
5448 70
946 70
955 70
76733 70
1982 70
2549 70
171612 70
121675 70
151493 70
4040 70
43984 70
201208 70
3027 70
982 70
985 70
192484 70
2897 70
1000 70
5098 70
53245 70
4081 70
121844 70
3063 70
3066 70
3067 70
3069 70
64853 70
1 70
3076 70
2053 70
8 70
1033 70
11 70
13 70
1042 70
19 70
20 70
21 70
66582 70
1056 70
33 70
35 70


1525 72
48553 72
3576 72
3579 72
1533 72
5630 72
1535 72
392 72
515 72
518 72
519 72
520 72
2569 72
3594 72
523 72
5814 72
192943 72
63065 72
539 72
3614 72
2598 72
1116 72
3628 72
1583 72
99893 72
30473 72
2619 72
2621 72
23989 72
3649 72
4675 72
580 72
1605 72
1633 72
1608 72
233033 72
117322 72
588 72
590 72
2642 72
2643 72
238862 72
1622 72
215481 72
3672 72
601 72
213404 72
5723 72
3676 72
609 72
172647 72
111208 72
54952 72
1644 72
238866 72
625 72
3698 72
629 72
631 72
250488 72
2324 72
4735 72
1664 72
141966 72
642 72
2697 72
176407 72
654 72
96348 72
659 72
73364 72
155285 72
73379 72
2673 72
1704 72
5802 72
2731 72
684 72
1709 72
2845 72
5810 72
692 72
2846 72
694 72
2743 72
5581 72
1723 72
226762 72
1312 72
188098 72
2760 72
202871 72
719 72
2296 72
63508 72
463 72
2783 72
1760 72
237281 72
738 72
187515 72
67300 72
2791 72
28392 72
2793 72
746 72
157422 72
5872 72
1781 72
1783 72
226043 72
231164 72
228094 72
811 72
105220 72
774 72
776 72
1801 72
781 72
5902 72
2832 72
100

1424 74
1434 74
2460 74
3485 74
5189 74
416 74
156066 74
200099 74
420 74
128358 74
61863 74
53672 74
1449 74
3655 74
1266 74
431 74
5552 74
1459 74
436 74
2485 74
203193 74
1466 74
3516 74
447 74
448 74
451 74
74181 74
88519 74
2636 74
39370 74
1483 74
2509 74
2510 74
77 74
243152 74
89553 74
1490 74
84387 74
6153 74
472 74
198109 74
2527 74
483 74
484 74
485 74
3558 74
178428 74
3564 74
496 74
1525 74
48553 74
3576 74
3579 74
1533 74
5630 74
1535 74
392 74
515 74
518 74
519 74
520 74
2569 74
3594 74
523 74
5814 74
192943 74
63065 74
539 74
3614 74
2598 74
1116 74
3628 74
1583 74
99893 74
30473 74
2619 74
2621 74
23989 74
3649 74
4675 74
580 74
1605 74
1633 74
1608 74
233033 74
117322 74
588 74
590 74
2642 74
2643 74
238862 74
1622 74
215481 74
3672 74
601 74
213404 74
5723 74
3676 74
609 74
172647 74
111208 74
54952 74
1644 74
238866 74
625 74
3698 74
629 74
631 74
250488 74
2324 74
4735 74
1664 74
141966 74
642 74
2697 74
176407 74
654 74
96348 74
659 74
73364 74
155285 74
73379 74


76733 75
1982 75
2549 75
171612 75
121675 75
151493 75
4040 75
43984 75
201208 75
3027 75
982 75
985 75
192484 75
2897 75
1000 75
5098 75
53245 75
4081 75
121844 75
3063 75
3066 75
3067 75
3069 75
64853 75
76
1 76
3076 76
2053 76
8 76
1033 76
11 76
13 76
1042 76
19 76
20 76
21 76
66582 76
1056 76
33 76
35 76
38 76
39 76
2088 76
45097 76
690 76
1070 76
178525 76
160909 76
2097 76
54 76
55 76
57 76
68 76
69 76
72 76
2121 76
1098 76
75 76
5197 76
78 76
2127 76
80 76
2129 76
82 76
1891 76
85 76
163926 76
65624 76
97318 76
92 76
94 76
97 76
1122 76
235195 76
237310 76
3173 76
119910 76
140391 76
2152 76
3177 76
107 76
4204 76
110 76
153960 76
114 76
118 76
119 76
218232 76
1145 76
231615 76
128 76
2177 76
2178 76
2181 76
163974 76
2183 76
1160 76
1161 76
138 76
140 76
4072 76
142 76
227473 76
254101 76
159894 76
2199 76
152 76
1178 76
156 76
5279 76
2208 76
163 76
1189 76
166 76
168 76
170 76
2563 76
180 76
2230 76
2231 76
186 76
71869 76
191 76
192 76
29891 76
196 76
3269 76
86216 76
203 7

3076 77
2053 77
8 77
1033 77
11 77
13 77
1042 77
19 77
20 77
21 77
66582 77
1056 77
33 77
35 77
38 77
39 77
2088 77
45097 77
690 77
1070 77
178525 77
160909 77
2097 77
54 77
55 77
57 77
68 77
69 77
72 77
2121 77
1098 77
75 77
5197 77
78 77
2127 77
80 77
2129 77
82 77
1891 77
85 77
163926 77
65624 77
97318 77
92 77
94 77
97 77
1122 77
235195 77
237310 77
3173 77
119910 77
140391 77
2152 77
3177 77
107 77
4204 77
110 77
153960 77
114 77
118 77
119 77
218232 77
1145 77
231615 77
128 77
2177 77
2178 77
2181 77
163974 77
2183 77
1160 77
1161 77
138 77
140 77
4072 77
142 77
227473 77
254101 77
159894 77
2199 77
152 77
1178 77
156 77
5279 77
2208 77
163 77
1189 77
166 77
168 77
170 77
2563 77
180 77
2230 77
2231 77
186 77
71869 77
191 77
192 77
29891 77
196 77
3269 77
86216 77
203 77
205 77
20692 77
213 77
214 77
1240 77
219 77
36062 77
229 77
380 77
235 77
236 77
238 77
116975 77
241 77
1099 77
1268 77
382 77
2294 77
248 77
1275 77
252 77
82174 77
255 77
258 77
2313 77
1510 77
269 77
20750 7

2791 79
28392 79
2793 79
746 79
157422 79
5872 79
1781 79
1783 79
226043 79
231164 79
228094 79
811 79
105220 79
774 79
776 79
1801 79
781 79
5902 79
2832 79
100114 79
1812 79
789 79
4886 79
238359 79
1821 79
1822 79
804 79
1329 79
2856 79
144171 79
812 79
91954 79
78639 79
1842 79
819 79
4916 79
1847 79
824 79
91961 79
829 79
832 79
76611 79
1861 79
4934 79
250383 79
120650 79
3895 79
846 79
5967 79
1848 79
851 79
3924 79
86670 79
862 79
2911 79
51043 79
243556 79
1893 79
881 79
1906 79
4755 79
5609 79
1912 79
100217 79
191898 79
1920 79
1856 79
52100 79
4193 79
2540 79
1931 79
212881 79
83861 79
917 79
922 79
220439 79
254885 79
5030 79
1959 79
942 79
5448 79
946 79
955 79
76733 79
1982 79
2549 79
171612 79
121675 79
151493 79
4040 79
43984 79
201208 79
3027 79
982 79
985 79
192484 79
2897 79
1000 79
5098 79
53245 79
4081 79
121844 79
3063 79
3066 79
3067 79
3069 79
64853 79
1 79
3076 79
2053 79
8 79
1033 79
11 79
13 79
1042 79
19 79
20 79
21 79
66582 79
1056 79
33 79
35 79
38 79
39 

76611 80
1861 80
4934 80
250383 80
120650 80
3895 80
846 80
5967 80
1848 80
851 80
3924 80
86670 80
862 80
2911 80
51043 80
243556 80
1893 80
881 80
1906 80
4755 80
5609 80
1912 80
100217 80
191898 80
1920 80
1856 80
52100 80
4193 80
2540 80
1931 80
212881 80
83861 80
917 80
922 80
220439 80
254885 80
5030 80
1959 80
942 80
5448 80
946 80
955 80
76733 80
1982 80
2549 80
171612 80
121675 80
151493 80
4040 80
43984 80
201208 80
3027 80
982 80
985 80
192484 80
2897 80
1000 80
5098 80
53245 80
4081 80
121844 80
3063 80
3066 80
3067 80
3069 80
64853 80
1 80
3076 80
2053 80
8 80
1033 80
11 80
13 80
1042 80
19 80
20 80
21 80
66582 80
1056 80
33 80
35 80
38 80
39 80
2088 80
45097 80
690 80
1070 80
178525 80
160909 80
2097 80
54 80
55 80
57 80
68 80
69 80
72 80
2121 80
1098 80
75 80
5197 80
78 80
2127 80
80 80
2129 80
82 80
1891 80
85 80
163926 80
65624 80
97318 80
92 80
94 80
97 80
1122 80
235195 80
237310 80
3173 80
119910 80
140391 80
2152 80
3177 80
107 80
4204 80
110 80
153960 80
114 80
11

2731 81
684 81
1709 81
2845 81
5810 81
692 81
2846 81
694 81
2743 81
5581 81
1723 81
226762 81
1312 81
188098 81
2760 81
202871 81
719 81
2296 81
63508 81
463 81
2783 81
1760 81
237281 81
738 81
187515 81
67300 81
2791 81
28392 81
2793 81
746 81
157422 81
5872 81
1781 81
1783 81
226043 81
231164 81
228094 81
811 81
105220 81
774 81
776 81
1801 81
781 81
5902 81
2832 81
100114 81
1812 81
789 81
4886 81
238359 81
1821 81
1822 81
804 81
1329 81
2856 81
144171 81
812 81
91954 81
78639 81
1842 81
819 81
4916 81
1847 81
824 81
91961 81
829 81
832 81
76611 81
1861 81
4934 81
250383 81
120650 81
3895 81
846 81
5967 81
1848 81
851 81
3924 81
86670 81
862 81
2911 81
51043 81
243556 81
1893 81
881 81
1906 81
4755 81
5609 81
1912 81
100217 81
191898 81
1920 81
1856 81
52100 81
4193 81
2540 81
1931 81
212881 81
83861 81
917 81
922 81
220439 81
254885 81
5030 81
1959 81
942 81
5448 81
946 81
955 81
76733 81
1982 81
2549 81
171612 81
121675 81
151493 81
4040 81
43984 81
201208 81
3027 81
982 81
985 8

198109 81
2527 81
483 81
484 81
485 81
3558 81
178428 81
3564 81
496 81
1525 81
48553 81
3576 81
3579 81
1533 81
5630 81
1535 81
392 81
515 81
518 81
519 81
520 81
2569 81
3594 81
523 81
5814 81
192943 81
63065 81
539 81
3614 81
2598 81
1116 81
3628 81
1583 81
99893 81
30473 81
2619 81
2621 81
23989 81
3649 81
4675 81
580 81
1605 81
1633 81
1608 81
233033 81
117322 81
588 81
590 81
2642 81
2643 81
238862 81
1622 81
215481 81
3672 81
601 81
213404 81
5723 81
3676 81
609 81
172647 81
111208 81
54952 81
1644 81
238866 81
625 81
3698 81
629 81
631 81
250488 81
2324 81
4735 81
1664 81
141966 81
642 81
2697 81
176407 81
654 81
96348 81
659 81
73364 81
155285 81
73379 81
2673 81
1704 81
5802 81
2731 81
684 81
1709 81
2845 81
5810 81
692 81
2846 81
694 81
2743 81
5581 81
1723 81
226762 81
1312 81
188098 81
2760 81
202871 81
719 81
2296 81
63508 81
463 81
2783 81
1760 81
237281 81
738 81
187515 81
67300 81
2791 81
28392 81
2793 81
746 81
157422 81
5872 81
1781 81
1783 81
226043 81
231164 81
228

192 82
29891 82
196 82
3269 82
86216 82
203 82
205 82
20692 82
213 82
214 82
1240 82
219 82
36062 82
229 82
380 82
235 82
236 82
238 82
116975 82
241 82
1099 82
1268 82
382 82
2294 82
248 82
1275 82
252 82
82174 82
255 82
258 82
2313 82
1510 82
269 82
20750 82
274 82
190739 82
276 82
277 82
1302 82
140799 82
281 82
1987 82
389 82
2336 82
71970 82
3291 82
3367 82
122153 82
1322 82
2439 82
116012 82
40238 82
303 82
4400 82
2353 82
2354 82
307 82
122164 82
121054 82
310 82
3383 82
313 82
735 82
5442 82
2371 82
325 82
26337 82
328 82
249163 82
104782 82
335 82
2387 82
341 82
5465 82
1373 82
351 82
4451 82
358 82
360 82
2409 82
154987 82
90480 82
128371 82
373 82
4104 82
237289 82
377 82
1404 82
3454 82
1339 82
5506 82
2587 82
388 82
1138 82
390 82
391 82
29064 82
74122 82
395 82
397 82
1424 82
1434 82
2460 82
3485 82
5189 82
416 82
156066 82
200099 82
420 82
128358 82
61863 82
53672 82
1449 82
3655 82
1266 82
431 82
5552 82
1459 82
436 82
2485 82
203193 82
1466 82
3516 82
447 82
448 82
451

1098 84
75 84
5197 84
78 84
2127 84
80 84
2129 84
82 84
1891 84
85 84
163926 84
65624 84
97318 84
92 84
94 84
97 84
1122 84
235195 84
237310 84
3173 84
119910 84
140391 84
2152 84
3177 84
107 84
4204 84
110 84
153960 84
114 84
118 84
119 84
218232 84
1145 84
231615 84
128 84
2177 84
2178 84
2181 84
163974 84
2183 84
1160 84
1161 84
138 84
140 84
4072 84
142 84
227473 84
254101 84
159894 84
2199 84
152 84
1178 84
156 84
5279 84
2208 84
163 84
1189 84
166 84
168 84
170 84
2563 84
180 84
2230 84
2231 84
186 84
71869 84
191 84
192 84
29891 84
196 84
3269 84
86216 84
203 84
205 84
20692 84
213 84
214 84
1240 84
219 84
36062 84
229 84
380 84
235 84
236 84
238 84
116975 84
241 84
1099 84
1268 84
382 84
2294 84
248 84
1275 84
252 84
82174 84
255 84
258 84
2313 84
1510 84
269 84
20750 84
274 84
190739 84
276 84
277 84
1302 84
140799 84
281 84
1987 84
389 84
2336 84
71970 84
3291 84
3367 84
122153 84
1322 84
2439 84
116012 84
40238 84
303 84
4400 84
2353 84
2354 84
307 84
122164 84
121054 84
310

28392 84
2793 84
746 84
157422 84
5872 84
1781 84
1783 84
226043 84
231164 84
228094 84
811 84
105220 84
774 84
776 84
1801 84
781 84
5902 84
2832 84
100114 84
1812 84
789 84
4886 84
238359 84
1821 84
1822 84
804 84
1329 84
2856 84
144171 84
812 84
91954 84
78639 84
1842 84
819 84
4916 84
1847 84
824 84
91961 84
829 84
832 84
76611 84
1861 84
4934 84
250383 84
120650 84
3895 84
846 84
5967 84
1848 84
851 84
3924 84
86670 84
862 84
2911 84
51043 84
243556 84
1893 84
881 84
1906 84
4755 84
5609 84
1912 84
100217 84
191898 84
1920 84
1856 84
52100 84
4193 84
2540 84
1931 84
212881 84
83861 84
917 84
922 84
220439 84
254885 84
5030 84
1959 84
942 84
5448 84
946 84
955 84
76733 84
1982 84
2549 84
171612 84
121675 84
151493 84
4040 84
43984 84
201208 84
3027 84
982 84
985 84
192484 84
2897 84
1000 84
5098 84
53245 84
4081 84
121844 84
3063 84
3066 84
3067 84
3069 84
64853 84
1 84
3076 84
2053 84
8 84
1033 84
11 84
13 84
1042 84
19 84
20 84
21 84
66582 84
1056 84
33 84
35 84
38 84
39 84
2088 

1322 86
2439 86
116012 86
40238 86
303 86
4400 86
2353 86
2354 86
307 86
122164 86
121054 86
310 86
3383 86
313 86
735 86
5442 86
2371 86
325 86
26337 86
328 86
249163 86
104782 86
335 86
2387 86
341 86
5465 86
1373 86
351 86
4451 86
358 86
360 86
2409 86
154987 86
90480 86
128371 86
373 86
4104 86
237289 86
377 86
1404 86
3454 86
1339 86
5506 86
2587 86
388 86
1138 86
390 86
391 86
29064 86
74122 86
395 86
397 86
1424 86
1434 86
2460 86
3485 86
5189 86
416 86
156066 86
200099 86
420 86
128358 86
61863 86
53672 86
1449 86
3655 86
1266 86
431 86
5552 86
1459 86
436 86
2485 86
203193 86
1466 86
3516 86
447 86
448 86
451 86
74181 86
88519 86
2636 86
39370 86
1483 86
2509 86
2510 86
77 86
243152 86
89553 86
1490 86
84387 86
6153 86
472 86
198109 86
2527 86
483 86
484 86
485 86
3558 86
178428 86
3564 86
496 86
1525 86
48553 86
3576 86
3579 86
1533 86
5630 86
1535 86
392 86
515 86
518 86
519 86
520 86
2569 86
3594 86
523 86
5814 86
192943 86
63065 86
539 86
3614 86
2598 86
1116 86
3628 86
15

2527 87
483 87
484 87
485 87
3558 87
178428 87
3564 87
496 87
1525 87
48553 87
3576 87
3579 87
1533 87
5630 87
1535 87
392 87
515 87
518 87
519 87
520 87
2569 87
3594 87
523 87
5814 87
192943 87
63065 87
539 87
3614 87
2598 87
1116 87
3628 87
1583 87
99893 87
30473 87
2619 87
2621 87
23989 87
3649 87
4675 87
580 87
1605 87
1633 87
1608 87
233033 87
117322 87
588 87
590 87
2642 87
2643 87
238862 87
1622 87
215481 87
3672 87
601 87
213404 87
5723 87
3676 87
609 87
172647 87
111208 87
54952 87
1644 87
238866 87
625 87
3698 87
629 87
631 87
250488 87
2324 87
4735 87
1664 87
141966 87
642 87
2697 87
176407 87
654 87
96348 87
659 87
73364 87
155285 87
73379 87
2673 87
1704 87
5802 87
2731 87
684 87
1709 87
2845 87
5810 87
692 87
2846 87
694 87
2743 87
5581 87
1723 87
226762 87
1312 87
188098 87
2760 87
202871 87
719 87
2296 87
63508 87
463 87
2783 87
1760 87
237281 87
738 87
187515 87
67300 87
2791 87
28392 87
2793 87
746 87
157422 87
5872 87
1781 87
1783 87
226043 87
231164 87
228094 87
811

212881 87
83861 87
917 87
922 87
220439 87
254885 87
5030 87
1959 87
942 87
5448 87
946 87
955 87
76733 87
1982 87
2549 87
171612 87
121675 87
151493 87
4040 87
43984 87
201208 87
3027 87
982 87
985 87
192484 87
2897 87
1000 87
5098 87
53245 87
4081 87
121844 87
3063 87
3066 87
3067 87
3069 87
64853 87
88
1 88
3076 88
2053 88
8 88
1033 88
11 88
13 88
1042 88
19 88
20 88
21 88
66582 88
1056 88
33 88
35 88
38 88
39 88
2088 88
45097 88
690 88
1070 88
178525 88
160909 88
2097 88
54 88
55 88
57 88
68 88
69 88
72 88
2121 88
1098 88
75 88
5197 88
78 88
2127 88
80 88
2129 88
82 88
1891 88
85 88
163926 88
65624 88
97318 88
92 88
94 88
97 88
1122 88
235195 88
237310 88
3173 88
119910 88
140391 88
2152 88
3177 88
107 88
4204 88
110 88
153960 88
114 88
118 88
119 88
218232 88
1145 88
231615 88
128 88
2177 88
2178 88
2181 88
163974 88
2183 88
1160 88
1161 88
138 88
140 88
4072 88
142 88
227473 88
254101 88
159894 88
2199 88
152 88
1178 88
156 88
5279 88
2208 88
163 88
1189 88
166 88
168 88
170 88
2

2231 88
186 88
71869 88
191 88
192 88
29891 88
196 88
3269 88
86216 88
203 88
205 88
20692 88
213 88
214 88
1240 88
219 88
36062 88
229 88
380 88
235 88
236 88
238 88
116975 88
241 88
1099 88
1268 88
382 88
2294 88
248 88
1275 88
252 88
82174 88
255 88
258 88
2313 88
1510 88
269 88
20750 88
274 88
190739 88
276 88
277 88
1302 88
140799 88
281 88
1987 88
389 88
2336 88
71970 88
3291 88
3367 88
122153 88
1322 88
2439 88
116012 88
40238 88
303 88
4400 88
2353 88
2354 88
307 88
122164 88
121054 88
310 88
3383 88
313 88
735 88
5442 88
2371 88
325 88
26337 88
328 88
249163 88
104782 88
335 88
2387 88
341 88
5465 88
1373 88
351 88
4451 88
358 88
360 88
2409 88
154987 88
90480 88
128371 88
373 88
4104 88
237289 88
377 88
1404 88
3454 88
1339 88
5506 88
2587 88
388 88
1138 88
390 88
391 88
29064 88
74122 88
395 88
397 88
1424 88
1434 88
2460 88
3485 88
5189 88
416 88
156066 88
200099 88
420 88
128358 88
61863 88
53672 88
1449 88
3655 88
1266 88
431 88
5552 88
1459 88
436 88
2485 88
203193 88
14

KeyboardInterrupt: 

### Task 2: Latent Semantic Models (LSMs) [15 points] ###

In this task you will experiment with applying distributional semantics methods ([LSI](http://lsa3.colorado.edu/papers/JASIS.lsi.90.pdf) **[5 points]** and [LDA](https://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf) **[5 points]**) for retrieval.

You do not need to implement LSI or LDA on your own. Instead, you can use [gensim](http://radimrehurek.com/gensim/index.html). An example on how to integrate Pyndri with Gensim for word2vec can be found [here](https://github.com/cvangysel/pyndri/blob/master/examples/word2vec.py). For the remaining latent vector space models, you will need to implement connector classes (such as `IndriSentences`) by yourself.

In order to use a latent semantic model for retrieval, you need to:
   * build a representation of the query **q**,
   * build a representation of the document **d**,
   * calculate the similarity between **q** and **d** (e.g., cosine similarity, KL-divergence).
     
The exact implementation here depends on the latent semantic model you are using. 
   
Each of these LSMs come with various hyperparameters to tune. Make a choice on the parameters, and explicitly mention the reasons that led you to these decisions. You can use the validation set to optimize hyper parameters you see fit; motivate your decisions. In addition, mention clearly how the query/document representations were constructed for each LSM and explain your choices.

In this experiment, you will first obtain an initial top-1000 ranking for each query using TF-IDF in **Task 1**, and then re-rank the documents using the LSMs. Use TREC Eval to obtain the results and report on `NDCG@10`, Mean Average Precision (`MAP@1000`), `Precision@5` and `Recall@1000`.

Perform significance testing **[5 points]** (similar as in Task 1) in the class of semantic matching methods.

### Task 3:  Word embeddings for ranking [20 points] (open-ended) ###

First create word embeddings on the corpus we provided using [word2vec](http://arxiv.org/abs/1411.2738) -- [gensim implementation](https://radimrehurek.com/gensim/models/word2vec.html). You should extract the indexed documents using pyndri and provide them to gensim for training a model (see example [here](https://github.com/nickvosk/pyndri/blob/master/examples/word2vec.py)).
   
This is an open-ended task. It is left up you to decide how you will combine word embeddings to derive query and document representations. Note that since we provide the implementation for training word2vec, you will be graded based on your creativity on combining word embeddings for building query and document representations.

Note: If you want to experiment with pre-trained word embeddings on a different corpus, you can use the word embeddings we provide alongside the assignment (./data/reduced_vectors_google.txt.tar.gz). These are the [google word2vec word embeddings](https://code.google.com/archive/p/word2vec/), reduced to only the words that appear in the document collection we use in this assignment.

### Task 4: Learning to rank (LTR) [15 points] (open-ended) ###

In this task you will get an introduction into learning to rank for information retrieval.

You can explore different ways for devising features for the model. Obviously, you can use the retrieval methods you implemented in Task 1, Task 2 and Task 3 as features. Think about other features you can use (e.g. query/document length). Creativity on devising new features and providing motivation for them will be taken into account when grading.

For every query, first create a document candidate set using the top-1000 documents using TF-IDF, and subsequently compute features given a query and a document. Note that the feature values of different retrieval methods are likely to be distributed differently.

You are adviced to start some pointwise learning to rank algorithm e.g. logistic regression, implemented in [scikit-learn](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).
Train your LTR model using 10-fold cross validation on the test set. More advanced learning to rank algorithms will be appreciated when grading.

### Task 4: Write a report [15 points; instant FAIL if not provided] ###

The report should be a PDF file created using the [sigconf ACM template](https://www.acm.org/publications/proceedings-template) and will determine a significant part of your grade.

   * It should explain what you have implemented, motivate your experiments and detail what you expect to learn from them. **[10 points]**
   * Lastly, provide a convincing analysis of your results and conclude the report accordingly. **[10 points]**
      * Do all methods perform similarly on all queries? Why?
      * Is there a single retrieval model that outperforms all other retrieval models (i.e., silver bullet)?
      * ...

**Hand in the report and your self-contained implementation source files.** Only send us the files that matter, organized in a well-documented zip/tgz file with clear instructions on how to reproduce your results. That is, we want to be able to regenerate all your results with minimal effort. You can assume that the index and ground-truth information is present in the same file structure as the one we have provided.
