# Requirements

In [1]:
import textacy
corpus = textacy.Corpus.load("en_core_web_md", "./data/enwikinews/textacy_corpus.bin.gz")

# Assigment

This hands on session aims to improve Wikinews' existing "Related articles"-widget, by incorporating a two-step pre-retrieval and re-rank methodology. More specifically, we expect you to create:

### 1. Candidate retrieval

One method that takes as input a `textacy.Corpus` document, and as output a (ranked) list of candidate 'related articles'. A textacy-based reimplementation of a simple `candidate_retriever` could be:


In [65]:
def candidate_retriever(doc, corpus, limit=10):
    """
    Retrieve candidate articles "related" to doc in corpus
    # Fetches documents that have one or more categories in common
    """
    
    doc_categories = doc._.meta["categories"]
    
    match_func = lambda x: any([cat in doc_categories 
                                for cat in x._.meta.get("categories")])
    
    candidates = corpus.get(match_func, limit=limit)
    
    return candidates

In [66]:
def re_ranker(candidates):
    """ Sorts candidates by date created"""
    
    return sorted(candidates, key=lambda x: datetime.datetime.strptime(x._.meta['dt_created'], dt_format), 
                  reverse = True)

In [67]:
candidates = candidate_retriever(corpus[2344], corpus)

In [68]:
ranked_candidates = re_ranker(candidates)

In [69]:
for c in ranked_candidates:
    print(c._.meta['dt_created'], c._.meta['title'])

2010-10-31T12:08:17Z Cholera spreads to Port-au-Prince, five cases reported
2005-06-04T04:00:35Z First winter snowfall in New Zealand
2005-05-06T06:17:56Z Surprise win for RESPECT Party in UK 2005 General Election
2005-05-02T03:56:22Z Charity haircuts and collaborative art at spring festival in Cambridge, Massachusetts
2005-04-04T04:53:15Z UK Prime Minister sets 2005 General Election date
2005-02-18T13:55:08Z Crosswords/2005/February/19
2005-02-11T20:58:49Z Arthur Miller dies, aged 89
2005-02-02T05:15:08Z New drugs listed on Australian Pharmaceutical Benefits Scheme
2004-12-09T03:42:22Z Blair wishes to cooperate with the United States to reduce climatic change
2004-11-18T06:35:42Z Speculation arises as North Korean media drops Kim Jong Il's title
