# Assignment 1: Information Retrieval (10 marks)

This assignment is based on the Assignment 1 Demo tutorials.

In this assignment, your task is to index a new document collection into *Elasticsearch*, and then measure search performance based on predefined queries.

A new document collection containing more than 10,000 goverment sites description, and a set of predefined queries, is provided for this assignment.

Throughout this assginment: 
1. You will develop a better understanding of indexing, including the tokeniser, parser, and normaliser components, and how to improve the search performance given a predefined evaluation metric, 
2. You will develop a better understanding of search algorithms, and how to obtain better search results, and 
3. You will find the best way to combine an indexer and search algorithm to maximise your performance.

Below, you will solve five programming questions, and three written questions. 

We will check the correctness of your code and the overall performance score.

- Write your code after `### Your code here`, and remove `raise NotImplementedError` after implementation.
- Write answers in this notebook file in the designated cells, and upload the file to the Wattle submission site. **Please rename and submit jupyter notebook file (`Assignment1.ipynb`) to `your_uid.ipynb` (e.g. `u1234567.ipynb`) with your written answers therein - FAILURE TO DO SO WILL RESULT IN -0.5 POINTS**. 

*Hint*: After finishing coding your notebook, select from the Jupyter Notebook interface the menu option Kernel -> Restart & Run All. After the execution of each block is finished, inspect the output, save the notebook and shutdown the kernel. Only now you can safely manipulate the .ipynb file, which contains code, explanations and output.

## Coding component (Q1 - Q5), 4 marks

### Q1: Index Gov dataset (0.5 marks)

For this assignment we will be working with a corpus of government documents, located in the gov folder. 

The gov folder contains three sub-folders; documents, qrels and topics. The documents folder consists of sub-folders, each of which contain multiple documents. Topics and qrels contain search queries and corresponding ground truth relevant documents, respectively.

Your first job is to index the documents as we have done in the tutorial exercises (Assignment 1 Demo).

Note that depending on your machine, indexing may take several minutes to a few hours. You may implement multi-threaded version of indexing to mitigate this problem.

Below is provided the basic code configuration for indexing:

In [1]:
# basic configuration for indexing
basic_settings = {
  "mappings": {
    "doc": {
      "properties": {
        "filename": {
          "type": "keyword",
          "index": False,
        },
        "path": {
          "type": "keyword",
          "index": False,
        },
        "text": {
          "type": "text",
          "similarity": "boolean",
          "analyzer": "my_analyzer",
          "search_analyzer": "my_analyzer"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "filter": [
            "stop"
          ],
          "char_filter": [
            "html_strip"
          ],
          "type": "custom",
          "tokenizer": "whitespace"
        }
      }
    }
  }   
}

You need to implement the below function `build_gov_index`. Don't forget to remove `raise NotImplementedError` after implementation.

In [2]:
import os
from collections import namedtuple

#  A document class with following attributes:
#   filename: document filename
#   text: body of documment
#   path: path of document

Doc = namedtuple('Doc', 'filename path text')

def read_doc(doc_path, encoding):
    '''
        reads a document from path
        input:
            - doc_path : path of document
            - encoding: encoding
        output: =>
            - doc: instance of Doc namedtuple
    '''
    filename = doc_path.split('/')[-1]
    fp = open(doc_path, 'r', encoding = encoding)
    text = fp.read().strip()
    fp.close()
    return Doc(filename = filename, text = text, path = doc_path)

def read_dataset(path, encoding = "ISO-8859-1"):
    '''
        reads multiple documents from path
        input:
            - doc_path : path of document
            - encoding: encoding
        output: =>
            - docs: instances of Doc namedtuple returned as generator
    '''
    for root, dirs, files in os.walk(path):
        for doc_path in files:
            yield read_doc(root + '/' + doc_path, encoding)

In [3]:
from elasticsearch import Elasticsearch
from elasticsearch import helpers

ES_HOSTS = ['http://localhost:9200']
DOCS_PATH = 'gov/documents'
INDEX_NAME = 'gov'
DOC_TYPE = 'doc'

def build_gov_index(es_conn, index_name, doc_path, settings):
    # TODO implement function that:
    #  1. Create an index with `index_name`. If `index_name` already exists, remove the index first.
    #  2. Index the documents under doc_path, including subfolders, into elasticsearch (hint: read demo carefully)
    # Note that this function will be used throughout this assignment    
    
    # create the index if it doesn't exist
    if es_conn.indices.exists(index_name):
        es_conn.indices.delete(index = index_name)
        print('index `{}` deleted'.format(index_name))
    es_conn.indices.create(index = index_name, ignore = 400, body = settings)
    print('index `{}` created'.format(index_name))     
    
    counter_idx_failed = 0, 0 # counters

    # retrive & index documents
    def gendata():
        counter_read = 0
        for doc in dataset:
            yield {
                "_index": index_name,
                "_type": DOC_TYPE,
                "_id" : doc.filename,
                "_source": doc._asdict(),
            }
            counter_read += 1
        
   
    res = helpers.bulk(es_conn, gendata())
    print ("Indexed {} docs to index '{}'".format(res[0], index_name))
    
    # refresh after indexing
    es_conn.indices.refresh(index=index_name)  

es_conn = Elasticsearch(ES_HOSTS)
dataset = read_dataset(DOCS_PATH)

In [4]:
build_gov_index(es_conn, INDEX_NAME, dataset, basic_settings)

index `gov` deleted
index `gov` created
Indexed 33848 docs to index 'gov'


In [5]:
es_conn.count()

{'count': 33848,
 '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}}

### Q2: Search and performance measure (0.5 marks)

For this second task, you will first need to read the topics/gov.topics file. 

As we have done in the demo tutorial, each file is formatted as `query_id query_terms`, where
query_id is a numerical number, and query_terms consists of multiple keywords as search terms. 

Your job is to read the query file and search using the provided search function. You will need to write the output of the search results to an output.txt file in the trec-eval standard format used in the demo tutorial. 

As a reminder, this means that each result of each query should be put in a line in the output.txt like this:

`01 Q0 email09 0 1.23 my_IR_system1`

`01 Q0 email09 1 1.11 my_IR_system1`

`02 Q0 email07 0 1.08 my_IR_system1`

where '01' is the query ID; ignore 'Q0'; 'emailxx' is the name of the file; '0' (or '1' or some other integer number) is the rank of this result; '1.23' (or '1.08' or some other number) is the score of this result; and 'my_IR_system1' is the name for your retrieval system. 

Note that you are only allowed to write 10-documents at most for each query. If your output file contains more than 10 documents per query, you will get 0 score for this question.

**Please rename your output_q2.txt file to `YourUID_output_q2.txt` eg `u1234567_output_q2.txt`, before submitting to Wattle - FAILURE TO DO SO WILL RESULT IN -0.5 POINTS**.

Below is some code to get you started and for you to complete:

In [6]:
def search(query_string, es_conn, index_name):
    '''
        searches for query_string with default search algorithm
        input:
            - query_string: a query
            - es_conn: elasticsearch connection
            - index_name: name of index
        output:
            - a generator of tuple (filename, score)

    '''
    res = es_conn.search(index = index_name,
        body = {
            "_source": [ "filename"],
            "query": {
                "query_string": {
                    "query": query_string,
                }
            }
        }
    )
    return res['hits']['hits']

# TODO: 
#       Read query file from `query_path`, search using `search_fn`, and 
#       Write top 10 outputs per query to `output_file`
#       Note that the function takes a search function as an argument. You can directly call the search function
#       as `result = search_fn(query_string, es_conn, index_name)` within the function.
#       This function will be used throughout this assignment
def read_search_write_output(search_fn, query_path, output_file):
    with open(query_path, 'r') as f:
            query_strings = f.readlines()
    
    with open(output_file, 'w') as output:
        for query_string in query_strings:
            query_terms = query_string[query_string.index(' ')+1:]
            query_terms = query_terms.rstrip()
            query_terms = query_terms.replace("\/", " OR ")
            query_terms = query_terms.replace("/\\", " AND ")

            matches = search_fn(query_terms, es_conn, INDEX_NAME)

            if matches is not None:
                matches = sorted(matches, key = lambda x: -x['_score'])
                for i in range(10):
                    if (i >= len(matches)):
                        break
                    output.write('{} Q0 {} {} {} my_IR_system1\n'.format(
                        query_string.split(' ')[0],
                        matches[i]['_source']['filename'], # filename
                        str(i),
                        matches[i]['_score'], # score
                    ))

In [7]:
query_path = 'gov/topics/gov.topics'
output_file = 'u6381103_output_q2.txt'
read_search_write_output(search, query_path, output_file)

Once you have written the results of your query to an output file, you can run trec-eval on your output file and the provided gov.qrel file to evaluate your system. Trec-eval provides many different measures of quality, but for the purposes of this assignment you will use precision@10 (p_10 in trec-eval output) to measure the performance of your systems.

In [8]:
!./trec_eval/trec_eval ./gov/qrels/gov.qrels u6381103_output_q2.txt

runid                 	all	my_IR_system1
num_q                 	all	31
num_ret               	all	310
num_rel               	all	190
num_rel_ret           	all	22
map                   	all	0.0484
gm_map                	all	0.0004
Rprec                 	all	0.0874
bpref                 	all	0.1200
recip_rank            	all	0.1936
iprec_at_recall_0.00  	all	0.1969
iprec_at_recall_0.10  	all	0.1646
iprec_at_recall_0.20  	all	0.1184
iprec_at_recall_0.30  	all	0.0703
iprec_at_recall_0.40  	all	0.0419
iprec_at_recall_0.50  	all	0.0323
iprec_at_recall_0.60  	all	0.0000
iprec_at_recall_0.70  	all	0.0000
iprec_at_recall_0.80  	all	0.0000
iprec_at_recall_0.90  	all	0.0000
iprec_at_recall_1.00  	all	0.0000
P_5                   	all	0.0774
P_10                  	all	0.0710
P_15                  	all	0.0473
P_20                  	all	0.0355
P_30                  	all	0.0237
P_100                 	all	0.0071
P_200                 	all	0.0035
P_500                 	all	0.0014
P_1000               

### Q3: Improving the search algorithm: compare similarity algorithms (1 mark)

*Elasticsearch* also provides multiple configurable scoring algorithms. 

For this task, you will be asked to find a better similarity module to improve the search performance. Please refer [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/similarity.html) for a better understanding of the configurable elasticsearch similarity modules.

Here's some code to get you started and for you to complete:

In [9]:
# TODO: define your own analyzer for indexing and searching
q3_settings = {
  "mappings": {
    "doc": {
      "properties": {
        "filename": {
          "type": "keyword",
          "index": False,
        },
        "path": {
          "type": "keyword",
          "index": False,
        },
        "text": {
            "type": "text",
            "similarity": "my_similarity",
            "analyzer": "my_analyzer",
            "search_analyzer": "my_analyzer"
        }
      }
    }
  },
  "settings": {
    "index":{
        "similarity" : {
            "my_similarity" : {
                "type" : "BM25",
                "k1" : "2.0",
                "b" : "0.75",
                "discount_overlaps" : "true"
            }
        }
    },
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "filter": [
            "stop"
          ],
          "char_filter": [
            "html_strip"
          ],
          "type": "custom",
          "tokenizer": "whitespace"
        }
      }
    }
  }
}


In [10]:
es_conn = Elasticsearch(ES_HOSTS, maxsize=25)
dataset = read_dataset(DOCS_PATH)
output_file = 'u6381103_output_q3.txt'

build_gov_index(es_conn, INDEX_NAME, DOCS_PATH, q3_settings)
read_search_write_output(search, query_path, output_file)

index `gov` deleted
index `gov` created
Indexed 33848 docs to index 'gov'


In [11]:
!./trec_eval/trec_eval ./gov/qrels/gov.qrels u6381103_output_q3.txt

runid                 	all	my_IR_system1
num_q                 	all	31
num_ret               	all	310
num_rel               	all	190
num_rel_ret           	all	50
map                   	all	0.1953
gm_map                	all	0.0099
Rprec                 	all	0.2226
bpref                 	all	0.2922
recip_rank            	all	0.4729
iprec_at_recall_0.00  	all	0.4831
iprec_at_recall_0.10  	all	0.4562
iprec_at_recall_0.20  	all	0.4000
iprec_at_recall_0.30  	all	0.3185
iprec_at_recall_0.40  	all	0.2373
iprec_at_recall_0.50  	all	0.1738
iprec_at_recall_0.60  	all	0.1137
iprec_at_recall_0.70  	all	0.0935
iprec_at_recall_0.80  	all	0.0438
iprec_at_recall_0.90  	all	0.0161
iprec_at_recall_1.00  	all	0.0161
P_5                   	all	0.2581
P_10                  	all	0.1613
P_15                  	all	0.1075
P_20                  	all	0.0806
P_30                  	all	0.0538
P_100                 	all	0.0161
P_200                 	all	0.0081
P_500                 	all	0.0032
P_1000               

In [12]:
es_conn.indices.get_settings()

{'gov': {'settings': {'index': {'number_of_shards': '5',
    'provided_name': 'gov',
    'similarity': {'my_similarity': {'discount_overlaps': 'true',
      'b': '0.75',
      'type': 'BM25',
      'k1': '2.0'}},
    'creation_date': '1566052602097',
    'analysis': {'analyzer': {'my_analyzer': {'filter': ['stop'],
       'char_filter': ['html_strip'],
       'type': 'custom',
       'tokenizer': 'whitespace'}}},
    'number_of_replicas': '1',
    'uuid': 'CQ3ktJHcQpuL5pRO_f9ITQ',
    'version': {'created': '6030099'}}}}}

Upload the final output to Wattle, but **please first rename output_file to YourUID_output_q3.txt eg u1234567_output_q3.txt - FAILURE TO DO SO WILL RESULT IN -0.5 POINTS**.

### Q4: Improving the indexer: compare different ways of indexing (1 mark)

For this part, you will be asked to change the configuration of indexer (`basic_settings`) to improve the search performance.

Please look at the elastic search official document [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html) for better understanding of configuration and other options.

Note that you can check how your tokeniser tokenises your input string via the `analyze_query` function provided in the demo code.

In [13]:
# TODO: configure settings to define your own analyzer for indexing
q4_settings = {
  "mappings": {
    "doc": {
      "properties": {
        "filename": {
          "type": "keyword",
          "index": False,
        },
        "path": {
          "type": "keyword",
          "index": False,
        },
        "text": {
            "type": "text",
            "similarity" : "my_similarity",
            "analyzer": "my_analyzer",
            "search_analyzer": "my_analyzer",            
        }
      }
    }
  },
  "settings": {
    "index":{
        "similarity" : {
            "my_similarity" : {
                "type" : "BM25",
                "k1" : "2.0",
                "b" : "0.75",
                "discount_overlaps" : "true"
            }
        }
    },
    "analysis": {
        "filter": {
            "english_stop": {
              "type":       "stop",
              "stopwords":  "_english_" 
            },
            "english_stemmer": {
              "type":       "stemmer",
              "language":   "english"
            },
            "english_possessive_stemmer": {
              "type":       "stemmer",
              "language":   "possessive_english"
            },
        },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
              "asciifolding",
              "english_possessive_stemmer",
              "lowercase",
              "english_stop",
              "english_stemmer",
              "porter_stem"
          ],
          "char_filter": [
              "html_strip"
          ]
        }
      }
    }
  }
}

In [14]:
es_conn = Elasticsearch(ES_HOSTS, maxsize=25)
dataset = read_dataset(DOCS_PATH)
output_file = 'u6381103_output_q4.txt'

build_gov_index(es_conn, INDEX_NAME, DOCS_PATH, q4_settings)
read_search_write_output(search, query_path, output_file)

index `gov` deleted
index `gov` created
Indexed 33848 docs to index 'gov'


In [15]:
!./trec_eval/trec_eval ./gov/qrels/gov.qrels u6381103_output_q4.txt

runid                 	all	my_IR_system1
num_q                 	all	31
num_ret               	all	310
num_rel               	all	190
num_rel_ret           	all	108
map                   	all	0.5052
gm_map                	all	0.0890
Rprec                 	all	0.5046
bpref                 	all	0.5963
recip_rank            	all	0.6530
iprec_at_recall_0.00  	all	0.6818
iprec_at_recall_0.10  	all	0.6778
iprec_at_recall_0.20  	all	0.6630
iprec_at_recall_0.30  	all	0.6401
iprec_at_recall_0.40  	all	0.6146
iprec_at_recall_0.50  	all	0.5895
iprec_at_recall_0.60  	all	0.5111
iprec_at_recall_0.70  	all	0.4616
iprec_at_recall_0.80  	all	0.3814
iprec_at_recall_0.90  	all	0.2427
iprec_at_recall_1.00  	all	0.2427
P_5                   	all	0.5097
P_10                  	all	0.3484
P_15                  	all	0.2323
P_20                  	all	0.1742
P_30                  	all	0.1161
P_100                 	all	0.0348
P_200                 	all	0.0174
P_500                 	all	0.0070
P_1000              

In [16]:
def analyze_query(text, es_conn, index_name):
    '''
        analyzes any text with my_analyzer defined in es_settings.json
        input:
            - text: a query text
            - es_conn: elasticsearch connection
            - index_name: name of index
        output:
            - a list of tokens
    '''

    tokens = es_conn.indices.analyze(
        index = index_name,
        body = {"text": text, "analyzer": "my_analyzer"})['tokens']

    return [token_row["token"].encode('utf-8') for token_row in tokens]

print(analyze_query(
    "Let's see. how the analyzer analyse analyze tokenize tokenise. this sentence.!\
     Universal university in the universe is universal universal univalent univalved univalves universes univocals univalve", es_conn, INDEX_NAME))

[b'let', b'see', b'how', b'analyz', b'anali', b'analyz', b'token', b'tokeni', b'sentenc', b'univ', b'univ', b'univ', b'univ', b'univ', b'univ', b'univalv', b'univalv', b'univ', b'univoc', b'univalv']


Upload the final output to Wattle, but **please first rename output_file to YourUID_output_q4.txt eg u1234567_output_q3.txt - FAILURE TO DO SO WILL RESULT IN -0.5 POINTS**.

### Q5: Tolerant retrieval: wildcard queries (1 mark)

*Elasticsearch* provides wildcard query search. You can use wildcard expressions cosisting of '*' and '?' to search.  

For this task, you can reuse the previous index, i.e., *q4_settings*. Refer to the [link](https://www.elastic.co/guide/en/elasticsearch/reference/6.3/query-dsl-wildcard-query.html) to see how to search with wildcard queries. 

For each query term from 'gov.topics', replace last two characters with any wildcard expression. For example, the first topic from 'gov.topics' is 'mining gold silver coal'. Instead, you search 'mini&ast; go?? silv&ast; co??'. 

In [17]:
def my_search(query_string, es_conn, index_name):
    wildcard_query = query_string.split(' ')
    for i in range(len(wildcard_query)):
        wildcard_query[i] = wildcard_query[i][:-2] + "*"
    query_string = str.join(" ", wildcard_query)
    
    res = es_conn.search(index = index_name,
        body = {
            "_source": ["filename"],
            "query": {
                "query_string": {
                    "query": query_string
                }
            }
        }
    )
    return res['hits']['hits']

In [18]:
es_conn = Elasticsearch(ES_HOSTS, maxsize=25)
dataset = read_dataset(DOCS_PATH)
output_file = 'u6381103_output_q5.txt'
read_search_write_output(my_search, query_path, output_file)

In [19]:
!./trec_eval/trec_eval ./gov/qrels/gov.qrels u6381103_output_q5.txt

runid                 	all	my_IR_system1
num_q                 	all	30
num_ret               	all	300
num_rel               	all	186
num_rel_ret           	all	19
map                   	all	0.0357
gm_map                	all	0.0001
Rprec                 	all	0.0644
bpref                 	all	0.0892
recip_rank            	all	0.1056
iprec_at_recall_0.00  	all	0.1111
iprec_at_recall_0.10  	all	0.0963
iprec_at_recall_0.20  	all	0.0630
iprec_at_recall_0.30  	all	0.0607
iprec_at_recall_0.40  	all	0.0552
iprec_at_recall_0.50  	all	0.0167
iprec_at_recall_0.60  	all	0.0167
iprec_at_recall_0.70  	all	0.0167
iprec_at_recall_0.80  	all	0.0000
iprec_at_recall_0.90  	all	0.0000
iprec_at_recall_1.00  	all	0.0000
P_5                   	all	0.0667
P_10                  	all	0.0633
P_15                  	all	0.0422
P_20                  	all	0.0317
P_30                  	all	0.0211
P_100                 	all	0.0063
P_200                 	all	0.0032
P_500                 	all	0.0013
P_1000               

## Written component (Q6 - Q9), 6 marks

Answer the following questions based on your implementation of Questions 1-5:

### Q6 (1.5 marks): What changs did you make to the search similarity to improve the performance of the system? Why do you think it improved the performance?

(provide answers below using bullet points with 2~3 items)

* At first, our system used the basic settings, which uses boolean similarity. It simply compares the terms and returns a score based on whether the query terms match or not. If a document contain the exact terms, its score is 1. If it doesn't, the score is 0. It does not consider anything else.

* We change the similarity type to BM25, which is considered to be a state-of-the-art ranking function. It uses term frequency, inverse document frequency, and field-length normalisation, which will all greatly improve the performance of the system compared to boolean similarity. Term frequency measures the amount of times a word appears in a document because a document is usually more relevant if it contains many occurrences of that word. Inverse document frequency gives a higher value to uncommon words because words that occur a lot are usually less significant. Field-length normalisation considers shorter fields to have more weight than longer fields.

* However, there are a few differences with the classic TF/IDF algorithm in Elasticsearch which makes it perform better. It has nonlinear term-frequency saturation, which means terms that appear 10 times in a document have almost the same impact as terms that appear 1000 times. I changed k1 to value higher than default, which will result in a slower saturation. Besides, the field-length normalisation also takes the average length of the field into account. 


Source: https://www.elastic.co/guide/en/elasticsearch/guide/2.x/pluggable-similarites.html

### Q7 (1.5 marks): What changes did you make to the indexer to improve the performance of the system? Why do you think it improved the performance?

(provide answer below using bullet points with 2~3 items (Check [this](https://sourceforge.net/p/jupiter/wiki/markdown_syntax/#md_ex_lists) if you are not familiar with markdown syntax))

* The tokeniser is changed to the standard one instead of whitespace. The original tokeniser simply splits text whenever there is a whitespace character. The standard tokeniser improves performance by splitting based on grammer.

* Then, we added a number of token filters to modify the tokens to improve the performance. "asciifolding" tries to convert characters that are not in the first 127 ASCII characters into their ASCII equivalents, for example some other symbols or Unicode characters. We also convert all text into lower case.

* In addition, we remove stop words and apply a few different stemmers. Stop words are the most common words such as "the" and "is" which are of very little importance. Stemming reduces words to their "stem" form so words of different forms that have similar meaning are reduced to the same "stem" form. This improves the performane because we want to search for documents that contain relevant terms as well, not just the exact terms.

Source: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html

### Q8 (1.5 marks): Apart from Precision@10, what other metrics can be used to measure the performance of the developed IR system for the government document collection? Provide two metrics and explain why they would be suited for this particular government IR system.

(provide answers below using bullet points with 2~3 items)

* Mean average precision (map): Precision is the measure of true positives over total positive results. This gives us a measure of how accurate our predictions are.

* Reciprocal rank (recip_rank): The reciprocal rank of a query response is the multiplicative inverse of the rank of the  correct answer. Since our output is sorted by the score, the documents that are supposed to be the most relevant are on the top. Thus, higher ranked outputs that are not actually relevant should get a higher "penalty".

### Q9 (1.5 marks): How do wildcard queries affect performance of the retrieval in terms of measures you answered for Q8? Also provide some situations when wildcard queries are useful. 

We can use wildcard queries when when we are uncertain of the spelling of a query term. For example, we may not be certain of some unfamiliar words in another language. In addtion, wildcard queries are useful when the term has multiple variants. For example, "analyse, analyze and analysis" and words that end with -ise/-ize. Wildcard queries are also useful on words that are stemmed. We could just search for the stemmed form of a word and add a wildcard at the end to get the result we want, which will work regardless of whether stemming is actually performed.

**Academic Misconduct Policy**: All submitted written work and code must be your own (except for any provided starter code, of course) â€“ submitting work other than your own will lead to both a failure on the assignment and a referral of the case to the ANU academic misconduct review procedures: ANU Academic Misconduct Procedures