Skip to content

Commit

Permalink
add description
Browse files Browse the repository at this point in the history
  • Loading branch information
Hamed Zamani committed Dec 14, 2019
1 parent 91dace6 commit 85ccf06
Show file tree
Hide file tree
Showing 6 changed files with 39 additions and 8 deletions.
32 changes: 28 additions & 4 deletions README.md
Expand Up @@ -10,7 +10,33 @@ an interactive mode, where the back end can be *fully algorithmic* or a *wizard
Macaw could be of interest to the researchers and practitioners working on information retrieval, natural language
processing, and dialogue systems.

## Features
## Macaw Architecture
Macaw has a modular architecture, which allows further development and extension. The high-level architecture of Macaw
is presented below:

![The high-level architecture of Macaw](macaw-arch.jpg)

For more information on each module in Macaw, refer to this paper.

#### Interfaces
Macaw supports the following interfaces:
+ Standard IO: For *development* purposes
+ File IO: For *batch experiments*
+ Telegram bot: For interaction with real users

Here is an example of the Telegram interface for Macaw. It supports multi-modal interactions (text, speech, click, etc).
![Telegram interface for Macaw](macaw-example-tax.jpg)


#### Retrieval
Macaw features the following search engines:
+ [Indri](http://lemurproject.org/indri.php): an open-source search engine that can be used for any arbitrary text
collection.
+ Bing web search API: sending a request to the Bing API and getting the results.

#### Answer Selection / Generation
For question answering, Macaw only features [the DrQA model](https://github.com/facebookresearch/DrQA) in its current
version.


## Installation
Expand Down Expand Up @@ -96,7 +122,7 @@ need a speech support from Macaw, you can skip this step. To install FFmpeg, run
sudo apt-get install
```

#### Step 5: Installing Macaw
#### Step 6: Installing Macaw
After cloning Macaw, use the following commands for installation:
```
cd macaw
Expand All @@ -117,8 +143,6 @@ When the MongoDB server runs, open another terminal and run one of the macaw app
python3 live_main.py
```

## Development and Extension in Macaw


## Bug Report and Feature Request
For bug report and feature request, you can open an issue in github, or send an email to
Expand Down
Binary file added macaw-arch.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added macaw-example-tax.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions macaw/core/retrieval/bing_api.py
Expand Up @@ -49,10 +49,10 @@ def retrieve(self, query):
for i in range(min(len(search_results['webPages']['value']), self.results_requested)):
id = search_results['webPages']['value'][i]['url']
title = search_results['webPages']['value'][i]['name']
text = search_results['webPages']['value'][i]['snippet']
snippet = search_results['webPages']['value'][i]['snippet']
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0'}
text = html_to_clean_text(requests.get(id, headers=headers).content)
score = 10 - i # this is not a score returned by Bing
score = 10 - i # this is not a score returned by Bing (just 10 - document rank)
results.append(Document(id, title, text, score))
return results

Expand All @@ -67,5 +67,7 @@ def get_doc_from_index(self, doc_id):
A Document from the collection whose ID is equal to the given doc_id. For some reasons, the method returns
a list of Documents with a length of 1.
"""
# Telegram has a nice interface for loading websites. Therefore, we decided to only pass the doc_id (URL). This
# can be simply enhanced by the title and the content of the document.
doc = Document(doc_id, doc_id, doc_id, -1)
return [doc]
4 changes: 2 additions & 2 deletions macaw/live_main.py
Expand Up @@ -78,8 +78,8 @@ def run(self):
# These are parameters used by the retrieval model.
retrieval_params = {'query_generation': 'simple', # the model that generates a query from a conversation history.
'use_coref': True, # True, if query generator can use coreference resolution, otherwise False.
'search_engine': 'indri', # the search engine. It can be either 'indri' or 'bing'.
'bing_key': '7a9b8a186d414184abecb3ac6ef7d296', # Bing API key
'search_engine': 'bing', # the search engine. It can be either 'indri' or 'bing'.
'bing_key': '008c3b49dd9d401eb42fe3d6c5e3527a', # Bing API key
'search_engine_path': '/mnt/e/indri-5.11/', # The path to the indri toolkit.
'col_index': '/mnt/e/indri-index/robust_indri/indri_index', # The path to the indri index.
'col_text_format': 'trectext', # collection text format. Standard 'trectext' is only supported.
Expand Down
5 changes: 5 additions & 0 deletions macaw/util/__init__.py
Expand Up @@ -31,6 +31,10 @@ def __init__(self, params):
self.params = params
self.corenlp = StanfordCoreNLP(self.params['corenlp_path'], quiet=False)

# Pre-fetching the required models.
props = {'annotators': 'coref', 'pipelineLanguage': 'en', 'ner.useSUTime': False}
self.corenlp.annotate('', properties=props)

def get_coref(self, text):
"""
Run co-reference resolution on the input text.
Expand All @@ -42,4 +46,5 @@ def get_coref(self, text):
"""
props = {'annotators': 'coref', 'pipelineLanguage': 'en', 'ner.useSUTime': False}
result = json.loads(self.corenlp.annotate(text, properties=props))

return result

0 comments on commit 85ccf06

Please sign in to comment.