Fair search DELTR for ElasticSearch
This library requires:
Pythondependencies are stored in the
ElasticSearchand Learning to rank plugin (LTR) for
- Start a supported version of
ElasticSearchand follow the installation steps
- Start a supported version of
There are several steps you need to take. In the following, we describe how to use the adapter to search on a collection of e-mails from W3C, included in /data/, which is one of the examples we used in the paper (see bibliography below).
Index the training corpus
Index the training corpus. We have a sample data set in
Make sure to unzip them first. Then, you can index them with:
python deltr.py --index --document-dir ./data/candidates --index-name resumes
This will (re)index the
JSON files under the folder
/data/candidates in an index named
Later, at any point, you can add the real documents over which you want to search using the trained ranking model. Those documents do not need to be in the same index, most commonly they will be in a different index.
Setup the features
Create the features you want to use in LTR. We have created sample features in
Next, we need to upload these features to ElasticSearch.
python deltr.py --prepare --feature-set-file ./data/features.json --feature-set-name w3c
This will upload the features defined in
/data/features.json in ElasticSearch under the name
Train the model
After, we have defined and uploaded the features and indexed the data, we can now create a model to use for retrireval.
In order to build a DELTR model, we need to provide it with some training data. We have created a sample train set contained in two files:
/data/judgements.csv. You can train a model with:
python deltr.py --train --queries ./data/queries.csv --judgements ./data/judgements.csv --model deltr_vanilla --feature-set-name w3c
This is going to train a DELTR model (with default parameters) name
deltr_vanilla using the questions in
judgements for those queries in
/data/judgements.csv, with the features defined in the feature set name
Debugging the model by observing the feature values
The library will use the features we defined in LTR to train the model. So, for debugging purposes, the library
features.csv file in the same folder where this is executed. There you can see what features were generated for each document.
It also creates a
model.txt where you can see the final model, that was uploaded in LTR.
Note: You can also specify tuning parameters from the command line as well. E.g.
python deltr.py --train --queries ./data/queries.csv --judgements ./data/judgements.csv --model deltr_not_vanilla --feature-set-name w3c --gamma 0.8
This will create a new model with the same files, only it will set the
gamma parameter to 0.8. Here you can see how to check all options.
Search with the model
Once we have the model, we can start using to do some searches.
python3 deltr.py --search --query html --model deltr_vanilla --index-name resumes
This will run a query with the keyword
html using the model
deltr_vanilla on the index
Note: You can also see a verbose output, which will contain the features calculated for each document returned.
python3 deltr.py --search --query html --model deltr_vanilla --index-name resumes --verbose
Run the following command to get the full options list
python deltr.py --help
- Clone this repository
git clone https://github.com/fair-search/fairsearch-deltr-for-elasticsearch
- Change directory to the directory where you cloned the repository
- Use any IDE to work with the code
The DELTR algorithm is described in this paper:
- Meike Zehlike, Gina-Theresa Diehn, Carlos Castillo. "Reducing Disparate Exposure in Ranking: A Learning to Rank Approach." preprint arXiv:1805.08716 (2018).
For any questions contact Meike Zehlike.
You can also see: