Rank replies by relevance according to the context. Context usually consists of 3 replicas with several replies. Each reply has relevance and confidence - I use it's product as a target variable.
Use pretrained fastText model to represent each sequence as concatenated replicas embeddings and match it all with target. Let's create KFold cross-validation with 10 folds, train separate simple lightgbm regressor on every iteration and then calculate mean of all models predictions to make result more "stable".
- Install requirements.txt
pip3 install -r requirements.txt
- Change paths in config.py and then run
python3 prep.py
- Download fastText and build it using make.
- Download fastText model trained on wikipedia and common crawl.
- Fill get_vectors.txt with needed paths and copy it to the fastText folder.
- Run
bash get_vectors.txt
- Make numpy array from processed dataset
python3 prep_fasttext_data.py
- Train model
python3 fasttext_lgbm.py