Skip to content

Latest commit

 

History

History
80 lines (56 loc) · 2.84 KB

README.md

File metadata and controls

80 lines (56 loc) · 2.84 KB

example workflow

research-keywords

Project for working with text descriptions of tests in Markdown format. It contains tools for various ways of comparing texts and searching for keywords. They combine classic NLP approaches with the power of transformers.

What is a keyword in this context? Generally, "keyword" is an approach in testing when some logical block is designated by a key phrase and need not be deciphered in the text of the test.

For example: we use the keyword "register" and mean by this a set of actions that need to be performed in the application in order to register and gain access. We record this set of actions in the keyword description, and in the test we simply use "register".

This project code is fully written in Python 3.9 and convenient to use as a console utility. Usage examples can be easily derived from tests, feel free to look through them. Synthetic examples of real test cases with respect to original design (can be found here) are also friends of yours.

Features

  • Text comparison

There are various methods of test case comparison available. Once text is preprocessed and prepared with either ngrams, random sentence split or RAKE algorithm, it then can be vectorized with Tfidf or BERT.

  • Keyword detection

Both format-specific keyword detection methods and generalized search methods are available.

Requirements

See here.

Best way to get all required packages at once is running the following line:

pip install -r requirements.txt

After installation of nltk you might also need to execute the following in python:

import nltk

nltk.download('punkt')
nltk.download('stopwords')

Installation

  • Clone this repo
  • Make sure you've satisfied the requirements
  • For text comparison run the line like:
python main.py path_to_cases_folder path/new_case.md [silent | print | log]

The last argument is optional and set silent by default.

References

  • "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation" by N. Reimers, I. Gurevych (source)
  • "Automatic Keyword Extraction from Individual Documents" by S. Rose, D. Engel, N. Cramer W. Cowley (source)
  • "Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK" by Vishwas B. Sharma (source)

License

MIT License