diff --git a/patchwork/steps/FilterBySimilarity/README.md b/patchwork/steps/FilterBySimilarity/README.md new file mode 100644 index 000000000..308d2ba67 --- /dev/null +++ b/patchwork/steps/FilterBySimilarity/README.md @@ -0,0 +1,20 @@ +# Code Documentation + +## Inputs +### File: patchwork/steps/FilterBySimilarity/typed.py +- Two typed dictionary classes are defined: `FilterBySimilarityRequiredInputs` and `FilterBySimilarityInputs`. +- `FilterBySimilarityRequiredInputs` contains keys `list` and `keywords`. +- `FilterBySimilarityInputs` extends `FilterBySimilarityRequiredInputs` with additional keys `keys` and `top_k`. + +## Outputs +### File: patchwork/steps/FilterBySimilarity/FilterBySimilarity.py +- A class `FilterBySimilarity` is defined as a subclass of `Step` with specific input and output classes. +- The class takes a list of items, keywords, optional keys, and top-k parameter as inputs. +- The `run` method uses TF-IDF vectorization and cosine similarity to filter items based on similarity to keywords. +- The output is a dictionary with a list of items sorted by their average similarity score to the keywords, limited by `top_k`. + +## Description +- The code consists of a step in a system designed to filter items by their similarity to given keywords. +- It utilizes TF-IDF vectorization and cosine similarity for this task. +- The main logic is encapsulated in the `FilterBySimilarity` class. +- Input parameters such as list of items, keywords, keys, and top-k are crucial for determining the filtering process. \ No newline at end of file