Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions patchwork/steps/FilterBySimilarity/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Code Documentation

## Inputs
### File: patchwork/steps/FilterBySimilarity/typed.py
- Two typed dictionary classes are defined: `FilterBySimilarityRequiredInputs` and `FilterBySimilarityInputs`.
- `FilterBySimilarityRequiredInputs` contains keys `list` and `keywords`.
- `FilterBySimilarityInputs` extends `FilterBySimilarityRequiredInputs` with additional keys `keys` and `top_k`.

## Outputs
### File: patchwork/steps/FilterBySimilarity/FilterBySimilarity.py
- A class `FilterBySimilarity` is defined as a subclass of `Step` with specific input and output classes.
- The class takes a list of items, keywords, optional keys, and top-k parameter as inputs.
- The `run` method uses TF-IDF vectorization and cosine similarity to filter items based on similarity to keywords.
- The output is a dictionary with a list of items sorted by their average similarity score to the keywords, limited by `top_k`.

## Description
- The code consists of a step in a system designed to filter items by their similarity to given keywords.
- It utilizes TF-IDF vectorization and cosine similarity for this task.
- The main logic is encapsulated in the `FilterBySimilarity` class.
- Input parameters such as list of items, keywords, keys, and top-k are crucial for determining the filtering process.