Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions patchwork/steps/FilterBySimilarity/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Documentation for Patchwork FilterBySimilarity Module

## Inputs
- `list`: A list of dictionaries.
- `keywords`: An annotated string indicating keywords for similarity comparison.
- `keys`: An annotated string for configuration.
- `top_k`: An annotated integer for configuration.

## Outputs
- `result_list`: A list of dictionaries sorted by similarity scores.

### `typed.py`
- Defines input and output data types for FilterBySimilarity step in Patchwork.
- Contains TypedDict subclasses for input and output data structures.

### `FilterBySimilarity.py`
- Implements the FilterBySimilarity step in Patchwork.
- Utilizes TF-IDF vectorizer and cosine similarity for similarity calculation.
- Skips the step if the input list is empty.
- Parses keys for text extraction if provided.
- Sorts and returns items from the input list based on similarity scores.

### `__init__.py`
- Placeholder file with no code.