Skip to content

Commit

Permalink
Replaced default argument min_df from 20 to 0 on TFIDF (#213)
Browse files Browse the repository at this point in the history
* Remove default min_df=20 argument from TFIDF

On sklearn min_df defaults to 0, it should default to 0 as well in fklearn. This argument is not specified in the docstring and was messing up with the performance of my sentiment classifier. I lost 30 points of recall because of this argument and had to spent hours figuring out that this was the problem. Could we change that  
argument to be consistent with sklearn?

* Update src/fklearn/training/classification.py

Co-authored-by: Hellen Lima <hellen.lima@nubank.com.br>

Co-authored-by: Hellen Lima <hellen.lima@nubank.com.br>
  • Loading branch information
raphaeldayan-nubank and hellenlima committed Oct 21, 2022
1 parent df1f3c1 commit 8a40e86
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/fklearn/training/classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,7 @@ def nlp_logistic_classification_learner(df: pd.DataFrame,
"""

# set default params
default_vect_params = {"strip_accents": "unicode", "min_df": 20}
default_vect_params = {"strip_accents": "unicode", "min_df": 1}
merged_vect_params = default_vect_params if not vectorizer_params else merge(default_vect_params, vectorizer_params)

default_clf_params = {"C": 0.1, "multi_class": "ovr", "solver": "liblinear"}
Expand Down

0 comments on commit 8a40e86

Please sign in to comment.