[LabelModels] The `LabelModel.predict` should optionally return a numpy array #1247

dcfidalgo · 2022-03-10T16:23:19Z

Right now the LabelModel.predict method returns a list of records that include the predictions. This is fine if you want to log these records directly to Rubrix again for inspection, but not so good if you want to continue using the predictions for training.
Example from our tutorial:

label_model = Snorkel(weak_labels)

# get records with the predictions from the label model
records = label_model.predict()

# build a simple dataframe with text and the prediction with the highest score
df_train = pd.DataFrame([
    {"text": record.inputs["text"], "label": weak_labels.label2int[record.prediction[0][0]]}
    for record in records
])

# fit the classifier
classifier.fit(
    X=df_train.text,
    y=df_train.label
)

I would like to simplify this to:

label_model = Snorkel(weak_labels)

# get records with the predictions from the label model
records, y = label_model.predict(return_y_array=True)

# fit the classifier
classifier.fit(
    X=records.to_pandas().text,
    y=y
)

The text was updated successfully, but these errors were encountered:

dcfidalgo · 2022-04-28T10:56:55Z

For multi-label predictions you would need a threshold parameter, I don't think the increase in complexity is worth this feature.
#1442 addresses part of the solution described above by returning a DatasetForTextClassification instead of a list of records.

…fication (#1442) * feat: return DatasetForTextClassification instead of list of records * docs: update/improve weak label tutorials

…fication (#1442) * feat: return DatasetForTextClassification instead of list of records * docs: update/improve weak label tutorials (cherry picked from commit 559593a)

dcfidalgo self-assigned this Mar 10, 2022

dcfidalgo added the labeling label Mar 10, 2022

dcfidalgo mentioned this issue Apr 28, 2022

feat(#1247): label models predict method returns DatasetForTextClassification #1442

Merged

dcfidalgo closed this as completed in #1442 Apr 29, 2022

dcfidalgo pushed a commit that referenced this issue Apr 29, 2022

feat(#1247): label models predict method returns DatasetForTextClassi…

559593a

…fication (#1442) * feat: return DatasetForTextClassification instead of list of records * docs: update/improve weak label tutorials

frascuchon added this to Backlog in Release via automation May 4, 2022

frascuchon moved this from Backlog to Waiting Release in Release May 4, 2022

frascuchon moved this from Waiting Release to Ready to Release QA in Release May 4, 2022

dcfidalgo moved this from Ready to Release QA to Approved Release QA in Release May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LabelModels] The `LabelModel.predict` should optionally return a numpy array #1247

[LabelModels] The `LabelModel.predict` should optionally return a numpy array #1247

dcfidalgo commented Mar 10, 2022

dcfidalgo commented Apr 28, 2022

[LabelModels] The LabelModel.predict should optionally return a numpy array #1247

[LabelModels] The LabelModel.predict should optionally return a numpy array #1247

Comments

dcfidalgo commented Mar 10, 2022

dcfidalgo commented Apr 28, 2022

[LabelModels] The `LabelModel.predict` should optionally return a numpy array #1247

[LabelModels] The `LabelModel.predict` should optionally return a numpy array #1247