Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LabelModels] The LabelModel.predict should optionally return a numpy array #1247

Closed
dcfidalgo opened this issue Mar 10, 2022 · 1 comment · Fixed by #1442
Closed

[LabelModels] The LabelModel.predict should optionally return a numpy array #1247

dcfidalgo opened this issue Mar 10, 2022 · 1 comment · Fixed by #1442
Assignees
Projects

Comments

@dcfidalgo
Copy link
Contributor

Right now the LabelModel.predict method returns a list of records that include the predictions. This is fine if you want to log these records directly to Rubrix again for inspection, but not so good if you want to continue using the predictions for training.
Example from our tutorial:

label_model = Snorkel(weak_labels)

# get records with the predictions from the label model
records = label_model.predict()

# build a simple dataframe with text and the prediction with the highest score
df_train = pd.DataFrame([
    {"text": record.inputs["text"], "label": weak_labels.label2int[record.prediction[0][0]]}
    for record in records
])

# fit the classifier
classifier.fit(
    X=df_train.text,
    y=df_train.label
)

I would like to simplify this to:

label_model = Snorkel(weak_labels)

# get records with the predictions from the label model
records, y = label_model.predict(return_y_array=True)

# fit the classifier
classifier.fit(
    X=records.to_pandas().text,
    y=y
)
@dcfidalgo
Copy link
Contributor Author

For multi-label predictions you would need a threshold parameter, I don't think the increase in complexity is worth this feature.
#1442 addresses part of the solution described above by returning a DatasetForTextClassification instead of a list of records.

dcfidalgo pushed a commit that referenced this issue Apr 29, 2022
…fication (#1442)

* feat: return DatasetForTextClassification instead of list of records

* docs: update/improve weak label tutorials
@frascuchon frascuchon added this to Backlog in Release via automation May 4, 2022
@frascuchon frascuchon moved this from Backlog to Waiting Release in Release May 4, 2022
@frascuchon frascuchon moved this from Waiting Release to Ready to Release QA in Release May 4, 2022
frascuchon pushed a commit that referenced this issue May 4, 2022
…fication (#1442)

* feat: return DatasetForTextClassification instead of list of records

* docs: update/improve weak label tutorials

(cherry picked from commit 559593a)
@dcfidalgo dcfidalgo moved this from Ready to Release QA to Approved Release QA in Release May 5, 2022
frascuchon pushed a commit that referenced this issue May 10, 2022
…fication (#1442)

* feat: return DatasetForTextClassification instead of list of records

* docs: update/improve weak label tutorials

(cherry picked from commit 559593a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Release
Approved Release QA
Development

Successfully merging a pull request may close this issue.

1 participant