Skip to content

Conversation

@osanseviero
Copy link
Contributor

Third part of 4 (or 3?) for #83

This PR adds all tasks from tasks.json in datasets with latest update from @lhoestq in huggingface/datasets#4066

Internal change needed: hide the new types based on hideInModels

Screenshot from 2022-04-12 15-41-00

Some changes along with the PR

  • multimodal and time series modalities are added, although for time series we don't have yet any tasks for which we want to show models
  • we moved some CV and Audio tasks to multimodal
  • added a bunch of new tasks with hideInModels
  • changed TASKS_DATA and TASKS_MODEL_LIBRARIES to have same order as PIPELINE_TAGS_DISPLAY_ORDER since that really helps with diffs.

cc @lhoestq

audio: "Audio",
cv: "Computer Vision",
rl: "Reinforcement Learning",
time_series: "Time Series",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
time_series: "Time Series",
time_series: "Time Series",
structured: "Structured Data",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and actually I would advocate for time series to be inside structured

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the number of Time Series datasets/models is so small that it's maybe overkill to have a dedicated modality yet.

However in the long run I expect it to become separated to the other structured datasets we'll have. We may want to have a separate modality just for time series at one point anyway for classification, forecasting, anomaly detection, etc. As vision, audio and text modalities, time series require very specific preprocessing and model architectures.

Though I agree such datasets often come with structured metadata to help predictions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

individual tasks will still time-series related so I like using structured as an umbrella for a few different things, personally. Feels like a better level of generality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about table tasks? Technically it's also structured no? But we've kept them under NLP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since modality is not heavily used anywhere, I feel it's safe to change afterward if needed once we see more adoption. Let's go with structured and have everything together, so we don't have sections just pointing to a couple of models/datasets. Later on we can always split pragmatically.

osanseviero and others added 6 commits April 12, 2022 22:05
* This is not a Python repo anymore

* Update README.md

* rabbit hole: revert to package-lock.json lockfileVersion=1

(we use npm 6 stable for now)

* rabbit hole: let's try this?

* CI: Actually we should also build widgets in that case (they're broken currently)

cc @mishig25

* Fix for new `tabular-classification`

* `export-data.ts` endpoint

* ci: trigger JS Interfaces CI run

* Revert "ci: trigger JS Interfaces CI run"

This reverts commit 34ac3e9.

* move export-tasks to a simple script and run using `tsm`
@julien-c julien-c merged commit 7f94d83 into main Apr 13, 2022
@julien-c julien-c deleted the add-more-tasks branch April 13, 2022 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants