Add all tasks from datasets #95

osanseviero · 2022-04-12T14:34:50Z

Third part of 4 (or 3?) for #83

This PR adds all tasks from tasks.json in datasets with latest update from @lhoestq in huggingface/datasets#4066

Internal change needed: hide the new types based on hideInModels

Some changes along with the PR

multimodal and time series modalities are added, although for time series we don't have yet any tasks for which we want to show models
we moved some CV and Audio tasks to multimodal
added a bunch of new tasks with hideInModels
changed TASKS_DATA and TASKS_MODEL_LIBRARIES to have same order as PIPELINE_TAGS_DISPLAY_ORDER since that really helps with diffs.

cc @lhoestq

js/src/lib/interfaces/Types.ts

julien-c · 2022-04-12T14:59:59Z

js/src/lib/interfaces/Types.ts

+	audio:       "Audio",
+	cv:          "Computer Vision",
+	rl:          "Reinforcement Learning",
+	time_series: "Time Series",


Suggested change

time_series: "Time Series",

time_series: "Time Series",

structured: "Structured Data",

and actually I would advocate for time series to be inside structured

Maybe the number of Time Series datasets/models is so small that it's maybe overkill to have a dedicated modality yet.

However in the long run I expect it to become separated to the other structured datasets we'll have. We may want to have a separate modality just for time series at one point anyway for classification, forecasting, anomaly detection, etc. As vision, audio and text modalities, time series require very specific preprocessing and model architectures.

Though I agree such datasets often come with structured metadata to help predictions

individual tasks will still time-series related so I like using structured as an umbrella for a few different things, personally. Feels like a better level of generality

What about table tasks? Technically it's also structured no? But we've kept them under NLP

Since modality is not heavily used anywhere, I feel it's safe to change afterward if needed once we see more adoption. Let's go with structured and have everything together, so we don't have sections just pointing to a couple of models/datasets. Later on we can always split pragmatically.

js/src/lib/interfaces/Types.ts

tasks/src/const.ts

tasks/src/tasksData.ts

@mishig25

* This is not a Python repo anymore * Update README.md * rabbit hole: revert to package-lock.json lockfileVersion=1 (we use npm 6 stable for now) * rabbit hole: let's try this? * CI: Actually we should also build widgets in that case (they're broken currently) cc @mishig25 * Fix for new `tabular-classification` * `export-data.ts` endpoint * ci: trigger JS Interfaces CI run * Revert "ci: trigger JS Interfaces CI run" This reverts commit 34ac3e9. * move export-tasks to a simple script and run using `tsm`

osanseviero added 2 commits April 12, 2022 15:55

Sync some of the tasks from datasets

ac076aa

Align order

ca2d465

osanseviero requested review from beurkinger and julien-c April 12, 2022 14:34