Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /first-rows endpoint #431

Merged
merged 38 commits into from
Jul 19, 2022
Merged

Add /first-rows endpoint #431

merged 38 commits into from
Jul 19, 2022

Conversation

severo
Copy link
Collaborator

@severo severo commented Jun 30, 2022

No description provided.

It is a lot simpler in the sense that we simply store HTTP responses,
with a lot less logic than in cache.
Also: fix an issue to ensure the datetime always carry the timezone
@severo severo changed the title Create first rows Add /first-rows endpoint Jun 30, 2022
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 30, 2022

The documentation is not available anymore as the PR was closed or merged.

severo added 24 commits June 30, 2022 15:56
using the new version of libcache and libqueue
also make the logs format a bit more coherent
comparing to 2.3.2, it provides: a) timestamp cast to datetime, b)
features with inference in streaming mode:
IterableDataset._resolve_features()
thanks to _resolve_features(). Props to @lhoestq
/split-next will replace /splits (new implementation) while /first-rows
will replace /rows (new implementation and new route)
also add e2e test for these two endpoints
@severo
Copy link
Collaborator Author

severo commented Jul 19, 2022

I deployed it on the ephemeral cluster:

slesage@aws-dev-sylvain-lesage  ~/hf/datasets-server   create-first-rows  kubectx
Switched to context "arn:aws:eks:us-east-1:707930574880:cluster/hub-ephemeral".
slesage@aws-dev-sylvain-lesage  ~/hf/datasets-server   create-first-rows  kubens
Context "arn:aws:eks:us-east-1:707930574880:cluster/hub-ephemeral" modified.
Active namespace is "hub".
slesage@aws-dev-sylvain-lesage  ~/hf/datasets-server   create-first-rows  cd infra/charts/datasets-server
slesage@aws-dev-sylvain-lesage  ~/hf/datasets-server/infra/charts/datasets-server   create-first-rows  make init
helm dependency update .                                                                                                                                                                                                                                      Getting updates for unmanaged Helm repositories...
...Successfully got an update from the "https://charts.bitnami.com/bitnami" chart repository
Saving 1 charts
Deleting outdated charts
 ✘ slesage@aws-dev-sylvain-lesage  ~/hf/datasets-server/infra/charts/datasets-server   create-first-rows  make upgrade-dev
make[1]: Entering directory '/home/slesage/hf/datasets-server/infra/charts/datasets-server'
helm upgrade --install datasets-server-dev . --values docker-images.yaml --values env/dev.yaml -n hub
Release "datasets-server-dev" does not exist. Installing it now.
NAME: datasets-server-dev
LAST DEPLOYED: Tue Jul 19 20:30:29 2022
NAMESPACE: hub
STATUS: deployed
REVISION: 1
TEST SUITE: None
make[1]: Leaving directory '/home/slesage/hf/datasets-server/infra/charts/datasets-server'                                                                                                                                                                     slesage@aws-dev-sylvain-lesage  ~/hf/datasets-server/infra/charts/datasets-server   create-first-rows  k get pod
NAME                                                              READY   STATUS              RESTARTS          AGE
datasets-server-dev-admin-b7b899b4c-n7f28                         0/1     Init:0/1            0                 8s
datasets-server-dev-api-6746fdf75c-wws2m                          0/1     Init:0/1            0                 8s
datasets-server-dev-mongodb-0                                     0/1     ContainerCreating   0                 8s
datasets-server-dev-reverse-proxy-54c447f67d-x5vb8                0/1     Init:0/1            0                 8s
datasets-server-dev-worker-datasets-54cf84cd8d-8hvl4              0/1     Init:2/3            0                 8s
datasets-server-dev-worker-datasets-54cf84cd8d-gvnsl              0/1     Init:0/3            0                 8s
datasets-server-dev-worker-first-rows-77d9f578bd-gcbz6            0/1     Init:0/3            0                 8s
datasets-server-dev-worker-first-rows-77d9f578bd-gvrjd            0/1     Init:2/3            0                 8s
datasets-server-dev-worker-first-rows-77d9f578bd-w2hqf            0/1     Init:0/3            0                 8s
datasets-server-dev-worker-first-rows-77d9f578bd-wlxjf            0/1     Init:0/3            0                 8s
datasets-server-dev-worker-first-rows-77d9f578bd-z25c9            0/1     Init:2/3            0                 8s
datasets-server-dev-worker-splits-7b4cbd848d-lgc2r                0/1     PodInitializing     0                 8s
datasets-server-dev-worker-splits-7b4cbd848d-r4gq2                0/1     Pending             0                 8s
datasets-server-dev-worker-splits-7b4cbd848d-sjwbc                0/1     Init:0/3            0                 8s
datasets-server-dev-worker-splits-7b4cbd848d-t8k7c                0/1     Init:2/3            0                 8s
datasets-server-dev-worker-splits-7b4cbd848d-wv8hq                0/1     Init:1/3            0                 8s
datasets-server-dev-worker-splits-next-67c6c5d995-9xdvf           0/1     Init:2/3            0                 8s
datasets-server-dev-worker-splits-next-67c6c5d995-kt94g           0/1     Init:0/3            0                 8s
hub-pr-2658-rename-long-repos-6645bb749c-fnxv8                    1/1     Running             0                 29m
hub-pr-2983-make-a-pr-on-mongodbs-driver-to-update-u-6455crfwmg   1/1     Running             0                 10h
hub-pr-3488-notifications-emails-add-a-link-to-setti-849f7lr7vq   1/1     Running             0                 5h53m
hub-pr-only-show-non-empty-eval-results-784f56b997-pqf5d          1/1     Running             0                 10m
hub-pr-shorter-reconnect-f666f964-vwpkv                           1/1     Running             140 (6m19s ago)   35h
hub-pr-spaces-card-overflow-7d847956-7n5kw                        1/1     Running             0                 3h29m

Then, once the pods have been initiated, I ensured the reverse-proxy and the API is responsive

 slesage@aws-dev-sylvain-lesage  ~/hf/datasets-server/infra/charts/datasets-server   create-first-rows  curl https://datasets-server.us.dev.moon.huggingface.tech/healthcheck
ok%

Then I launched a webhook for the dataset https://huggingface.co/datasets/wikimedia/wit_base:

 slesage@aws-dev-sylvain-lesage  ~/hf/datasets-server/infra/charts/datasets-server   create-first-rows  curl -X POST https://datasets-server.us.dev.moon.huggingface.tech/webhook -H 'Content-Type: application/json' -d '{"update": "datasets/wikimedia/wit_base"}'
{"status":"ok"}%                                                                                                                                                                                                                                              

After some time, the new /splits-next endpoint works as expected:

 slesage@aws-dev-sylvain-lesage  ~/hf/datasets-server/infra/charts/datasets-server   create-first-rows  curl https://datasets-server.us.dev.moon.huggingface.tech/splits-next\?dataset\=wikimedia/wit_base
{"splits":[{"dataset_name":"wikimedia/wit_base","config_name":"wikimedia--wit_base","split_name":"train","num_bytes":329313891794,"num_examples":6477255}]}%

And also the new /first-rows endpoint:

https://datasets-server.us.dev.moon.huggingface.tech/first-rows?dataset=wikimedia/wit_base&config=wikimedia--wit_base&split=train

Capture d’écran 2022-07-19 à 16 49 44

It has the features field, instead of columns in the former /rows endpoint (which is still working, until we migrate moonlanding)

Also: note that the image URL are no more relative but full, eg https://datasets-server.us.dev.moon.huggingface.tech/assets/wikimedia/wit_base/--/wikimedia--wit_base/train/0/image/image.jpg that resolves to the image on the same API:

Capture d’écran 2022-07-19 à 16 51 04

The former endpoints, still working for backwards compatibility:

@severo severo marked this pull request as ready for review July 19, 2022 20:51
@severo
Copy link
Collaborator Author

severo commented Jul 19, 2022

This PR is too awfully big to ask for a review, so I'm merging (the CI is green, and my manual tests on ephemeral are OK)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants