## Load a dataset
Load a dataset from hugging face or local and convert into Ray Dataset. A Ray cluster automatically initialized on local or on Anyscale platform. You can also use **ray.init()** To explicitly create or connect to an existing Ray cluster.

https://docs.ray.io/en/latest/ray-core/api/doc/ray.init.html#ray.init

In [3]:
# load a Hugging Face dataset
hf_dataset = load_dataset("cardiffnlp/tweet_eval", "sentiment", split="train")
# Convert the Hugging Face dataset to a Ray Dataset
ds = ray.data.from_huggingface(hf_dataset).repartition(2) # repartition to 2 blocks for parallel processing. Not necessary if already partitioned due to the size of the dataset.

2025-07-11 06:47:51,776	INFO worker.py:1908 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m


In [4]:
# dataset metadata
print(ds)

Repartition
+- Dataset(num_rows=45615, schema={text: string, label: int64})


In [5]:
# show the first 10 rows
# Each row has "text" and "label"
ds.show(10)

2025-07-11 06:49:24,021	INFO dataset.py:3046 -- Tip: Use `take_batch()` instead of `take() / show()` to return records in pandas or numpy batch format.
2025-07-11 06:49:24,031	INFO logging.py:295 -- Registered dataset logger for dataset dataset_2_0
2025-07-11 06:49:24,056	INFO streaming_executor.py:117 -- Starting execution of Dataset dataset_2_0. Full logs are in /tmp/ray/session_2025-07-11_06-47-50_390429_98374/logs/ray-data
2025-07-11 06:49:24,056	INFO streaming_executor.py:118 -- Execution plan of Dataset dataset_2_0: InputDataBuffer[Input] -> AllToAllOperator[Repartition] -> LimitOperator[limit=10]


Running 0: 0.00 row [00:00, ? row/s]

- Repartition 1: 0.00 row [00:00, ? row/s]

Split Repartition 2:   0%|                                                                                    …

- limit=10 3: 0.00 row [00:00, ? row/s]

2025-07-11 06:49:24,378	INFO streaming_executor.py:227 -- ✔️  Dataset dataset_2_0 execution finished in 0.32 seconds


{'text': '"QT @user In the original draft of the 7th book, Remus Lupin survived the Battle of Hogwarts. #HappyBirthdayRemusLupin"', 'label': 2}
{'text': '"Ben Smith / Smith (concussion) remains out of the lineup Thursday, Curtis #NHL #SJ"', 'label': 1}
{'text': 'Sorry bout the stream last night I crashed out but will be on tonight for sure. Then back to Minecraft in pc tomorrow night.', 'label': 1}
{'text': "Chase Headley's RBI double in the 8th inning off David Price snapped a Yankees streak of 33 consecutive scoreless innings against Blue Jays", 'label': 1}
{'text': '@user Alciato: Bee will invest 150 million in January, another 200 in the Summer and plans to bring Messi by 2017"', 'label': 2}
{'text': "@user LIT MY MUM 'Kerry the louboutins I wonder how many Willam owns!!! Look Kerry Warner Wednesday!'", 'label': 2}
{'text': '"\\"""" SOUL TRAIN\\"""" OCT 27 HALLOWEEN SPECIAL ft T.dot FINEST rocking the mic...CRAZY CACTUS NIGHT CLUB ..ADV ticket $10 wt out costume $15..."', 'label': 