# How to load datasets from Hugging Face Datasets

- toc: true 
- badges: true
- categories: [Hugging Face, Data]
- permalink: /how-to-load-datasets-from-hugging-face-datasets/


<br><br>
The [Hugging Face Datasets](https://github.com/huggingface/datasets) makes thousands of datasets available that can be found on the [Hub](https://huggingface.co/datasets). Check if there's any dataset you would like to try out!
<br><br>
In this tutorial, we will load the [agnews](https://huggingface.co/datasets/ag_news#data-fields) dataset, a collection of more than 1 million news articles on four categories: world, sports, business, sci/tech.


## 1. Install the datasets package
- see the [installation guide](https://huggingface.co/docs/datasets/installation) for more information

In [1]:
#collapse-output
!pip install datasets

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## 2. Loading the dataset
The agnews dataset has the data fields of:
- text: a string feature.
- label: a classification label, with possible values including World (0), Sports (1), Business (2), Sci/Tech (3).

In [2]:
#collapse-output
from datasets import load_dataset
agnews = load_dataset('ag_news')
agnews

Using custom data configuration default
Reusing dataset ag_news (/root/.cache/huggingface/datasets/ag_news/default/0.0.0/bc2bcb40336ace1a0374767fc29bb0296cdaf8a6da7298436239c54d79180548)


  0%|          | 0/2 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 120000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 7600
    })
})

## 3. (Optional) Convert a Dataset object to a Pandas DataFrame 
- When we're dealing with data, it's often more convenient to use a DataFrame.

In [3]:
# from Datasets to Pandas DataFrames
agnews.set_format(type="pandas")
train_df = agnews['train'][:]
train_df.head()

Unnamed: 0,text,label
0,Wall St. Bears Claw Back Into the Black (Reute...,2
1,Carlyle Looks Toward Commercial Aerospace (Reu...,2
2,Oil and Economy Cloud Stocks' Outlook (Reuters...,2
3,Iraq Halts Oil Exports from Main Southern Pipe...,2
4,"Oil prices soar to all-time record, posing new...",2


<br><br>
That's it! Now you have the dataset at your disposal.
Check out the official [doc](https://huggingface.co/docs/datasets/v1.2.1/loading_datasets.html) for more information! 