**Download dataset using [Huggingface Datasets](https://github.com/huggingface/datasets)**  
<img src=https://raw.githubusercontent.com/huggingface/datasets/master/docs/source/imgs/datasets_logo_name.jpg width=500>

**Documentation:**  
* loading datasets: https://huggingface.co/docs/datasets/loading_datasets.html

**Install project requirements**

In [1]:
# !pip install -r requirements.txt

**Import libraries**

In [2]:
import pandas as pd
from datasets import list_datasets
from datasets import load_dataset

import os

**Get the list of available emotion classification datasets**

In [3]:
datasets_list = list_datasets()
emotion_datasets = list(filter(lambda x: "emotion" in x, datasets_list))
emotion_datasets

['emotion',
 'go_emotions',
 'Mansooreh/sharif-emotional-speech-dataset',
 'Pyjay/emotion_nl',
 'SetFit/go_emotions',
 'SetFit/emotion',
 'jakeazcona/short-text-labeled-emotion-classification',
 'jakeazcona/short-text-multi-labeled-emotion-classification',
 'mrm8488/goemotions',
 'pariajm/sharif_emotional_speech_dataset']

**Download the `SetFit/emotion` dataset**

In [4]:
dataset = load_dataset(path="SetFit/emotion")
dataset

Using custom data configuration SetFit--emotion-aa9fc3cfdd638933
Reusing dataset json (C:\Users\caded\.cache\huggingface\datasets\json\SetFit--emotion-aa9fc3cfdd638933\0.0.0\ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b)


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label', 'label_text'],
        num_rows: 16000
    })
    test: Dataset({
        features: ['text', 'label', 'label_text'],
        num_rows: 2000
    })
    validation: Dataset({
        features: ['text', 'label', 'label_text'],
        num_rows: 2000
    })
})

**Define custom function to donwload the complete dataset from Huggingface Datasets in a Pandas DataFrame format**

In [5]:
def download_dataset(path: str) -> pd.DataFrame:
    """
    Download complete dataset from huggingface datasets in pandas format.
    
    Parameters
    ----------
    path : str
        dataset path
    
    Returns
    -------
    pd.DataFrame
    """
    dfs = []
    datasets = load_dataset(path=path)
    for split, dataset in datasets.items():
        df = dataset.to_pandas() \
                    .assign(**{"split": split})
        dfs.append(df)
    return pd.concat(objs=dfs).reset_index(drop=True)

**Call the custom function to get the dataset**

In [6]:
df = download_dataset(path="SetFit/emotion")
df

Using custom data configuration SetFit--emotion-aa9fc3cfdd638933
Reusing dataset json (C:\Users\caded\.cache\huggingface\datasets\json\SetFit--emotion-aa9fc3cfdd638933\0.0.0\ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b)


  0%|          | 0/3 [00:00<?, ?it/s]

Unnamed: 0,text,label,label_text,split
0,i didnt feel humiliated,0,sadness,train
1,i can go from feeling so hopeless to so damned...,0,sadness,train
2,im grabbing a minute to post i feel greedy wrong,3,anger,train
3,i am ever feeling nostalgic about the fireplac...,2,love,train
4,i am feeling grouchy,3,anger,train
...,...,...,...,...
19995,im having ssa examination tomorrow in the morn...,0,sadness,validation
19996,i constantly worry about their fight against n...,1,joy,validation
19997,i feel its important to share this info for th...,1,joy,validation
19998,i truly feel that if you are passionate enough...,1,joy,validation


**Save DataFrame to ouput path in json format**

In [7]:
output_path = os.path.join("data", "raw")
os.makedirs(name=output_path, exist_ok=True)
output_file = os.path.join(output_path, "SetFit_emotion.json")
df.to_json(path_or_buf=output_file, orient="records", lines=True)