# HuggingNERSCDataset API Use Example

How to use the HuggingNERSCDataset API to catalog datasets on the NERSC HuggingFace organization, and link to locations in the NERSC CFS.

In [1]:
from huggingnersc_dataset import HuggingNERSCDataset

## 1. Make a HuggingNERSCDataset object

Supply an official name to give the dataset (to be displayed on webpages, notebooks, etc.) and a nickname (to be used in directory names, etc.)

In [2]:
official_name = 'Iris'
nickname = 'iris'

hn_iris = HuggingNERSCDataset(official_name, nickname)

In [3]:
hn_iris

NERSC HuggingFace Dataset Object: 
        
 -official_name: Iris 
        
 -nickname: iris
        
 -huggingface location: https://huggingface.co/datasets/NERSC/iris/
        
 -nersc location: /global/cfs/cdirs/dasrepo/ai_ready_datasets/iris/

*Note: the above locations are not active until the directories in the local and huggingface NERSC repos are created*

## 2. Make the directories in the NERSC CFS and Huggingface repos

In [4]:
hn_iris.construct_repos()

  from .autonotebook import tqdm as notebook_tqdm


Repos will now be created at both of the locations described above. However, they will be empty. Next, we add a Jupyter notebook that can load the data.

## 3. Construct loader notebooks

In [5]:
hn_iris.construct_notebook('iris.csv')

You should now have a loader notebook in the CFS repository for your dataset.

## 4. Fill Huggingface README

Lastly, we want to populate the Huggingface README with all of our desired metadata. This will make it easier to search for datasets within the organization. This step also populates the readme with the loader code, as well as a link to the loader notebook we just created on the CFS.

In [6]:
metadata_readme = {'language':'en', 
                   'filename':'iris.csv', 
                   'tags':['pandas', 'csv', 'tabular'], 
                   'official_name':'Iris',
                   'nickname':'iris',
                   'size_bucket': 'n<1K'}

In [7]:
hn_iris.upload_readme(metadata_readme)