Skip to content

the data download script of the-stack-v2, which is the training data of StarCoder2.

License

Notifications You must be signed in to change notification settings

huangyangyu/starcoder_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Download Script of the-stack-v2 Dataset

Introduction

the-stack-v2 is the training data of starcoder v2. Whereas, the starcoder merely provides the metadata of its training dataset. To convinient for your usage of this dataset, I share this script for you to download the dataset directly.

Usage

You could apply the below commondline for dataset download directly. The only thing you need to do is to set your huggingface access token through the --hug_access_token parameter.

python download_the_stack_v2.py --hug_access_token {your_huggingface_access_token}

About

the data download script of the-stack-v2, which is the training data of StarCoder2.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages