Skip to content
This repository has been archived by the owner on May 5, 2023. It is now read-only.

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
richarddwang committed Sep 1, 2020
2 parents 91ce40f + 6bd6a89 commit 6f9aea6
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 11 deletions.
23 changes: 14 additions & 9 deletions docs/source/start.rst
@@ -1,10 +1,12 @@
==================
Get Started
==================

-----------------
Base use case
-----------------

.. code-block::
::

>>> from nlp import load_dataset
>>> from hugdatafast import *
Expand All @@ -14,7 +16,7 @@ Base use case

Can you turn your data pipeline into only 3 lines ?

.. code-block::
::

>>> dataset = load_dataset('glue', 'cola')
-> {'train': nlp.Dataset, 'validation': nlp.Dataset, 'test': nlp.Dataset}
Expand All @@ -26,7 +28,7 @@ Now you can enjoy
1. :func:`show_batch` of fastai \n
Even you don't use fastai to train, you can still use as a normal DataLoader

.. code-block::
::

>>> dls.show_batch(max_n=2)
text_idxs label
Expand All @@ -39,7 +41,7 @@ Even you don't use fastai to train, you can still use as a normal DataLoader

2. Train model on the data using fastai, and also show the prediction

.. code-block::
::

>>> learn = Learner(dls, your_model, loss_func=CrossEntropyLossFlat())
>>> learn.fit(3)
Expand All @@ -54,13 +56,14 @@ Even you don't use fastai to train, you can still use as a normal DataLoader
3. Use it as normal Dataloaders if you don't use fastai .

.. code-block::
::

>>> train_dataloader, val_dataloader, test_dataloader = dls[0], dls[1], dls[2]
>>> for b in train_dataloader: break

------------------
Other use cases
----------------
------------------

1. Use your own dataset ?

Expand All @@ -71,16 +74,18 @@ Other use cases
2. Use custom tokenization or custom processing function ?
use :class:`HF_Transform`

.. code-block::
::

>>> def custom_tokenize(example):
... example['tok_ids'] = hf_tokenizer.encode(example['sentence1'], example['sentence2'])
... return example
>>> tokenized_rte = HF_Transform(rte, custom_tokenize).map()

----------------------------
``hugdatafast`` in practice
-------------------------
You can see how to use ``hugdatafast`` in real situations. Also, we're welcome you to share how you use
----------------------------

You can see how to use ``hugdatafast`` in the real situations. Also, You are welcome to share how you use
``hugdatafast`` in your project, contact me via github or twitter to put your project link here.

* `electra_pytorch <https://github.com/richarddwang/hugdatafast>`_ : Pretrain ELECTRA and finetune on GLUE benchmark
3 changes: 1 addition & 2 deletions hugdatafast/fastai.py
Expand Up @@ -434,8 +434,7 @@ def hf_merge_datasets(*datasets_s):
Dict[ str : :class:`HF_MergeDataset` ]
Example:
# Just for example, you may not concates rte and wnli datasets in real.
>>> rte, wnli = nlp.load_dataset('glue', 'rte'), nlp.load_dataset('glue', 'wnli')
>>> rte, wnli = nlp.load_dataset('glue', 'rte'), nlp.load_dataset('glue', 'wnli') # Just for example, you may not concates rte and wnli datasets in real.
# rte: {'train':Dataset(schema:{...,'sentence1':...,'sentence2':...}),'validation':...}, wnli: {'train':Dataset(schema:{...,'sentence1':...,'sentence2':...}),'validation':...
>>> merge_dsets = hf_merge_datasets(rte, wnli)
{'train': HF_MergedDataset, 'validation': HF_MergedDataset, 'test': HF_MergedDataset}
Expand Down

0 comments on commit 6f9aea6

Please sign in to comment.