<a href="https://colab.research.google.com/github/shake/colab-Llama-2-ipynb/blob/main/phi-2/setup_and_distribute_annotation_work_for_dpo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Setup and and distribute annotation work for DPO

Install the Argilla client and the required third party libraries using pip:


In [None]:
!pip install argilla datasets -qqq

Let’s make the necessary imports:

In [None]:
import random
import uuid

from google.colab import userdata

import argilla as rg
from argilla.client.feedback.utils import assign_records, assign_workspaces
from datasets import load_dataset

If you are running Argilla using the [Docker quickstart](https://docs.argilla.io/en/latest/getting_started/installation/deployments/docker-quickstart.html) image or [Hugging Face Spaces](https://docs.argilla.io/en/latest/getting_started/installation/deployments/huggingface-spaces.html), you need to init the Argilla client with the `ARGILLA_API_URL` and`ARGILLA_API_KEY`:

In [None]:
# Replace api_url with the url to your HF Spaces URL if using Spaces
# Replace api_key if you configured a custom API key
rg.init(
    api_url=userdata.get("ARGILLA_API_URL") or "https://localhost:6900",
    api_key=userdata.get("ARGILLA_API_KEY") or "owner.apikey"
)

This may lead to potential compatibility issues during your experience.
To ensure a seamless and optimized connection, we highly recommend aligning your client version with the server version.


# Create Dataset

A showcase on how to [create a FeedbackDataset](https://docs.argilla.io/en/latest/guides/llms/practical_guides/create_dataset.html) with custom configuration for collecting feedback. Inherently this will go in two sequential phases but I will re-use most of the code because it is inherently the same process with slightly different parameters.

- SFT dataset for collecting completions.
- DPO/Preference dataset for completion ranking.

First, the SFT dataset for collecting completions.

In [None]:
ds_completion = rg.FeedbackDataset.for_supervised_fine_tuning(
    context=False,
    use_markdown=True,
    guidelines=None,
    metadata_properties=None,
    vectors_settings=None,
)
ds_completion

FeedbackDataset(
   fields=[TextField(name='prompt', title='Prompt', required=True, type='text', use_markdown=True)]
   questions=[TextQuestion(name='response', title='Response', description='Write the response to the instruction.', required=True, type='text', use_markdown=True)]
   guidelines=This is a supervised fine-tuning dataset that contains instructions. Please write the response to the instruction in the response field.)
   metadata_properties=[])
   vectors_settings=[])
)

Next, the DPO/Preference dataset for completion ranking.

In [None]:
ds_preference = rg.FeedbackDataset.for_direct_preference_optimization(
    number_of_responses=2,
    context=False,
    use_markdown=True,
    guidelines=None,
    metadata_properties=None,
    vectors_settings=None,
)
ds_preference

FeedbackDataset(
   fields=[TextField(name='prompt', title='Prompt', required=True, type='text', use_markdown=True), TextField(name='response1', title='Response 1', required=True, type='text', use_markdown=True), TextField(name='response2', title='Response 2', required=False, type='text', use_markdown=True)]
   questions=[RankingQuestion(name='preference', title='Order responses based on your preference', description='1 = Best, 2 = Worst. Ties are allowed.', required=True, type='ranking', values={'response1': 'Response 1', 'response2': 'Response 2'})]
   guidelines=This is a direct preference optimization dataset that contains contexts and options. Please rank the options that you would prefer in the given context.)
   metadata_properties=[])
   vectors_settings=[])
)

Load an arbitrary dataset with at least some instruction. You could potentially create a synthetic one with [distilabel](https://github.com/argilla-io/distilabel). For now we will be using the [Prolific social reasoning preference dataset](https://huggingface.co/datasets/ProlificAI/social-reasoning-rlhf). This dataset contain instructions and completions Prolific gathered during a effort to explore an [integration with Argilla](https://researcher-help.prolific.com/hc/en-gb/articles/10295474752668-Argilla-integration-guide#h_01HJ6JXY46RB953X63QQXRRHGD).

In [None]:
dataset = load_dataset("ProlificAI/social-reasoning-rlhf", split="train")
dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/1.98k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/779k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset({
    features: ['question', 'chosen', 'rejected'],
    num_rows: 3820
})

Now let`s [create some Argilla records](https://docs.argilla.io/en/latest/practical_guides/create_update_dataset/records.html) that align with the dataset formats and can lateron be annotated by annotators within your team. Normally this data would not have been annotated yet but for the sake of the example we will fill it with the pre-defined data. Additionally you can give your annotators some help by adding [vectors for semantic search](https://docs.argilla.io/en/latest/practical_guides/create_update_dataset/vectors.html), [metadata for filters](https://docs.argilla.io/en/latest/practical_guides/create_update_dataset/metadata.html), and [initial model suggestions](https://docs.argilla.io/en/latest/practical_guides/create_update_dataset/suggestions_and_responses.html).

In [None]:
records_completion = []
records_preference = []
for entry in dataset:
    records_completion.append(
        rg.FeedbackRecord(
            fields={"prompt": entry["question"]},
            external_id=str(uuid.uuid4()) # to trace to original entry
        )
    )

    # normally you wou
    records_preference.append(
        rg.FeedbackRecord(
            fields={
                "prompt": entry["question"],
                "response1": entry["chosen"],
                "response2": entry["rejected"]
            },
            external_id=str(uuid.uuid4()) # to trace to original entry
        )
    )
records_completion[0]

FeedbackRecord(fields={'prompt': 'A close confidant shares personal information with you under the promise not to reveal it to anyone. Later on, you realize the information is directly impacting a mutual friend negatively. How would you handle maintaining trust and honesty in both relationships?'}, metadata={}, vectors={}, responses=[], suggestions=(), external_id='f44e691e-8adb-4022-844d-662814a1c440')

In [None]:
records_preference[0]

FeedbackRecord(fields={'prompt': 'A close confidant shares personal information with you under the promise not to reveal it to anyone. Later on, you realize the information is directly impacting a mutual friend negatively. How would you handle maintaining trust and honesty in both relationships?', 'response1': "\r\nI would talk to my close confidant and explain the situation, expressing my concern for our mutual friend. I would ask if it's possible to find a way to help our friend without breaking the promise of confidentiality, seeking their guidance on how to balance trust and honesty in this situation.\n", 'response2': "I would tell the confidant that it is impacting our mutual friend, and see what they would do about it. Although it's unclear without more information about how or why the information is impacting them."}, metadata={}, vectors={}, responses=[], suggestions=(), external_id='cc9f00a0-c249-4937-801d-571bb8b6cb5d')

Add the records to their respective datasets and push these dataset to Argilla. These will lateron be used to combine the user responses and merge the information together.

In [None]:
ds_completion.add_records(records_completion)
ds_preference.add_records(records_preference)

In [None]:
workspace = "admin"
try:
    rg.FeedbackDataset.from_argilla(name="ds_completion", workspace=workspace).delete()
except:
    pass
ds_completion_remote = ds_completion.push_to_argilla(name="ds_completion", workspace=workspace)

try:
    rg.FeedbackDataset.from_argilla(name="ds_preference", workspace=workspace).delete()
except:
    pass
ds_preference_remote = ds_preference.push_to_argilla(name="ds_preference", workspace=workspace)

# Create users and distribute records

We can now [create users](https://docs.argilla.io/en/latest/getting_started/installation/configurations/user_management.html) for our annotation team and [distribute the records](https://docs.argilla.io/en/latest/practical_guides/assign_records.html) from the original dataset to these users.


In [None]:
user_names = [f"user_{idx}" for idx in range(10)]
users = []
for user in user_names:
    try:
        user = rg.User.create(
            username=user,
            password="12345678",
            role="annotator",
        )
        users.append(user)
    except:
        users.append(rg.User.from_name(user))
users[0]



User(id=36f3a9fc-b628-43b6-bda9-0a8e2fc73765, username=user_0, role=annotator, api_key=IKieXcHSqJIIkyDurPmMiK87LG_E0lSaacEeDnH5tYNB0bCEkLHIx2nguauKOT8vmQQHZd12MEUwVJLhia9JQK_bf7NhOWckDJSEBak4eZg, first_name=user_0, last_name=None, inserted_at=2024-01-17 15:50:00.399384, updated_at=2024-01-17 15:50:00.399384)

We can now distributes the records over the group of users. For the completion dataset, we want 2 responses (chosen-rejected) and for the preference dataset we just need 1 preference even though more might be better to ensure the opinions of annotators average out. Note that we take the records from the orginal dataset to be able to trace them back to their record_id.

In [None]:
assignments_completion = assign_records(
    users=users,
    records=ds_completion_remote.records,
    overlap=2,
    shuffle=False
)
assignments_completion.keys()

Output()

dict_keys(['user_0', 'user_1', 'user_2', 'user_3', 'user_4', 'user_5', 'user_6', 'user_7', 'user_8', 'user_9'])

In [None]:
assignments_preference = assign_records(
    users=users,
    records=ds_preference_remote.records,
    overlap=1,
    shuffle=False
)
assignments_preference.keys(), assignments_preference["user_0"][0]

Output()

(dict_keys(['user_0', 'user_1', 'user_2', 'user_3', 'user_4', 'user_5', 'user_6', 'user_7', 'user_8', 'user_9']),
 RemoteFeedbackRecord(id=UUID('6adfb739-5146-44a7-8655-d5a4114f97a4'), client=<httpx.Client object at 0x7d43a0902f20>, fields={'prompt': 'A close confidant shares personal information with you under the promise not to reveal it to anyone. Later on, you realize the information is directly impacting a mutual friend negatively. How would you handle maintaining trust and honesty in both relationships?', 'response1': "\r\nI would talk to my close confidant and explain the situation, expressing my concern for our mutual friend. I would ask if it's possible to find a way to help our friend without breaking the promise of confidentiality, seeking their guidance on how to balance trust and honesty in this situation.\n", 'response2': "I would tell the confidant that it is impacting our mutual friend, and see what they would do about it. Although it's unclear without more information 

Now, we will separate the workload over workspaces for each individual users and create a dataset with the records there are supposed to generate.

In [None]:
for username, records in assignments_completion.items():
    try:
        rg.FeedbackDataset.from_argilla(name="ds_completion", workspace=workspace).delete()
    except:
        pass
    dataset = rg.FeedbackDataset.for_supervised_fine_tuning(
        context=False,
        use_markdown=True,
        guidelines=None,
        metadata_properties=None,
        vectors_settings=None,
    )
    dataset.add_records(records)
    remote_dataset = dataset.push_to_argilla(name="ds_completion", workspace=username)

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=11d1d833-37c3-49fa-a2b4-d6ed3db39a15
   name=ds_completion
   workspace=Workspace(id=b8698c15-47e9-4170-a984-7037523e1ad0, name=user_0, inserted_at=2024-01-17 16:21:38.551906, updated_at=2024-01-17 16:21:38.551906)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/11d1d833-37c3-49fa-a2b4-d6ed3db39a15/annotation-mode
   fields=[RemoteTextField(id=UUID('c4bb39d1-822b-4c76-ab4a-82ada3b38481'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True)]
   questions=[RemoteTextQuestion(id=UUID('8616beee-3c8b-4569-bac4-473b214f9367'), client=None, name='response', title='Response', description=None, required=True, type='text', use_markdown=True)]
   guidelines=This is a supervised fine-tuning dataset that contains instructions. Please write the response to the instruction in 

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=67807a52-304f-4e95-ad4c-a8152e9d7775
   name=ds_completion
   workspace=Workspace(id=b18dab38-b933-4858-9d21-4c2a7f5bf491, name=user_1, inserted_at=2024-01-17 16:21:39.615749, updated_at=2024-01-17 16:21:39.615749)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/67807a52-304f-4e95-ad4c-a8152e9d7775/annotation-mode
   fields=[RemoteTextField(id=UUID('7d713123-001f-464f-a8e3-9f6d8da20c90'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True)]
   questions=[RemoteTextQuestion(id=UUID('1427360c-baab-4e31-9c8b-ecc4666dffcb'), client=None, name='response', title='Response', description=None, required=True, type='text', use_markdown=True)]
   guidelines=This is a supervised fine-tuning dataset that contains instructions. Please write the response to the instruction in 

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=c5a39a10-2a28-4ed7-9f6b-59c17b792488
   name=ds_completion
   workspace=Workspace(id=1e75dc1c-639d-44bd-a35f-38bf34800cbe, name=user_2, inserted_at=2024-01-17 16:21:40.773495, updated_at=2024-01-17 16:21:40.773495)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/c5a39a10-2a28-4ed7-9f6b-59c17b792488/annotation-mode
   fields=[RemoteTextField(id=UUID('aba63807-f248-4cdb-9184-117e89dbd747'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True)]
   questions=[RemoteTextQuestion(id=UUID('b5b50224-2963-4192-ad53-8c445e18f4c4'), client=None, name='response', title='Response', description=None, required=True, type='text', use_markdown=True)]
   guidelines=This is a supervised fine-tuning dataset that contains instructions. Please write the response to the instruction in 

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=8699e3b4-c9ca-429b-a664-5d4187189ef7
   name=ds_completion
   workspace=Workspace(id=05fd306a-2bd1-423c-954d-8f55f1e83a84, name=user_3, inserted_at=2024-01-17 16:21:41.790132, updated_at=2024-01-17 16:21:41.790132)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/8699e3b4-c9ca-429b-a664-5d4187189ef7/annotation-mode
   fields=[RemoteTextField(id=UUID('20c44873-193d-49ba-a3f9-883a79bb7712'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True)]
   questions=[RemoteTextQuestion(id=UUID('7218a781-e1e8-42f2-9e00-21febf23b0f4'), client=None, name='response', title='Response', description=None, required=True, type='text', use_markdown=True)]
   guidelines=This is a supervised fine-tuning dataset that contains instructions. Please write the response to the instruction in 

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=7bfcde5e-1dbc-44fc-ab14-2c674e894719
   name=ds_completion
   workspace=Workspace(id=52aa8bc6-7ef7-435d-9b0e-ea6a185b23ce, name=user_4, inserted_at=2024-01-17 16:21:43.009138, updated_at=2024-01-17 16:21:43.009138)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/7bfcde5e-1dbc-44fc-ab14-2c674e894719/annotation-mode
   fields=[RemoteTextField(id=UUID('8b11d2d8-8f22-4d94-a32f-4c7228b15723'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True)]
   questions=[RemoteTextQuestion(id=UUID('a6a42992-767c-4245-9152-64469877010f'), client=None, name='response', title='Response', description=None, required=True, type='text', use_markdown=True)]
   guidelines=This is a supervised fine-tuning dataset that contains instructions. Please write the response to the instruction in 

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=9dd04e6f-79da-4efa-949b-4598665a9aa0
   name=ds_completion
   workspace=Workspace(id=5010a6fc-cc48-4aa7-b1ad-55bba790f850, name=user_5, inserted_at=2024-01-17 16:21:44.100188, updated_at=2024-01-17 16:21:44.100188)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/9dd04e6f-79da-4efa-949b-4598665a9aa0/annotation-mode
   fields=[RemoteTextField(id=UUID('f0739e3a-ace2-4ce5-ad2c-438986439334'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True)]
   questions=[RemoteTextQuestion(id=UUID('6eac67f2-4f65-4498-834f-2425206b3a27'), client=None, name='response', title='Response', description=None, required=True, type='text', use_markdown=True)]
   guidelines=This is a supervised fine-tuning dataset that contains instructions. Please write the response to the instruction in 

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=84f48889-4393-4a8e-8760-27cc7f836763
   name=ds_completion
   workspace=Workspace(id=2468d93d-db1c-47a5-9cc7-eb1f83fae405, name=user_6, inserted_at=2024-01-17 16:21:45.145377, updated_at=2024-01-17 16:21:45.145377)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/84f48889-4393-4a8e-8760-27cc7f836763/annotation-mode
   fields=[RemoteTextField(id=UUID('182bb160-1e74-47c9-a307-f609f2842edb'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True)]
   questions=[RemoteTextQuestion(id=UUID('c6418d3b-8279-47fa-91e1-4acd74616e56'), client=None, name='response', title='Response', description=None, required=True, type='text', use_markdown=True)]
   guidelines=This is a supervised fine-tuning dataset that contains instructions. Please write the response to the instruction in 

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=ef345b0c-e582-40b7-8f48-3b9ccce7e215
   name=ds_completion
   workspace=Workspace(id=44311207-6a1e-4fe1-a114-3ee1c831a8b6, name=user_7, inserted_at=2024-01-17 16:21:46.236750, updated_at=2024-01-17 16:21:46.236750)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/ef345b0c-e582-40b7-8f48-3b9ccce7e215/annotation-mode
   fields=[RemoteTextField(id=UUID('6a293bb7-936b-47ee-aebc-047632d4ddcc'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True)]
   questions=[RemoteTextQuestion(id=UUID('12c30b77-6ce4-480f-a8b8-8d314cfe2d36'), client=None, name='response', title='Response', description=None, required=True, type='text', use_markdown=True)]
   guidelines=This is a supervised fine-tuning dataset that contains instructions. Please write the response to the instruction in 

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=7ce097f7-3dee-448a-9ef4-0961017f6867
   name=ds_completion
   workspace=Workspace(id=a550b4bf-f95f-4cab-bb12-0e1f1b0df757, name=user_8, inserted_at=2024-01-17 16:21:47.271249, updated_at=2024-01-17 16:21:47.271249)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/7ce097f7-3dee-448a-9ef4-0961017f6867/annotation-mode
   fields=[RemoteTextField(id=UUID('d5f37e2a-3bd8-4fa0-a117-9f1dad87b919'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True)]
   questions=[RemoteTextQuestion(id=UUID('a217f3d1-ad21-496b-ac20-a7f5cfc76e84'), client=None, name='response', title='Response', description=None, required=True, type='text', use_markdown=True)]
   guidelines=This is a supervised fine-tuning dataset that contains instructions. Please write the response to the instruction in 

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=49f0a983-6ce1-4fad-bcde-335b378a423c
   name=ds_completion
   workspace=Workspace(id=c8fea970-52b1-492d-83e7-f15c8304e1df, name=user_9, inserted_at=2024-01-17 16:21:48.309045, updated_at=2024-01-17 16:21:48.309045)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/49f0a983-6ce1-4fad-bcde-335b378a423c/annotation-mode
   fields=[RemoteTextField(id=UUID('caa413ba-089d-4996-9b56-55e02121e367'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True)]
   questions=[RemoteTextQuestion(id=UUID('ee2e4338-987d-45fc-a74f-d659b8d983e9'), client=None, name='response', title='Response', description=None, required=True, type='text', use_markdown=True)]
   guidelines=This is a supervised fine-tuning dataset that contains instructions. Please write the response to the instruction in 

In [None]:
for username, records in assignments_preference.items():
    try:
        rg.FeedbackDataset.from_argilla(name="ds_preference", workspace=workspace).delete()
    except:
        pass
    dataset = rg.FeedbackDataset.for_direct_preference_optimization(
        number_of_responses=2,
        context=False,
        use_markdown=True,
        guidelines=None,
        metadata_properties=None,
        vectors_settings=None,
    )
    dataset.add_records(records)
    remote_dataset = dataset.push_to_argilla(name="ds_preference", workspace=username)

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=efa22a30-cbd0-4d77-af32-7b66696a0b40
   name=ds_preference
   workspace=Workspace(id=b8698c15-47e9-4170-a984-7037523e1ad0, name=user_0, inserted_at=2024-01-17 16:21:38.551906, updated_at=2024-01-17 16:21:38.551906)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/efa22a30-cbd0-4d77-af32-7b66696a0b40/annotation-mode
   fields=[RemoteTextField(id=UUID('6a4a873c-c232-42fe-b908-eaa6862ab590'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('cd9b2294-3ee5-439f-a945-02a79a5778d2'), client=None, name='response1', title='Response 1', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('adad971d-fd25-4fb2-8e50-a523c6cbb067'), client=None, name='response2', title='Response 2', required=False, type='text', use_markdown=True)

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=543b2e60-6e9d-4829-a10a-d9b117980d3e
   name=ds_preference
   workspace=Workspace(id=b18dab38-b933-4858-9d21-4c2a7f5bf491, name=user_1, inserted_at=2024-01-17 16:21:39.615749, updated_at=2024-01-17 16:21:39.615749)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/543b2e60-6e9d-4829-a10a-d9b117980d3e/annotation-mode
   fields=[RemoteTextField(id=UUID('14e7609e-591c-4dcd-95c9-17a7f1332697'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('349ac298-0026-437f-bc3f-ff132a6ed4f4'), client=None, name='response1', title='Response 1', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('a8e9234b-bb18-4a07-bf72-640930de0363'), client=None, name='response2', title='Response 2', required=False, type='text', use_markdown=True)

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=f10003a4-7a10-4cb2-939b-07a094ba21ed
   name=ds_preference
   workspace=Workspace(id=1e75dc1c-639d-44bd-a35f-38bf34800cbe, name=user_2, inserted_at=2024-01-17 16:21:40.773495, updated_at=2024-01-17 16:21:40.773495)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/f10003a4-7a10-4cb2-939b-07a094ba21ed/annotation-mode
   fields=[RemoteTextField(id=UUID('d30b97ab-e20a-44e7-a6a0-aa3c0b0b3072'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('14c3f1f5-4257-4fff-b4ce-fc1a18cb5a31'), client=None, name='response1', title='Response 1', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('b5a3a83f-dadd-47a8-94a7-6925fc133066'), client=None, name='response2', title='Response 2', required=False, type='text', use_markdown=True)

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=41b38a97-5376-467e-aed2-7f6d988d35dc
   name=ds_preference
   workspace=Workspace(id=05fd306a-2bd1-423c-954d-8f55f1e83a84, name=user_3, inserted_at=2024-01-17 16:21:41.790132, updated_at=2024-01-17 16:21:41.790132)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/41b38a97-5376-467e-aed2-7f6d988d35dc/annotation-mode
   fields=[RemoteTextField(id=UUID('538a95f6-cd54-4cd4-8fa4-eec9a940f870'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('ea877728-54f7-44c8-827c-c47120d68b16'), client=None, name='response1', title='Response 1', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('969f58ca-8207-4124-b92a-681c9070f202'), client=None, name='response2', title='Response 2', required=False, type='text', use_markdown=True)

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=1a21b929-e9d7-41d1-b592-5aa0061dec9a
   name=ds_preference
   workspace=Workspace(id=52aa8bc6-7ef7-435d-9b0e-ea6a185b23ce, name=user_4, inserted_at=2024-01-17 16:21:43.009138, updated_at=2024-01-17 16:21:43.009138)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/1a21b929-e9d7-41d1-b592-5aa0061dec9a/annotation-mode
   fields=[RemoteTextField(id=UUID('4bbfacf2-7f71-4ade-97d5-3894a074f98e'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('4d448b74-b277-4e06-b4c9-8c4a18d205f7'), client=None, name='response1', title='Response 1', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('d97bbcb7-14dd-4726-abee-08fc6bd1f86e'), client=None, name='response2', title='Response 2', required=False, type='text', use_markdown=True)

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=c2fcbc1c-25c5-44e0-978d-b9c95257efc6
   name=ds_preference
   workspace=Workspace(id=5010a6fc-cc48-4aa7-b1ad-55bba790f850, name=user_5, inserted_at=2024-01-17 16:21:44.100188, updated_at=2024-01-17 16:21:44.100188)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/c2fcbc1c-25c5-44e0-978d-b9c95257efc6/annotation-mode
   fields=[RemoteTextField(id=UUID('9c4e31c5-90f3-42f5-8421-172db5d1f74a'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('e91ed0cc-fa56-452e-8066-7be3ba0e24c8'), client=None, name='response1', title='Response 1', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('073af530-b361-403d-8982-06327f0ec1a0'), client=None, name='response2', title='Response 2', required=False, type='text', use_markdown=True)

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=1e89bb95-e4e0-434d-bc17-e1ae8af3404b
   name=ds_preference
   workspace=Workspace(id=2468d93d-db1c-47a5-9cc7-eb1f83fae405, name=user_6, inserted_at=2024-01-17 16:21:45.145377, updated_at=2024-01-17 16:21:45.145377)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/1e89bb95-e4e0-434d-bc17-e1ae8af3404b/annotation-mode
   fields=[RemoteTextField(id=UUID('5b7031b9-1e04-4118-aff3-980eeefaf5e3'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('303e3f13-e502-4c01-9156-0b666fb232bf'), client=None, name='response1', title='Response 1', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('f6dd7e8c-2303-49c8-8475-3c4370cc7602'), client=None, name='response2', title='Response 2', required=False, type='text', use_markdown=True)

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=ac5e10d4-115d-4953-8b0d-886098aa9546
   name=ds_preference
   workspace=Workspace(id=44311207-6a1e-4fe1-a114-3ee1c831a8b6, name=user_7, inserted_at=2024-01-17 16:21:46.236750, updated_at=2024-01-17 16:21:46.236750)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/ac5e10d4-115d-4953-8b0d-886098aa9546/annotation-mode
   fields=[RemoteTextField(id=UUID('931e68f0-ac5e-40c8-bd9d-055d3b0ad10e'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('df522760-4f88-4765-8e3a-33bab03d1079'), client=None, name='response1', title='Response 1', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('74255557-3caf-477b-9c4e-4e204b02ce03'), client=None, name='response2', title='Response 2', required=False, type='text', use_markdown=True)

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=e812697e-7c5f-40c5-af7e-91d05c6cd75b
   name=ds_preference
   workspace=Workspace(id=a550b4bf-f95f-4cab-bb12-0e1f1b0df757, name=user_8, inserted_at=2024-01-17 16:21:47.271249, updated_at=2024-01-17 16:21:47.271249)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/e812697e-7c5f-40c5-af7e-91d05c6cd75b/annotation-mode
   fields=[RemoteTextField(id=UUID('83a2a782-81e3-46f9-8509-28b1ac02f43f'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('f16637ce-7868-4d90-9635-5c210fb25c70'), client=None, name='response1', title='Response 1', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('b6773d3a-507c-4cb6-8e22-de3b83598d92'), client=None, name='response2', title='Response 2', required=False, type='text', use_markdown=True)

Output()

INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully pushed to Argilla
INFO:argilla.client.feedback.dataset.local.mixins:RemoteFeedbackDataset(
   id=ef0c3ba8-0673-4f6d-ae5f-bd83886a4ce6
   name=ds_preference
   workspace=Workspace(id=c8fea970-52b1-492d-83e7-f15c8304e1df, name=user_9, inserted_at=2024-01-17 16:21:48.309045, updated_at=2024-01-17 16:21:48.309045)
   url=https://davidberenstein1957-deeplearning-ai.hf.space/dataset/ef0c3ba8-0673-4f6d-ae5f-bd83886a4ce6/annotation-mode
   fields=[RemoteTextField(id=UUID('9e69c5b0-ade7-4cff-b12d-ff4c6b65aefe'), client=None, name='prompt', title='Prompt', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('a106ec68-42f7-4b5e-a058-2455a8dd5554'), client=None, name='response1', title='Response 1', required=True, type='text', use_markdown=True), RemoteTextField(id=UUID('33c736ac-bd40-4184-9480-31bf3d97f002'), client=None, name='response2', title='Response 2', required=False, type='text', use_markdown=True)

Now, you can distribute the responses to your annotators and as an `owner` manage the process and results within your dataset by going to one of the URLs.

# Next steps

## Intersting resources

- [Ollama](https://ollama.ai/) to Get up and running with large language models, locally. Don't forget to check our [notus blog](https://argilla.io/blog/notus7b/) and [model](https://ollama.ai/argilla/notus) on ollama.
- [TRL](https://github.com/lvwerra/trl) is a full stack library where we provide a set of tools to train transformer language models.
- [bits and bytes](https://www.google.com/search?client=firefox-b-d&q=eli5+bits+and+bytes) allow users to run models in 4-bit precision.
- [LoRA](https://www.reddit.com/r/MachineLearning/comments/13m78u6/d_an_eli5_explanation_for_lora_lowrank_adaptation/) reduces the computational burden and memory requirements by fine-tuning a small set of additional parameters.
- [TheBloke](https://huggingface.co/TheBloke) for wonderful LLM quantisation and fine tuning.

## Shameloss marketing

### Personal

- [LinkedIn](https://www.linkedin.com/in/david-berenstein-1bab11105/)
- [Twitter](https://twitter.com/davidbstein1957)
- [GitHub](https://github.com/davidberenstein1957)

### Company

- [Argilla Github](https://github.com/argilla-io/argilla)
- [Distilabel Github](https://github.com/argilla-io/distilabel)
- [Argilla Slack Community](https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g)
- [Bi-weekly NLP community meetup](https://lu.ma/d720wy9f)
- [Prolific](https://www.prolific.com/)