Official Python SDK for the onetask API. Leverage your labeling using Weak Supervision and Active Learning.
You can use pip to install the library:
$ pip install onetask
Alternatively, you can just clone the repository and run the setup.py script:
$ python setup.py install
Before making requests to the API, you need to create an instance of the onetask client. You will have to use your account API key:
from onetask import Client
# Instantiate the client Using your API key
api_token = '<YOUR API TOKEN HERE>'
project_id = '<YOUR PROJECT ID HERE>'
client = Client(api_token=api_token, project_id=project_id)
# if you print the client, you will receive some further instructions
print(client)There are several ways how you can start using our SDK. We'll show them in the following order:
- Fetching Sample Records
- Writing Labeling Functions
- Testing Labeling Functions Locally
- Registering Labeling Functions
- Fetching Labeling Functions
max_number_samples = 100 # default value
record_list = client.get_sample_records(max_number_samples=max_number_samples)Once you correctly instantiated your Client, you can start accessing record and labeling function endpoints. Please always ensure that your labeling functions:
- return label names that also exist in your project definition
- have exactly one parameter; we execute labeling functions on a record-basis
- If you need an import statement in your labeling functions, please check if it is given in the whitelisted libraries. If you need a library that we have not yet whitelisted, feel free to reach out to us.
The most straightforward way to create and register a labeling function is as follows:
def my_first_lf(record):
"""
Checks whether a headline contains clickbait-like terms.
"""
clickbait_terms = [
'should', 'reasons', 'you', #...
]
headline = record['attributes']['headline'].lower()
for clickbait_term in clickbait_terms:
if clickbait_term in headline:
return 'Clickbait'
return 'Regular'
lf_first = onetask.build_lf(my_first_lf)
def my_second_lf(record):
"""
Checks whether a headline starts with two digits.
"""
import re # standard regular expressions library
pattern = "^[1-9][0-9]" # two digits at start of string
headline = record['attributes']['headline'].lower()
if re.match(pattern, headline):
return 'Clickbait'
lf_second = onetask.build_lf(my_second_lf)Before you register your labeling functions, you can run them on your local machine, e.g. to ensure syntactic correctness.
record_hit_list_first = lf_first.execute(record_list) # this runs locally
print(record_hit_list_first)
record_hit_list_second = lf_second.execute(record_list) # this runs locally
print(record_hit_list_second)If you defined (and optionally tested) your labeling functions, you can register them to your project. Once registered, these labeling functions will receive an internal id, which can be used to fetch them back.
client.register_lf(lf_first)
print(lf_first.internal_id)
client.register_lf(lf_second)
print(lf_second.internal_id)You can always fetch your registered labeling functions.
lf_list = client.get_all_lfs()If you have any further questions which are not covered by this README or our documentations, please do not hesitate to contact us directly