# Creating HAWC bioassay data via the HAWC client

This notebook demonstrates using the HAWC client to programmatically create animal bioassay endpoints.

Make sure the `hawc_client` is installed. If you need to install it:

```bash
pip install -U hawc_client
```

Then, we'll create all the components for a assay, from experiment to individual endpoint. This assumes you have write permission and a HAWC study has been created which is ready for extraction. We also give all possible options for fields in HAWC; in many cases fields are optional with reasonable defaults, so you may not need to specifiy all the options as we do in these examples.

In [1]:
from getpass import getpass

import pandas as pd

from hawc_client import HawcClient

First, setup a HAWC client instance and authenticate with your username and password:

In [2]:
client = HawcClient('https://hawcproject.org')
client.authenticate(email='webmaster@hawcproject.org', password=getpass())
assessment_id = 100500210

········


## Importing a reference and getting ready for extraction

1. Import a reference from HERO
2. Map the HERO reference ID to the HAWC refernece ID
3. Create a "Study" in HAWC so it's ready for extraction

If you already have studies in HAWC ready for extraction, you can skip this step.

In [3]:
# add a new reference
hero_id = 4322522

response = client.lit.import_hero(
    assessment_id,
    title="import title",
    description="import description",
    ids=[hero_id]
)
response

{'assessment': 100500210,
 'search_type': 'i',
 'source': 2,
 'title': 'import title',
 'slug': 'import-title',
 'description': 'import description',
 'search_string': '4322522',
 'created': '2020-07-17T14:33:31.830202-05:00',
 'last_updated': '2020-07-17T14:33:31.830230-05:00'}

In [4]:
# get HAWC reference ID mapping
references=client.lit.reference_ids(assessment_id)
hawc_reference_id = int(references.query(f'`hero_id` == {hero_id}').reference_id.iloc[0])
hawc_reference_id

100798297

## HAWC metadata

There's a large amount of metadata required in HAWC to build all the required components of a bioassay experiment, and without reading the code it can be daunting to determine which enumerated term to put where. Therefore, we request metadata for all elements required to create a bioassay experiment with a single request.

In [5]:
metadata = client.animal.metadata()
metadata.keys()

dict_keys(['study', 'experiment', 'animal_group', 'dosing_regime', 'dose_group', 'endpoint'])

Getting possible values for the COI field in a study for example:

In [6]:
pd.Series(metadata['study']['coi_reported'], name='coi_item').to_frame().reset_index()

Unnamed: 0,index,coi_item
0,4,---
1,0,Authors report they have no COI
2,1,Authors disclosed COI
3,5,Not reported; no COI is inferred based on auth...
4,6,Not reported; a COI is inferred based on autho...
5,3,Not reported
6,2,Unknown


Or explorting values for an experiment type:

In [7]:
metadata['experiment']['type']

{'Ac': 'Acute (<24 hr)',
 'St': 'Short-term (1-30 days)',
 'Sb': 'Subchronic (30-90 days)',
 'Ch': 'Chronic (>90 days)',
 'Ca': 'Cancer',
 'Me': 'Mechanistic',
 'Rp': 'Reproductive',
 '1r': '1-generation reproductive',
 '2r': '2-generation reproductive',
 'Dv': 'Developmental',
 'Ot': 'Other',
 'NR': 'Not-reported'}

Or get the species ID for given names

In [8]:
species = pd.DataFrame(data=metadata['animal_group']['species']).set_index('id')
strain = pd.DataFrame(data=metadata['animal_group']['strains']).set_index('id')

species.query('name.str.contains("Mouse")', engine='python')

Unnamed: 0_level_0,name
id,Unnamed: 1_level_1
2,Mouse


Thus, we know to use ID 2 for a mouse experiment.

In [9]:
strain.query('species_id==2').head(20)

Unnamed: 0_level_0,species_id,name
id,Unnamed: 1_level_1,Unnamed: 2_level_1
4,2,B6C3F1
6,2,C57BL/6
7,2,BALB/c
12,2,Other
13,2,OF-1
15,2,Crj:BDF1
16,2,ERβ -/- (C57)
17,2,ERβ WT (C57)
18,2,C57
19,2,Swiss albino OF1


And then we can query various mouse strains as shown above.

## Converting a HAWC  reference to a Study for data extraction

In [5]:
# create a new study from that reference
data = dict(
    bioassay=True,
    epi=False,
    epi_meta=False,
    in_vitro=False,
    coi_reported=3,
    coi_details="",
    funding_source='Acme industries',
    study_identifier="4322522",
    contact_author=False,
    ask_author="",
    published=True,
    summary="",
    editable=True,
)
study = client.study.create(
    reference_id=hawc_reference_id,
    short_citation="York, 2003, 4322522",
    full_citation="York RG. 2003. Oral (galvage) dosage-range developmental toxicity study of potassium perfluorobutane sulfonate (PFBS) in rats.",
    data=data    
)
study['id']

100798297

## Creating experiments

Next, we'll create an experiment. The are a number of options available for metadata to add, and at this point the only way to determine what options are available are by reading the HAWC model source code. Hopefully we'll have useful utilities in the future which can provide more details on the metata. 

Feel free to [contact us](https://hawcproject.org/contact/) if you get stuck.

We'll create an experiment:


In [6]:
data = dict(
    study_id=study['id'],     
    name="30 day oral",     
    type="St",
    has_multiple_generations=False,
    chemical="2,3,7,8-Tetrachlorodibenzo-P-dioxin",
    cas="1746-01-6",
    dtxsid="DTXSID6026296",
    chemical_source="ABC Inc.",
    purity_available=True,
    purity_qualifier="≥",
    purity=99.9,
    vehicle="DMSO",
    guideline_compliance="not reported",
    description="Details here."
)
experiment = client.animal.create_experiment(data)
experiment['id']

100500580

# Creating animal groups and dosing regimes

Now that we've created an experiment, we can associate an animal-group and dosing regime with the experiment.

In many cases, the animal-groups being observed will have an associated dosing regime applied.

In [7]:
data = dict(
    experiment_id=experiment['id'],     
    name="Female C57BL/6 Mice",
    species=2,
    strain=6,
    sex="F",
    animal_source="Charles River",
    lifestage_exposed="Adult",
    lifestage_assessed="Adult",
    generation="",
    comments="Detailed comments here",
    diet="...",
    dosing_regime=dict(
        route_of_exposure= "OR",
        duration_exposure=30,
        duration_exposure_text="30 days",
        duration_observation=180,
        num_dose_groups=3,
        positive_control=True,
        negative_control="VT",
        description="...",
        doses = [
            {"dose_group_id": 0, "dose": 0, "dose_units_id": 1},
            {"dose_group_id": 1, "dose": 50, "dose_units_id": 1},
            {"dose_group_id": 2, "dose": 100, "dose_units_id": 1},
            {"dose_group_id": 0, "dose": 0, "dose_units_id": 2},
            {"dose_group_id": 1, "dose": 3.7, "dose_units_id": 2},
            {"dose_group_id": 2, "dose": 11.4, "dose_units_id": 2},
        ],
    )
)
animal_group1 = client.animal.create_animal_group(data)
animal_group_id = animal_group1['id']
dosing_regime_id = animal_group1['dosing_regime']['id']
animal_group_id

100501313

However, for developmental/reproductive studies, you can also specify a dosing regime which was applied to another group: 

In [9]:
data = dict(
    experiment_id=experiment['id'],     
    name="F1 Male/Female C57BL/6 Mice",
    species=2,
    strain=6,
    sex="C",
    parent_ids=[animal_group_id],
    siblings_id=animal_group_id,
    animal_source="Charles River",
    lifestage_exposed="Adult",
    lifestage_assessed="Adult",
    dosing_regime_id=dosing_regime_id,
    generation="F1",
    comments="Detailed comments here",
    diet="...",
)
animal_group2 = client.animal.create_animal_group(data)
animal_group2['id']

100501314

## Creating endpoints

There are many options for creating an endpoint:

In [10]:
data = dict(
    animal_group_id=animal_group_id,
    name='Relative liver weight', # or name_term:int from controlled vocabulary
    system='Hepatic', # or system_term:int from controlled vocabulary
    organ="Liver", # or organ_term:int from controlled vocabulary
    effect="Organ weight", # or effect_term:int from controlled vocabulary
    effect_subtype="Relative weight", # or effect_subtype_term:int from controlled vocabulary
    litter_effects="NA",
    litter_effect_notes="",
    observation_time=104,
    observation_time_units=5,
    observation_time_text="104 weeks",
    data_location="Figure 2B",
    expected_adversity_direction=3,
    response_units="g/100g BW",
    data_type="C",
    variance_type=1,
    confidence_interval=0.95,
    NOEL=1,  # should be the corresponding dose_group_id below or -999
    LOEL=2,  # should be the corresponding dose_group_id below or -999
    FEL=-999,  # should be the corresponding dose_group_id below or -999
    data_reported=True,
    data_extracted=True,
    values_estimated=False,
    monotonicity=8,
    statistical_test="ANOVA + Dunnett's test",
    trend_value=0.0123,
    trend_result=2,
    diagnostic="...",
    power_notes="...",
    results_notes="...",
    endpoint_notes="...",
    groups=[
        dict(
            dose_group_id=0,
            n=10,
            incidence=None,
            response=4.35,
            variance=0.29,
            significant=False,
            significance_level=None,
        ),
        dict(
            dose_group_id=1,
            n=10,
            incidence=None,
            response=5.81,
            variance=0.47,
            significant=False,
            significance_level=None,
        ),
        dict(
            dose_group_id=2,
            n=10,
            incidence=None,
            response=7.72,
            variance=0.63,
            significant=True,
            significance_level=0.035,
        )
    ],
)
endpoint = client.animal.create_endpoint(data)
endpoint["id"]

100513489

To create an endpoint with no response data:

In [12]:
data = dict(
    animal_group_id=animal_group_id,
    name='Relative liver weight',
    system='Hepatic',
    organ="Liver",
    effect="Organ weight",
    effect_subtype="Relative weight",
    litter_effects="NA",
    litter_effect_notes="",
    observation_time=104,
    observation_time_units=5,
    observation_time_text="104 weeks",
    data_location="Figure 2B",
    expected_adversity_direction=3,
    response_units="g/100g BW",
    data_type="NR",
    variance_type=3,
    data_reported=True,
    data_extracted=False,
    values_estimated=False,
    diagnostic="...",
    power_notes="...",
    results_notes="...",
    endpoint_notes="...",    
)
endpoint = client.animal.create_endpoint(data)
endpoint["id"]

100513491