# Literature - tagging references

This notebooks demonstrates using the HAWC client to tag references in HAWC.

Make sure the `hawc_client` is installed. If you need to install it:

```bash
pip install -U hawc_client
```


In [1]:
from getpass import getpass

from hawc_client import HawcClient
import pandas as pd

First, setup a HAWC client instance and authenticate with your username and password:

In [2]:
client = HawcClient('https://hawcproject.org')
client.authenticate(email='webmaster@hawcproject.org', password=getpass())
assessment_id = 100500210

········


## Setup

Let's import some new references to play around with:

### Adding some references

In [3]:
response = client.lit.import_hero(
    assessment_id,
    title="reference import",
    description="description of what was imported ...",
    ids=[1037869, 1040933, 1054799]
)
response

{'assessment': 100500210,
 'search_type': 'i',
 'source': 2,
 'title': 'reference import',
 'slug': 'reference-import',
 'description': 'description of what was imported ...',
 'search_string': '1037869,1040933,1054799',
 'created': '2020-07-17T14:12:58.489053-05:00',
 'last_updated': '2020-07-17T14:12:58.489076-05:00'}

### Setting up the literature tags

Unfortunately, no API exists for this (yet). You'll need to create using the HAWC UI.  Once taht's done, continue with the next steps...

## Fetching HAWC IDs for references and tags

Next, we'll query HAWC to get IDs for our references and tags.

Let's fetch our references:

In [4]:
references_df = client.lit.references(assessment_id)
references_df.head()

Unnamed: 0,HAWC ID,HERO ID,PubMed ID,Citation,Full Citation,Title,Authors,Authors short,Year,Journal,...,Created,Last updated,Inclusion,Inclusion|Human Study,Inclusion|Animal Study,Inclusion|Mechanistic Study,Exclusion,Exclusion|Tier I,Exclusion|Tier II,Exclusion|Tier III
0,100798293,1037869,,Duraipandiyan V et al.,"Duraipandiyan V, Al-Harbi NA, Ignacimuthu S, M...",Antimicrobial activity of sesquiterpene lacton...,"Duraipandiyan V, Al-Harbi NA, Ignacimuthu S, M...",Duraipandiyan V et al.,,BMC Complementary and Alternative Medicine.,...,1595013178582,1595013178582,False,False,False,False,False,False,False,False
1,100798294,1040933,,Ponnampalam EN et al.,"Ponnampalam EN, Hopkins DL, Butler KL, Dunshea...","Polyunsaturated fats in meat from Merino, firs...","Ponnampalam EN, Hopkins DL, Butler KL, Dunshea...",Ponnampalam EN et al.,,Meat Science.,...,1595013178707,1595013178707,False,False,False,False,False,False,False,False
2,100798295,1054799,,Feng S et al.,"Feng S, Song L, Lee YK, Huang D. The Effects o...",The Effects of Fungal Stress on the Antioxidan...,"Feng S, Song L, Lee YK, Huang D",Feng S et al.,,Journal of Agricultural and Food Chemistry.,...,1595013178831,1595013178831,False,False,False,False,False,False,False,False


And our tags:

In [5]:
tags_df = client.lit.tags(assessment_id)
tags_df.head(15)

Unnamed: 0,id,depth,name,nested_name
0,100505134,2,Inclusion,Inclusion
1,100505135,3,Human Study,Inclusion|Human Study
2,100505136,3,Animal Study,Inclusion|Animal Study
3,100505137,3,Mechanistic Study,Inclusion|Mechanistic Study
4,100505138,2,Exclusion,Exclusion
5,100505139,3,Tier I,Exclusion|Tier I
6,100505140,3,Tier II,Exclusion|Tier II
7,100505141,3,Tier III,Exclusion|Tier III


It's also a good idea to grab the current mapping of references to tags for an assessment.

**Note**: This is always a good idea to save as a backup just in case you need it...

In [6]:
ref_tags_backup = client.lit.reference_tags(assessment_id)
ref_tags_backup.to_csv('~/Desktop/ref-tags-backup.csv', index=False)
ref_tags_backup

## Applying tags to references:

Now that we have references, tags, and a backup of our old mappings, we can create some new ones. 

To add a single tag to a single reference, create a datafame:

In [7]:
new_tags_df = pd.DataFrame(data=dict(
    reference_id=[references_df.iloc[0]["HAWC ID"]],
    tag_id=[tags_df.iloc[0].id]
))
new_tags_df.head()

Unnamed: 0,reference_id,tag_id
0,100798293,100505134


Now, you can submit the new reference/tag combination to HAWC to append to the current list of tags:

In [8]:
result = client.lit.import_reference_tags(assessment_id, new_tags_df.to_csv(index=False), 'append')
result.head()

Unnamed: 0,reference_id,tag_id
0,100798293,100505134


Generaly, you'll probably load thousands of reference-tag combinations at once. We'll create random permutation of the tags we want to apply (you probably don't want it to be random in reality):

In [9]:
# sample 10 random reference/tags w/ replacement
new_tags_df = pd.DataFrame(data=dict(
    reference_id=references_df['HAWC ID'].sample(10, replace=True).values, 
    tag_id=tags_df['id'].sample(10, replace=True).values,
)).drop_duplicates()
new_tags_df.head(10)

Unnamed: 0,reference_id,tag_id
0,100798293,100505137
1,100798293,100505138
2,100798294,100505139
3,100798294,100505134
5,100798295,100505136
6,100798293,100505134
7,100798293,100505135
8,100798293,100505139
9,100798295,100505139


 Now, instead of appending to the list of relationships, we will `replace` (this will delete old relations):

In [10]:
result = client.lit.import_reference_tags(assessment_id, new_tags_df.to_csv(index=False), 'replace')
result.head(10)

Unnamed: 0,reference_id,tag_id
0,100798293,100505134
1,100798293,100505135
2,100798293,100505137
3,100798293,100505138
4,100798293,100505139
5,100798294,100505134
6,100798294,100505139
7,100798295,100505136
8,100798295,100505139


That should do it!