# Before You Start

1. You will need Credentials to Silverpond's PyPi server. Contact your Customer Success team member if you don't have one.
2. Highlighter API Token. If you don't already have one you can do the following:
  - Login to Highlighter
  - Click on the User Icon 👤 and click their name in the dropdown menu
  - Click Request Access Token (At the bottom). This token will be valid until it is deleted
  - Save the token somewhere safe
3. This notebook should work on any Colab Runtime
4. If in Google Colab be sure when the Install Packaged cell completes it may ask you to restart the runtime. Click the button and **do not** re-run the cell again.

# This notebook

- Installs packages
- Exports data from Highlighter
- Inspects exported data


In [None]:
def i_am_running_in_colab():
    try:
        import google.colab
        return True
    except:
        return False
    
if i_am_running_in_colab():
    %env PYPI_USERNAME=rick_sanchez
    %env PYPI_PASSWORD=WubbaLubbaDubDub
    !git clone https://github.com/silverpond/highlighter-client-v2-notebooks.git
    !bash highlighter-client-v2-notebooks/colab-scripts/setup-export-submissions.sh


# Input Your API Token

In [None]:
HL_WEB_GRAPHQL_API_TOKEN="<HIGHLIGHTER_API_TOKEN>"
HL_WEB_GRAPHQL_ENDPOINT="https://<ACCOUNT_NAME>.highlighter.ai/graphql"

In [None]:
from highlighter_client.gql_client import HLClient

# Small helper function for displaying the DataFrames in the highlighter clinet
# dataset object
def display_ds(ds, count=10):
    display(ds.annotations_df.head(count))
    display(ds.images_df.head(count))



# Create a HLClient object from credentials

This client will be used when we need to communicate with Highlighter via GraphQL.

# House Keeping

In [None]:
client = HLClient.from_credential(api_token=HL_WEB_GRAPHQL_API_TOKEN, endpoint_url=HL_WEB_GRAPHQL_ENDPOINT)

# Read Dataset from Highlighter

`HighlighterClient` represents datasets as two Pandas DataFrames `annotations_df` and `images_df`. We can populate a `HighlighterClient.Dataset` in several ways using `Readers`. You can list the availaible `Readers` and load one from its name. In this case we'll be loading the `HighlighterSubmissionsReader` so we can pull submissions down from Highlighter.

In [None]:
from highlighter_client.datasets import get_reader, READERS

print(f"READERS: {list(READERS.keys())}")

reader = get_reader("highlighter_submissions")()

In [None]:
# View the doc string and function signature
# Note it expects a submissions generator
# We will create one in a moment.
?reader

Once we have a `Reader` we can initialize a `highlighter_client.Dataset` object 
and with that `Reader`

In [None]:
from highlighter_client.datasets.dataset import Dataset
ds = Dataset(reader=reader)

Now we have a `highlighter_client.Dataset` with a `HighlighterSubmissionsReader` we can populate our `DataFrames`.

To understand this we need to know two things.

1. `highlighter_client` uses Pandas `BaseModel` to tell GraphQL what values to return from a query. Some common `BaseModel`s are defined in `highlighter_client.base_models` but if you want more fine grained control you can define your own.

2. Some GraphQL queries may return many results. These types of queries are called `Connections` are are named accordingly in the code. There is a `paginate` function that takes a `Connection` query and returns a Python Generator.

For more information on the BaseModels see `highlighter_client/base_models.py`

In [None]:
from highlighter_client.base_models import DatasetSubmissionTypeConnection
from highlighter_client.paginate import paginate

dataset_id = ?

submissions_gen = paginate(
client.datasetSubmissionConnection,
DatasetSubmissionTypeConnection,
datasetId=dataset_id,
)

ds.read(submissions_gen=submissions_gen)
display_ds(ds)