Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,32 @@ $ make generate
## Testing
Most tests need an API endpoint to run.

### Getting the tests to use your current code.

You kinda want to do a `pip install -e .` equivalent but I don't know how to do that with poetry. The ugly version is this...

Find the directory where `groundlight` is installed:

```
$ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import groundlight
>>> groundlight
<module 'groundlight' from '/home/leo/anaconda3/lib/python3.7/site-packages/groundlight/__init__.py'>
```

Then blow this away and set up a symlink from that directory to your source.

```
cd /home/leo/anaconda3/lib/python3.7/site-packages/
rm -rf groundlight
ln -s ~/ptdev/groundlight-python-sdk/src/groundlight groundlight
```

TODO: something better.

### Local API endpoint

1. Set up a local [janzu API
Expand Down
42 changes: 35 additions & 7 deletions UserGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,49 @@

Groundlight makes it simple to understand images. You can easily create computer vision detectors just by describing what you want to know using natural language.

## Computer vision made simple

How to build a working computer vision system in just 5 lines of python code:

```Python
from groundlight import Groundlight
gl = Groundlight()
d = gl.create_detector("door", query="Is the door open?") # define with natural language
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically its just one line of code to build the model :) the other line is to use the model you already built.

image_query = gl.submit_image_query(detector=d, image=jpeg_img) # send in an image
print(f"The answer is {image_query.result}") # get the result
```

**How does it work?** Your images are first analyzed by machine learning (ML) models which are automatically trained on your data. If those models have high enough confidence, that's your answer. But if the models are unsure, then the images are progressively escalated to more resource-intensive analysis methods up to real-time human review. So what you get is a computer vision system that starts working right away without even needing to first gather and label a dataset. At first it will operate with high latency, because people need to review the image queries. But over time, the ML systems will learn and improve so queries come back faster with higher confidence.

*Note: The SDK is currently in "beta" phase. Interfaces are subject to change in future versions.*


## Simple Example
## Managing confidence levels and latency

How to build a computer vision system in 5 lines of python code:
Groundlight gives you a simple way to control the trade-off of latency against accuracy. The longer you can wait for an answer to your image query, the better accuracy you can get. In particular, if the ML models are unsure of the best response, they will escalate the image query to more intensive analysis with more complex models and real-time human monitors as needed. Your code can easily wait for this delayed response. Either way, these new results are automatically trained into your models so your next queries will get better results faster.

The desired confidence level is set as the escalation threshold on your detector. This determines what is the minimum confidence score for the ML system to provide before the image query is escalated.

For example, say you want to set your desired confidence level to 0.95, but that you're willing to wait up to 60 seconds to get a confident response.

```Python
from groundlight import Groundlight
gl = Groundlight()
d = gl.create_detector("door", query="Is the door open?") # define with natural language
image_query = gl.submit_image_query(detector=d, image="path/filename.jpeg") # send an image
print(f"The answer is {image_query.result}") # get the result
d = gl.create_detector("trash", query="Is the trash can full?", confidence=0.95)
image_query = gl.submit_image_query(detector=d, image=jpeg_img, wait=60)
# This will wait until either 30 seconds have passed or the confidence reaches 0.95
print(f"The answer is {image_query.result}")
```

Or if you want to run as fast as possible, set `wait=0`. This way you will only get the ML results, without waiting for escalation. Image queries which are below the desired confidence level still be escalated for further analysis, and the results are incorporated as training data to improve your ML model, but your code will not wait for that to happen.

```Python
image_query = gl.submit_image_query(detector=d, image=jpeg_img, wait=0)
```

You can see the confidence score returned for the image query:

```Python
print(f"The confidence is {image_query.result.confidence}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the None result here will be confusing...

```

## Getting Started

Expand All @@ -45,6 +71,7 @@ $ python3 glapp.py
```



## Prerequisites

### Using Groundlight SDK on Ubuntu 18.04
Expand Down Expand Up @@ -125,6 +152,7 @@ gl = Groundlight()
try:
detectors = gl.list_detectors()
except ApiException as e:
# Many fields available to describe the error
print(e)
print(e.args)
print(e.body)
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "groundlight"
version = "0.5.4"
version = "0.6.0"
license = "MIT"
readme = "UserGuide.md"
homepage = "https://groundlight.ai"
Expand Down
2 changes: 2 additions & 0 deletions samples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Code samples

15 changes: 15 additions & 0 deletions samples/blocking_submit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
"""Example of how to wait for a confident result
"""
import logging

logging.basicConfig(level=logging.DEBUG)

from groundlight import Groundlight

gl = Groundlight()

d = gl.get_or_create_detector(name="dog", query="is there a dog in the picture?")

print(f"Submitting image query")
iq = gl.submit_image_query(d, image="../test/assets/dog.jpeg", wait=30)
print(iq)
5 changes: 3 additions & 2 deletions spec/public-api.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
openapi: 3.0.3
info:
title: Groundlight API
version: 0.1.0
description: Ask visual queries.
version: 0.6.0
description: Easy Computer Vision powered by Natural Language
contact:
name: Questions?
email: support@groundlight.ai
Expand Down Expand Up @@ -273,6 +273,7 @@ components:
like to use.
maxLength: 100
required:
# TODO: make name optional - that's how the web version is going.
- name
- query
x-internal: true
Expand Down
42 changes: 38 additions & 4 deletions src/groundlight/client.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import os
from io import BufferedReader, BytesIO
import logging
import os
import time
from typing import Optional, Union

from model import Detector, ImageQuery, PaginatedDetectorList, PaginatedImageQueryList
Expand All @@ -15,6 +17,8 @@

GROUNDLIGHT_ENDPOINT = os.environ.get("GROUNDLIGHT_ENDPOINT", "https://api.groundlight.ai/device-api")

logger = logging.getLogger("groundlight")


class ApiTokenError(Exception):
pass
Expand Down Expand Up @@ -57,7 +61,10 @@ def __init__(self, endpoint: str = GROUNDLIGHT_ENDPOINT, api_token: str = None):
self.detectors_api = DetectorsApi(ApiClient(configuration))
self.image_queries_api = ImageQueriesApi(ApiClient(configuration))

def get_detector(self, id: str) -> Detector:
def get_detector(self, id: Union[str, Detector]) -> Detector:
if isinstance(id, Detector):
# Short-circuit
return id
obj = self.detectors_api.get_detector(id=id)
return Detector.parse_obj(obj.to_dict())

Expand Down Expand Up @@ -107,19 +114,22 @@ def submit_image_query(
self,
detector: Union[Detector, str],
image: Union[str, bytes, BytesIO, BufferedReader],
wait: float = 0,
) -> ImageQuery:
"""Evaluates an image with Groundlight.
:param detector: the Detector object, or string id of a detector like `det_12345`
:param image: The image, in several possible formats:
- a filename (string) of a jpeg file
- a byte array or BytesIO with jpeg bytes
- a numpy array in the 0-255 range (gets converted to jpeg)
:param wait: How long to wait (in seconds) for a confident answer
"""
if isinstance(detector, Detector):
detector_id = detector.id
else:
detector_id = detector
image_bytesio: Union[BytesIO, BufferedReader]
# TODO: support PIL Images
if isinstance(image, str):
# Assume it is a filename
image_bytesio = buffer_from_jpeg_file(image)
Expand All @@ -134,5 +144,29 @@ def submit_image_query(
"Unsupported type for image. We only support JPEG images specified through a filename, bytes, BytesIO, or BufferedReader object."
)

obj = self.image_queries_api.submit_image_query(detector_id=detector_id, body=image_bytesio)
return ImageQuery.parse_obj(obj.to_dict())
raw_img_query = self.image_queries_api.submit_image_query(detector_id=detector_id, body=image_bytesio)
img_query = ImageQuery.parse_obj(raw_img_query.to_dict())
if wait:
threshold = self.get_detector(detector).confidence_threshold
img_query = self._poll_for_confident_result(img_query, wait, threshold)
return img_query

def _poll_for_confident_result(self, img_query: ImageQuery, wait: float, threshold: float) -> ImageQuery:
"""Polls on an image query waiting for the result to reach the specified confidence."""
start_time = time.time()
delay = 0.1
while time.time() - start_time < wait:
current_confidence = img_query.result.confidence
if current_confidence is None:
logging.debug(f"Image query with None confidence implies human label (for now)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really confusing and needs to be documented further up as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree. In fact, we should write a wrapper which hides this.

break
if current_confidence >= threshold:
logging.debug(f"Image query confidence {current_confidence:.3f} above {threshold:.3f}")
break
logger.debug(
f"Polling for updated image_query because confidence {current_confidence:.3f} < {threshold:.3f}"
)
time.sleep(delay)
delay *= 1.4 # slow exponential backoff
img_query = self.get_image_query(img_query.id)
return img_query
2 changes: 2 additions & 0 deletions src/groundlight/images.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ def buffer_from_jpeg_file(image_filename: str) -> io.BufferedReader:
For now, we only support JPEG files, and raise an ValueError otherwise.
"""
if imghdr.what(image_filename) == "jpeg":
# Note this will get fooled by truncated binaries since it only reads the header.
# That's okay - the server will catch it.
return open(image_filename, "rb")
else:
raise ValueError("We only support JPEG files, for now.")
Empty file added test/assets/blankfile.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
47 changes: 41 additions & 6 deletions test/integration/test_groundlight.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,23 @@
import os
from datetime import datetime

import openapi_client
import pytest

from groundlight import Groundlight
from model import Detector, ImageQuery, PaginatedDetectorList, PaginatedImageQueryList


@pytest.fixture
def gl() -> Groundlight:
"""Creates a Groundlight client object for testing."""
endpoint = os.environ.get("GROUNDLIGHT_TEST_API_ENDPOINT", "http://localhost:8000/device-api")
return Groundlight(endpoint=endpoint)


@pytest.fixture
def detector(gl: Groundlight) -> Detector:
"""Creates a new Test detector."""
name = f"Test {datetime.utcnow()}" # Need a unique name
query = "Test query?"
return gl.create_detector(name=name, query=query)
Expand All @@ -24,7 +28,6 @@ def image_query(gl: Groundlight, detector: Detector) -> ImageQuery:
return gl.submit_image_query(detector=detector.id, image="test/assets/dog.jpeg")


# @pytest.mark.skip(reason="We don't want to create a million detectors")
def test_create_detector(gl: Groundlight):
name = f"Test {datetime.utcnow()}" # Need a unique name
query = "Test query?"
Expand All @@ -33,7 +36,6 @@ def test_create_detector(gl: Groundlight):
assert isinstance(_detector, Detector)


# @pytest.mark.skip(reason="We don't want to create a million detectors")
def test_create_detector_with_config_name(gl: Groundlight):
name = f"Test b4mu11-mlp {datetime.utcnow()}" # Need a unique name
query = "Test query with b4mu11-mlp?"
Expand All @@ -49,27 +51,60 @@ def test_list_detectors(gl: Groundlight):
assert isinstance(detectors, PaginatedDetectorList)


# @pytest.mark.skip(reason="We don't want to create a million detectors")
def test_get_detector(gl: Groundlight, detector: Detector):
_detector = gl.get_detector(id=detector.id)
assert str(_detector)
assert isinstance(_detector, Detector)


# @pytest.mark.skip(reason="We don't want to create a million detectors and image_queries")
def test_submit_image_query(gl: Groundlight, detector: Detector):
def test_submit_image_query_blocking(gl: Groundlight, detector: Detector):
# Ask for a trivially small wait so it never has time to update, but uses the code path
_image_query = gl.submit_image_query(detector=detector.id, image="test/assets/dog.jpeg", wait=5)
assert str(_image_query)
assert isinstance(_image_query, ImageQuery)


def test_submit_image_query_filename(gl: Groundlight, detector: Detector):
_image_query = gl.submit_image_query(detector=detector.id, image="test/assets/dog.jpeg")
assert str(_image_query)
assert isinstance(_image_query, ImageQuery)


def test_submit_image_query_jpeg_bytes(gl: Groundlight, detector: Detector):
jpeg = open("test/assets/dog.jpeg", "rb").read()
_image_query = gl.submit_image_query(detector=detector.id, image=jpeg)
assert str(_image_query)
assert isinstance(_image_query, ImageQuery)


def test_submit_image_query_jpeg_truncated(gl: Groundlight, detector: Detector):
jpeg = open("test/assets/dog.jpeg", "rb").read()
jpeg_truncated = jpeg[:-500] # Cut off the last 500 bytes
# This is an extra difficult test because the header is valid.
# So a casual check of the image will appear valid.
with pytest.raises(openapi_client.exceptions.ApiException) as exc_info:
_image_query = gl.submit_image_query(detector=detector.id, image=jpeg_truncated)
e = exc_info.value
assert e.status == 400


def test_submit_image_query_bad_filename(gl: Groundlight, detector: Detector):
with pytest.raises(FileNotFoundError):
_image_query = gl.submit_image_query(detector=detector.id, image="missing-file.jpeg")


def test_submit_image_query_bad_jpeg_file(gl: Groundlight, detector: Detector):
with pytest.raises(ValueError) as exc_info:
_image_query = gl.submit_image_query(detector=detector.id, image="test/assets/blankfile.jpeg")
assert "jpeg" in str(exc_info).lower()


def test_list_image_queries(gl: Groundlight):
image_queries = gl.list_image_queries()
assert str(image_queries)
assert isinstance(image_queries, PaginatedImageQueryList)


# @pytest.mark.skip(reason="We don't want to create a million detectors and image_queries")
def test_get_image_query(gl: Groundlight, image_query: ImageQuery):
_image_query = gl.get_image_query(id=image_query.id)
assert str(_image_query)
Expand Down