# Unitary Interview Stage 2: Clean Code

## Background

This assessment is designed to evaluate your wider software skills by demonstrating your ability to write code that is:
- clean
- easily maintained
- well-tested
- extensible

We are more interested in how you approach the design and structure of your implementation rather than evaluating if your code actually works.

## Case Study

For one of Unitary's products we need to build a pipeline to automatically classify content for 3 different categories and 3 different levels of risks based on a customer's policy. We need to build a backend service that implements the following pipeline:

1. Accept an input request (e.g. HTTP) with the content that needs to be moderated. The content can be a video, an image or a text.

  - Example request for text:

  ```json
    {
      "text": "Example text",
      "customer": "test",
      "modality": "text",
      "prediction_type": "policy"
    }
  ```

  - Example request for images or videos:

  ```json
    {
      "url": "Media file url",
      "customer": "test",
      "modality": "image" # can also be video
      "prediction_type": "policy"
    }
  ```

2. Pre-process the content to convert it into a numerical format to be consumed by our AI models.
  - The actual pre-processing is mocked out in the dummy functions at the end of the file. We are more interested in how you will structure your code to process different modalities.

3. Send the pre-processed input into different AI models to get detailed categories of harmful content.
  - We have given you 3 mock functions in the Appendix section below which you can use. There is one function for each of the 3 different models we use: hate speech, violence, and sexual.
  - Example response for hate speech:

  ```json
    {
        "toxicity": 0.6,
        "severe_toxicity": 0.2,
        "obscene": 0.4,
        "insult": 0.7,
        "identity_attack": 0.6,
        "threat": 0.5
    }
  ```
4. You need to aggregate the detailed responses to end up with three categories, i.e. `hate_speech`, `sexual` and `violence` and a risk level for each category. The values for the risk levels are `low`, `medium` and `high`.

The pipeline should combine the model responses and produce a final policy classification output something like:

```json
{
  "hate_speech": "low",
  "violence": "medium",
  "sexual": "high"
}

```
The aggregation can be a simple average per model. Take the example of `hate_speech` given above. You can produce an average for all the hate speech categories and determine the risk level with something as simple as:

```python
  if avg < 0.3:
      risk_level = "low"
  elif avg < 0.6:
      risk_level = "medium"
  else:
      risk_level = "high"
```

### Questions

1. How would you structure your code in your backend service to a) make it work with different modalities and b) make it easily extensible in order to add new AI models and different categories?

  - *Note:* If you don't have time to implement the code for all the different modalities you can select one to write the implementation for as long as your code is easily extensible and reusable.

2. How would you test it? Write unit tests to cover different cases for the code you wrote in the previous part.

  - *Note:* You don't need to write a fully functional test, only the test definition to explain what functionalities you would test.

3. How would you expose this functionality to customers? Write an example API that uses your backend service to produce a policy classification result.

## Exercise

Please implement the backend service as described in the Case Study.

In [None]:
# Your code here

## Appendix
### Assumptions & helper functions

1. You can download the image from URL specified below and use it as your input into your backend service. For simplicity you can consider videos as a list of frames, so you can provide a list of bytes of the same image.

In [None]:
!pip install requests



In [None]:
import requests

URL = "https://t3.ftcdn.net/jpg/03/21/62/56/360_F_321625657_rauGwvaYjtbETuwxn9kpBWKDYrVUMdB4.jpg"
IMAGE_PATH = "./gun.jpg"

resp = requests.get(URL)

with open(IMAGE_PATH, "wb") as f:
    content = resp.content
    f.write(resp.content)

2. We have provided some functions to mock the pre-processing and prediction steps of the pipeline which you can use in your code to avoid actually pre-processing the image/video and sending it for classification.

```python

# Dummy functions
import random
from pathlib import Path

def load_image(path: str | Path) -> list[bytes]:
  with open(path, "rb") as f:
    image_bytes = f.read()
  return [image_bytes]

def load_video(path: str | Path) -> list[bytes]:
  img_bytes = load_image(path)
  return img_bytes * 10

def preprocess_text(input_data: str) -> list[int]:
  return [random.randint(1, 100)] * 16

def preprocess_image(input_data: list[bytes]) -> list[int]:
  return [random.randint(1,100)] * 16

def preprocess_video(input_data: list[bytes]) -> llist[list[int]]:
  return [[random.randint(1,100)] * 16] * len(input_data)

async def predict_hate_speech(input_data: list[int]) -> dict[str, float]:
    return {
        "toxicity": random.random(),
        "severe_toxicity": random.random(),
        "obscene": random.random(),
        "insult": random.random(),
        "identity_attack": random.random(),
        "threat": random.random()
    }

async def predict_sexual(input_data: list[int]) -> dict[str, float]:
  return {
      "sexual_explicit": random.random(),
      "adult_content": random.random(),
      "adult_toys": random.random()
  }


async def predict_violence(input_data: list[int]) -> dict[str, float]:
  return {
      "violence": random.random(),
      "firearm": random.random(),
      "knife": random.random()
  }
```