# The MLSE Technical interview 

Task 1: JSON parse

The JSON parse will be taking the Labelbox export and retriving specfic aspects of it.

The reasoning behind this question is to replicate a very common task of an MLSE . 

Task 2: Labelbox SDK functions
Upload 2 dataRows by creating a JSON list, then pull the data back out using the Labelbox SDK. 


### Let's set up the libraries needed

In [26]:
"""
Install and import all the necessary tools
"""
!pip3 install labelbox
import json
from labelbox import Client, Project, Dataset

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [27]:
"""
We will be logging in to Labelbox to get this API Key

https://docs.labelbox.com/docs/create-an-api-key 

"""
api_key = ""
client = Client(api_key)

## Task 1

In [36]:
"""
For the project, please export the project Mapiallary Project(cl8t2zdka0ouo073fhtwpg9n4), and parse out the following values. 
Once parsed out, please create a Dict with each value and count.

Questions:
The task is to get three metrics out of the attached JSON document:
1. Get a count of the total number of regulatory
2. From the sub-class of regulatory, get a count of Yield and No-Parking 
3. Get a count of the total number of Warning
4. From the sub-class of Warning, get a count of pedestrians-crossing
5. Add these values to a dict and print. 

"""


# Select Project 
project = client.get_project('cl8t2zdka0ouo073fhtwpg9n4')

# Export labels as a json:
labels = project.export_labels(download = True)

print(json.dumps(labels[4:8], indent = 4))


[
    {
        "ID": "cl8t34ks000zh0f2osxwn7j09",
        "DataRow ID": "cl8t27iag230707afd9jlf4du",
        "Labeled Data": "https://labelbox.s3-us-west-2.amazonaws.com/datasets/mapillary_traffic/images/Lz65DutF7nGY3MApfx5IyQ.jpg",
        "Label": {
            "objects": [
                {
                    "featureId": "cl8t34kpc00rq0f2o8s2x8twp",
                    "schemaId": "cl8osdtdo0pm30733c5hz001b",
                    "color": "#ff0000",
                    "title": "regulatory",
                    "value": "regulatory",
                    "bbox": {
                        "top": 1455,
                        "left": 1981,
                        "height": 32,
                        "width": 34
                    },
                    "instanceURI": "https://api.labelbox.com/masks/feature/cl8t34kpc00rq0f2o8s2x8twp?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VySWQiOiJja3J3azQ4dm0wMG9pMHkweGdkMHY1NmUxIiwib3JnYW5pemF0aW9uSWQiOiJja3J3azQ4djIwMG9oMHkweDN4MWhkaDY1I

In [35]:
regulatory_count, yield_parking_count, warning_count, pedestrians_count = 0, 0, 0, 0
for i in labels:
  for j in i['Label']['objects']:
    if j['value'] == 'regulatory':
      regulatory_count += 1
      if j['classifications'][0]['answer']['title'] == 'yield' or j['classifications'][0]['answer']['title'] == 'no-parking':
        yield_parking_count += 1
    elif j['value'] == 'warning':
      warning_count += 1
      if j['classifications'][0]['answer']['title'] == 'pedestrians-crossing':
        pedestrians_count += 1
print(regulatory_count, yield_parking_count, warning_count, pedestrians_count)


4254 396 2550 189


# Task 2

*This cell can be ignored.* Please run it **before** testing Task 2.

In [37]:
"""
Please run this cell, this will be used to test your code.
"""
def save_data_as_json(input):
  with open("saved_json.json","w") as f:
    json_data = json.dumps(input)
    f.write(json_data)
    
def test_uploading_file(filepath_to_upload):
  save_data_as_json(data)

  client = Client(api_key)

  uploaded_url = client.upload_file(filepath_to_upload)
  dataset = client.get_dataset("cky4j7e0zl0n40z9ta3qhddau")

  org = client.get_organization()
  user = client.get_user()
  parameters = f"{{\"jsonUrl\":\"{uploaded_url}\",\"datasetId\":\"{dataset.uid}\"}}"

  res = client.execute(
      """
  mutation CreateTask($name: String!, $userId: ID!, $organizationId: ID!, $functionName: String!, $parameters: String!, $notifyOnCompletion: Boolean!) {
    createTask(data: {name: $name, status: IN_PROGRESS, createdBy: {connect: {id: $userId}}, completionPercentage: 0, organization: {connect: {id: $organizationId}}, notifyOnCompletion: $notifyOnCompletion, assigned: {create: {function: {connect: {name: $functionName}}, createdBy: {connect: {id: $userId}}, organization: {connect: {id: $organizationId}}, parameters: $parameters}}}) {
      id
      assigned {
        id
        __typename
      }
      __typename
    }
  }
  """, {
          "functionName": "JSON Processor",
          "name": "JSON Import",
          "notifyOnCompletion": False,
          "organizationId": org.uid,
          "parameters": parameters,
          "userId": user.uid
      })

  print(f"File '{filepath_to_upload}' saved to Labelbox.") 

In [39]:
"""
Objective:
  Given the two lists:
    list_of_externalId
    list_of_imageUrl

  Create a function that will take both list and return a JSON format result of the values.

  JSON format consists of a list of dictionaries. The example below can be considered JSON format
  [
    {
      "key":"value"
    }
  ]

  Here, we will have two keys in each dictionary: "externalId" and "imageUrl." 
  The items in the respective lists will be the values

  This will be creating new dataRows for a dataset in Labelbox.
"""

list_of_externalId = ["2017-Tesla-Model-S-P90D-102.jpg","2017-Tesla-Model-3-top-view.jpg"]
list_of_imageUrl = ["https://storage.googleapis.com/labelbox-example-datasets/tesla/2017-Tesla-Model-S-P90D-102.jpg", "https://storage.googleapis.com/labelbox-example-datasets/tesla/2017-Tesla-Model-3-top-view.jpg"]

def transform_to_json_format(list_of_externalId, list_of_imageUrl):
  task2 = []
  externalid_len = len(list_of_externalId)
  imageurl_len = len(list_of_imageUrl)
  if externalid_len == imageurl_len:
    for i, j in zip(list_of_externalId, list_of_imageUrl):
      temp = {'externalId': i, 'imageUrl': j}
      task2.append(temp)
  return task2

In [40]:
"""
Run the below to test your solution for Task 2
"""
data = transform_to_json_format(list_of_externalId, list_of_imageUrl)
save_data_as_json(data)
test_uploading_file(filepath_to_upload="/content/saved_json.json")

File '/content/saved_json.json' saved to Labelbox.


In [None]:
"""
We can check on the Organization's Datasets now to look at if it worked!
"""

In [45]:
"""
Once you have visually confirmed that the dataset has the assets (AKA data rows) 
that you have created, then its time for Task 2!

Objective:
    - Get the dataset that was created earlier.
    - Sample a few data rows from the ones you have previously created.

Requirements:
    - Get the dataset using the Labelbox Python SDK only using the dataset ID.
    - (Optional) The function can accept a dataset ID.
    - return a list of the datarow External IDs.

"""
"""
Definitions of functions that we will be using to test. 
They are only for testing their end result, and can be ignored
"""   
import random 
def sample_3_data_row_external_ids():
  client = Client(api_key)
  dataset = client.get_dataset("cky4j7e0zl0n40z9ta3qhddau")
  data_rows = dataset.data_rows()
  external_ids = []
  for data_row in data_rows:
    external_ids.append(data_row.external_id)
  return external_ids[:3]


In [46]:
"""
To test your sampling function (sample_3_data_row_external_ids)"""

sample_3_data_row_external_ids()


['2017-Tesla-Model-3-top-view.jpg',
 '2017-Tesla-Model-S-P90D-102.jpg',
 '2017-Tesla-Model-3-top-view.jpg']