# Process SDII Data (Basic)

**This notebook demonstrates how to read and perform basic exploratory analysis on archived and streaming SDII data using the Python SDK. SDII stands for Sensor Data Ingestion Interface. [Learn more about the SDII format specification](https://developer.here.com/documentation/sdii-data-spec/topics/introduction.html "SDII Documentation")**


### Dependencies
* Catalogs: [SDII Sample Berlin](https://platform.here.com/data/hrn:here:data::olp-here:olp-sdii-sample-berlin-2)  
* Layers: [Sample Versioned Layer](https://platform.here.com/data/hrn:here:data::olp-here:olp-sdii-sample-berlin-2/sample-versioned-layer)
* Languages: Python


### Test data sample description
The data sample contains simulated vehicle travel paths and corresponding simulated traffic sign observations. Although the schema represents the actual SDII data structure not all fields have been populated. The simulated signs are for demonstration purposes only and do not represent actual sign placements. 

### Workflow
- Import packages
- Import Python SDK Libraries
- Specify data catalog and layer
- Specify a bounding box and get tiles
- Read SDII data from a versioned layer in HERE deserialize using the Python SDK utilities
- Extract information of interest to _pandas_ dataframes for manipulation and visualization
- Perform simple exploratory analysis
- Read SDII data from a streaming layer and print some trace information

## Import Python Packages

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import operator

This configuration is being used for bar and chart.

In [None]:
plt.rcParams['figure.figsize'] = (20, 6)
plt.rcParams['axes.linewidth'] = 3
plt.rcParams['xtick.labelsize'] = 14
plt.rcParams['ytick.labelsize'] = 14
plt.rcParams['axes.titlesize'] = 20
plt.rcParams['axes.labelsize'] = 18

## Import the Python SDK libraries

In [None]:
from here.platform import Platform
from here.inspector import inspect, Color
import here.geotiles.heretile as ht

from shapely.geometry import  Point

# Initialize the platform object

In [None]:
platform = Platform()

## Specify data catalog and layer

In [None]:
CATALOG_HRN = 'hrn:here:data::olp-here:olp-sdii-sample-berlin-2'
LAYER_ID = 'sample-versioned-layer'

catalog = platform.get_catalog(CATALOG_HRN)
layer = catalog.get_layer(LAYER_ID)

## Specify a bounding box for the data and get the corresponding tiles

In [None]:
tile_ids = list(ht.between_points(Point(13.3, 52.5), Point(13.5, 52.6), level=14, fully_contained=False))
tile_ids[:10]

## Read and parse the SDII data

In [None]:
blobs = layer.read_partitions(tile_ids)

# We select only the decoded messages, ignoring partition information
parsed_blobs = [ blob for _, blob in blobs ]
parsed_blobs[:10]

## Calulate  number of occurance of RoadSignData

In [None]:
def get_freq(arr):
    elements_count = {}
    for element in arr:
        if element in elements_count:
            elements_count[element] += 1
        else:
            if element is not None:
                elements_count[element] = 1
    return elements_count.items()

### Assign Data to RoadSignFields

In [None]:
roadSignType_arr2= []
roadSignPermanency_arr= []
roadSignValue_arr= []
messageId_arr= []
messageId_count_arr= []
messageId_count_dict= {}
messageId_lng_lat_dict= {}

for blob in parsed_blobs:
    for item in getattr(blob, 'messages'):
        messageId_arr.append(item.messageId)
        messageId_count_arr = list(item.message.path.positionEstimate)
        messageId_count_dict[item.messageId] = len(messageId_count_arr)
        messageId_lng_lat_dict[item.messageId] = messageId_count_arr
        for item2 in item.message.pathEvents.signRecognition:
            roadSignType_arr2.append(item2.roadSignType)
            roadSignPermanency_arr.append(item2.roadSignPermanency)
            if item2.HasField('roadSignValue'):
                roadSignValue_arr.append(item2.roadSignValue)

### Fetch RoadSignFields values and their occurance

In [None]:
roadSignType_count_dict= dict(get_freq(roadSignType_arr2))
roadSignType_count_dict = dict(sorted(roadSignType_count_dict.items()))
roadSignValue_count_dict = dict(get_freq(roadSignValue_arr))
messageId_arr= dict(get_freq(messageId_arr))

### Number of drives per archiving tile - drives are associated with tiles based on the first GPS point in the trace

In [None]:
def getMsgCountGroupByTileId(blobs):
    data = dict()
    for blob in blobs:
        data[blob.tileId] = len(blob.messages)
    return data

In [None]:
parsed_data = getMsgCountGroupByTileId(parsed_blobs)
df = parsed_data
plt.bar(range(len(df)), list(df.values()), align='center')
plt.xticks(range(len(df)), list(df.keys()))
plt.xlabel('Tile ID')
plt.ylabel('Number of messages')
plt.grid(True)

plt.show()

## Traffic sign information

### Statistics for different sign types in the sample

In [None]:
cmap = plt.get_cmap("tab20c")

labels_signs = [str(h) for h in roadSignType_count_dict.keys()]

pie = plt.pie(roadSignType_count_dict.values(), shadow=False, startangle=140, pctdistance=0.85, colors=cmap.colors)

plt.legend(pie[0],labels_signs, bbox_to_anchor=(1,0.5), loc="center right", fontsize=12,
           bbox_transform=plt.gcf().transFigure)

plt.tight_layout()

### Number of signs by permanency

In [None]:
roadSignPermanency = set(roadSignPermanency_arr)

fig, ax =plt.subplots(1,1)
data=[[list(roadSignPermanency)[0],len(roadSignPermanency_arr)]]
column_labels=["roadSignPermanency", "count"]
ax.axis('tight')
ax.axis('off')
ax.table(cellText=data,colLabels=column_labels,loc="center")

plt.show()

### Number of observation of speed limit signs by value

In [None]:
plt.bar(range(len(roadSignValue_count_dict)), list(roadSignValue_count_dict.values()), align='center')
plt.xticks(range(len(roadSignValue_count_dict)), list(roadSignValue_count_dict.keys()))
plt.xlabel('Speed sign value')
plt.ylabel('Number of signs')
plt.grid(True)

plt.show()

### Number of GPS points recorded in each drive (sorted from longest to shortest trace)

In [None]:
msgs_by_count = messageId_count_dict
max_obs = max(messageId_count_dict.values())
bins=list(range(1,max_obs+2, 5))

In [None]:
# Implementation of matplotlib function
import matplotlib
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(10**7)
mu = 1
sigma = 21
x = messageId_count_dict.values()
num_bins = bins

n, num_bins, patches = plt.hist(x, bins)

y = ((1 / (np.sqrt(2 * np.pi) * sigma)) *
    np.exp(-1 * (1 / sigma * (num_bins - mu))**2))

plt.plot(bins, y)

plt.title('Number of GPS points per message')
plt.xlabel('Number of GPS points')
plt.ylabel('Number of messages')
plt.grid(True)

plt.show()

### Get the longest drive (based on number of GPS points)

In [None]:
msgs_by_count_sorted = dict( sorted(msgs_by_count.items(), key=operator.itemgetter(1),reverse=True))
frq_msg = list(msgs_by_count_sorted.keys())[0]
lat_lon_list = list(messageId_lng_lat_dict[frq_msg])

points = [ Point(item.longitude_deg, item.latitude_deg) for item in lat_lon_list ]

inspect(points, style=Color.BLUE)

# Loading SDII data from streaming layers

In [None]:
LAYER_ID = 'sample-streaming-layer'

layer = catalog.get_layer(LAYER_ID)
subscription = layer.subscribe()

# get data formatted similar to a partition object
parsed_msgs = layer.read_stream(subscription=subscription)

parsed_data_list = [ msg.path.positionEstimate for _, msg in parsed_msgs ]

result_data = []
for i in range(len(parsed_data_list[0])):
    result_data.append([])
    result_data[i].append(parsed_data_list[0][i].timeStampUTC_ms)
    result_data[i].append(parsed_data_list[0][i].latitude_deg)
    result_data[i].append(parsed_data_list[0][i].longitude_deg)
    result_data[i].append(parsed_data_list[0][i].heading_deg)
    result_data[i].append(parsed_data_list[0][i].speed_mps)
    if i == 4:
        break

fig, ax = plt.subplots(1,1)
column_labels=["timeStampUTC_ms", "latitude_deg", "longitude_deg", "heading_deg", "speed_mps"]
ax.axis('tight')
ax.axis('off')
ax.table(cellText=result_data,colLabels=column_labels,loc="center")

plt.show()

### Cleaning up

When done, unsubscribe from the stream.

In [None]:
subscription.unsubscribe()

<!--
from urllib.parse import quote, unquote
svg = """<svg width="40" height="48" viewBox="0 0 40 48" fill="none" xmlns="http://www.w3.org/2000/svg"><path class="triangle" d="M11,36.8l-5.5,5.5L0,36.8H11z" fill="#48DAD0"></path><path class="HERE" d="M19.1,24.2c-1.2-1.4-1.1-2.1-0.4-2.8c0.9-0.9,1.7-0.5,2.7,0.5L19.1,24.2z M30.9,9.2c0.9-0.9,1.7-0.5,2.7,0.5
    L31.3,12C30.1,10.7,30.2,9.9,30.9,9.2z M38,11.6c-1.1,1.6-2.9,4.2-4.9,2.2l5-5c-0.4-0.5-0.8-0.9-1-1.1c-2.7-2.7-5.7-2.7-8-0.4
	c-1.6,1.6-2,3.4-1.5,5.1l-1.6-1.8c-0.5,0.3-2.4,1.9-0.9,4.5l-1.8-1.5l-2.4,2.4l3.2,3.2c-2.5-1.9-5.1-1.8-7.2,0.4
	c-2.3,2.3-2.1,5-0.4,7.3l-0.3-0.3c-2.3-2.3-4.7-1.5-5.9-0.3c-0.9,0.9-1.5,2.2-1.3,3.1L4,24.6l-2.6,2.6l9.6,9.6h5.2l-3.5-3.5
	c-1.8-1.8-1.8-2.8-1-3.7c0.8-0.8,1.8-0.3,3.6,1.4l3.4,3.4l2.6-2.6l-3.2-3.2c2.3,1.8,5.1,1.9,7.7-0.6l0,0c1.5-1.4,2-2.8,2-2.8
	l-1.9-1.3c-1.1,1.6-2.9,4.2-4.9,2.3l5-5l3.1,3.1l2.7-2.7l-3.9-3.9c-1.8-1.8-0.7-3.5,0-4.1c0.4,0.7,0.9,1.5,1.5,2.1
	c2.5,2.5,5.7,3,8.6,0.2l0,0c1.5-1.4,2-2.8,2-2.8S38,11.6,38,11.6z" fill="#000000"></path></svg>
"""
print(f"""![HERE](data:image/svg+xml,{quote(svg)})
<span style="float:right; width:90%;"><sub><b>Copyright (c) 2020-2025 HERE Global B.V. and its affiliate(s). All rights reserved.</b>
This software, including documentation, is protected by copyright controlled by HERE. All rights are reserved. Copying, including reproducing, 
storing, adapting or translating, any or all of this material requires the prior written consent of HERE. This material also contains confidential 
information which may not be disclosed to others without the prior written consent of HERE.</sub></span>""")
-->
![HERE](data:image/svg+xml,%3Csvg%20width%3D%2240%22%20height%3D%2248%22%20viewBox%3D%220%200%2040%2048%22%20fill%3D%22none%22%20xmlns%3D%22http%3A//www.w3.org/2000/svg%22%3E%3Cpath%20class%3D%22triangle%22%20d%3D%22M11%2C36.8l-5.5%2C5.5L0%2C36.8H11z%22%20fill%3D%22%2348DAD0%22%3E%3C/path%3E%3Cpath%20class%3D%22HERE%22%20d%3D%22M19.1%2C24.2c-1.2-1.4-1.1-2.1-0.4-2.8c0.9-0.9%2C1.7-0.5%2C2.7%2C0.5L19.1%2C24.2z%20M30.9%2C9.2c0.9-0.9%2C1.7-0.5%2C2.7%2C0.5%0A%20%20%20%20L31.3%2C12C30.1%2C10.7%2C30.2%2C9.9%2C30.9%2C9.2z%20M38%2C11.6c-1.1%2C1.6-2.9%2C4.2-4.9%2C2.2l5-5c-0.4-0.5-0.8-0.9-1-1.1c-2.7-2.7-5.7-2.7-8-0.4%0A%09c-1.6%2C1.6-2%2C3.4-1.5%2C5.1l-1.6-1.8c-0.5%2C0.3-2.4%2C1.9-0.9%2C4.5l-1.8-1.5l-2.4%2C2.4l3.2%2C3.2c-2.5-1.9-5.1-1.8-7.2%2C0.4%0A%09c-2.3%2C2.3-2.1%2C5-0.4%2C7.3l-0.3-0.3c-2.3-2.3-4.7-1.5-5.9-0.3c-0.9%2C0.9-1.5%2C2.2-1.3%2C3.1L4%2C24.6l-2.6%2C2.6l9.6%2C9.6h5.2l-3.5-3.5%0A%09c-1.8-1.8-1.8-2.8-1-3.7c0.8-0.8%2C1.8-0.3%2C3.6%2C1.4l3.4%2C3.4l2.6-2.6l-3.2-3.2c2.3%2C1.8%2C5.1%2C1.9%2C7.7-0.6l0%2C0c1.5-1.4%2C2-2.8%2C2-2.8%0A%09l-1.9-1.3c-1.1%2C1.6-2.9%2C4.2-4.9%2C2.3l5-5l3.1%2C3.1l2.7-2.7l-3.9-3.9c-1.8-1.8-0.7-3.5%2C0-4.1c0.4%2C0.7%2C0.9%2C1.5%2C1.5%2C2.1%0A%09c2.5%2C2.5%2C5.7%2C3%2C8.6%2C0.2l0%2C0c1.5-1.4%2C2-2.8%2C2-2.8S38%2C11.6%2C38%2C11.6z%22%20fill%3D%22%23000000%22%3E%3C/path%3E%3C/svg%3E%0A)
<span style="float:right; width:90%;"><sub><b>Copyright (c) 2020-2025 HERE Global B.V. and its affiliate(s). All rights reserved.</b>
This software, including documentation, is protected by copyright controlled by HERE. All rights are reserved. Copying, including reproducing, 
storing, adapting or translating, any or all of this material requires the prior written consent of HERE. This material also contains confidential 
information which may not be disclosed to others without the prior written consent of HERE.</sub></span>