# Accessing segmentation errors with NeurErrors

This tutorial provides a high-level overview for how to access FlyWire's error dataset on top of CAVEclient built from [connectome annotation versioning engine](https://www.biorxiv.org/content/10.1101/2023.07.26.550598v1.abstract).

CAVE supports proofreading of datasets and their analysis even while proofreading is ongoing.

NeurErrors is a Python package that allows you to visualize and analyze the errors found by the proofreaders in the segmented connectome. One part is data acquisition of found errors for any segment ID in time. The other part focuses on building a graph dataset at the L2 resolution of the chunked graph and associating error coordinates to each node of the graph for Graph Machine Learning.


To install NeurErrors, you can use the following command:

`pip install https://github.com/raphael-levisse/neurerrors.git`

Inside a notebook, you can import NeurErrors with the following command:

In [None]:
!pip install *****

and imported like this:

In [1]:
import neurerrors

Make sure you have an account on CaveClient which is core to the NeurErrors package and allows you to access and query the chunked graph. Here is the link to the [CaveClient tutorial](https://github.com/seung-lab/FlyConnectome/blob/main/CAVE%20tutorial.ipynb).

Let's start by importing the necessary libraries:

In [2]:
import caveclient
import numpy as np
import torch
import torch_geometric
import tqdm
import pandas

We then start a session with the CAVEclient and we will use the `Dataset_Client` class to build the error and the graph dataset associated. Both can be independently built.

In [5]:
### Caveclient initialization ###
datastack_name = "flywire_fafb_public"
voxel_resolution = np.array([16,16,40])

client = caveclient.CAVEclient(datastack_name)
#client.materialize.version = 783 # FlyWire public version


NeurErrors Dataset Client initialization (client is optional if you want to use the other functions of the Dataset_Client class)

In [6]:
dataset_client = neurerrors.Dataset_Client(client)

Optional:
Generally, one has a .txt file of the segment IDs he wants to verify. One function of the `Dataset_Client` class is to build the old segment IDs from this file.
This will return a dictionary of the root segment IDs and the list of old segment IDs associated to each root segment ID. The minimum size of the segment to be considered is set to 10 L2 leaves (argument `min_l2_size`). 

In [None]:
input_path = 'your/path/to/seg_ids.txt'
root_to_old_seg_ids, old_seg_ids = dataset_client.get_old_seg_ids_from_txt_file(input_path, min_l2_size=10, show_progress=True)

To build the graph dataset as a list of PyTorch Geometric Data objects, one can use the following function and build the feature list of the dataset.

In [7]:
old_seg_ids = [720575940624605639, 720575940621549138, 720575940630876760, 720575940612108494] # Example of old segment IDs corresponding to the current root segment ID=720575940612108494

attributes_list = ['rep_coord_nm', 'size_nm3', 'area_nm2', 'max_dt_nm', 'mean_dt_nm', 'pca_val']
graph_dataset = dataset_client.build_graph_dataset(old_seg_ids, attributes_list, show_progress=True, verbose=False)

Building L2 graphs: 100%|██████████| 4/4 [00:02<00:00,  1.34ID/s]


This graph dataset is just a list of Data objects of the neurons as graphs of L2 nodes. We can add the error features to the dataset by making a forward pass in the graph of operations.
associate_error_to_graph_dataset will find all proofreading actions on the neuron, find their amplitude and assimilate them to the closest L2 node in the graph.

In [8]:
dataset_client.associate_error_to_graph_dataset(graph_dataset, voxel_resolution=voxel_resolution, show_progress=True)
#dataset_client.normalize_features(graph_dataset) #if you want to normalize the features for more stable training, normalizes data.x (features) to have a mean of 0 and a std of 1.

Finding errors and associating them to graphs: 100%|██████████| 4/4 [00:16<00:00,  4.15s/graph, seg_id=7.21e+17]


[Data(
   x=[218, 10],
   edge_index=[2, 510],
   l2_nodes=[218],
   metadata={ seg_id=720575940624605639 },
   l2_error_weights=[218, 1],
   error_features=[50, 6]
 ),
 Data(
   x=[392, 10],
   edge_index=[2, 930],
   l2_nodes=[392],
   metadata={ seg_id=720575940621549138 },
   l2_error_weights=[392, 1],
   error_features=[49, 6]
 ),
 Data(
   x=[31, 10],
   edge_index=[2, 66],
   l2_nodes=[31],
   metadata={ seg_id=720575940630876760 },
   l2_error_weights=[31, 1],
   error_features=[16, 6]
 ),
 Data(
   x=[43, 10],
   edge_index=[2, 98],
   l2_nodes=[43],
   metadata={ seg_id=720575940612108494 },
   l2_error_weights=[43, 1],
   error_features=[0]
 )]

We can now visualize the dataset with the following functions.

In [12]:
# FlyWire
data_point = graph_dataset[0]
url = dataset_client.get_url_for_visualization(seg_id=data_point.metadata['seg_id'], error_features=data_point.error_features, voxel_resolution=voxel_resolution, local_host=True)
# CA3
# url = dataset_client.get_url_for_visualization(seg_id=data_point.metadata['seg_id'], em_data_url=em_data_url_ca3, segmentation_url=segmentation_url_ca3, error_features=data_point.error_features, voxel_resolution=voxel_resolution, local_host=True)
print(url)

http://localhost:8000/client/#!%7B%22dimensions%22%3A%20%7B%22x%22%3A%20%5B1.6e-08%2C%20%22m%22%5D%2C%20%22y%22%3A%20%5B1.6e-08%2C%20%22m%22%5D%2C%20%22z%22%3A%20%5B4e-08%2C%20%22m%22%5D%7D%2C%20%22layers%22%3A%20%5B%7B%22source%22%3A%20%22precomputed%3A//https%3A//bossdb-open-data.s3.amazonaws.com/flywire/fafbv14%22%2C%20%22type%22%3A%20%22image%22%2C%20%22tab%22%3A%20%22source%22%2C%20%22name%22%3A%20%22EM-image%22%7D%2C%20%7B%22tab%22%3A%20%22segments%22%2C%20%22source%22%3A%20%22graphene%3A//middleauth%2Bhttps%3A//prodv1.flywire-daf.com/segmentation/1.0/flywire_public%22%2C%20%22type%22%3A%20%22segmentation%22%2C%20%22segments%22%3A%20%5B%22720575940624605639%22%5D%2C%20%22colorSeed%22%3A%20883605311%2C%20%22name%22%3A%20%22Segmentation%22%7D%2C%20%7B%22tool%22%3A%20%22annotatePoint%22%2C%20%22type%22%3A%20%22annotation%22%2C%20%22transform%22%3A%20%7B%22outputDimensions%22%3A%20%7B%22x%22%3A%20%5B1.8e-08%2C%20%22m%22%5D%2C%20%22y%22%3A%20%5B1.8e-08%2C%20%22m%22%5D%2C%20%22z%22%3A%