# NeurErrors : Accessing segmentation errors and predicting error nodes at the L2 level

This tutorial provides a high-level overview for how to access FlyWire's error dataset on top of CAVEclient built from [connectome annotation versioning engine](https://www.biorxiv.org/content/10.1101/2023.07.26.550598v1.abstract).

CAVE supports proofreading of datasets and their analysis even while proofreading is ongoing.

NeurErrors is a Python package that allows you to visualize and analyze the errors found by the proofreaders in the segmented connectome. One part is data acquisition of found errors for any segment ID in time. The other part focuses on building a graph dataset at the L2 resolution of the chunked graph and associating error coordinates to each node of the graph for Graph Machine Learning.


## Installing NeurErrors

To install NeurErrors, you can use the following command:

`pip install https://github.com/raphaellevisse/NeurErrors.git`

Inside a notebook, you can import NeurErrors with the following command:

In [None]:
!pip install git+https://github.com/raphaellevisse/NeurErrors.git

## Imports

Make sure you have an account on CaveClient which is core to the NeurErrors package and allows you to access and query the chunked graph. Here is the link to the [CaveClient tutorial](https://github.com/seung-lab/FlyConnectome/blob/main/CAVE%20tutorial.ipynb).

Let's start by importing the necessary libraries:

In [1]:
import neurerrors
import caveclient
import numpy as np
import torch
import torch_geometric
import tqdm
import pandas

We then start a session with the CAVEclient and we will use the `Dataset_Client` class to build the error and the graph dataset associated. Both can be independently built.

In [2]:
seg_id = [720575940638929825]


In [3]:
### Caveclient initialization ###
datastack_name = "flywire_fafb_public"
voxel_resolution = np.array([16,16,40])

client = caveclient.CAVEclient(datastack_name)
#client.materialize.version = 783 # FlyWire public version


NeurErrors Dataset Client initialization (client is optional if you want to use the other functions of the Dataset_Client class)

In [4]:
dataset_client = neurerrors.Dataset_Client(client)

Optional:
Generally, one has a .txt file of the segment IDs he wants to verify. One function of the `Dataset_Client` class is to build the old segment IDs from this file.
This will return a dictionary of the root segment IDs and the list of old segment IDs associated to each root segment ID. The minimum size of the segment to be considered is set to 10 L2 leaves (argument `min_l2_size`). 

In [None]:
input_path = 'neurerrors/data/seg_ids/ID_tests.txt'
root_to_old_seg_ids, old_seg_ids = dataset_client.get_old_seg_ids_from_txt_file(input_path, min_l2_size=10, show_progress=True)
print(root_to_old_seg_ids)

To build the graph dataset as a list of PyTorch Geometric Data objects, one can use the following function and build the feature list of the dataset.

In [None]:
#old_seg_ids = [720575940620826450] # Example with one old segment ID
attributes_list = ['rep_coord_nm', 'size_nm3', 'area_nm2', 'max_dt_nm', 'mean_dt_nm', 'pca_val']
graph_dataset = dataset_client.build_graph_dataset(old_seg_ids, attributes_list, show_progress=True, verbose=False)

This graph dataset is just a list of Data objects of the neurons as graphs of L2 nodes. We can add the error features to the dataset by making a forward pass in the graph of operations.
associate_error_to_graph_dataset will find all proofreading actions on the neuron, find their amplitude and assimilate them to the closest L2 node in the graph.

In [None]:
dataset_client.associate_error_to_graph_dataset(graph_dataset, voxel_resolution=voxel_resolution, show_progress=True) # you can add find_operation_weights=True to find the weights of the operations, but this adds a big time cost
#dataset_client.normalize_features(graph_dataset, flywire_normalization=True) #if you want to normalize the features for more stable training, normalizes data.x (features) to have a mean of 0 and a std of 1. 
# Standard normalization over 20,000 neurons of Flywire public dataset.

We can now visualize the dataset with the following functions.

In [None]:
# FlyWire
data_point = graph_dataset[1]
print(data_point)
url = dataset_client.get_url_for_visualization(seg_id=data_point.metadata['seg_id'], error_features=data_point.error_features, voxel_resolution=voxel_resolution, local_host=True)
# CA3
# url = dataset_client.get_url_for_visualization(seg_id=data_point.metadata['seg_id'], em_data_url=em_data_url_ca3, segmentation_url=segmentation_url_ca3, error_features=data_point.error_features, voxel_resolution=voxel_resolution, local_host=True)
print(url)

# Making decisions on the dataset

With these graph datasets, one can now train a model to make decisions on the errors. There is already a pretrained model on the FlyWire public dataset that can be used.

In [None]:
node_decisions = dataset_client.model_inference([graph_dataset[0]], threshold=0.5, pretrained_weights_path="neurerrors/models/training/weights/19000-256-best-3_5.pt")
#print(node_decisions)
predicted_l2_nodes = graph_dataset[0].l2_nodes[node_decisions]
print(predicted_l2_nodes)


In [None]:
url = dataset_client.get_url_for_visualization(seg_id=graph_dataset[0].metadata['seg_id'], error_features=graph_dataset[0].error_features, local_host=True, l2_nodes=predicted_l2_nodes)
print(url)