<a href="https://colab.research.google.com/github/magland/spikeforest_batch_run/blob/master/notebooks/assemble_website_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Assemble website data

This notebook assembles the data for the website.


This is the info to Liz on 11/16/18:


Here's the data for the website:
```
kb.loadObject(
    key=dict(
        target='spikeforest_website',
        name='studies'
    )
)

kb.loadObject(
    key=dict(
        target='spikeforest_website',
        name='recordings'
    )
)

kb.loadObject(
    key=dict(
        target='spikeforest_website',
        name='true_units'
    )
)

kb.loadObject(
    key=dict(
        target='spikeforest_website',
        name='sorters'
    )
)
```

Temporary locations for convenience (WARNING: don't use these because they are not up to date):

* studies: http://132.249.245.245:24351/7317cea8265b/download/3/84/3840c06ec9f048e29b0f8cfdab5eb9208a2df8a9
* recordings: http://132.249.245.245:24351/7317cea8265b/download/d/81/d81a8498fa880d1e6f7671f0d5b23255ec86ec52
* units: http://132.249.245.245:24351/7317cea8265b/download/3/1c/31cbe8f7b423301665f4c48f11a689a7567f33c7
* sorters: http://132.249.245.245:24351/7317cea8265b/download/8/e5/8e5e7cb801475b2a171228ad50fc4d6ec9bb2689


These are each tables, and the records of the tables are like this:

```
**** Study:
Study:
{
    "name": "bionet_drift",
    "study_set": "bionet",
    "directory": "kbucket://15734439d8cf/groundtruth/bionet/bionet_drift",
    "description": "...",
    "sorters": [
        "MountainSort4-thr3",
        "IronClust-drift"
    ]
}

**** Recording:
{
    "name": "001_synth",
    "study": "magland_synth_noise10_K10_C4",
    "directory": "kbucket://15734439d8cf/groundtruth/magland_synth/datasets_noise10_K10_C4/001_synth",
    "description": "One of the recordings in the magland_synth_noise10_K10_C4 study",
    "computed_info": {
        "samplerate": 30000.0,
        "num_channels": 4,
        "duration_sec": 600.0
    },
    "plots": {
        "timeseries": "sha1://cd0958b1f58faed6764e703deb3b70e59b2f1d27/timeseries.jpg",
        "waveforms_true": "sha1://70925b4a481ba6119623a6a487429fe898e18e54/waveforms.jpg"
    },
    "firings_true": "kbucket://15734439d8cf/groundtruth/magland_synth/datasets_noise10_K10_C4/001_synth/firings_true.mda",
    "true_units_info": "sha1://eec30f7c5e1d97ff6078110427e2f397c117790c/true_units_info.json"
}

**** Unit:
{
    "unit_id": 1,
    "snr": 25.396783859187707,
    "peak_channel": 0,
    "num_events": 1398,
    "firing_rate": 2.33,
    "study": "magland_synth_noise10_K10_C4",
    "recording": "001_synth",
    "sorting_results": {
        "MountainSort4-thr3": {
            "Unit ID": 2,
            "Accuracy": "0.99",
            "Best unit": 6,
            "Matched unit": 6,
            "f.n.": "0.00",
            "f.p.": "0.01",
            "# matches": 1394
        },
        "IronClust-tetrode": {
            "Unit ID": 2,
            "Accuracy": "0.98",
            "Best unit": 7,
            "Matched unit": 7,
            "f.n.": "0.00",
            "f.p.": "0.01",
            "# matches": 1381
        },
        "SpykingCircus": {
            "Unit ID": 2,
            "Accuracy": "0.99",
            "Best unit": 8,
            "Matched unit": 8,
            "f.n.": "0.00",
            "f.p.": "0.01",
            "# matches": 1384
        }
    }
}

**** Sorter:
{
    "name": "MountainSort4-thr3",
    "processor_name": "MountainSort4",
    "params": {
        "detect_sign": -1,
        "adjacency_radius": 100,
        "detect_threshold": 3
    }
}
```

Right now there are:
```
3 sorters
8 studies
80 recordings
1200 true units
3360 sorted units
```

Note that num_true_units*num_sorters should equal num_sorted_units.

However 3360 <> 1200*3. So that means some units, for some sorters, do not have an accuracy. For those, you should assign 0.



In [0]:
# Only run this cell if you are running this on a hosted runtime that does not have these packages installed
# Consider connecting to a local runtime
%%capture
!pip install spikeforest

In [0]:
from kbucket import client as kb
import spikeforest as sf
import spikeinterface as si
import json

In [0]:
## Configure read/write access to kbucket
sf.kbucketConfigRemote(name='spikeforest1-readwrite',ask_password=True)

Enter password: ··········
Pairio user set to spikeforest. Test succeeded.


In [0]:
batch_names=[
    'summarize_recordings',
    'ms4_magland_synth_dev4',
    'irc_magland_synth_dev4',
    'sc_magland_synth_dev4',
    'ms4_bionet',
    'irc_bionet'
]

In [0]:
all_sorting_results=[]
all_summarize_recording_results=[]
for bname in batch_names:
  print('Loading '+bname)
  obj=kb.loadObject(key=dict(batch_name=bname,name='job_results'))
  job_results=obj['job_results']
  for res in job_results:
    if res['job']['command']=='sort_recording':
      all_sorting_results.append(res)
    elif res['job']['command']=='summarize_recording':
      if 'true_units_info' in res['result']:
        all_summarize_recording_results.append(res)
      else:
        print('WARNING: no field, true_units_info, skipping.')
        display(res)

Loading summarize_recordings


{'job': {'command': 'summarize_recording',
  'label': 'M5_2018-03-06_15-34-44-tetrode0',
  'recording': {'name': 'M5_2018-03-06_15-34-44-tetrode0',
   'study': 'testing',
   'directory': 'kbucket://15734439d8cf/testing/M5_2018-03-06_15-34-44',
   'channels': [3, 4, 5, 6],
   'description': 'One of the recordings in the testing study (tetrode=0)'}},
 'result': {'name': 'M5_2018-03-06_15-34-44-tetrode0',
  'study': 'testing',
  'directory': 'kbucket://15734439d8cf/testing/M5_2018-03-06_15-34-44',
  'channels': [3, 4, 5, 6],
  'description': 'One of the recordings in the testing study (tetrode=0)',
  'computed_info': {'samplerate': 30000.0,
   'num_channels': 4,
   'duration_sec': 1928.3968},
  'plots': {'timeseries': 'sha1://3a4f0316fd0c19452c0e3ce767ecec6adc77790c/timeseries.jpg'}}}



{'job': {'command': 'summarize_recording',
  'label': 'M5_2018-03-06_15-34-44-tetrode1',
  'recording': {'name': 'M5_2018-03-06_15-34-44-tetrode1',
   'study': 'testing',
   'directory': 'kbucket://15734439d8cf/testing/M5_2018-03-06_15-34-44',
   'channels': [7, 8, 9, 10],
   'description': 'One of the recordings in the testing study (tetrode=1)'}},
 'result': {'name': 'M5_2018-03-06_15-34-44-tetrode1',
  'study': 'testing',
  'directory': 'kbucket://15734439d8cf/testing/M5_2018-03-06_15-34-44',
  'channels': [7, 8, 9, 10],
  'description': 'One of the recordings in the testing study (tetrode=1)',
  'computed_info': {'samplerate': 30000.0,
   'num_channels': 4,
   'duration_sec': 1928.3968},
  'plots': {'timeseries': 'sha1://9069e1ce8c1059f327169f7532e3018ae58c5789/timeseries.jpg'}}}



{'job': {'command': 'summarize_recording',
  'label': 'M5_2018-03-06_15-34-44-tetrode2',
  'recording': {'name': 'M5_2018-03-06_15-34-44-tetrode2',
   'study': 'testing',
   'directory': 'kbucket://15734439d8cf/testing/M5_2018-03-06_15-34-44',
   'channels': [11, 12, 13, 14],
   'description': 'One of the recordings in the testing study (tetrode=2)'}},
 'result': {'name': 'M5_2018-03-06_15-34-44-tetrode2',
  'study': 'testing',
  'directory': 'kbucket://15734439d8cf/testing/M5_2018-03-06_15-34-44',
  'channels': [11, 12, 13, 14],
  'description': 'One of the recordings in the testing study (tetrode=2)',
  'computed_info': {'samplerate': 30000.0,
   'num_channels': 4,
   'duration_sec': 1928.3968},
  'plots': {'timeseries': 'sha1://752a1e34d49841e4699abbcbf96fdd4edfdcbe7a/timeseries.jpg'}}}

Loading ms4_magland_synth_dev4
Loading irc_magland_synth_dev4
Loading sc_magland_synth_dev4
Loading ms4_bionet
Loading irc_bionet


In [0]:
import json
def load_json(fname):
  fname=kb.realizeFile(fname)
  with open(fname) as f:
    return json.load(f)

def use_study(name):
  if name.startswith('magland'):
    return True
  if name.startswith('bionet'):
    return True
  return False

## Load the studies
print('Loading studies')
all_studies=[]
studies_by_name=dict()
obj=kb.loadObject(key=dict(name='spikeforest_recordings'))
studies=obj['studies']
for study in studies:
  study['sorters']=[] # initialize
  studies_by_name[study['name']]=study
  if use_study(study['name']):
    all_studies.append(study)

## Load the recordings
print('Loading recordings')
all_recordings=[]
for res in all_summarize_recording_results:
  recording=res['result']
  study=recording['study']
  if use_study(recording['study']):
    all_recordings.append(recording)
    
## Load the units
all_true_units=[]
unit_lookup=dict()
print('Loading summarize recording results')
for res in all_summarize_recording_results:
  study=res['job']['recording']['study']
  recording=res['job']['recording']['name']
  if use_study(study):
    obj=load_json(res['result']['true_units_info'])
    for unit in obj:
      unit['study']=study
      unit['recording']=recording
      unit['sorting_results']=dict()
      all_true_units.append(unit)
      code=study+'---'+recording+'---'+str(unit['unit_id'])
      unit_lookup[code]=unit
  #res['result']['true_units_info_data']=obj
print('Found {} true units'.format(len(all_true_units)))

## Load the sorting results
print('Loading sorting results')
count=0
sorters_by_name=dict()
for res in all_sorting_results:
  study=res['job']['recording']['study']
  recording=res['job']['recording']['name']
  sorter=res['job']['sorter']['name']
  if not sorter in studies_by_name[study]['sorters']:
    studies_by_name[study]['sorters'].append(sorter)
  if use_study(study):
    obj=load_json(res['result']['comparison_with_truth']['json'])
    sorters_by_name[sorter]=res['job']['sorter']
    for unit_id in obj:
      unit=obj[unit_id]
      code=study+'---'+recording+'---'+str(unit_id)
      if code in unit_lookup:
        unit_lookup[code]['sorting_results'][sorter]=unit
        count=count+1
print('Loaded {} sorted units'.format(count))

all_sorters=[]
for sname in sorters_by_name:
  all_sorters.append(sorters_by_name[sname])
print('Found {} sorters'.format(len(all_sorters)))

print('Saving {} studies'.format(len(all_studies)))
kb.saveObject(
    key=dict(
        target='spikeforest_website',
        name='studies'
    ),
    object=dict(
        studies=all_studies
    )
)
    
print('Saving {} recordings'.format(len(all_recordings)))
kb.saveObject(
    key=dict(
        target='spikeforest_website',
        name='recordings'
    ),
    object=dict(
        recordings=all_recordings
    )
)


print('Saving units')
kb.saveObject(
    key=dict(
        target='spikeforest_website',
        name='true_units'
    ),
    object=dict(
        true_units=all_true_units
    )
)

print('Saving sorters')
kb.saveObject(
    key=dict(
        target='spikeforest_website',
        name='sorters'
    ),
    object=dict(
        sorters=all_sorters
    )
)

  
#print('Loading summarize recording results')
#for res in all_summarize_recording_results:
#  obj=load_json(res['result']['true_units_info'])
#  res['result']['true_units_info_data']=obj

Loading studies
Loading recordings
Loading summarize recording results
Found 26688 true units
Loading sorting results
Loaded 9530 sorted units
Found 4 sorters
Saving 11 studies
Saving 116 recordings
Already on server.
Saving units
Already on server.
Saving sorters
Already on server.


In [0]:
print('Study:')
obj=kb.loadObject(
    key=dict(
        target='spikeforest_website',
        name='studies'
    )
)
print(json.dumps(obj['studies'][0],indent=4))

print('Recording:')
obj=kb.loadObject(
    key=dict(
        target='spikeforest_website',
        name='recordings'
    )
)
print(json.dumps(obj['recordings'][0],indent=4))


print('Unit:')
obj=kb.loadObject(
    key=dict(
        target='spikeforest_website',
        name='true_units'
    )
)
print(json.dumps(obj['true_units'][0],indent=4))

print('Sorter:')
obj=kb.loadObject(
    key=dict(
        target='spikeforest_website',
        name='sorters'
    )
)
print(json.dumps(obj['sorters'][0],indent=4))

Study:
{
    "name": "bionet_drift",
    "study_set": "bionet",
    "directory": "kbucket://15734439d8cf/groundtruth/bionet/bionet_drift",
    "description": "\nThe dataset is collected by Brian D. Allen from Ed Boyden's lab. The intracellular and extracellular voltages were recorded simultaneously.\nExtracellular voltages were recorded using 128 or 256 site silicon probes custom made at Boyden lab (9x9 um site dimension, 11x11 um site pitch).\n128-channel probe has 64x2 site grid pattern and 256-channel probe has 64x4 site grid pattern.\nBad or shorted sites are excluded based on the experimenter's criteria. The number of channels exported excludes the channels from bad sites. \nBursting spikes (ISI<20ms) are kept up to three successive spikes using the experimeter's burst creteria.\n\nFor more info, visit the publication website:\nhttps://www.physiology.org/doi/10.1152/jn.00650.2017\nAutomated in vivo patch clamp evaluation of extracellular multielectrode array spike recording capabi

In [0]:
print('studies: '+kb.findFile(key=dict(
    target='spikeforest_website',
    name='studies'
),local=False,remote=True))

print('recordings: '+kb.findFile(key=dict(
    target='spikeforest_website',
    name='recordings'
),local=False,remote=True))

print('units: '+kb.findFile(key=dict(
    target='spikeforest_website',
    name='true_units'
),local=False,remote=True))

print('sorters: '+kb.findFile(key=dict(
    target='spikeforest_website',
    name='sorters'
),local=False,remote=True))

studies: http://132.249.245.245:24351/7317cea8265b/download/8/50/850bdb6be7707150160c361efafcc0c7d39a4752
recordings: http://132.249.245.245:24351/7317cea8265b/download/b/eb/beb63dd5b4dda4ba49eb08c3af7c0319a07b8f38
units: http://132.249.245.245:24351/7317cea8265b/download/6/e5/6e5838eadebcc6c8a490ab919f3685f1c6e389e0
sorters: http://132.249.245.245:24351/7317cea8265b/download/f/b0/fb04a740198198d8f6fe25e64e73870c55558546
