In [None]:
library(catmaid)
library(dplyr)
library(tidyr)
library(elmr)

dplyr is the main data-handling package in R. For installation see: https://www.r-project.org/nosvn/pandoc/dplyr.html

The cheat sheet has some good information: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

An example problem to demonstrate the uses of dplyr, focussing on generating and manipulating neuron data.

## Read skids and neurons (in this case a selection of 10 excitatory and inhibitory mPNs)

In [2]:
excitatory.mPNs.sk = catmaid_skids("Rclub_1807_Ex")
inhibitory.mPNs.sk = catmaid_skids("Rclub_1807_In")
excitatory.mPNs = read.neurons.catmaid(excitatory.mPNs.sk) #not used here
inhibitory.mPNs = read.neurons.catmaid(inhibitory.mPNs.sk) #not used here

## Construct data frames with information about each neuron

In [4]:
#excitatory
excitatory.df = data.frame(skid = excitatory.mPNs.sk, 
                           name = catmaid_get_neuronnames(excitatory.mPNs.sk), 
                           row.names = NULL)
#inhibitory
inhibitory.df = data.frame(skid = inhibitory.mPNs.sk, 
                           name = catmaid_get_neuronnames(inhibitory.mPNs.sk),
                           row.names = NULL)

excitatory.df
inhibitory.df

skid,name
<int>,<fct>
32793,Multiglomerular mALT PN 32794 Flywalkies JMR
37212,Multiglomerular PN unknown 37213 JMR
57019,Multiglomerular mALT PN DA1 Type 1b R 57020 GA
57039,Multiglomerular mALT PN 57040 IJA
57076,Multiglomerular mALT PN 57077 ML
57126,Multiglomerular PN mALT 57127 LK
57179,Multiglomerular PN mALT DA1 Type 2 R 57180 LK
57204,Multiglomerular PN mALT 57205 IJA
57208,Multiglomerular PN mALT 57209 IJA
57216,Multiglomerular PN mALT 57217 GA


skid,name
<int>,<fct>
2766186,mPN mlALT VP1+VA1v+DL2v+DA1+33 LTS 0.40 2766187 Nibelung RJVR
3003322,mPN mlALT VP1+DP1m+33 LTS 0.45 3003323 Twin of Nibelung RJVR
3813335,mPN mlALT DM2+5 LTS 0.97 3813336 ASB
3813403,mPN mlALT DC4+4 LTS 0.99 3813404 RJVR
3813424,mPN mlALT VP1+5 LTS 0.99 3813425 RJVR
3813434,mPN mlALT VM5d+4 LTS 0.98 3813435 RJVR
3813442,mPN mlALT VP1+2 LTS 0.99 3813443 RJVR
3813483,mPN mlALT VP1+5 LTS 0.98 3813484 ASB
3842493,mPN mlALT D+DL1+VP1+4 LTS 0.93 3842494 Midas SD
3903440,mPN mlALT VC1+DC3+5 LTS 0.94 3903441 Frieda SD


## Add some addition information about tract and type

In [5]:
#the pipe operator %>% takes the output of one function and uses it as the input for another
excitatory.df %>% 
  mutate(type = "excitatory") %>% 
  mutate(tract = "mALT") -> excitatory.df
inhibitory.df %>% 
  mutate(type = "inhibitory") %>%
  mutate(tract = "mlALT") -> inhibitory.df

excitatory.df
inhibitory.df

skid,name,type,tract
<int>,<fct>,<chr>,<chr>
32793,Multiglomerular mALT PN 32794 Flywalkies JMR,excitatory,mALT
37212,Multiglomerular PN unknown 37213 JMR,excitatory,mALT
57019,Multiglomerular mALT PN DA1 Type 1b R 57020 GA,excitatory,mALT
57039,Multiglomerular mALT PN 57040 IJA,excitatory,mALT
57076,Multiglomerular mALT PN 57077 ML,excitatory,mALT
57126,Multiglomerular PN mALT 57127 LK,excitatory,mALT
57179,Multiglomerular PN mALT DA1 Type 2 R 57180 LK,excitatory,mALT
57204,Multiglomerular PN mALT 57205 IJA,excitatory,mALT
57208,Multiglomerular PN mALT 57209 IJA,excitatory,mALT
57216,Multiglomerular PN mALT 57217 GA,excitatory,mALT


skid,name,type,tract
<int>,<fct>,<chr>,<chr>
2766186,mPN mlALT VP1+VA1v+DL2v+DA1+33 LTS 0.40 2766187 Nibelung RJVR,inhibitory,mlALT
3003322,mPN mlALT VP1+DP1m+33 LTS 0.45 3003323 Twin of Nibelung RJVR,inhibitory,mlALT
3813335,mPN mlALT DM2+5 LTS 0.97 3813336 ASB,inhibitory,mlALT
3813403,mPN mlALT DC4+4 LTS 0.99 3813404 RJVR,inhibitory,mlALT
3813424,mPN mlALT VP1+5 LTS 0.99 3813425 RJVR,inhibitory,mlALT
3813434,mPN mlALT VM5d+4 LTS 0.98 3813435 RJVR,inhibitory,mlALT
3813442,mPN mlALT VP1+2 LTS 0.99 3813443 RJVR,inhibitory,mlALT
3813483,mPN mlALT VP1+5 LTS 0.98 3813484 ASB,inhibitory,mlALT
3842493,mPN mlALT D+DL1+VP1+4 LTS 0.93 3842494 Midas SD,inhibitory,mlALT
3903440,mPN mlALT VC1+DC3+5 LTS 0.94 3903441 Frieda SD,inhibitory,mlALT


These functions can be used individually, but a data frame needs specifying e.g.

mutate(df, type = "excitatory")

is equivalent to

df %>%
  mutate(type = "excitatory")
  
The latter just allows you to pipe long chains of commands together without the need to make a load of intermediary variables.


## Merge data frames and filter by some variable

In [7]:
bind_rows(excitatory.df, inhibitory.df) -> merge.df #bind_rows() joins by position
#filter by type
merge.df %>%
  filter(type == "excitatory")
#filter by tract
merge.df %>%
  filter(tract == "mlALT")
#filter by names containing "VP1"
merge.df %>%
  filter(grepl("VP1", name))

“binding character and factor vector, coercing into character vector”

skid,name,type,tract
<int>,<chr>,<chr>,<chr>
32793,Multiglomerular mALT PN 32794 Flywalkies JMR,excitatory,mALT
37212,Multiglomerular PN unknown 37213 JMR,excitatory,mALT
57019,Multiglomerular mALT PN DA1 Type 1b R 57020 GA,excitatory,mALT
57039,Multiglomerular mALT PN 57040 IJA,excitatory,mALT
57076,Multiglomerular mALT PN 57077 ML,excitatory,mALT
57126,Multiglomerular PN mALT 57127 LK,excitatory,mALT
57179,Multiglomerular PN mALT DA1 Type 2 R 57180 LK,excitatory,mALT
57204,Multiglomerular PN mALT 57205 IJA,excitatory,mALT
57208,Multiglomerular PN mALT 57209 IJA,excitatory,mALT
57216,Multiglomerular PN mALT 57217 GA,excitatory,mALT


skid,name,type,tract
<int>,<chr>,<chr>,<chr>
2766186,mPN mlALT VP1+VA1v+DL2v+DA1+33 LTS 0.40 2766187 Nibelung RJVR,inhibitory,mlALT
3003322,mPN mlALT VP1+DP1m+33 LTS 0.45 3003323 Twin of Nibelung RJVR,inhibitory,mlALT
3813335,mPN mlALT DM2+5 LTS 0.97 3813336 ASB,inhibitory,mlALT
3813403,mPN mlALT DC4+4 LTS 0.99 3813404 RJVR,inhibitory,mlALT
3813424,mPN mlALT VP1+5 LTS 0.99 3813425 RJVR,inhibitory,mlALT
3813434,mPN mlALT VM5d+4 LTS 0.98 3813435 RJVR,inhibitory,mlALT
3813442,mPN mlALT VP1+2 LTS 0.99 3813443 RJVR,inhibitory,mlALT
3813483,mPN mlALT VP1+5 LTS 0.98 3813484 ASB,inhibitory,mlALT
3842493,mPN mlALT D+DL1+VP1+4 LTS 0.93 3842494 Midas SD,inhibitory,mlALT
3903440,mPN mlALT VC1+DC3+5 LTS 0.94 3903441 Frieda SD,inhibitory,mlALT


skid,name,type,tract
<int>,<chr>,<chr>,<chr>
2766186,mPN mlALT VP1+VA1v+DL2v+DA1+33 LTS 0.40 2766187 Nibelung RJVR,inhibitory,mlALT
3003322,mPN mlALT VP1+DP1m+33 LTS 0.45 3003323 Twin of Nibelung RJVR,inhibitory,mlALT
3813424,mPN mlALT VP1+5 LTS 0.99 3813425 RJVR,inhibitory,mlALT
3813442,mPN mlALT VP1+2 LTS 0.99 3813443 RJVR,inhibitory,mlALT
3813483,mPN mlALT VP1+5 LTS 0.98 3813484 ASB,inhibitory,mlALT
3842493,mPN mlALT D+DL1+VP1+4 LTS 0.93 3842494 Midas SD,inhibitory,mlALT


In [None]:
## We can do more complicated transformations with relative ease in dplyr.
Here we would like to generate a connectivity matrix showing numbers of synapses between the 20 mPNs and all their downstream partners.

In [12]:
connectors = catmaid_get_connector_table(skids = c(excitatory.mPNs.sk, inhibitory.mPNs.sk)) #a list of all connectors incoming and outgoing from the mPNs
connectors %>%
  filter(direction == "outgoing") %>%                       #downstream targets only
  na.omit() %>%                                              #removes unconnected pre-synapses
  group_by(skid) %>%                                          #specify skid as a group that we want to perfrom some kind of summary on
  count(partner_skid) %>%                                      #counts partner_skid per group (skid)
  spread(partner_skid, n, fill = 0) -> connectivity.matrix      #from the package "tidyr". Splits one column apart (partner_skid) and populates the new cells with new values (n). Long -> wide.
connectivity.matrix

skid,430,8218,8770,9654,11218,12002,15306,23005,23569,⋯,11545948,11547665,11586712,11617348,11649625,11847433,11878257,11900959,11959451,12070989
<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
32793,0,0,2,1,1,1,0,1,0,⋯,0,0,0,1,0,0,0,0,0,0
37212,0,1,0,0,0,0,1,4,0,⋯,0,0,1,0,0,1,0,0,0,0
57019,2,0,0,0,0,0,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
57039,0,0,0,0,0,0,0,0,0,⋯,0,0,0,0,1,0,2,0,0,0
57076,1,0,0,0,0,0,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
57126,0,0,0,0,0,0,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
57179,6,0,0,0,0,0,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
57204,0,0,0,0,0,0,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
57208,0,0,0,0,0,0,0,1,0,⋯,0,0,0,0,0,0,0,2,0,0
57216,0,0,0,0,0,0,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0


At any stage we can pipe to the View() function which creates a temporary window with the output. Very useful when working with a long chain of commands. N.B. View() not yet supported in the Jupyter R kernel, only on R desktop.
