# Classifying NeuroSynth Tags Based on Word Embeddings

Here we will build a word2vec model with all of the neurosynth corpus, and then train/test to predict individual neurosynth terms.

In [3]:
import pandas
pandas.set_option('display.max_rows', 4000)
classifiers = pandas.read_csv("classifiers_neurosynth_neurosynth.tsv",sep="\t",index_col=0)
classifiers = classifiers.sort(columns=["accuracy","TP","TN"],ascending=False)
classifiers

Unnamed: 0,accuracy,N,term,N_train,N_test,testNPos,testNNeg,trainNPos,trainNNeg,TP,FP,TN,FN,kernel
words presented_rbf,0.998152,2702,words presented,2216,541,1,540,4,2212,0,0,540,1,rbf
words presented_linear,0.998152,2702,words presented,2216,541,1,540,4,2212,0,0,540,1,linear
words presented_poly,0.998152,2702,words presented,2216,541,1,540,4,2212,0,0,540,1,poly
words presented_sigmoid,0.998152,2702,words presented,2216,541,1,540,4,2212,0,0,540,1,sigmoid
computations_rbf,0.998152,2702,computations,2211,541,1,540,4,2207,0,0,540,1,rbf
computations_linear,0.998152,2702,computations,2211,541,1,540,4,2207,0,0,540,1,linear
computations_poly,0.998152,2702,computations,2211,541,1,540,4,2207,0,0,540,1,poly
computations_sigmoid,0.998152,2702,computations,2211,541,1,540,4,2207,0,0,540,1,sigmoid
living_rbf,0.998152,2702,living,2212,541,1,540,3,2209,0,0,540,1,rbf
living_linear,0.998152,2702,living,2212,541,1,540,3,2209,0,0,540,1,linear


This is an interesting set, because many of the terms are rather sparse (meaning the original size of the positive test set is small. We can try looking at the few terms with a moderately sized set:

In [4]:
classifiers = classifiers[classifiers.testNPos > 100]
classifiers

Unnamed: 0,accuracy,N,term,N_train,N_test,testNPos,testNNeg,trainNPos,trainNNeg,TP,FP,TN,FN,kernel
connectivity_linear,0.844732,2702,connectivity,2215,541,103,438,419,1796,24,5,433,79,linear
functional magnetic_linear,0.829945,2702,functional magnetic,2214,541,211,330,862,1352,148,29,301,63,linear
resonance_linear,0.828096,2702,resonance,2218,541,244,297,1003,1215,177,26,271,67,linear
magnetic resonance_linear,0.826568,2702,magnetic resonance,2200,542,243,299,990,1210,188,39,260,55,linear
magnetic_linear,0.817006,2702,magnetic,2212,541,247,294,1011,1201,181,33,261,66,linear
information_rbf,0.813309,2702,information,2213,541,101,440,411,1802,0,0,440,101,rbf
information_linear,0.813309,2702,information,2213,541,101,440,411,1802,0,0,440,101,linear
information_poly,0.813309,2702,information,2213,541,101,440,411,1802,0,0,440,101,poly
information_sigmoid,0.813309,2702,information,2213,541,101,440,411,1802,0,0,440,101,sigmoid
cortical_rbf,0.813309,2702,cortical,2210,541,101,440,407,1803,0,0,440,101,rbf


This shows more interesting results, like for many of these we didn't do very well at all (all the positive cases are false negative). What this doesn't take into account is that each abstract actually has multiple labels, so possibly we need to build a model that can allow for that.