## ERP SCANR

This notebook is the overview notebook for the ERP-SCANR project (erpsc).

ERPSC is an attempt to use automated web-scraping and text mining to summarize research on ERPs. Hopefully this project will serve as a type of automated meta-analysis, and also a way to pull out patterns in ERP research.

Logic:
- At it's core, the erpsc code provides a base to set terms - both ERP terms and other terms, such as cognitive and/or disease terms - and mine papers for these terms, using the NCBI e-utils. You can also specify a list of 'exclusion words' that will be specifically avoided in ERP searches. 

Current analysis takes two forms:
- Co-Occurence analysis: searches for co-occurences of terms (ERPs and others), and simply analyses how commonly these terms occur together, in both absolute and relative terms. 
- Word analysis:

Settings:
- Several settings can be set and are explorable. For example:
    - Database: EUtils can search different databases, most relevantly, 'pubmed' or 'pmc', which have some differences.
    - Search Area: although somewhat contingent upon the database, the search area can be set, for all 'TIAB' (title and abstract), 'ARTI' (all available article text), WORD (all searchable words), etc. 

The code in this notebook simply uses the base class to load and check the terms 

These notebooks just run the code and display results. The actual code is in the 'erpsc' custom module. 

In [3]:
# Import custom code
from erpsc.base import Base

In [4]:
# Load a test object to test all terms
check = Base()

In [5]:
# Load erps and cognitive terms from file
check.set_erps_file()
check.set_terms_file('cognitive')

In [6]:
# Check the ERPs that are being used. Entries in the same line as used as synonyms.
check.check_erps()

List of ERPs used: 

P100	 : P100
P150	 : P150
P180	 : P180
P200	 : P200
P220	 : P220
P240	 : P240
P250	 : P250
P270	 : P270
P300	 : P300
P3a	 : P3a
P3b	 : P3b
P340	 : P340
P350	 : P350
P400	 : P400
P500	 : P500
P550	 : P550
P600	 : P600
N75	 : N75
N80	 : N80
N90	 : N90
N100	 : N100
N110	 : N110
N120	 : N120
N130	 : N130
N140	 : N140
N150	 : N150
N160	 : N160
N170	 : N170
N180	 : N180
N190	 : N190
N200	 : N200
MMN	 : MMN, mismatch negativity
N2a	 : N2a
N2b	 : N2b
N2c	 : N2c
N240	 : N240
N250	 : N250
N270	 : N270
N280	 : N280
N2pc	 : N2pc
N300	 : N300
N320	 : N320
N350	 : N350
N400	 : N400
N450	 : N450
N550	 : N550
N600	 : N600
N700	 : N700
LPC	 : late positive component,  late positive complex
LPP	 : late positive potential,  late positive potentials
NSW	 : negative slow wave
PSW	 : positive slow wave
VPP	 : vertex positive potential
CNV	 : contingent negative variation
PINV	 : post imperative negative variation
ELAN	 : early left anterior negativity
CPS	 : closure positive shift
LRP	 

In [8]:
# Check the cognitive terms used
check.check_terms()

List of terms used: 

attention
arousal
auditory, audition
awareness
categorization
conflict
decision making
emotion, emotional
error
executive functions
expectation
face, facial
grammar
language
learning
memory
motor
movement
number
pain, nociception
phonology, phonological
reading
reasoning
representation
reward
semantic, semantics
sleep
spatial
speech
social
somatosensory
tactile
valence
vision, visual
working memory


In [9]:
# Load the disease terms to check
check.set_terms_file('disease')

Unloading previous terms words.


In [10]:
# Check the disease terms
check.check_terms()

List of terms used: 

alcoholism
addiction
attention deficit hyperactivity disorder, ADHD
alzheimer
anorexia
anxiety
aphasia
autism
bipolar
dementia
depression
down syndrome
dyslexia
epilepsy, seizure
insomnia
migraine
mild cognitive impairment
multiple sclerosis
obsessive compulsive disorder
parkinson
personality disorder
post traumatic stress disorder, PTSD
psychosis
schizophrenia
stroke


In [11]:
# Add exclusion words
check.set_exclusions_file()

In [12]:
# Check the exclusion terms used
check.check_exclusions()

List of exclusion words used: 

P100	 : gene, virus, protein, cancer, acid, skin
P150	 : protein, cell, dna, dynein, adhesion
P180	 : protein, serum, plasma, rat, mice, feline
P200	 : gene, protein, antibody, phosphate
P220	 : protein, dna, postnatal
P240	 : gene, protein, muscle, rat
P250	 : protein, cell, postnatal
P270	 : protein, promoter, rna
P300	 : gene, protein, transcription, antibody, tumor, battery
P3a	 : protein
P3b	 : protein
P340	 : molecular
P350	 : protein
P400	 : protein
P500	 : protein, index, financial, cell
P550	 : cell, peptide, protein, oil, bond
P600	 : protein, gene
N75	 : protein, cell, nitrogen, bacteria
N80	 : protein, glycan, gene
N90	 : protein, nitrogen, gene
N100	 : gene, ketac
N110	 : protein, quantum, hydrogen
N120	 : nitrogen, protein, fertilizer
N130	 : virus, amino, cancer
N140	 : protein
N150	 : nitrogen
N160	 : protein
N170	 : protein
N180	 : fertilizer, protein, nitrogen
N190	 : protein, amino, cell
N200	 : protein, nitrogen
MMN	 : micronutrients


### Fixing File Numbering

In [1]:
from erpsc.core.utils import erp_file_numbers

path = '/Users/thomasdonoghue/Documents/GitCode/ERP_SCANR/erpsc/terms/'
f_names = ['erps.txt', 'erp_labels.txt', 'erps_exclude.txt', 'latencies.txt']

for f_name in f_names:
    erp_file_numbers(path + f_name)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/thomasdonoghue/Documents/GitCode/ERP_SCANR/erpsc/terms/erps.txt' -> 'temp.txt'