# RDoC Expert Survey Author Finder
Chris Iyer
Updated 3/23/2023

This is a notebook designed to find authors' emails for the purposes of sending them our RDoC Expert Survey Screener. Leveraging functions from `author_finder_functions.py`, this notebook will do the following things for each of the tasks we are using:
1. Search pubmed central (PMC) for open-access articles in the past 10 years with task keywords in the abstract.
2. Obtain correspondence/author emails for as many of these articles as possible.
3. Retrieve the number of PMC articles that cite the given PMC article.
4. Select the top <n> (100?) most-cited papers and retrieve their emails in order to send them our expert screener. 
5. We'll write these emails to a CSV.


In [1]:
from author_finder_functions import *

In [2]:
ROOT_PATH = '/Users/sunjaeshim/Documents/GitHub/author_finder/pubget_data_5_18_2008-2023/' # change this to match your local desired path

# tasks: 'spatial_cueing', 'visual_search', 'cued_ts', 'ax_cpt', 'flanker', 'stroop', 'stop_signal', 'go_nogo', 'span', 'change_detection', 'n_back'
all_tasks = ['spatial_cueing', 'visual_search', 'cued_ts', 'ax_cpt', 'flanker', 'stroop', 'stop_signal', 'go_nogo', 'span', 'change_detection', 'n_back']
tasks_to_run = all_tasks

# if you would like to manually change the keywords to search through, do so here:
# task_keywords[task_to_run] = ['stop-signal task', 'stop signal task']

### Option 1: run all-in-one

In [3]:
run_author_finder(tasks_to_run, ROOT_PATH, output = 'csv') # output = 'txt'

STARTING TASK: spatial_cueing
Executing command: pubget run /Users/sunjaeshim/Documents/GitHub/author_finder/pubget_data_5_16_2008-2023/spatial_cueing -q '"(spatial cueing task[Abstract]) OR (Posner cueing task[Abstract]) OR (Posner paradigm[Abstract]) OR (spatial cueing paradigm[Abstract]) AND ("2008"[Publication Date] : "2023"[Publication Date])"'
Files in /Users/sunjaeshim/Documents/GitHub/author_finder/pubget_data_5_16_2008-2023/spatial_cueing: ['query_90ee408ab8593bac7220dd05998a2cab']
article set path:  /Users/sunjaeshim/Documents/GitHub/author_finder/pubget_data_5_16_2008-2023/spatial_cueing/query_90ee408ab8593bac7220dd05998a2cab/articlesets
Article set #1:
Out of 79 papers, 78 had emails and 1 did not.
STARTING TASK: visual_search
Executing command: pubget run /Users/sunjaeshim/Documents/GitHub/author_finder/pubget_data_5_16_2008-2023/visual_search -q '"(visual search task[Abstract]) OR (visual search paradigm[Abstract]) AND ("2008"[Publication Date] : "2023"[Publication Date])

'Done!'

### Option 2: Run step-by-step

In [3]:
task_to_run = tasks_to_run[6] # CHANGE THIS

outpath = os.path.join(ROOT_PATH, task_to_run)
print(f'Output path: {outpath}')

Output path: /Users/sunjaeshim/Documents/GitHub/author_finder/pubget_data_5_18_2008-2023/stop_signal


In [4]:
# 1. pubmed search
do_pubget_query(task_to_run, outpath) # writes directory with search results

Executing command: pubget run /Users/sunjaeshim/Documents/GitHub/author_finder/pubget_data_5_18_2008-2023/stop_signal -q '"(stop-signal task[Abstract]) OR (stop signal task[Abstract]) OR (stop-signal paradigm[Abstract]) OR (stop signal paradigm[Abstract]) AND ("2008"[Publication Date] : "2023"[Publication Date])"'
Files in /Users/sunjaeshim/Documents/GitHub/author_finder/pubget_data_5_18_2008-2023/stop_signal: ['query_e229f3be77a3ed39b27119b07030cc24']


In [5]:
# 2. Pull emails and PMCIDS
papers = get_all_emails(outpath) # PMCIDs and emails

article set path:  /Users/sunjaeshim/Documents/GitHub/author_finder/pubget_data_5_18_2008-2023/stop_signal/query_e229f3be77a3ed39b27119b07030cc24/articlesets
Article set #1:
Out of 404 papers, 396 had emails and 8 did not.


In [6]:
# 3. Top 100 most cited
papers_top = get_most_cited(papers, n=100)

In [7]:
# 4. write output
write_papers_csv(papers_top, outpath, task_to_run) 
# OR
#write_email_txt(papers_top, outpath)

Unnamed: 0,pmcid,doi,title,journal,publication_year,emails,citation_count
0,2845804,10.1016/j.neuroimage.2009.12.109,The role of the right inferior frontal gyrus: ...,Neuroimage,2010,[adam.hampshire@mrc-cbu.cam.ac.uk],502
1,6533084,10.7554/eLife.46323,A consensus guide to capturing the ability to ...,eLife,2019,[frederick.verbruggen@ugent.be],197
2,3733500,10.1038/nn.3456,Canceling actions involves a race between basa...,Nat Neurosci,2013,[],176
3,3724271,10.1177/0956797612457390,Fictitious Inhibitory Differences,Psychol Sci,2013,[f.l.j.verbruggen@exeter.ac.uk],139
4,2973972,10.1371/journal.pone.0013848,On the Role of the Striatum in Response Inhibi...,PLoS One,2010,[b.b.zandbelt@umcutrecht.nl],138
...,...,...,...,...,...,...,...
95,6190063,10.1017/S0033291718000107,Children with ADHD symptoms show deficits in r...,Psychol Med,2018,[b.vanhulst@umcutrecht.nl],15
96,4594014,10.3389/fnhum.2015.00529,The effects of impulsivity and proactive inhib...,Front Hum Neurosci,2015,[leidy-janeth.castro-meneses@students.mq.edu.au],15
97,4469608,10.1371/journal.pone.0129139,Barratt Impulsivity and Neural Regulation of P...,PLoS One,2015,"[chiang-shan.li@yale.edu, sheng.zhang@yale.edu]",15
98,3949195,10.3389/fnbeh.2014.00049,Chronic exercise keeps working memory and inhi...,Front Behav Neurosci,2014,[c.padilla@uib.es],15


### CURRENT CAVEATS

1. The keywords are imperfect. Including "task" will exclude a lot of good papers, but leaving it out means we get articles like [this article](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4359377/) that are about the 'stop codon acting as a stop signal' and not about the stop signal task at all.

2. A few emails are lost (somewhere in the ballpark of 2%). I'm not worried.

3. We are sorting by # of citations only of other papers in PMC (not necessarily all citations of the paper, but just the ones in PMC). 

4. This search only gets open access papers, which is not all possible serach results.