# Clean, Merge and Export to bibliographic tool

### This Notebook is used to exclude non-papers from database search export and merge database results between PubMed and Web Of Science. It then condenses the information to be exported in a Zotero readable format (likely readable by other bibliographic tools) to be screened.

Please keep in mind that some articles might not be peer-reviewed depending on their journal origin. Most non peer-reviewed articles have nonetheless been excluded (proceedings, pre-prints, etc...).

Required packages:
Python == 3.12.0
pandas == 2.1.2
jupyter == 1.0.0
xlrd == 2.0.1

In [1]:
import pandas as pd     # Pandas is needed for dataframe treatment
import numpy as np
PM_path = "Z:/Literature Review/Queries exports/raw/27-10-2023/PM_1_271023.csv"    # PubMed's results have to be exported in csv format.
WoS_path = "Z:/Literature Review/Queries exports/raw/27-10-2023/WoS_1_271023.xls"  # Web of Science results must be exported in Excel format (xls or xlsx).
PM_1 = pd.read_csv(PM_path)
WoS_1 = pd.read_excel(WoS_path)


PM_1    # Inspect PM dataframe
#WoS_1   # Inspect Web Of Science dataframe

Unnamed: 0,PMID,Title,Authors,Citation,First Author,Journal/Book,Publication Year,Create Date,PMCID,NIHMS ID,DOI
0,37885127,Prediction of individual brain age using movie...,"Bi S, Guan Y, Tian L.",Cereb Cortex. 2023 Oct 26:bhad407. doi: 10.109...,Bi S,Cereb Cortex,2023,2023/10/27,,,10.1093/cercor/bhad407
1,37806102,Neural envelope tracking predicts speech intel...,"Van Hirtum T, Somers B, Dieudonné B, Verschuer...",Hear Res. 2023 Oct 4;439:108893. doi: 10.1016/...,Van Hirtum T,Hear Res,2023,2023/10/08,,,10.1016/j.heares.2023.108893
2,37752389,Theta EEG neurofeedback promotes early consoli...,"Rozengurt R, Kuznietsov I, Kachynska T, Kozach...",Cogn Affect Behav Neurosci. 2023 Sep 26. doi: ...,Rozengurt R,Cogn Affect Behav Neurosci,2023,2023/09/26,,,10.3758/s13415-023-01125-0
3,37562718,Arousal modulates the amygdala-insula reciproc...,"Wang L, Hu X, Ren Y, Lv J, Zhao S, Guo L, Liu ...",Neuroimage. 2023 Oct 1;279:120316. doi: 10.101...,Wang L,Neuroimage,2023,2023/08/10,,,10.1016/j.neuroimage.2023.120316
4,37480715,Individual differences in time-varying and sta...,"Di X, Xu T, Uddin LQ, Biswal BB.",Dev Cogn Neurosci. 2023 Oct;63:101280. doi: 10...,Di X,Dev Cogn Neurosci,2023,2023/07/22,PMC10393546,,10.1016/j.dcn.2023.101280
...,...,...,...,...,...,...,...,...,...,...,...
168,15110035,The chronoarchitecture of the human brain--nat...,"Bartels A, Zeki S.",Neuroimage. 2004 May;22(1):419-33. doi: 10.101...,Bartels A,Neuroimage,2004,2004/04/28,,,10.1016/j.neuroimage.2004.01.007
169,14755595,Functional brain mapping during free viewing o...,"Bartels A, Zeki S.",Hum Brain Mapp. 2004 Feb;21(2):75-85. doi: 10....,Bartels A,Hum Brain Mapp,2004,2004/02/03,PMC6872023,,10.1002/hbm.10153
170,11446582,Manipulation of frontal EEG asymmetry through ...,"Allen JJ, Harmon-Jones E, Cavender JH.",Psychophysiology. 2001 Jul;38(4):685-93.,Allen JJ,Psychophysiology,2001,2001/07/12,,,
171,11352609,Spatiotemporal brain imaging of visual-evoked ...,"Bonmassar G, Schwartz DP, Liu AK, Kwong KK, Da...",Neuroimage. 2001 Jun;13(6 Pt 1):1035-43. doi: ...,Bonmassar G,Neuroimage,2001,2001/05/16,,,10.1006/nimg.2001.0754


### Excluding non-article entries

In [2]:
# the "str.contains" function iteratively looks for the specified string in the string-converted rows of column "Document Type". case=False ensures the string is not case-sensitive. the "|" acts as a logical "OR" to either retain all rows that contain either "article" OR "reviews" since both interest us. na=False ensures that we treat "NaN" values as unwanted and are thus excluded. regex=True ensures the specified string is not treated as literal. The "~" is the logical exclusion for the following rule.

WoS_1_exclude = WoS_1[~WoS_1["Document Type"].str.contains("article|review", case=False, na=False, regex=True)] # excluded papers extracted in another dataframe
WoS_1_include = WoS_1[WoS_1["Document Type"].str.contains("article|review", case=False, na=False, regex=True)]  # kept papers extracted in another dataframe
#WoS_1_include   # Visualizing the excluded dataframe


### Format the Authors section in WoS to match Pubmed format
This is important for the file creation later to be in a format readable by Zotero. I decided to use the export format of PubMed because it is very easy to create from scratch and readable by most bibliographic tool. It is a simple text file with formalized structure.

In [3]:
author_column = WoS_1_include["Authors"].to_numpy()     # Converted to numpy for manipulation

for i in range(author_column.size): # Iterate over the author column
    author_column[i] = author_column[i].replace(",", "")
    author_column[i] = author_column[i].replace(";", ",")
    
WoS_1_include["Authors"] = author_column
WoS_1_include

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  WoS_1_include["Authors"] = author_column


Unnamed: 0,Publication Type,Authors,Book Authors,Group Authors,Book Group Authors,Researcher Ids,ORCIDs,Book Editors,Author - Arabic,Article Title,...,Copyright,Degree Name,Institution Address,Institution,Dissertation and Thesis Subjects,Author Keywords,Indexed Date,UT (Unique ID),Pubmed Id,Unnamed: 73
0,J,"Lankinen Kaisu, Saari Jukka, Hlushchuk Yevhen,...",,,,"Parkkonen, Lauri/G-6755-2012; Lankinen, Kaisu ...","Parkkonen, Lauri/0000-0002-0130-0801; Lankinen...",,,Consistency and similarity of MEG- and fMRI-si...,...,,,,,,,2018-12-28,WOS:000430366000030,29486325.0,
1,J,"Liu Xingyu, Dai Yuxuan, Xie Hailun, Zhen Zonglei",,,,"Zhen, Zonglei/GPG-1239-2022","Liu, Xingyu/0000-0002-4386-2140",,,"A studyforrest extension, MEG recordings while...",...,,,,,,,2022-05-25,WOS:000795513500002,35562378.0,
2,J,"Bonmassar G, Schwartz DP, Liu AK, Kwong KK, Da...",,,,"Liu, Alan King Lun/A-2210-2015; anand, amit/A-...","Liu, Alan King Lun/0000-0001-6109-1338; Kwong,...",,,Spatiotemporal brain imaging of visual-evoked ...,...,,,,,,,2001-06-01,WOS:000169056500009,11352609.0,
3,J,"Eickhoff Simon B., Milham Michael, Vanderwal T...",,,,"Vanderwal, Tamara/AAS-4214-2021; Eickhoff, Sim...","Eickhoff, Simon B./0000-0001-6363-2759; Vander...",,,Towards clinical applications of movie fMRI,...,,,,,,,2020-08-15,WOS:000542369500007,32376301.0,
5,J,"Stroman Patrick W., Coe Brian C., Munoz Doug P.",,,,,"Coe, Brian/0000-0002-3985-0163; Stroman, Patri...",,,Influence of attention focus on neural activit...,...,,,,,,,2011-01-01,WOS:000285570100002,20850240.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
313,J,"Tan Chenhao, Liu Xin, Zhang Gaoyan",,,,,"Zhang, Gaoyan/0000-0001-5189-9229",,,Inferring Brain State Dynamics Underlying Natu...,...,,,,,,,2022-03-21,WOS:000764450400001,35244856.0,
314,J,"Symonds Renee M., Zhou Juin W., Cole Sally L.,...",,,,,"Cole, Sally/0000-0002-7660-2732; Sussman, Elys...",,,Cognitive resources are distributed among the ...,...,,,,,,,2020-02-03,WOS:000509324600002,31578762.0,
315,J,"Franssen Sieske, Jansen Anita, van den Hurk Jo...",,,,,"Roefs, Anne/0000-0002-9935-1075; Jansen, Anita...",,,"Effects of mindset on hormonal responding, neu...",...,,,,,,,2022-07-15,WOS:000820529200006,35182553.0,
316,J,"Alain C, Schuler BM, McDonald KL",,,,,,,,Neural activity associated with distinguishing...,...,,,,,,,2002-02-01,WOS:000173784900036,11863201.0,


## Merging both databases to a single dataframe
Note that PubMed csv export does not include the abstracts. However, there is a possibility to export ONLY the abstracts as another file. Some clever tricks in Python will enable you to include them that way.

### Included data points. 

In [4]:
# the "concat" function stacks vertically so that WoS results will be put underneath PM ones. the "ignore_index=True" ensures the definition of a new index specific to this dataframe. Otherwise, we would keep the indices from previous dataframes and these could clash.

inc_merged = pd.DataFrame()         # Empty dataframe

inc_merged["Title"] = pd.concat([WoS_1_include["Article Title"], PM_1["Title"]], ignore_index=True) # Merging title columns

empty_abs = pd.DataFrame("", index=np.arange(len(PM_1.index)), columns=["Abstract"])    # PubMed exported results do not contain the abstract and will have to be retrieved manually.
inc_merged["Abstract"] = pd.concat([WoS_1_include["Abstract"], empty_abs["Abstract"]],ignore_index=True)

inc_merged["Authors"] = pd.concat([WoS_1_include["Authors"], PM_1["Authors"]], ignore_index=True)   # Authors

inc_merged["PMID"] = pd.concat([WoS_1_include["Pubmed Id"], PM_1["PMID"]], ignore_index=True)       # PubMed Id

inc_merged["DOI"] = pd.concat([WoS_1_include["DOI"], PM_1["DOI"]], ignore_index=True)               # DOI

inc_merged = inc_merged.drop_duplicates(subset="DOI")   # Dropping the duplicates in terms of DOI
inc_merged = inc_merged.drop_duplicates(subset="Title")
inc_merged  # Inspecting resulting dataframe

Unnamed: 0,Title,Abstract,Authors,PMID,DOI
0,Consistency and similarity of MEG- and fMRI-si...,Movie viewing allows human perception and cogn...,"Lankinen Kaisu, Saari Jukka, Hlushchuk Yevhen,...",29486325.0,10.1016/j.neuroimage.2018.02.045
1,"A studyforrest extension, MEG recordings while...","Naturalistic stimuli, such as movies, are bein...","Liu Xingyu, Dai Yuxuan, Xie Hailun, Zhen Zonglei",35562378.0,10.1038/s41597-022-01299-1
2,Spatiotemporal brain imaging of visual-evoked ...,Combined analysis of electroencephalography (E...,"Bonmassar G, Schwartz DP, Liu AK, Kwong KK, Da...",11352609.0,10.1006/nimg.2001.0754
3,Towards clinical applications of movie fMRI,,"Eickhoff Simon B., Milham Michael, Vanderwal T...",32376301.0,10.1016/j.neuroimage.2020.116860
4,Influence of attention focus on neural activit...,Perceptions of sensation and pain in healthy p...,"Stroman Patrick W., Coe Brian C., Munoz Doug P.",20850240.0,10.1016/j.mri.2010.07.012
...,...,...,...,...,...
269,Prediction of individual brain age using movie...,,"Bi S, Guan Y, Tian L.",37885127.0,10.1093/cercor/bhad407
281,Individual differences in time-varying and sta...,,"Di X, Xu T, Uddin LQ, Biswal BB.",36778481.0,10.1101/2023.01.30.526311
305,Exploring﻿ electroencephalography with a model...,,"Popiel NJM, Metrow C, Laforge G, Owen AM, Stoj...",34611185.0,10.1038/s41598-021-97960-7
325,Inter-subject correlations during natural view...,,"Gait A, Duisenbinov V, Lee MH, Biesmann F, Faz...",33017964.0,10.1109/EMBC44109.2020.9176083


### Excluded datapoints

In [6]:
ex_merged = pd.DataFrame()         # Empty dataframe
ex_merged["Title"] = WoS_1_exclude["Article Title"] # Merging title columns
ex_merged["Abstract"] = WoS_1_exclude["Abstract"]
ex_merged["Authors"] = WoS_1_exclude["Authors"]     # Authors
ex_merged["PMID"] = WoS_1_exclude["Pubmed Id"]  # PubMed Id
ex_merged["DOI"] = WoS_1_exclude["DOI"]   # DOI

ex_merged

Unnamed: 0,Title,Abstract,Authors,PMID,DOI
4,Brain Activity Movie functional MRI with ultra...,Increased signal changes in blood oxygen depen...,"Windischberger, C.; Gerstl, F.; Fischmeister, ...",,
10,TRACES OF HUMAN FUNCTIONAL ACTIVITY: MOMENT-TO...,Dynamic functional connectivity (dFC) measured...,"Dodero, Luca; Sona, Diego; Meskaldji, Djalel E...",,10.1109/ISBI.2016.7493507
17,Data_Sheet_1_Dynamic Effective Connectivity us...,Functional MRI (fMRI) is an indirect reflectio...,"Nag, Sayan; Uludag, Kamil",,0
27,Face Prediction from fMRI Data during Movie St...,We investigate the suitability of the multi-vo...,"Kauppi, Jukka-Pekka; Huttunen, Heikki; Korkala...",,
76,PREDICTION OF COGNITIVE SCORES BY MOVIE-WATCHI...,Brain functional connectivity has been demonst...,"Gao, Jiaxing; Li, Changhe; He, Zhibin; Wei, Ya...",,10.1109/ISBI52829.2022.9761565
77,Triangulating Multimodal Representations of Af...,,"Chang, Luke",,
78,Inter Subject Correlation of Brain Activity du...,Brain imaging using functional MRI allows us t...,"Miyapuram, Krishna Prasad; Pamnani, Ujjval; Do...",,
93,Effect of visual image features on neural acti...,"There are various elements in visual images, s...","Kato, K.; Miura, O.; Shikoda, A.; Sugawara, K....",,
109,Movie_S1.avi.,Animated movements of simple geometric shapes ...,"Osaka, Mariko; Osaka, Naoyuki; Ikeda, Takashi",,0
110,Movie_S4.avi.,Animated movements of simple geometric shapes ...,"Osaka, Mariko; Osaka, Naoyuki; Ikeda, Takashi",,0


## Exporting to PubMed txt format, readable by Zotero
To be able to import citations into Zotero (which is able to retrieve pdfs from metadata only), we have to convert our dataframe into a format that is readable by Zotero and that is easily created with Python.

Note that only the first author is exported.

In [8]:
template = np.array(["TI  - ", "AB  - ", "FAU - ","PMID- ", "AID - "])      # Typical order of the PubMed format, to be respected


with open("cleaned_merged_results.txt", "w", encoding="utf-8") as f:
    for i, row in inc_merged.iterrows():
        lines = np.char.add(template, row.to_numpy().astype("str"))
        lines[4] = lines[4]+" [doi]"    # IMORTANT, you have to add this at the end of the DOi for Zotero to recognize it as such.
        for line in lines:
            f.write(line+"\n")
        f.write("\n")
