##### This notebook can be used to load the abstract screen results (extracted from abstrackr) and filter out the excluded publications. The publications included by consensus (after conflict resolution) are formatted into a PUBMED ID list for import to SRDR+.

##### The abstract screen results are extracted in CSV format. 
##### We will exclude the excluded publications and save a list of PUBMED IDs as TXT file.

Convert CSV to XLSX:

In [14]:
# import pandas as pd

# # Define the file path
# csv_file =  "/home/kivi/Downloads/abstrackr_consensus_090424.csv"
# df = pd.read_csv(csv_file)

# # Save the DataFrame to an XLSX file
# xlsx_file = "/home/kivi/Downloads/epirev_extracted_data_090424.xlsx"
# df.to_excel(xlsx_file, index=False)

Convert XLSX to CSV:

In [15]:
# import pandas as pd

# # Load the Excel file into a DataFrame
# xlsx_file = "/home/kivi/Downloads/epirev_extracted_data_090424.xlsx"
# df = pd.read_excel(xlsx_file)

# # Save the DataFrame to a CSV file
# csv_file =  "/home/kivi/Downloads/abstrackr_consensus_090424.csv"
# df.to_csv(csv_file, index=False)

In [47]:
import pandas as pd

# Step 1: Read the CSV file
csv_consensus_file_path = "/home/kivi/Downloads/abstrackr_consensus_090424.csv"
df_consensus = pd.read_csv(csv_consensus_file_path)

# Step 2: Filter out rows with consensus == -1
filtered_df = df_consensus[df_consensus['consensus'] == "1"]
print(len(filtered_df))

# Print the columns of the DataFrame
print("Column names:")
for column in df_consensus.columns: print(column)

# Step 3: Save the "(internal) id" column to a TXT file
txt_file_path = "/home/kivi/Downloads/abstrackr_internalid_consensus_090424.txt"
with open(txt_file_path, 'w') as txt_file:
    for idx, row in filtered_df.iterrows():
        txt_file.write(f"({row['(internal) id']},{row['pubmed id']})\n")

txt_file.close()

163
Column names:
(internal) id
(source) id
pubmed id
abstract
title
journal
authors
consensus
labeled_at
kivankovic
labeled_at.1
alessandro.principe
labeled_at.2


In [45]:
# Step 1: Read the CSV file
csv_karla_file_path = "/home/kivi/Downloads/abstrackr_karla_110124.csv"
df_karla = pd.read_csv(csv_karla_file_path)

# Step 2: Filter out rows with consensus == -1
filtered_df = df_karla[df_karla['kivankovic'] == 1.0]
print(len(filtered_df))

# Print the columns of the DataFrame
print("Column names:")
for column in df_karla.columns: print(column)

# Step 3: Save the "(internal) id" column to a TXT file
txt_file_path = "/home/kivi/Downloads/abstrackr_internalid_karla_110124.txt"
with open(txt_file_path, 'w') as txt_file:
    for idx, row in df_karla.iterrows():
        txt_file.write(f"({row['(internal) id']},{int(row['pubmed id'])})\n")

txt_file.close()

182
Column names:
(internal) id
(source) id
pubmed id
keywords
abstract
title
journal
authors
tags
consensus
labeled_at
kivankovic
labeled_at.1


##### Now we will load the already reviewed list of publications (as internal ID list) and compare it to the consensus list.

In [43]:
# Step 1: Read the TXT files into lists
karla_file_path = '/home/kivi/Downloads/abstrackr_internalid_karla_110124.txt'
consensus_file_path = '/home/kivi/Downloads/abstrackr_internalid_consensus_090424.txt'

with open(karla_file_path, 'r') as karla_file:
    karla_ids = karla_file.read().splitlines()

with open(consensus_file_path, 'r') as consensus_file:
    consensus_ids = consensus_file.read().splitlines()

print("Karla's screen:", len(karla_ids))
print("Included publication after conflict resolution:", len(consensus_ids))
# Step 2: Identify PubMed IDs present in the first file but not in the second file
unique_ids = [id for id in consensus_ids if id not in karla_ids]

print("Total conflicts:", len(unique_ids))

print("Studies in conflicts with unavailable PUBMED ID:\n", [p.split(",")[0][1::] for p in unique_ids if p.split(",")[1][:-1]=="0"])

# Step 3: Save the unique PubMed IDs to a new TXT file
# output_file_path = '/home/kivi/Downloads/unreviewed_pubmed_ids.txt'
# with open(output_file_path, 'w') as output_file:
#     for pubmed_id in unique_pubmed_ids:
#         if len(pubmed_id)<=8: #make sure that it is PUBMED ID ant not other text
#             output_file.write(pubmed_id + '\n')
# 
# output_file.close()

# with open(output_file_path, 'r') as output_file:
#     unreviwed_pubmed_ids = list(output_file.read().splitlines())
#     print("Unreviewed, included from Alessandro's screen:", len(unreviwed_pubmed_ids))
#     print("The PUBMED IDs to import to SRDR+:\n", unreviwed_pubmed_ids)

karla_file.close()
consensus_file.close()
# output_file.close()


Karla's screen: 182
Included publication after conflict resolution: 163
Total conflicts: 41
Studies with unavailable PUBMED ID:
 ['39471769', '39471863', '39472027', '39472076', '39472090', '39472093', '39472141', '39472212', '39472229', '39472240']
['(39470572,30166056)', '(39470576,37555141)', '(39470592,31785422)', '(39470596,29067832)', '(39470605,29523391)', '(39470611,35260657)', '(39470638,32589284)', '(39470668,30508033)', '(39470669,37728414)', '(39470691,37652703)', '(39470760,28166392)', '(39470799,34817446)', '(39470826,36088217)', '(39470870,35240426)', '(39470924,37480785)', '(39470938,33960712)', '(39471099,34991017)', '(39471112,37546108)', '(39471171,31491812)', '(39471244,36381989)', '(39471267,35774185)', '(39471285,28782373)', '(39471286,34191730)', '(39471308,31756595)', '(39471333,37064531)', '(39471334,36672052)', '(39471526,36696482)', '(39471577,37002979)', '(39471631,33972159)', '(39471655,34891320)', '(39471690,31783358)', '(39471769,0)', '(39471863,0)', '(39

##### We have 41 new unreviewed publications. However, some publications from Karla's list are to be excluded.
##### The publications that must be excluded from Karla's list will be excluded from the extracted data XLSX, before the analysis of the data.
##### We will import the new  publications into the SRDR+ project manually, based on PUBMED ID.
##### For the publications with unknown PUBMED ID, we will find them via title.

Below, we fetch the titles from the original CVS file extracted from abstrackr, based on internal IDs of publications with unknown PUBMED ID.

In [53]:

# Define the list of IDs
id_list = [p.split(",")[0][1::] for p in unique_ids if p.split(",")[1][:-1]=="0"]

# Filter the DataFrame based on the IDs in the "internal id" column
filtered_df_consensus = df_consensus[df_consensus['(internal) id'].astype(str).isin(id_list)]

# Extract the values from the "title" column corresponding to the filtered IDs
title_values = filtered_df_consensus['title'].tolist()

# Print the list of title values
for title in title_values: print(title, "\n")


Source localization of epileptic spikes using Multiple Sparse Priors 

Quantitative electrocorticographic biomarkers of clinical outcomes in mesial temporal lobe epileptic patients treated with the RNS? system 

Correlations between interictal extratemporal spikes and clinical features, imaging characteristics, and surgical outcomes in patients with mesial temporal lobe epilepsy 

Metabolic Brain Network and Surgical Outcome in Temporal Lobe Epilepsy: A Graph Theoretical Study Based on 18F-fluorodeoxyglucose PET 

Ictal onset patterns of subdural intracranial electroencephalogram in children: How helpful for predicting epilepsy surgery outcome? 

Betweenness centrality of intracranial electroencephalography networks and surgical epilepsy outcome 

The delta between postoperative seizure freedom and persistence: Automatically detected focal slow waves after epilepsy surgery 

Associated factors with stimulation induced seizures and the relevance with surgical outcomes 

Detection of pat

Check duplicates:

In [17]:
# Read the CSV file
duplicates_file_path = "/home/kivi/Downloads/duplicated.csv"
df_dups = pd.read_csv(duplicates_file_path)

# Step 2: Filter out rows with consensus == -1
dup = df_dups[df_dups['tags'] == "duplicate"]
possible_dup = df_dups[df_dups['tags'] == "possible duplicate"]

print(len(possible_dup))
print(len(dup))

print("Total duplicates:", len(possible_dup)+len(dup))

5
142
Total duplicates: 147
