### This code performs a search for scientific articles related to the topic of "p-graph" or "process graph theory" using the Scopus API. Alternatively, it can import RIS files from a specified folder to create a Pandas dataframe containing information about the articles.
1. The ScopusSearch function from the pybliometrics.scopus module is used to perform the search. The 'view' parameter can be used to set the number of headers to be retrieved, and the search query is specified as a string in the function call.
2. The RIS files import process involves setting the folder path, obtaining a list of RIS files in the folder, creating an empty list to hold the dictionaries for each article, and then iterating through each file, converting it into a dictionary, and appending it to the list of dictionaries. A Pandas dataframe is then created from this list of dictionaries.
3. Once the SLR results dataframe is created, the code can extract information such as the dimensions and list of headers using the shape and columns.tolist() functions, respectively. It can also write the data to an Excel file for easier readability using the to_excel function from the pandas module.
4. An optional step involves sorting the dataframe by keywords defined based on previous knowledge. This is done by creating a new filtered dataframe using the str.contains function to search for specific keywords in the title, abstract, and keywords fields. The filtered dataframe is also written to an Excel file for easier readability.

In [1]:
import os
import pandas as pd
import pybliometrics
import rispy

In [11]:
# Scopus search -> Performs a query to search for articles
# Any search that works in the Scopus website's Advanced Search function can be recreated.
# There are two exceptions: LIMIT-TO() -> only affects the display of search results | INDEXTERMS()

from pybliometrics.scopus import ScopusSearch
search = ScopusSearch('TITLE ("p-graph" OR "p graph" OR "process graph theory") OR KEY("p-graph")', view="COMPLETE")
# Within the function, the "view" parameter can be used to set the number of headers (Complete contains 36 variables). This must match our own database or at least the imported data.
# Attention: Scopus API functions may require a registered licence, including a generated API key (https://dev.elsevier.com/). Only 5000 articles can be queried in the free version, unlimited for paying users.

In [3]:
# Alternatively, import .RIS files exported from the scientific database (e.g. Scopus, WoS) from a specified folder (file path) to a Pandas dataframe (df)
# set the path to the folder containing the RIS files
folder_path = 'C:/Users/<filepath>'

# get a list of all the .RIS files in the folder
ris_files = [f for f in os.listdir(folder_path) if f.endswith('.ris')]

# create an empty list to hold the dictionaries for each article
articles = []

# loop through each RIS file, import it into a dictionary, and append it to the list of dictionaries
for file in ris_files:
    file_path = os.path.join(folder_path, file)
    with open(file_path, "r", encoding="UTF-8") as f:
        entries = rispy.load(f)
        for entry in entries:
            articles.append(entry)

# create a pandas dataframe from the list of dictionaries
slr_results = pd.DataFrame(articles)

print("Successfully imported all RIS files!")

Successfully imported all RIS files!


In [4]:
# Print the first and last items of the df for illustrative purpose
print(slr_results)

    type_of_reference                                              title  \
0                JOUR  Addressing supply uncertainties using multi-pe...   
1                JOUR  Synthesis of multiperiod heat exchanger networ...   
2                JOUR  Optimisation of heat distribution system by us...   
3                JOUR  General formulation of resilience for designin...   
4                JOUR  P-graph optimization of energy crisis response...   
..                ...                                                ...   
289              JOUR  Applications of P-graph to Carbon Management: ...   
290              JOUR  P-graph Attainable Region Technique (PART) for...   
291              JOUR  Framework to embed machine learning algorithms...   
292              JOUR  Enabling technology models with nonlinearities...   
293              JOUR  Synthesis of mass exchange network using proce...   

                                               authors  \
0    [Lo, S.L.Y., Lim, C.H., 

In [9]:
# Extract information of DF - (1) dimensions of DF; (2) list of headers 
print(slr_results.shape)
print(slr_results.columns.tolist())

(294, 32)
['type_of_reference', 'title', 'authors', 'secondary_title', 'abstract', 'date', 'year', 'doi', 'volume', 'alternate_title1', 'language', 'issn', 'url', 'name_of_database', 'notes', 'keywords', 'number', 'start_page', 'end_page', 'publisher', 'secondary_authors', 'custom3', 'access_date', 'subsidiary_authors', 'tertiary_title', 'tertiary_authors', 'custom1', 'accession_number', 'database_provider', 'place_published', 'unknown_tag', 'short_title']


In [10]:
# Write to Excel for easier readability
excel_writer = pd.ExcelWriter('SLR_Library.xlsx')
slr_results.to_excel(excel_writer)
excel_writer.save()

Optional step: Sorting the SLR_results dataframe by keywords defined based on previous knowledge. Sign "|" can be used to define several keywords as sort parameters in the title, abstract, and keywords (or other) fields. 

In [5]:
import numpy as np

slr_results_filter1 = slr_results[
slr_results['title'].str.contains('keyword1|keyword2|keyword3', case=False, na=False) |
slr_results['abstract'].str.contains('keyword1|keyword2|keyword3', case=False, na=False) |
slr_results['keywords'].str.contains('keyword1|keyword2|keyword3', case=False, na=False)
]

print(slr_results_filter1)

# Writing to Excel for easier readability
excel_writer = pd.ExcelWriter('slr_Library_filter1.xlsx')
slr_results_filter1.to_excel(excel_writer)
excel_writer.save()

Empty DataFrame
Columns: [type_of_reference, title, authors, secondary_title, abstract, date, year, doi, volume, alternate_title1, language, issn, url, name_of_database, notes, keywords, number, start_page, end_page, publisher, secondary_authors, custom3, access_date, subsidiary_authors, tertiary_title, tertiary_authors, custom1, accession_number, database_provider, place_published, unknown_tag, short_title]
Index: []

[0 rows x 32 columns]
