# Literature Review: Article Fetching Process

The notebook executes the literature review process to fetch the articles with the queries that has prepared and updated frequently during literature review process. The first part of the code concentrates on fetching articles from [Google Scholar](https://scholar.google.com) using [scholarly](https://pypi.org/project/scholarly/) library. The second part of the code prepares Excel files for personal assessment.

In [1]:
# Import libraries
import pandas as pd
from scholarly import scholarly

## First Part: Fetching Articles

In this part, the query performs the following operations:
1. Implemented `fetch_articles()` function gets the query, searches publications, and adds the outcomes to an array.
2. The search queries.
3. Execute the implemented function with the queries.
4. Turn the result into `DataFrame`.
5. Save the output as Excel file.

To sum up, this process extracts the articles and prepares it for the second part.

In [20]:
# Function to fetch articles using the scholarly package
def fetch_articles(search_query):
    search_results = scholarly.search_pubs(search_query)
    articles = []
    for result in search_results:
        # Extract relevant information from each result
        bib_info = result.get('bib', {})
        article_info = {
            "title": bib_info.get('title'),
            "authors": bib_info.get('author'),
            "published_year": bib_info.get('pub_year'),
            "citations": result.get('num_citations', 0),
            "journal": bib_info.get('journal'),
            "volume": bib_info.get('volume'),
            "issue": bib_info.get('issue'),
            "pages": bib_info.get('pages'),
            "abstract": bib_info.get('abstract'),
            "doi": bib_info.get('doi'),
            "url": result.get('pub_url'),
            "publisher": bib_info.get('publisher'),
            "keywords": bib_info.get('keywords', []),
            "bib": bib_info
        }
        articles.append(article_info)
    return articles

### The search queries

In [30]:
# Search queries

# OpenSSF Scorecard Search Query
# search_query = '"OpenSSF Scorecard" (history OR "theoretical background" OR "community involvement" OR purpose OR GitHub OR usage OR maintenance OR "maintenance score" OR reliability OR "security assessment" OR "best practices" OR impact OR application OR "case studies" OR "industry applications" OR benefits OR challenges)'

# Prediction Methods Search Query
search_query = '"univariate forecasting" (machine learning OR "deep learning") (prediction OR forecasting) (parameters OR features) ("time series" OR dataset) "supervised learning"'

In [26]:
# Fetch articles based on the search query
articles_data = fetch_articles(search_query)

### Data manipulation and saving the output to Excel.

In [31]:
# Create a pandas DataFrame
df = pd.DataFrame(articles_data)
df.head()

Unnamed: 0,title,authors,published_year,citations,journal,volume,issue,pages,abstract,doi,url,publisher,keywords,bib
0,Univariate model for hour ahead multi-step sol...,"[P Gupta, R Singh]",2021,11,,,,,"Further, the problem is converted into a super...",,https://ieeexplore.ieee.org/abstract/document/...,,[],{'title': 'Univariate model for hour ahead mul...
1,Short-term daily univariate streamflow forecas...,"[EB Wegayehu, FB Muluneh]",2022,34,,,,,"As a result, machine learning models have beco...",,https://www.hindawi.com/journals/amete/2022/18...,,[],{'title': 'Short-term daily univariate streamf...
2,Enhanced neural network-based univariate time-...,"[S Namasudra, S Dhamodharavadhani, R Rathipriya]",2024,5,,,,,accuracy because of the supervised learning me...,,https://www.liebertpub.com/doi/abs/10.1089/big...,,[],{'title': 'Enhanced neural network-based univa...
3,Forecasting from Physiological Time Series Thr...,[S Masum],2019,3,,,,,-independent medical time series data as stati...,,https://www.researchgate.net/profile/Shamsul-M...,,[],{'title': 'Forecasting from Physiological Time...
4,Eeg forecasting with univariate and multivaria...,"[DK Thara, BG Premasudha, TV Murthy]",2022,3,,,,,using machine learning algorithms like support...,,https://www.igi-global.com/article/eeg-forecas...,,[],{'title': 'Eeg forecasting with univariate and...


In [28]:
# Save the DataFrame to an Excel file
df.to_excel("../03_lit_review/scholarly_articles_prediction.xlsx", index=False, engine='openpyxl')

## Second Part: Excel File Manipulation

The second part of the code gets the Excel files obtained on the first part as inputs and performs the following operation to prepare these files in a readable format:

In [21]:
import pandas as pd
import openpyxl
from openpyxl import load_workbook

# Load the Excel file
df = pd.read_excel('../03_lit_review/scholarly_articles_prediction.xlsx')

# Add a new column named "Notes"
df['Notes'] = None

# Save the DataFrame to an Excel file
df.to_excel('../03_lit_review/scholarly_articles_prediction_x.xlsx', index=False, engine='openpyxl')

# Load the Excel file
wb = load_workbook('../03_lit_review/scholarly_articles_prediction_x.xlsx')
ws = wb.active

# Set the maximum length for each column and set the text wrap property to True
for column in ws.columns:
    max_length = 0
    column = [cell for cell in column]
    for cell in column:
        try: 
            if len(str(cell.value)) > max_length:
                max_length = len(cell.value)
        except:
            pass
    adjusted_width = (max_length if max_length < 50 else 50)
    ws.column_dimensions[column[0].column_letter].width = adjusted_width
    for cell in column:
        cell.alignment = openpyxl.styles.Alignment(wrap_text=True)

# Create a table style
tab = openpyxl.worksheet.table.Table(ref=ws.dimensions, displayName="Table1", tableStyleInfo=openpyxl.worksheet.table.TableStyleInfo(
    name="TableStyleMedium9", showFirstColumn=False,
    showLastColumn=False, showRowStripes=True, showColumnStripes=False))

# Add the table style to the worksheet
ws.add_table(tab)

# Save the workbook to the same Excel file
wb.save('../03_lit_review/scholarly_articles_prediction_final.xlsx')

In [64]:
def make_unique(df, column_name):
    counts = df.groupby(column_name).cumcount() + 1
    df[column_name] = df[column_name] + '_' + counts.astype(str)
    return df

# Function to join names from string
def join_names_from_string(names_string):
    # Remove the square brackets and split the string by commas
    names_string = names_string.strip("[]")
    # Split the string into a list of names and strip extra spaces and quotes
    names_list = [name.strip().strip("'") for name in names_string.split(',') if name.strip().strip("'")]
    # Join the names with " and "
    return ' and '.join(names_list)

In [65]:
data = pd.ExcelFile('../03_lit_review/scholarly_articles_prediction_final.xlsx')
data_df = pd.read_excel(data, 'Tabelle1')
data_df

Unnamed: 0,title,authors,published_year,citations,journal,volume,issue,pages,abstract,doi,...,bib,Technique,Architecture,Included,Univariate?,Notes,Citation,author with and,Format,Cite
0,Univariate model for hour ahead multi-step sol...,"['P Gupta', 'R Singh']",2021.0,11,,,,,"Further, the problem is converted into a super...",,...,{'title': 'Univariate model for hour ahead mul...,ML,"Random Forest, SVM",Y,Y,,ml_random_forest_svm_1,P Gupta and R Singh,\protect\cite{ml_random_forest_svm_1},
1,Short-term daily univariate streamflow forecas...,"['EB Wegayehu', 'FB Muluneh']",2022.0,34,,,,,"As a result, machine learning models have beco...",,...,{'title': 'Short-term daily univariate streamf...,DL,LSTM,Y,Y,,dl_lstm_1,EB Wegayehu and FB Muluneh,\protect\cite{dl_lstm_1},
2,Forecasting from Physiological Time Series Thr...,['S Masum'],2019.0,3,,,,,-independent medical time series data as stati...,,...,{'title': 'Forecasting from Physiological Time...,ML,Supervised,Y,Y,Contains multiple usable methods.,ml_supervised_1,S Masum,\protect\cite{ml_supervised_1},
3,A dynamic factor machine learning method for m...,"['G Bontempi', 'YA Le Borgne']",2017.0,11,,,,,Multivariate time series forecasting involves ...,,...,{'title': 'A dynamic factor machine learning m...,ML,Other,Y,Y,Multi-variate multi-step ahead.,ml_other_1,G Bontempi and YA Le Borgne,\protect\cite{ml_other_1},
4,Machine learning methods do have a place in un...,"['EM Riba', 'KM Malan', 'E Mudimu']",,0,,,,,of new and established univariate forecasting ...,,...,{'title': 'Machine learning methods do have a ...,ML,Multiple,Y,Y,Univariate but multiple method.,ml_multiple_1,EM Riba and KM Malan and E Mudimu,\protect\cite{ml_multiple_1},
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Analysis factors affecting Egyptian inflation ...,['MF Abd El-Aal'],2023.0,4,,,,,algorithms outperforms univariate forecasting ...,,...,{'title': 'Analysis factors affecting Egyptian...,ML,Multiple,Y,Y,Include the graph.,ml_multiple_6,MF Abd El-Aal,\protect\cite{ml_multiple_6},
66,Forecasting Retail Client Flow with LSTMs on I...,"['P Gusmão', 'J Moreira', 'A Tomé']",2021.0,0,,,,,"Because of this, the problem of forecasting is...",,...,{'title': 'Forecasting Retail Client Flow with...,DL,LSTM,Y,Y,Include the graph.,dl_lstm_18,P Gusmão and J Moreira and A Tomé,\protect\cite{dl_lstm_18},
67,Employment forecasting using data from the Swe...,['J Wikström'],2018.0,0,,,,,statistical model used for univariate forecast...,,...,{'title': 'Employment forecasting using data f...,DL,"RNN, LSTM",Y,Y,,dl_rnn_lstm_2,J Wikström,\protect\cite{dl_rnn_lstm_2},
68,Charging Scheduling of Hybrid Energy Storage S...,"['G Erdogan', 'W Fekih Hassen']",2023.0,2,,,,,Two types of univariate forecasting were used ...,,...,{'title': 'Charging Scheduling of Hybrid Energ...,DL,"LSTM, GRU, RNN",Y,Y,Include the graph.,dl_lstm_gru_rnn_1,G Erdogan and W Fekih Hassen,\protect\cite{dl_lstm_gru_rnn_1},


In [70]:
data_df['Citation'] = data_df['Technique'].str.replace(', ', '_').str.lower() + '_' + data_df['Architecture'].str.replace(', ', '_').str.lower()
data_df['Citation'] = data_df['Citation'].str.replace(' ', '_')
data_df = make_unique(data_df, 'Citation')
data_df['author with and'] = data_df['authors'].apply(join_names_from_string)
data_df['Format'] = '\protect\cite{' + data_df['Citation'] + '}'
data_df['Cite'] = '@article{' + data_df['Citation'] + ',\n\t title = {' + data_df['title'] + '},\n\tauthor={' + data_df['author with and'] + '},\n\tyear={' + data_df['published_year'].astype(str).str[:4] + '},\n\tjournal={' + 'Undefined' + '},\n}'
data_df

Unnamed: 0,title,authors,published_year,citations,journal,volume,issue,pages,abstract,doi,...,bib,Technique,Architecture,Included,Univariate?,Notes,Citation,author with and,Format,Cite
0,Univariate model for hour ahead multi-step sol...,"['P Gupta', 'R Singh']",2021.0,11,,,,,"Further, the problem is converted into a super...",,...,{'title': 'Univariate model for hour ahead mul...,ML,"Random Forest, SVM",Y,Y,,ml_random_forest_svm_1,P Gupta and R Singh,\protect\cite{ml_random_forest_svm_1},"@article{ml_random_forest_svm_1,\n\t title = {..."
1,Short-term daily univariate streamflow forecas...,"['EB Wegayehu', 'FB Muluneh']",2022.0,34,,,,,"As a result, machine learning models have beco...",,...,{'title': 'Short-term daily univariate streamf...,DL,LSTM,Y,Y,,dl_lstm_1,EB Wegayehu and FB Muluneh,\protect\cite{dl_lstm_1},"@article{dl_lstm_1,\n\t title = {Short-term da..."
2,Forecasting from Physiological Time Series Thr...,['S Masum'],2019.0,3,,,,,-independent medical time series data as stati...,,...,{'title': 'Forecasting from Physiological Time...,ML,Supervised,Y,Y,Contains multiple usable methods.,ml_supervised_1,S Masum,\protect\cite{ml_supervised_1},"@article{ml_supervised_1,\n\t title = {Forecas..."
3,A dynamic factor machine learning method for m...,"['G Bontempi', 'YA Le Borgne']",2017.0,11,,,,,Multivariate time series forecasting involves ...,,...,{'title': 'A dynamic factor machine learning m...,ML,Other,Y,Y,Multi-variate multi-step ahead.,ml_other_1,G Bontempi and YA Le Borgne,\protect\cite{ml_other_1},"@article{ml_other_1,\n\t title = {A dynamic fa..."
4,Machine learning methods do have a place in un...,"['EM Riba', 'KM Malan', 'E Mudimu']",,0,,,,,of new and established univariate forecasting ...,,...,{'title': 'Machine learning methods do have a ...,ML,Multiple,Y,Y,Univariate but multiple method.,ml_multiple_1,EM Riba and KM Malan and E Mudimu,\protect\cite{ml_multiple_1},"@article{ml_multiple_1,\n\t title = {Machine l..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Analysis factors affecting Egyptian inflation ...,['MF Abd El-Aal'],2023.0,4,,,,,algorithms outperforms univariate forecasting ...,,...,{'title': 'Analysis factors affecting Egyptian...,ML,Multiple,Y,Y,Include the graph.,ml_multiple_6,MF Abd El-Aal,\protect\cite{ml_multiple_6},"@article{ml_multiple_6,\n\t title = {Analysis ..."
66,Forecasting Retail Client Flow with LSTMs on I...,"['P Gusmão', 'J Moreira', 'A Tomé']",2021.0,0,,,,,"Because of this, the problem of forecasting is...",,...,{'title': 'Forecasting Retail Client Flow with...,DL,LSTM,Y,Y,Include the graph.,dl_lstm_18,P Gusmão and J Moreira and A Tomé,\protect\cite{dl_lstm_18},"@article{dl_lstm_18,\n\t title = {Forecasting ..."
67,Employment forecasting using data from the Swe...,['J Wikström'],2018.0,0,,,,,statistical model used for univariate forecast...,,...,{'title': 'Employment forecasting using data f...,DL,"RNN, LSTM",Y,Y,,dl_rnn_lstm_2,J Wikström,\protect\cite{dl_rnn_lstm_2},"@article{dl_rnn_lstm_2,\n\t title = {Employmen..."
68,Charging Scheduling of Hybrid Energy Storage S...,"['G Erdogan', 'W Fekih Hassen']",2023.0,2,,,,,Two types of univariate forecasting were used ...,,...,{'title': 'Charging Scheduling of Hybrid Energ...,DL,"LSTM, GRU, RNN",Y,Y,Include the graph.,dl_lstm_gru_rnn_1,G Erdogan and W Fekih Hassen,\protect\cite{dl_lstm_gru_rnn_1},"@article{dl_lstm_gru_rnn_1,\n\t title = {Charg..."


In [71]:
data_df.to_excel('../03_lit_review/scholarly_articles_prediction_final.xlsx', sheet_name='Tabelle1', index=False)

In [10]:
data_df['Architecture'] = data_df['Architecture'].str.split(', ')
data_df_individual = data_df.explode('Architecture')
data_df_individual

Unnamed: 0,title,authors,published_year,citations,journal,volume,issue,pages,abstract,doi,...,keywords,bib,Technique,Architecture,Included,Univariate?,Notes,Citation,Format,Cite
0,Univariate model for hour ahead multi-step sol...,"['P Gupta', 'R Singh']",2021.0,11,,,,,"Further, the problem is converted into a super...",,...,[],{'title': 'Univariate model for hour ahead mul...,ML,,Y,Y,,ml_random forest_svm_1,\protect\cite{ml_random forest_svm_1},"@article{ml_random forest_svm_1, title = {Univ..."
1,Short-term daily univariate streamflow forecas...,"['EB Wegayehu', 'FB Muluneh']",2022.0,34,,,,,"As a result, machine learning models have beco...",,...,[],{'title': 'Short-term daily univariate streamf...,DL,,Y,Y,,dl_lstm_1,\protect\cite{dl_lstm_1},"@article{dl_lstm_1, title = {Short-term daily ..."
2,Forecasting from Physiological Time Series Thr...,['S Masum'],2019.0,3,,,,,-independent medical time series data as stati...,,...,[],{'title': 'Forecasting from Physiological Time...,ML,,Y,Y,Contains multiple usable methods.,ml_supervised_1,\protect\cite{ml_supervised_1},"@article{ml_supervised_1, title = {Forecasting..."
3,A dynamic factor machine learning method for m...,"['G Bontempi', 'YA Le Borgne']",2017.0,11,,,,,Multivariate time series forecasting involves ...,,...,[],{'title': 'A dynamic factor machine learning m...,ML,,Y,Y,Multi-variate multi-step ahead.,ml_other_1,\protect\cite{ml_other_1},"@article{ml_other_1, title = {A dynamic factor..."
4,Machine learning methods do have a place in un...,"['EM Riba', 'KM Malan', 'E Mudimu']",,0,,,,,of new and established univariate forecasting ...,,...,[],{'title': 'Machine learning methods do have a ...,ML,,Y,Y,Univariate but multiple method.,ml_multiple_1,\protect\cite{ml_multiple_1},"@article{ml_multiple_1, title = {Machine learn..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Analysis factors affecting Egyptian inflation ...,['MF Abd El-Aal'],2023.0,4,,,,,algorithms outperforms univariate forecasting ...,,...,[],{'title': 'Analysis factors affecting Egyptian...,ML,,Y,Y,Include the graph.,ml_multiple_6,\protect\cite{ml_multiple_6},"@article{ml_multiple_6, title = {Analysis fact..."
66,Forecasting Retail Client Flow with LSTMs on I...,"['P Gusmão', 'J Moreira', 'A Tomé']",2021.0,0,,,,,"Because of this, the problem of forecasting is...",,...,[],{'title': 'Forecasting Retail Client Flow with...,DL,,Y,Y,Include the graph.,dl_lstm_18,\protect\cite{dl_lstm_18},"@article{dl_lstm_18, title = {Forecasting Reta..."
67,Employment forecasting using data from the Swe...,['J Wikström'],2018.0,0,,,,,statistical model used for univariate forecast...,,...,[],{'title': 'Employment forecasting using data f...,DL,,Y,Y,,dl_rnn_lstm_2,\protect\cite{dl_rnn_lstm_2},"@article{dl_rnn_lstm_2, title = {Employment fo..."
68,Charging Scheduling of Hybrid Energy Storage S...,"['G Erdogan', 'W Fekih Hassen']",2023.0,2,,,,,Two types of univariate forecasting were used ...,,...,[],{'title': 'Charging Scheduling of Hybrid Energ...,DL,,Y,Y,Include the graph.,dl_lstm_gru_rnn_1,\protect\cite{dl_lstm_gru_rnn_1},"@article{dl_lstm_gru_rnn_1, title = {Charging ..."


In [4]:

# Group the individual entries
data_df_grouped = data_df_individual.groupby(['Technique', 'Architecture']).size().reset_index(name='count')
data_df_grouped


Unnamed: 0,Technique,Architecture,count
0,-,-,1
1,DL,CNN,2
2,DL,GRU,4
3,DL,LSTM,26
4,DL,MLP,1
5,DL,Multiple,4
6,DL,Other,8
7,DL,RNN,7
8,DL,RNU,1
9,DL,TCL,1


In [10]:
data_df_individual

Unnamed: 0,title,authors,published_year,citations,journal,volume,issue,pages,abstract,doi,url,publisher,keywords,bib,Technique,Architecture,Included,Univariate?,Notes
0,Univariate model for hour ahead multi-step sol...,"['P Gupta', 'R Singh']",2021.0,11,,,,,"Further, the problem is converted into a super...",,https://ieeexplore.ieee.org/abstract/document/...,,[],{'title': 'Univariate model for hour ahead mul...,ML,Random Forest,Y,Y,
0,Univariate model for hour ahead multi-step sol...,"['P Gupta', 'R Singh']",2021.0,11,,,,,"Further, the problem is converted into a super...",,https://ieeexplore.ieee.org/abstract/document/...,,[],{'title': 'Univariate model for hour ahead mul...,ML,SVM,Y,Y,
1,Short-term daily univariate streamflow forecas...,"['EB Wegayehu', 'FB Muluneh']",2022.0,34,,,,,"As a result, machine learning models have beco...",,https://www.hindawi.com/journals/amete/2022/18...,,[],{'title': 'Short-term daily univariate streamf...,DL,LSTM,Y,Y,
2,Forecasting from Physiological Time Series Thr...,['S Masum'],2019.0,3,,,,,-independent medical time series data as stati...,,https://www.researchgate.net/profile/Shamsul-M...,,[],{'title': 'Forecasting from Physiological Time...,ML,Supervised,Y,Y,Contains multiple usable methods.
3,A dynamic factor machine learning method for m...,"['G Bontempi', 'YA Le Borgne']",2017.0,11,,,,,Multivariate time series forecasting involves ...,,https://ieeexplore.ieee.org/abstract/document/...,,[],{'title': 'A dynamic factor machine learning m...,ML,Other,Y,Y,Multi-variate multi-step ahead.
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67,Employment forecasting using data from the Swe...,['J Wikström'],2018.0,0,,,,,statistical model used for univariate forecast...,,https://www.diva-portal.org/smash/record.jsf?p...,,[],{'title': 'Employment forecasting using data f...,DL,LSTM,Y,Y,
68,Charging Scheduling of Hybrid Energy Storage S...,"['G Erdogan', 'W Fekih Hassen']",2023.0,2,,,,,Two types of univariate forecasting were used ...,,https://www.mdpi.com/1996-1073/16/18/6656,,[],{'title': 'Charging Scheduling of Hybrid Energ...,DL,LSTM,Y,Y,Include the graph.
68,Charging Scheduling of Hybrid Energy Storage S...,"['G Erdogan', 'W Fekih Hassen']",2023.0,2,,,,,Two types of univariate forecasting were used ...,,https://www.mdpi.com/1996-1073/16/18/6656,,[],{'title': 'Charging Scheduling of Hybrid Energ...,DL,GRU,Y,Y,Include the graph.
68,Charging Scheduling of Hybrid Energy Storage S...,"['G Erdogan', 'W Fekih Hassen']",2023.0,2,,,,,Two types of univariate forecasting were used ...,,https://www.mdpi.com/1996-1073/16/18/6656,,[],{'title': 'Charging Scheduling of Hybrid Energ...,DL,RNN,Y,Y,Include the graph.
