# Sort Google Scholar
This is a jupyter envirnment where you can try the code of the repository without installing anything. The only limitation is the robot checking problem which would require selenium and manual solution of the captchas, but for trying a few keywords, it should work!

> **INSTRUCTIONS:** If this is the first time you are using a jupyter environment, you simply have to run the code blocks using the keyword `SHIFT` + `ENTER`. Make sure to update the keyword parameters when required.

SortGS has been recently included to PyPI, so the instructions here got simpler. First, let's install the package:

In [12]:
!pip install sortgs pandas plotly --quiet

In [13]:
import sortgs
import pandas as pd
import os
import plotly.express as px

Example `search_query`:

- `Large Language Models` → General search
- `"Large Language Models"` → Exact phrase search
- `Large Language Models -transformer` → Exclude specific term
- `Large Language Models author:"Geoffrey Hinton"` → Search by author
- `Large Language Models source:Nature` → Search within a specific publication
- `("Large Language Models" OR "Transformer Models") AND (GPT OR BERT)` → Boolean search
- `intitle:"Large Language Models"` → Search in the title only

In [37]:
# Main query
search_query = '("software as a service" OR "software-as-a-service") AND ("latin america" OR "latam" OR "south america" OR "central america")' # @param {type:"string"}
search_query = 'software-as-a-service'
search_query = 'software as a service'

# Expanded form with extra parameters
sortby = "cit/year"  # @param ["Citations", "cit/year"] {type:"string"}
nresults = 100  # @param {type:"number"}
startyear = '2021'  # @param {type:"string"}
endyear = None  # @param {type:"string"}
langfilter = None  # @param ["None", "zh-CN", "zh-TW", "nl", "en", "fr", "de", "it", "ja", "ko", "pl", "pt", "es", "tr"] {type:"string"}

# Convert the langfilter to a list if it's not None
if langfilter and langfilter != "None":
    langfilter = [langfilter]
else:
    langfilter = None  # No language filter applied if "None" is selected

# Constructing the base command
cmd = f"sortgs '{search_query}' --sortby '{sortby}' --nresults {nresults}"

if startyear:
    cmd += f" --startyear {startyear}"

if endyear:
    cmd += f" --endyear {endyear}"

if langfilter:
    lang_str = ' '.join(langfilter)
    cmd += f" --langfilter {lang_str}"

# Output the constructed command for review
print("Constructed command:", cmd)


Constructed command: sortgs 'software as a service' --sortby 'cit/year' --nresults 100 --startyear 2021


In [38]:
!{cmd}

Running with the following parameters:
Keyword: software as a service, Number of results: 100, Save database: True, Path: /Users/sotapanna/Sync/Software/sort-google-scholar/jupyter, Sort by: cit/year, Permitted Languages: All, Plot results: False, Start year: 2021, End year: 2025, Debug: False
Loading next 10 results
Loading next 20 results
Loading next 30 results
Loading next 40 results
Loading next 50 results
Loading next 60 results
Loading next 70 results
Loading next 80 results
Loading next 90 results
Loading next 100 results
                                                Author  ... cit/year
Rank                                                    ...         
62                       F Saputra, B Cut, F Nilamsari  ...       71
42                           CM Mohammed, SRM Zeebaree  ...       39
51        M Kohtamäki, R Rabetino, S Einola, V Parida…  ...       27
46    T Huikkola, M Kohtamäki, R Rabetino, H Makkonen…  ...       17
37                                    CA Cruz, F M

> _**NOTE:** It is normal to get some warnings, for example year not found or author not found. However, if you get the robot checking warning, then it might not work anymore in the IP that you have. You can try going in 'Runtime' > 'Disconnect and delete runtime' to get a new IP. If the problem persists, then you will have to run locally using selenium and solve the captchas manually. Make sure to avoid running this code too often to avoid the robot checking problem._

Next, you will see that a csv file with the name of the keyword was created.

In [40]:
csv_filename = search_query.replace(' ', '_')+'.csv'
df = pd.read_csv(csv_filename)
pd.set_option('display.max_colwidth', None)  # Set to None for full width
df.head(20)

Unnamed: 0,Rank,Author,Title,Citations,Year,Publisher,Venue,Source,cit/year
0,62,"F Saputra, B Cut, F Nilamsari",Analisis Perbandingan Tiga Software Terhadap Pengukuran Quality Of service (QoS) Pada Pengukuran Jaringan Wireless Internet,212,2023,jurnal.utu.ac.id,Jurnal Teknologi Informasi,http://jurnal.utu.ac.id/JTI/article/view/7275,71
1,42,"CM Mohammed, SRM Zeebaree","Sufficient comparison among cloud computing services: IaaS, PaaS, and SaaS: A review",197,2021,ideas.repec.org,International Journal of Science …,https://ideas.repec.org/a/aif/journl/v5y2021i2p17-30.html,39
2,51,"M Kohtamäki, R Rabetino, S Einola, V Parida…",Unfolding the digital servitization path from products to product-service-software systems: Practicing change through intentional narratives,134,2021,Elsevier,Journal of Business …,https://www.sciencedirect.com/science/article/pii/S0148296321005865,27
3,46,"T Huikkola, M Kohtamäki, R Rabetino, H Makkonen…","Overcoming the challenges of smart solution development: Co-alignment of processes, routines, and practices to manage product, service, and software …",67,2022,Elsevier,Technovation,https://www.sciencedirect.com/science/article/pii/S0166497221001632,17
4,37,"CA Cruz, F Matos",ESG maturity: A software framework for the challenges of ESG data in investment,46,2023,mdpi.com,Sustainability,https://www.mdpi.com/2071-1050/15/3/2610,15
5,40,"DMV Rao, SS Vellela, K Basha Sk…",Systematic Review on Software Application Under-distributed Denial of Service Attacks for Group Websites,43,2023,papers.ssrn.com,Dogo Rangsang …,https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4502464,14
6,39,"WG Gadallah, HM Ibrahim, NM Omar",A deep learning technique to detect distributed denial of service attacks in software-defined networks,28,2024,Elsevier,Computers & Security,https://www.sciencedirect.com/science/article/pii/S0167404823004984,14
7,31,"J Mero, M Leinonen, H Makkonen…",Agile logic for SaaS implementation: Capitalizing on marketing automation software in a start-up,56,2022,Elsevier,Journal of business …,https://www.sciencedirect.com/science/article/pii/S0148296322002545,14
8,22,"M Abdellatif, A Shatnawi, H Mili, N Moha…",A taxonomy of service identification approaches for legacy software systems modernization,71,2021,Elsevier,… of Systems and Software,https://www.sciencedirect.com/science/article/pii/S0164121220302582,14
9,48,"NS Musa, NM Mirza, SH Rafique, AM Abdallah…",machine learning and deep learning techniques for distributed denial of service anomaly detection in software defined networks—current research solutions,23,2024,ieeexplore.ieee.org,IEEE …,https://ieeexplore.ieee.org/abstract/document/10418146/,12


In [11]:
# @title Rank vs Citations
view = df.reset_index().copy()

# Function to truncate and add line breaks to long titles
def shorten_title(title, max_length=60):
    words = title.split()
    shortened_lines = []
    current_line = []

    # Add words to the current line until max_length is exceeded
    for word in words:
        if len(' '.join(current_line + [word])) <= max_length:
            current_line.append(word)
        else:
            shortened_lines.append(' '.join(current_line))
            current_line = [word]

    # Add the last line
    if current_line:
        shortened_lines.append(' '.join(current_line))

    return '<br>'.join(shortened_lines)


# Apply this function to the 'Title' column and create a new column for the shortened titles
view['Short_Title'] = view['Title'].apply(shorten_title)

# Now use 'Short_Title' for hover_name
fig = px.scatter(view,
                 x='Rank',
                 y='Citations',
                 title='Number of Citations vs Google Scholar Rank',
                 hover_name='Short_Title',
                 hover_data=['Rank', 'Author', 'Citations', 'Year', 'Publisher', 'Venue', 'cit/year']
)
fig.show()

In [15]:
# Generate .bib filename based on the CSV filename
bib_filename = os.path.splitext(csv_filename)[0] + ".bib"

# Function to convert DataFrame to BibTeX
def df_to_bib(df, filename):
    with open(filename, "w", encoding="utf-8") as f:
        for _, row in df.iterrows():
            entry_type = "article"  # Assuming all are journal articles
            citation_key = f"{row['Author'].split(',')[0].split()[0]}{row['Year']}"  # First author + year
            entry = f"""@{entry_type}{{{citation_key},
    author = {{{row['Author']}}},
    title = {{{row['Title']}}},
    year = {{{row['Year']}}},
    publisher = {{{row['Publisher']}}},
    journal = {{{row['Venue']}}},
    url = {{{row['Source']}}}
}}\n\n"""
            f.write(entry)

# Export DataFrame as .bib with the same name as the CSV
df_to_bib(df.head(10), bib_filename)

print(f"BibTeX file exported successfully as {bib_filename}.")

BibTeX file exported successfully as software_as_a_service.bib.
