# Scopus: Data collection

### Useful links:
- Scopus [[Source](https://www.scopus.com/)]
- Scopus API (Application Programming Interface) Documentation [[Source 1](https://dev.elsevier.com/technical_documentation.html); [Source 2](https://dev.elsevier.com/api_docs.html)]
- The "pybliometrics" package in the Python programming language [[Source 1](https://github.com/pybliometrics-dev/pybliometrics); [Source 2](https://pybliometrics.readthedocs.io/en/stable/index.html)]

### Packages, classes and functions used:
**pandas package**
- .DataFrame()
- .to_excel()
- .iterrows()
- .at[]

**pybliometrics package**
- .ScopusSearch()
- .get_results_size()
- .AbstractRetrieval()
- .CitationOverview()

In [1]:
## 0 ## Installing and importing modules/libraries
#!pip install pandas # To work with dataframes
#!pip install pybliometrics # To work with api.elsevier.com

import pandas
from pybliometrics.scopus import ScopusSearch
#from pybliometrics.scopus import AbstractRetrieval
#from pybliometrics.scopus import CitationOverview

# Specifying the API key for successful package importing
# "a6b49bc00cce366026d4cfd9396ac572" - At the moment (19/01/24), the quota limit has been spent
# "c4b35f1579a33db64d94f97c723a60d8"

#help()

In [2]:
## 1 ## Request formation with all the necessary parameters

# Specifying the API key
api_key = "c4b35f1579a33db64d94f97c723a60d8"

# Specifying the query
query = '( TITLE-ABS-KEY ( "environment* practice*" OR "ecolog* practice*" OR "eco-practice*" OR "environment* behav*" OR "ecolog* behav*" OR "eco-behav*" ) AND PUBYEAR > 2012 AND PUBYEAR < 2024 ) AND ( sociology ) AND DOCTYPE("ar") AND SUBJAREA("SOCI") AND LANGUAGE("English")'

The query string written above is provided in a slightly different form, since all fields except “LIMIT-TO()” work for the class being used.

**The formula of the search query from the web version of Scopus:** ( TITLE-ABS-KEY ( "environment* practice*" OR "ecolog* practice*" OR "eco-practice*" OR "environment* behav*" OR "ecolog* behav*" OR "eco-behav*" ) AND PUBYEAR > 2012 AND PUBYEAR < 2024 ) AND ( sociology ) AND ( LIMIT-TO ( DOCTYPE , "ar" ) ) AND ( LIMIT-TO ( SUBJAREA , "SOCI" ) ) AND ( LIMIT-TO ( LANGUAGE , "English" ) )

**Explanation for the query parameters used (for more information, see [here](https://www.scopus.com/search/form.uri?display=advanced)):**
1. Field codes:
    - TITLE-ABS-KEY - A combined field that searches abstracts, keywords, and document titles.
    - PUBYEAR - A numeric field indicating the year of publication.
    - DOCTYPE - Limits your search to document types - article (ar), review (re), book chapter(ch), etc.
    - LANGUAGE - The language in which the original document was written.
    - SUBJAREA - A search field which returns documents related to a specific field of science.
2. Operators:
    - AND - Use AND when you want your results to include all terms and the terms may be far apart.
    - OR - Use OR when your results must include one or more of the terms (such as synonyms, alternate spellings, or abbreviations). Documents that contain any of the words will be found.
3. Wildcards:
    - Asterisk (*) - Replace multiple characters anywhere in a word. The asterisk replaces 0 or more characters, so it can be used to find any number or to indicate a character that may or may not be present.

In [3]:
## 2 ## Executing the request and saving all the collected data on request to the "response" object
response = ScopusSearch(api_key = api_key
                        , query = query
                        , view = "STANDARD"
                        , verbose = True
                        , subscriber = False)

In [4]:
## 3 ## Determining the number of publications found on request
response.get_results_size()

678

In [5]:
## 4 ## Creating a dataframe in which all information about the collected publications will be saved
all_publications = pandas.DataFrame(response.results)
all_publications

Unnamed: 0,eid,doi,pii,pubmed_id,title,subtype,subtypeDescription,creator,afid,affilname,...,pageRange,description,authkeywords,citedby_count,openaccess,freetoread,freetoreadLabel,fund_acr,fund_no,fund_sponsor
0,2-s2.0-85180566359,10.3390/bs13120966,,,Pro-Environmental Behavior and Climate Change ...,ar,Article,Leite Â.,,Universidade Católica Portuguesa,...,,,,0,1,repositoryam,Green,,,
1,2-s2.0-85178364143,10.1093/jcr/ucad016,,,Cyclical Time Is Greener: The Impact of Tempor...,ar,Article,Xu L.,,Wuhan University,...,722-741,,,1,0,,,,,
2,2-s2.0-85168088015,10.1007/s13412-023-00850-9,,,Using the social identity model of pro-environ...,ar,Article,Johnson N.,,Purdue University,...,587-601,,,0,0,,,,,
3,2-s2.0-85158156830,10.1057/s41599-023-01682-2,,,The role of peers in promoting energy conserva...,ar,Article,Lin B.,,Xiamen University,...,,,,0,1,publisherfullgold,Gold,,,
4,2-s2.0-85153196714,10.1002/bse.3428,,,The impact of a proactive environmental strate...,ar,Article,Galbreath J.,,The Faculty of Business and Law,...,5420-5434,,,3,1,publisherhybridgold,Hybrid Gold,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
673,2-s2.0-84884678671,10.1177/0162243913495924,,,Unheeded Science: Taking Precaution out of Tox...,ar,Article,Hoffman K.,,Universidad de Puerto Rico,...,829-850,,,6,0,,,,,
674,2-s2.0-84884519014,10.1108/JEA-04-2012-0049,,,The relationship between transformational lead...,ar,Article,Keung E.K.,,,...,836-854,,,41,0,,,,,
675,2-s2.0-84879053585,10.1080/13504622.2012.695013,,,Use of self-determination theory to support ba...,ar,Article,Karaarslan G.,,Aǧrı İbrahim Çeçen Üniversitesi;Middle East Te...,...,342-369,,,10,0,,,,,
676,2-s2.0-84874506315,10.1177/0162243912470726,,,Justice as Measure of Nongovernmental Organiza...,ar,Article,Allen B.,,Virginia Polytechnic Institute and State Unive...,...,224-249,,,14,0,,,,,


In [6]:
## 5 ## Just in case, saving the database in its original form in an Excel file format called "All_publications_Scopus.xlsx"
all_publications.to_excel("All_publications_Scopus.xlsx")

In [7]:
## 6 ## Additional collection of abstracts, keywords and all authors for the found publications
#for index, row in all_publications.iterrows():
    #scopus_id = row["eid"]
    #try:
        #publication_info = AbstractRetrieval(scopus_id, view = "FULL")
        #all_publications.at[index, "Abstract"] = publication_info.abstract
        #all_publications.at[index, "Keywords"] = publication_info.keywords
    #except Exception as e:
        #print(f"Error retrieving information for Scopus ID {scopus_id}: {str(e)}")

#all_publications

In [8]:
## 7 ## Additional collection of citation information for the found publications
#for index, row in all_publications.iterrows():
    #scopus_id = row["eid"]
    #try:
        #citation_info = CitationOverview(scopus_id)
        #all_publications.at[index, "Citations"] = citation_info.total_citations
        #all_publications.at[index, "Citing_Articles"] = citation_info.citing_articles
    #except Exception as e:
        #print(f"Error retrieving information for Scopus ID {scopus_id}: {str(e)}")

#all_publications

The code in chunks 6-7 will not work, since full authorization is required, which we, as students of the University of Bologna, do not have, so the remaining necessary characteristics will be collected manually.