# What impact has OpenAlex had so far?

## Citation analysis

Let's start by looking at the paper that OpenAlex asks researchers to cite:

> Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. ArXiv. https://arxiv.org/abs/2205.01833

In [3]:
import requests

doi = '10.48550/arXiv.2205.01833'

url = f'https://api.openalex.org/works?filter=doi:{doi}'
r = requests.get(url)
response_data = r.json()
openalex_article = response_data['results'][0]

In [4]:
print(f"Within the OpenAlex data, the OpenAlex paper has {openalex_article['cited_by_count']} (incoming) citations.")

Within the OpenAlex data, the OpenAlex paper has 3 (incoming) citations.


The number of papers citing OpenAlex seems low. Let's try Semantic Scholar's data for the same article.

In [8]:
s2_api_endpoint = "https://api.semanticscholar.org/graph/v1/paper"
fields = ['citationCount', 'citations.title', 'citations.year', 'citations.publicationDate', 'citations.citationCount', 'citations.externalIds']
params = {
    'fields': ",".join(fields)
}
r = requests.get(f"{s2_api_endpoint}/DOI:{doi}", params=params)
s2_article = r.json()

In [9]:
s2_article

{'error': 'Paper with id DOI:10.48550/arXiv.2205.01833 not found'}

In [12]:
s2_api_endpoint = "https://api.semanticscholar.org/graph/v1/paper"
fields = ['citationCount', 'citations.title', 'citations.year', 'citations.publicationDate', 'citations.citationCount', 'citations.externalIds']
params = {
    'fields': ",".join(fields)
}
arxiv_id = '2205.01833'
r = requests.get(f"{s2_api_endpoint}/ARXIV:{arxiv_id}", params=params)
s2_article = r.json()

In [14]:
print(f"Within the Semantic Scholar data, the OpenAlex paper has {s2_article['citationCount']} (incoming) citations.")

Within the Semantic Scholar data, the OpenAlex paper has 16 (incoming) citations.


In [16]:
citations_dois = [citing_article['externalIds'].get('DOI') for citing_article in s2_article['citations']]

In [17]:
citations_dois

['10.48550/arXiv.2302.02231',
 '10.48550/arXiv.2301.01502',
 '10.48550/arXiv.2210.14871',
 '10.1109/TVCG.2022.3209422',
 '10.1016/j.cosrev.2022.100531',
 '10.3389/frma.2022.1010504',
 '10.48550/arXiv.2211.04429',
 '10.48550/arXiv.2210.00356',
 '10.1108/jd-04-2022-0083',
 '10.48550/arXiv.2209.09246',
 '10.1007/978-3-031-16802-4_52',
 '10.48550/arXiv.2208.11065',
 '10.1007/s11192-022-04446-y',
 '10.5281/zenodo.6975102',
 '10.1162/qss_a_00222',
 '10.1162/qss_a_00200']

In [21]:
url = f'https://api.openalex.org/works'
citing_dois_str = "|".join(citations_dois)
params = {
    'filter': f"doi:{citing_dois_str}"
}
r = requests.get(url, params=params)
response_data = r.json()

In [25]:
for result in response_data['results']:
    print(result['doi'], result['publication_date'], len(result['referenced_works']))

https://doi.org/10.1162/qss_a_00222 2022-11-07 0
https://doi.org/10.1162/qss_a_00200 2021-09-01 20
https://doi.org/10.5281/zenodo.6975102 2022-06-28 0
https://doi.org/10.1007/s11192-022-04446-y 2022-07-15 19
https://doi.org/10.48550/arxiv.2208.11065 2022-08-23 0
https://doi.org/10.1007/978-3-031-16802-4_52 2022-01-01 5
https://doi.org/10.48550/arxiv.2209.09246 2022-09-19 0
https://doi.org/10.1109/tvcg.2022.3209422 2022-01-01 21
https://doi.org/10.1108/jd-04-2022-0083 2022-09-21 27
https://doi.org/10.48550/arxiv.2210.00356 2022-10-01 0
https://doi.org/10.48550/arxiv.2210.14871 2022-10-26 0
https://doi.org/10.48550/arxiv.2211.04429 2022-11-08 0
https://doi.org/10.3389/frma.2022.1010504 2022-11-10 17
https://doi.org/10.1016/j.cosrev.2022.100531 2023-02-01 155
https://doi.org/10.48550/arxiv.2301.01502 2023-01-04 0
https://doi.org/10.48550/arxiv.2302.02231 2023-02-04 0


In [6]:
from scholarly import scholarly, ProxyGenerator

# Set up a ProxyGenerator object to use free proxies
# This needs to be done only once per session
pg = ProxyGenerator()
pg.FreeProxies()
scholarly.use_proxy(pg)

# Now search Google Scholar from behind a proxy
search_query = scholarly.search_pubs(f'doi:{doi}')
article = next(search_query)

MaxTriesExceededException: Cannot Fetch from Google Scholar.

In [1]:
# specify endpoint
endpoint = 'works'

search_query = 'openalex'

# put the URL together
url = f'https://api.openalex.org/{endpoint}?search={search_query}'
print(f'complete URL with filters:\n{url}')

complete URL with filters:
https://api.openalex.org/works?search=openalex


In [3]:
openalex_arxiv_paper = 'W4229010617'
openalex_arxiv_paper_2 = 'W4288680697'
url = f'https://api.openalex.org/{endpoint}?filter=cites:{openalex_arxiv_paper}'
result = requests.get(url).json()

In [5]:
result['meta']

{'count': 3, 'db_response_time_ms': 59, 'page': 1, 'per_page': 25}

In [9]:
article.keys()

dict_keys(['container_type', 'source', 'bib', 'filled', 'gsrank', 'pub_url', 'author_id', 'url_scholarbib', 'url_add_sclib', 'num_citations', 'citedby_url', 'url_related_articles', 'eprint_url'])

In [10]:
article['num_citations']

19