# OJS Indexing

This notebook explores how many OJS journals are indexed by [Dimensions](https://www.dimensions.ai/).

## Load the Data

We can find the latest OJS data from:

> Khanna, Saurabh; Raoni, Jonas; Smecher, Alec; Alperin, Juan Pablo; Ball, Jon; Willinsky, John, 2024, "Details of publications using software by the Public Knowledge Project", https://doi.org/10.7910/DVN/OCZNVY, Harvard Dataverse, V4, UNF:6:gZEARPVQ7u+ewxiX1pWVBQ==

This was downloaded and placed in `data/beacon.csv` which we can read in with Pandas.

In [72]:
import pandas

beacon = pandas.read_csv('data/beacon.csv')
beacon

Unnamed: 0,oai_url,application,version,earliest_datestamp,repository_name,set_spec,context_name,stats_id,total_record_count,issn,...,last_completed_update,first_beacon,last_beacon,last_oai_response,unresponsive_endpoint,unresponsive_context,record_count_2020,record_count_2021,record_count_2022,record_count_2023
0,https://iimmun.ru/index/oai,ojs,,,,iimm,Russian Journal of Infection and Immunity,60252fb03e031,987,2220-7619\n2313-7398,...,2024-01-08 04:41:50,2022-04-12 03:52:25,2024-01-10 13:45:07,,0,0,117,78,113,92
1,https://wisdomperiodical.miopap.aspu.am/index....,ojs,,,,wisdom,WISDOM,5853f7a1edec3,399,1829-3824\n2738-2753,...,2022-05-26 04:02:05,2022-05-25 01:01:26,2022-06-22 02:35:09,,0,0,53,140,21,0
2,https://journals.rcsi.science/index/oai,ojs,,,,2221-7185,CardioSomatics,54b7a80a220b2,63,2221-7185\n2658-5707,...,2023-12-27 07:19:41,2023-04-25 08:37:54,2024-01-10 12:59:55,,0,0,0,0,8,38
3,https://journals.rcsi.science/index/oai,ojs,,,,2712-7672,Consortium Psychiatricum,54b7a80a220b2,34,2712-7672\n2713-2919,...,2023-12-27 07:19:41,2023-04-25 08:37:54,2024-01-10 12:59:55,,0,0,0,0,8,25
4,https://journals.rcsi.science/index/oai,ojs,,,,DD,Digital Diagnostics,54b7a80a220b2,89,2712-8490\n2712-8962,...,2023-12-27 07:19:41,2023-04-25 08:37:54,2024-01-10 12:59:55,,0,0,0,1,21,66
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63869,https://jxiv.jst.go.jp/index.php/index/oai,ops,3.3.0.8,2022-03-24 01:21:10,Jxivサイト,jxiv,"Jxiv, JSTプレプリントサーバ",6212d9f2b316d,179,,...,2023-10-18 04:11:15,2022-02-20 16:17:02,2023-12-21 06:49:39,2023-10-18 04:11:13,0,0,0,0,81,98
63870,https://ub31.uni-tuebingen.de/ops-3.3.0-9/inde...,ops,3.3.0.9,2021-01-25 12:48:44,OPS1,dkj,Die Kirchen und das Judentum - Dokumente,5fbf8789b86a2,549,,...,2024-01-06 04:37:26,2022-05-08 23:49:30,2022-06-07 00:07:47,2024-01-06 04:37:15,0,0,28,31,48,6
63871,https://curriculum-theologiae.de/ops/index.php...,ops,3.3.0.9,2021-01-25 12:48:44,OPS1,ct,Curriculum Theologiae,5fbf8789b86a2,84,,...,2024-01-06 05:49:41,2022-06-09 01:20:02,2023-07-14 08:32:43,2024-01-06 05:49:30,0,0,0,4,9,37
63872,https://ojs.uap.bo/index.php/index/oai,ops,3.4.0.3,2023-08-30 04:53:18,Revista Cientifica UAP,rdceyf,Revista DICyT Ciencias Economicas y Financieras,,11,2957-689X,...,2023-09-13 04:45:40,2023-08-02 22:55:09,2023-12-15 15:57:21,2023-09-04 04:44:02,0,0,0,0,0,11


We're going to be querying Dimensions by ISSN, so we'll have to remove any titles that lack an ISSN. This drops 22,566 titles and significantly shapes the results.

In [73]:
beacon = beacon[beacon.issn.notna()].copy()
beacon

Unnamed: 0,oai_url,application,version,earliest_datestamp,repository_name,set_spec,context_name,stats_id,total_record_count,issn,...,last_completed_update,first_beacon,last_beacon,last_oai_response,unresponsive_endpoint,unresponsive_context,record_count_2020,record_count_2021,record_count_2022,record_count_2023
0,https://iimmun.ru/index/oai,ojs,,,,iimm,Russian Journal of Infection and Immunity,60252fb03e031,987,2220-7619\n2313-7398,...,2024-01-08 04:41:50,2022-04-12 03:52:25,2024-01-10 13:45:07,,0,0,117,78,113,92
1,https://wisdomperiodical.miopap.aspu.am/index....,ojs,,,,wisdom,WISDOM,5853f7a1edec3,399,1829-3824\n2738-2753,...,2022-05-26 04:02:05,2022-05-25 01:01:26,2022-06-22 02:35:09,,0,0,53,140,21,0
2,https://journals.rcsi.science/index/oai,ojs,,,,2221-7185,CardioSomatics,54b7a80a220b2,63,2221-7185\n2658-5707,...,2023-12-27 07:19:41,2023-04-25 08:37:54,2024-01-10 12:59:55,,0,0,0,0,8,38
3,https://journals.rcsi.science/index/oai,ojs,,,,2712-7672,Consortium Psychiatricum,54b7a80a220b2,34,2712-7672\n2713-2919,...,2023-12-27 07:19:41,2023-04-25 08:37:54,2024-01-10 12:59:55,,0,0,0,0,8,25
4,https://journals.rcsi.science/index/oai,ojs,,,,DD,Digital Diagnostics,54b7a80a220b2,89,2712-8490\n2712-8962,...,2023-12-27 07:19:41,2023-04-25 08:37:54,2024-01-10 12:59:55,,0,0,0,1,21,66
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63401,https://jecs.pl/index.php/index/oai,ojs,3.4.0.4,2017-03-15 20:41:45,E-journals,jecs,Journal of Education Culture and Society,56b6964ee194e,703,2081-1640,...,2024-01-04 06:44:07,2020-05-21 00:29:17,2023-12-31 08:18:58,2024-01-04 06:44:04,0,0,298,80,75,79
63402,https://periodicos.ces.ufcg.edu.br/periodicos/...,ojs,5.4.2.0,2014-09-10 22:44:25,Periódicos do Centro de Educação e Saúde da UFCG,99cienciaeducacaosaude25,"Educação, Ciência e Saúde",6548dc7a7029a,203,2358-7504,...,2024-01-05 04:31:48,2023-11-06 20:36:01,2023-11-08 15:03:32,2024-01-05 04:31:45,0,0,49,27,15,47
63635,https://www.proconference.org/index.php/index/oai,omp,3.3.0.14,2022-04-07 13:06:35,ProConference,usc,Sworld-Us Conference proceedings,617d69b8ab23b,109,2709-2267,...,2024-01-06 05:04:48,2021-10-30 08:50:40,2024-01-08 12:25:08,2024-01-06 05:04:43,0,0,0,0,109,0
63636,https://www.proconference.org/index.php/index/oai,omp,3.3.0.14,2022-04-07 13:06:35,ProConference,gec,SWorld-Ger Conference proceedings,617d69b8ab23b,14,2709-1783,...,2024-01-06 05:04:48,2021-10-30 08:50:40,2024-01-08 12:25:08,2024-01-06 05:04:43,0,0,0,0,14,0


24,594 rows in the ISSN column appears to have multiple ISSNs separated by a newline.

In [74]:
beacon[beacon['issn'].str.match('.+\n.+')]

Unnamed: 0,oai_url,application,version,earliest_datestamp,repository_name,set_spec,context_name,stats_id,total_record_count,issn,...,last_completed_update,first_beacon,last_beacon,last_oai_response,unresponsive_endpoint,unresponsive_context,record_count_2020,record_count_2021,record_count_2022,record_count_2023
0,https://iimmun.ru/index/oai,ojs,,,,iimm,Russian Journal of Infection and Immunity,60252fb03e031,987,2220-7619\n2313-7398,...,2024-01-08 04:41:50,2022-04-12 03:52:25,2024-01-10 13:45:07,,0,0,117,78,113,92
1,https://wisdomperiodical.miopap.aspu.am/index....,ojs,,,,wisdom,WISDOM,5853f7a1edec3,399,1829-3824\n2738-2753,...,2022-05-26 04:02:05,2022-05-25 01:01:26,2022-06-22 02:35:09,,0,0,53,140,21,0
2,https://journals.rcsi.science/index/oai,ojs,,,,2221-7185,CardioSomatics,54b7a80a220b2,63,2221-7185\n2658-5707,...,2023-12-27 07:19:41,2023-04-25 08:37:54,2024-01-10 12:59:55,,0,0,0,0,8,38
3,https://journals.rcsi.science/index/oai,ojs,,,,2712-7672,Consortium Psychiatricum,54b7a80a220b2,34,2712-7672\n2713-2919,...,2023-12-27 07:19:41,2023-04-25 08:37:54,2024-01-10 12:59:55,,0,0,0,0,8,25
4,https://journals.rcsi.science/index/oai,ojs,,,,DD,Digital Diagnostics,54b7a80a220b2,89,2712-8490\n2712-8962,...,2023-12-27 07:19:41,2023-04-25 08:37:54,2024-01-10 12:59:55,,0,0,0,1,21,66
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63384,https://ojs3.berghahnjournals.com/index.php/in...,ojs,3.4.0.4,2017-06-16 13:28:45,Berghahn Journals,sa,Social Analysis,5db2addf6477c,8,0155-977X\n1558-5727,...,2024-01-11 04:01:57,2020-05-20 09:37:46,2021-12-08 08:48:12,2024-01-11 04:01:16,0,0,0,0,8,0
63386,https://ojs.berghahnjournals.com/index.php/ind...,ojs,3.4.0.4,2017-06-16 13:28:45,Berghahn Journals,aia,Anthropology in Action,5db2addf6477c,32,0967-201X\n1752-2285,...,2024-01-07 04:57:41,2021-12-09 19:24:07,2024-01-10 12:00:14,2024-01-07 04:57:05,0,0,0,0,14,0
63387,https://ojs.berghahnjournals.com/index.php/ind...,ojs,3.4.0.4,2017-06-16 13:28:45,Berghahn Journals,contention,Contention: The Multidisciplinary Journal of S...,5db2addf6477c,21,2330-1392\n2572-7184,...,2024-01-07 04:57:41,2021-12-09 19:24:07,2024-01-10 12:00:14,2024-01-07 04:57:05,0,0,7,5,5,0
63390,https://beam.pmu.edu.my/index.php/index/oai,ojs,3.4.0.4,2023-04-29 19:03:19,,beam,Borneo Engineering & Advanced Multidisciplinar...,6255295591f27,78,2948-4049\n2948-4057,...,2024-01-11 04:07:05,2022-05-07 22:37:03,2024-01-11 02:09:03,2024-01-11 04:07:02,0,0,0,0,17,61


These appear to be when the publication has both a print and online ISSN. We are interested in OJS indexing, so ideally we would know which ISSN identifies the online journal. For the purposes of this analysis we will check each ISSN to see if there is a match. This could result in inflated statistics since the ISSN for the OJS publication may not be indexed while the print one is.

## Query Dimensions

To use the Dimensions API we first eed to log in using the credentials in the `.env` file.

In [20]:
import dimcli
import dotenv

dotenv.load_dotenv()

dimcli.login(
    os.environ.get("DIMENSIONS_USER"),
    os.environ.get("DIMENSIONS_PASSWORD"),
    "https://app.dimensions.ai",
)

[2mDimcli - Dimensions API Client (v1.3)[0m
[2mConnected to: <https://app.dimensions.ai/api/dsl> - DSL v2.10[0m
[2mMethod: manual login[0m


Next we instantiate the DSL for querying the API.

In [21]:
dim = dimcli.Dsl(verbose=False)

Dimensions has its own [SQL like query language](https://docs.dimensions.ai/dsl/tour.html). To see whether they index a particular title we can use the `source_titles` source and look up an ISSN.

In [36]:
q = f"""
    search source_titles where issn in ["2957-689X"]
    return source_titles
    limit 1000
    """

results = dim.query(q, retry=5)
print(results.source_titles)

[{'id': 'jour.1443633', 'issn': ['2957-689X'], 'publisher': '"Facultad de Ciencias Económicas y Financieras, Universidad Amazonica de Pando"', 'start_year': 2021, 'title': 'Revista Científica ciencias Económicas y Financieras', 'type': 'journal'}]


## Batch Lookup

Each query can look up multiple ISSNs which will speed up the analysis. Lets create a function to do that given a list of ISSNs.

In [35]:
def lookup_issns(issns):
    issns_str = ','.join([f'"{issn}"' for issn in issns])
    q = f"""
    search source_titles where issn in [{issns_str}]
    return source_titles
    limit 1000
    """

    dim.query(q, retry=5)
    results = dim.query(q, retry=5)
    
    return results.source_titles

lookup_issns(['2712-8962', '2957-689X'])

[{'id': 'jour.1443633',
  'issn': ['2957-689X'],
  'publisher': '"Facultad de Ciencias Económicas y Financieras, Universidad Amazonica de Pando"',
  'start_year': 2021,
  'title': 'Revista Científica ciencias Económicas y Financieras',
  'type': 'journal'},
 {'id': 'jour.1406703',
  'issn': ['2712-8490', '2712-8962'],
  'publisher': 'ECO-Vector',
  'start_year': 2020,
  'title': 'Digital Diagnostics',
  'type': 'journal'}]

Now we can use this function with the ISSNs to create a Python *Set* of OJS title ISSNs that are indexed by OJS.

In [66]:
from more_itertools import batched
from itertools import chain
from tqdm import tqdm

dimensions_issns = set()

# read the ISSNs in chunks of 250
for issns in tqdm(batched(beacon.issn, 250), total=(int(len(beacon) / 250))):

    # this is a somewhat convoluted way of splitting the newline separated ISSNs into multiple ISSNs
    issns = list(chain(*[val.split("\n") for val in issns]))

    # look up the ISSNs and for any match in the results add it to the set
    for result in lookup_issns(issns):
        for issn in result['issn']:
            dimensions_issns.add(issn)

    time.sleep(1)
            

206it [21:52,  6.37s/it]                                                        


That took 20 minutes so lets persist it as a CSV file, in case it's useful to look at it again.

In [82]:
pandas.DataFrame({"issn": list(dimensions_issns)}).to_csv('data/dimensions-indexed-issns.csv', index=False)

## Annotate the Dataset

Lets add a column to the original dataset `indexed_dimensions` to indicate whether it is indexed or not. Since the ISSNs column can contain multiple ISSNs it's easiest to write a function that will look each one up, and then apply it to the DataFrame.

In [67]:
def indexed_by_dimensions(s):
    for issn in s.split("\n"):
        if issn in dimensions_issns:
            return True
    return False

This will work with individual ISSNs:

In [68]:
indexed_by_dimensions("2220-7619")

True

And multiple ISSNs:

In [69]:
indexed_by_dimensions("2220-7619\n2313-7398")

True

And when they aren't indexed:

In [70]:
indexed_by_dimensions("2220-8619")

False

Now we just need to use this function to create the column!

In [75]:
beacon['indexed_by_dimensions'] = beacon.issn.apply(indexed_by_dimensions)

## Analyze the Results

Now we can use the column to understand what OJS titles with ISSNs are indexed by Dimensions!

In [78]:
beacon.indexed_by_dimensions.value_counts()

indexed_by_dimensions
True     37100
False    14208
Name: count, dtype: int64

In [79]:
37100 / len(beacon)

0.7230841194355656

72.3% of titles are indexed.

In [84]:
beacon.to_csv('data/beacon-issns-dimensions.csv', index=False)