# UGC's 'preferred' journals are still predatory?

In January 2017, the University Grants Commission of India released [a list of 38,635 Journals](http://www.ugc.ac.in/ugc_notices.aspx?id=1604) that it considers 'genuine' and hence articles published only in these
journals will be used for academic performance evaluation.

The last list published was scattered across 5 scanned pdf documents. It was almost impossible to search. 
[The Wire](https://thewire.in/102950/predatory-journals-ugc-research/) published an analysis of these Journals and found at least 35 of them [Predatory](https://en.wikipedia.org/wiki/Predatory_open_access_publishing)


I did an [independent analysis](http://nbviewer.jupyter.org/gist/saketkc/19b3c85d2d6ffe17fda8350256c3c64a) and found only 25. The differing numbers arise from the simple fact that those PDFs are hard to parse.

The good news is that UGC re-released the list in [text format](http://ugc.ac.in/journallist/) on April 14th, 2017. However, the bad news is there are as many as 82 journals that overlap with Jeffrey Beall's list now.
There probably could be more, since here I just check for an exact match (case-sensitive, equal spaces etc.)


All the data and scripts are hosted [here](https://github.com/saketkc/ugc-predatory-journal-analysis)

In [1]:
import pandas as pd
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
pd.set_option('display.max_colwidth', -1)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [2]:
with open('Beall_list_dec2016.txt') as f:
    beall_list_dec = [x.strip() for x in f.readlines()]
df = pd.read_csv('UGC_Journal_list_2017.csv')

In [3]:
exact_matches_dec2016 = pd.DataFrame({'Journals': list(set(df.Title).intersection(beall_list_dec))})
#fuzzy_matches = [process.extractOne(x, beall_list_dec) for x in set(df.Title)]

In [4]:
exact_matches_dec2016

Unnamed: 0,Journals
0,Aging
1,International Journal of Pharmaceutical Sciences and Drug Research
2,Journal of Electrical Engineering
3,Sport Science
4,Global Media Journal
5,International Journal of Health Research
6,Romanian Biotechnological Letters
7,International Journal of Current Pharmaceutical Review and Research
8,Pharmacie Globale: International Journal of Comprehensive Pharmacy
9,Turkish Online Journal of Educational Technology


## Update

Update for Beall's list [Jan 2017 version](https://web.archive.org/web/20170111172309/https://scholarlyoa.com/individual-journals/). For some reason, the total count reduces to 43.


In [5]:
with open('Beall_list_Jan2017.txt') as f:
    beall_list_jan = [x.strip() for x in f.readlines()]
exact_matches_jan2017 = pd.DataFrame({'Journals': list(set(df.Title).intersection(beall_list_jan))})
exact_matches_jan2017


Unnamed: 0,Journals
0,Aging
1,International Journal of Pharmaceutical Sciences and Drug Research
2,Technics Technologies Education Management
3,Sport Science
4,Global Media Journal
5,International Journal of Health Research
6,Romanian Biotechnological Letters
7,International Journal of Current Pharmaceutical Review and Research
8,European Journal of Science and Theology
9,Clinics in Oncology


## Union of Dec2016 and Jan2017 version overlapping UGC's list

In [6]:
overlap_jan_dec = pd.concat([exact_matches_dec2016, exact_matches_jan2017]).drop_duplicates()
overlap_jan_dec

Unnamed: 0,Journals
0,Aging
1,International Journal of Pharmaceutical Sciences and Drug Research
2,Journal of Electrical Engineering
3,Sport Science
4,Global Media Journal
5,International Journal of Health Research
6,Romanian Biotechnological Letters
7,International Journal of Current Pharmaceutical Review and Research
8,Pharmacie Globale: International Journal of Comprehensive Pharmacy
9,Turkish Online Journal of Educational Technology


## Total Journals in Beall's list:

## Publisher list overlap


### Dec 2016 version

In [7]:
with open('Beall_publisher_list_Dec2016.txt') as f:
    beall_pubisher_list_dec = [x.strip() for x in f.readlines()]
publisher_matches_dec2016 = pd.DataFrame({'Publisher': list(set(df.Publisher).intersection(beall_pubisher_list_dec))})
publisher_matches_dec2016

Unnamed: 0,Publisher
0,CESER Publications
1,Engineering and Technology Publishing
2,Business Perspectives
3,Baishideng Publishing Group
4,Natural Sciences Publishing Corporation
5,Econjournals
6,Marsland Press
7,Internet Scientific Publications
8,Cardiology Academic Press
9,Canadian Center of Science and Education


### Jan 2017 version

In [8]:
with open('Beall_publisher_list_Jan2017.txt') as f:
    beall_pubisher_list_jan = [x.strip() for x in f.readlines()]
publisher_matches_jan2017 = pd.DataFrame({'Publisher': list(set(df.Publisher).intersection(beall_pubisher_list_jan))})
publisher_matches_jan2017

Unnamed: 0,Publisher
0,CESER Publications
1,Engineering and Technology Publishing
2,Business Perspectives
3,Baishideng Publishing Group
4,Natural Sciences Publishing Corporation
5,Econjournals
6,Marsland Press
7,Internet Scientific Publications
8,Cardiology Academic Press
9,Canadian Center of Science and Education
