# Check publication year of JSTOR articles

@author: Jaren Haber, PhD, Georgetown University<br>
@coauthors: Prof. Heather Haveman, UC Berkeley; Yoon Sung Hong, Wayfair<br>
@contact: Jaren.Haber@georgetown.edu<br>
@project: Computational Literature Review of Organizational Scholarship<br>
@date: March 2021<br>

@description: '''Checks publication year of articles in JSTOR Data for Research corpus. Uses original (merged) metadata as baseline; compares these with hand-checked year and year collected from Web of Science API. Goal is to validate the publication year figures in the core JSTOR metadata we are working with. From hand-checking, we've noticed many articles with incorrect values for publication year (13 incorrect out of 55 checked = 26% false positive rate).'''

## Initialize

In [30]:
!pip install scholarly



In [1]:
#!pip install openpyxl
#!pip install spacy
#!pip install fuzzywuzzy
#!pip install python-Levenshtein
#import nltk; nltk.download('words')

# import packages
import imp, importlib # For working with modules
import pandas as pd # for working with dataframes
import numpy as np # for working with numbers
import pickle # For working with .pkl files
import re # for regex magic
from tqdm import tqdm # Shows progress over iterations, including in pandas via "progress_apply"
import sys # For terminal tricks
import _pickle as cPickle # Optimized version of pickle
import gc # For managing garbage collector
import timeit # For counting time taken for a process
import datetime # For working with dates & times
from datetime import date
import openpyxl # for saving in excel format
from scholarly import scholarly # For checking article pub year
import tables
from fuzzywuzzy import fuzz, process
import random
from time import sleep
import os; from os import listdir; from os.path import isfile, join
from quickpickle import quickpickle_dump, quickpickle_load # custom scripts for quick saving & loading to pickle format

In [2]:
# define filepaths

thisday = date.today().strftime("%m%d%y")

cwd = os.getcwd()
root = str.replace(cwd, 'classification/preprocess', '')

ocr_fp = root + 'jstor_data/ocr/' # text files
data_fp = root + 'classification/data/' # public data files (no raw JSTOR data)
storage_fp = root + 'models_storage/'# private repo for output files (contains JSTOR data)

# Current article lists
article_list_fp = data_fp + 'filtered_length_index.csv' # Filtered index of research articles
article_paths_fp = data_fp + 'filtered_length_article_paths.csv' # List of article file paths
article_names_fp = data_fp + 'filtered_length_article_names.xlsx' # Filtered list of article names and general data, sorted by journal then article name

# dictionary counts (using core dictionaries) and matched subjects 
counts_fp = root + 'dictionary_methods/counts_and_subject.csv'

# per-article metadata with URLs
meta_fp = storage_fp + 'metadata/metadata_combined_030921.h5' 

# per-article info on cosine scores using each dictionary (core or 100-term dictionaries??)
cosines_fp = storage_fp + 'word_embeddings_data/text_with_cosine_scores_wdg_2020_oct27.csv'

# sample of 55 articles with hand-checked years
checked_fp = data_fp + 'coded_year_checked_02_2021.xlsx'

## Read in & merge data

In [6]:
# Read in metadata file
df_meta = pd.read_hdf(meta_fp)
df_meta.reset_index(drop=False, inplace=True) # extract file name from index

# For merging purposes, get ID alone from file name, e.g. 'journal-article-10.2307_2065002' -> '10.2307_2065002'
df_meta['edited_filename'] = df_meta['file_name'].apply(lambda x: x[16:]) 
df_meta = df_meta[["edited_filename", "article_name", "jstor_url", "abstract", "journal_title", "given_names", "primary_subject", "year", "type"]] # keep only relevant columns
df_meta.rename(columns={'year':'year_jstor_new'}, inplace = True)

df_meta.head()

Unnamed: 0,edited_filename,article_name,jstor_url,abstract,journal_title,given_names,primary_subject,year_jstor_new,type
0,10.2307_4167860,Cross-Dialectal Variation in Arabic: Competing...,https://www.jstor.org/stable/4167860,Most researchers of Arabic sociolinguistics as...,Language in Society,,Other,1979,research-article
1,10.2307_2578336,,https://www.jstor.org/stable/2578336,,Social Forces,"[Sidney, Hyman P., Riv-Ellen, Stephen, Thomas,...",Sociology,1983,book-review
2,10.2307_2654760,,https://www.jstor.org/stable/2654760,,Contemporary Sociology,"[Sidney, Hyman P., Riv-Ellen, Stephen, Thomas,...",Sociology,1998,book-review
3,10.2307_43242281,editor's note: A KNIGHT'S TALE,https://www.jstor.org/stable/43242281,,Corporate Knights,"[Sidney, Hyman P., Riv-Ellen, Stephen, Thomas,...",Other,2005,misc
4,10.2307_42862018,,https://www.jstor.org/stable/42862018,,Social Science Quarterly,"[Sidney, Hyman P., Riv-Ellen, Stephen, Thomas,...",Sociology,1985,book-review


In [7]:
# Read in filtered index, counts
df = pd.read_csv(article_list_fp, low_memory=False, header=None, names=["file_name"])
df['edited_filename'] = df['file_name'].apply(lambda x: x[16:]) # New col with only article ID

df_counts = pd.read_csv(counts_fp, low_memory=False)
df_counts['edited_filename'] = df_counts['article_id'].apply(lambda x: x[16:]) # New col with only article ID
df_counts = df_counts[['edited_filename', 'word_count']]

# Merge meta data, counts into articles list DF
df = pd.merge(df, df_meta, how='left', on='edited_filename') # meta data
df = pd.merge(df, df_counts, how='left', on='edited_filename') # counts

# Eliminate empty rows
df = df[df['article_name'].notnull()]

In [8]:
# Read hand-checked article sample
checked_df = pd.read_excel(checked_fp)
checked_df = checked_df[['edited_filename', 'year_hand_checked', 'year_jstor_dfr']]
checked_df.rename(columns={'year_jstor_dfr':'year_jstor_old'}, inplace = True)

# Limit to just this sample
df = pd.merge(df, checked_df, how='right', on='edited_filename')
              
# Show all columns in resulting DF
print("All columns:\n", list(df))
print()

print("Rows, cols in data:", df.shape)

df.head()

All columns:
 ['file_name', 'edited_filename', 'article_name', 'jstor_url', 'abstract', 'journal_title', 'given_names', 'primary_subject', 'year_jstor_new', 'type', 'word_count', 'year_hand_checked', 'year_jstor_old']

Rows, cols in data: (55, 13)


Unnamed: 0,file_name,edited_filename,article_name,jstor_url,abstract,journal_title,given_names,primary_subject,year_jstor_new,type,word_count,year_hand_checked,year_jstor_old
0,journal-article-10.2307_41275157,10.2307_41275157,Foucault Reads Freud: The Dialogue with Unreas...,https://www.jstor.org/stable/41275157,The title of the essay refers to the famous st...,Polish Sociological Review,"[Loren C., Jeffrey, Stephen, Gerrie ter, Mathi...",Sociology,2010,research-article,6943,2010,2010
1,journal-article-10.2307_29770169,10.2307_29770169,A Socio-instutionalist Critique of the 1990s' ...,https://www.jstor.org/stable/29770169,This paper argues that the on-going reforms to...,Review of Social Economy,"[Jack W., A. Harvey, Christopher, Michael G., ...",Sociology,2002,research-article,8969,2002,2002
2,journal-article-10.2307_20460016,10.2307_20460016,Changes in the Practice of Eating: A Comparati...,https://www.jstor.org/stable/20460016,This article examines changes in aspects of th...,Acta Sociologica,"[Sandra H., David E., Christoph, Martine, Arno...",Sociology,2007,research-article,9074,2007,2007
3,journal-article-10.2307_2112560,10.2307_2112560,Collective Bargaining and Faculty Compensation...,https://www.jstor.org/stable/2112560,This article assesses the impact of the unioni...,Sociology of Education,"[Mabel, David, Jorge A., Benito E., Danièle, L...",Sociology,1980,research-article,6366,1987,1980
4,journal-article-10.2307_2635073,10.2307_2635073,Windows of Opportunity: Temporal Patterns of T...,https://www.jstor.org/stable/2635073,This paper examines the introduction and adapt...,Organization Science,"[Arthur, Alan R., Georg, K., E., Yaniv, Stephe...",Management & Organizational Behavior,1986,research-article,10514,1994,1986


## Correct publication year using APIs

### Scholarly API (breaks)

In [9]:
def get_year_scholarly(title):
    '''
    Gets publication year for article using title, using the Scholarly API (which uses Google Scholar). 
    
    Args:
        title (str): full title, e.g., 'The Collective Strategy Framework: An Application to Competing Predictions of Isomorphism'
        
    Returns:
        pub_year (int): year article was published, in four digits (i.e., `19xx` or `20xx`)
    '''
    
    search_query = scholarly.search_pubs(title)
    
    pub_year = next(search_query)['bib']['pub_year']
    
    sleep(random.randint(200,500)/100) # random pause to avoid getting blocked by Google
    
    return pub_year

In [10]:
# Test Scholarly API (gets blocked by Google via `MaxTriesExceedException`)
title_year_tups = [
    ("A Socio-instutionalist Critique of the 1990s' Reforms of the United Kingdom's National Health Service", 2002, 2002), 
    ("Deepening Our Commitment, Hitting the Streets: A Call to Action", 2001, 2001), 
    ("Linking Organizational Values to Relationships with External Constituents: A Study of Nonprofit Professional Theatres", 2000, 1995), 
    ("On the Foundations of Athenian Democracy: Marx's Paradox and Weber's Solution", 2000, 2000), 
    ("Consumption Caught in the 'Cash Nexus'", 2000, 2000), 
    ("Culture and Charisma: Outline of a Theory", 2000, 1971), 
    ("The Collective Strategy Framework: An Application to Competing Predictions of Isomorphism", 1988, 1984), 
    ("Windows of Opportunity: Temporal Patterns of Technological Adaptation in Organizations", 1994, 1986), 
    ("The Gentle Leviathan: Welfare and the Indian State", 1994, 1987)]

for title, true_year, jstor_year in title_year_tups:
    retrieved_year = get_year_scholarly(title)
    print("Title: \t\t\t\t{}".format(title))
    print("Verified pub year: \t\t{}".format(str(true_year)))
    print("Pub year from Scholarly: \t{}".format(str(get_year_scholarly(title))))
    #print("Pub year from Web of Science: \t{}".format(str(get_year_webofscience(title))))
    print("Pub year from JSTOR: \tnext(search_query)['bib']['pub_year']\t{}".format(str(jstor_year)))
    print()

Title: 				A Socio-instutionalist Critique of the 1990s' Reforms of the United Kingdom's National Health Service
Verified pub year: 		2002
Pub year from Scholarly: 	2002
Pub year from JSTOR: 	next(search_query)['bib']['pub_year']	2002

Title: 				Deepening Our Commitment, Hitting the Streets: A Call to Action
Verified pub year: 		2001
Pub year from Scholarly: 	2001
Pub year from JSTOR: 	next(search_query)['bib']['pub_year']	2001



MaxTriesExceededException: Cannot Fetch from Google Scholar.

### Web of Science API

In [11]:
# Setup API
import woslite_client
from woslite_client.rest import ApiException
from pprint import pprint

def load_api_key(path):
    '''
    Loads text from file containing API key.
    '''
    with open(path, 'r') as f:
        for line in f:
            return line.strip()
        
# Configure API key authorization
configuration = woslite_client.Configuration()
configuration.api_key['X-ApiKey'] = load_api_key('wos_api_key.txt')

# Create an instance of the API class
integration_api_instance = woslite_client.IntegrationApi(woslite_client.ApiClient(configuration))
search_api_instance = woslite_client.SearchApi(woslite_client.ApiClient(configuration))
database_id = 'WOK'  # str | Database to search. Must be a valid database ID, one of the following: BCI/BIOABS/BIOSIS/CCC/DCI/DIIDW/MEDLINE/WOK/WOS/ZOOREC. WOK represents all databases.

In [15]:
def get_year_wos(row): 

    '''
    Gets publication year for article using title, using the Scholarly API (which uses Google Scholar). 
    
    Docs: https://github.com/Clarivate-SAR/woslite_py_client
    
    Args:
        row (Series): first element is full title, e.g., 'The Collective Strategy Framework: An Application to Competing Predictions of Isomorphism'; second element is journal title
        
    Returns:
        pub_year (int): year article was published, in four digits (i.e., `19xx` or `20xx`)
    '''
    
    sleeptime = 1 #random.randint(5000,7000)/1000  # set pause for politeness/to avoid getting blocked by API
    title = row[0] # get title #title_col
    journal = row[1] # get journal #journal_col
    
    # Configure query
    title = title.replace("'", "") # remove apostrophes (confuses parser)
    usr_query = f"TI=({title}) AND SO=({journal})" # str | User query for requesting data, ex: TS=(cadmium). The query parser will return errors for invalid queries.
    count = 1  # int | Number of records returned in the request
    first_record = 1  # int | Specific record, if any within the result set to return. Cannot be less than 1 and greater than 100000.
    lang = 'en'  # str | Language of search. This element can take only one value: en for English. If no language is specified, English is passed by default. (optional)
    sort_field = 'PY+D'  # str | Order by field(s). Field name and order by clause separated by '+', use A for ASC and D for DESC, ex: PY+D. Multiple values are separated by comma. (optional)
    
    try:
        # Find record(s) by user query
        api_response = search_api_instance.root_get(database_id, usr_query, count, first_record, lang=lang,
                                                                 sort_field=sort_field)
        
        # Get fields of interest from API response, assign to row
        pub_year = api_response.data[0].source.published_biblio_year[0]
        pub_title = api_response.data[0].title.title[0]
        similarity = fuzz.ratio(title.lower(), pub_title.lower()) # compare titles
        
        #print(f'API record found for: \t"{pub_title}"') # show results
        #print(f'JSTOR Title for above:\t"{title}"')
        sleep(sleeptime) # pause
        
        return pd.Series([pub_year, pub_title, similarity])
        
    except Exception as e:
        print("API failed with error: \t%s" % e)
        sleep(sleeptime) # pause
        
        return pd.Series([np.NaN, np.NaN, np.NaN])

In [16]:
# Execute WOS method
tqdm.pandas(desc='API -> year...')

try:
    df[['year_wos', 'article_name_wos', 'similarity_wos_title']] = df[['article_name', 'journal_title']].progress_apply(get_year_wos, axis=1)
except ValueError as e: 
    print(f'Encountered error: {e}')

API -> year...:   0%|          | 0/55 [00:00<?, ?it/s]

API record found for: 	"Foucault Reads Freud: The Dialogue with Unreason and Enlightenment"
JSTOR Title for above:	"Foucault Reads Freud: The Dialogue with Unreason and Enlightenment"


API -> year...:   4%|▎         | 2/55 [00:01<00:30,  1.72it/s]

API record found for: 	"Foucault Reads Freud: The Dialogue with Unreason and Enlightenment"
JSTOR Title for above:	"Foucault Reads Freud: The Dialogue with Unreason and Enlightenment"


API -> year...:   5%|▌         | 3/55 [00:02<00:42,  1.23it/s]

API failed with error: 	list index out of range


API -> year...:   7%|▋         | 4/55 [00:03<00:47,  1.07it/s]

API record found for: 	"Changes in the practice of eating - A comparative analysis of time-use"
JSTOR Title for above:	"Changes in the Practice of Eating: A Comparative Analysis of Time-Use"


API -> year...:   9%|▉         | 5/55 [00:04<00:51,  1.02s/it]

API record found for: 	"COLLECTIVE-BARGAINING AND FACULTY COMPENSATION - FACULTY AS A NEW WORKING-CLASS"
JSTOR Title for above:	"Collective Bargaining and Faculty Compensation: Faculty as a New Working Class"


API -> year...:  11%|█         | 6/55 [00:05<00:53,  1.09s/it]

API record found for: 	"WINDOWS OF OPPORTUNITY - TEMPORAL PATTERNS OF TECHNOLOGICAL ADAPTATION IN ORGANIZATIONS"
JSTOR Title for above:	"Windows of Opportunity: Temporal Patterns of Technological Adaptation in Organizations"


API -> year...:  13%|█▎        | 7/55 [00:07<00:54,  1.14s/it]

API record found for: 	"DRAMATURGY AND POLITICAL MYSTIFICATION - POLITICAL LIFE IN THE UNITED-STATES"
JSTOR Title for above:	"DRAMATURGY AND POLITICAL MYSTIFICATION: POLITICAL LIFE IN THE UNITED STATES"


API -> year...:  15%|█▍        | 8/55 [00:08<00:55,  1.17s/it]

API record found for: 	"Commanding materials: (Re)legitimating authority in the context of multi-disciplinary work"
JSTOR Title for above:	"Commanding Materials: (Re)legitimating Authority in the Context of Multi-disciplinary Work"


API -> year...:  16%|█▋        | 9/55 [00:09<00:53,  1.16s/it]

API record found for: 	"ANONYMITY AND THE RISE OF UNIVERSAL OCCASIONS FOR RELIGIOUS RITUAL - AN EXTENSION OF THE DURKHEIMIAN THEORY"
JSTOR Title for above:	"Anonymity and the Rise of Universal Occasions for Religious Ritual: An Extension of the Durkheimian Theory"


API -> year...:  18%|█▊        | 10/55 [00:10<00:52,  1.16s/it]

API record found for: 	"Constructing Clean Dreams: Accounts, Future Selves, and Social and Structural Support as Desistance Work"
JSTOR Title for above:	"Constructing Clean Dreams: Accounts, Future Selves, and Social and Structural Support as Desistance Work"


API -> year...:  20%|██        | 11/55 [00:11<00:50,  1.16s/it]

API record found for: 	"Being Special, Becoming Indigenous: Dilemmas of Special Adat Rights in Indonesia"
JSTOR Title for above:	"Being Special, Becoming Indigenous: Dilemmas of Special Adat Rights in Indonesia"


API -> year...:  22%|██▏       | 12/55 [00:12<00:49,  1.15s/it]

API record found for: 	"Culture and charisma: Outline of a theory"
JSTOR Title for above:	"Culture and Charisma: Outline of a Theory"


API -> year...:  24%|██▎       | 13/55 [00:14<00:48,  1.15s/it]

API failed with error: 	list index out of range


API -> year...:  25%|██▌       | 14/55 [00:15<00:46,  1.14s/it]

API record found for: 	"Consumption caught in the 'cash nexus'"
JSTOR Title for above:	"Consumption Caught in the Cash Nexus"


API -> year...:  27%|██▋       | 15/55 [00:16<00:45,  1.14s/it]

API failed with error: 	list index out of range


API -> year...:  29%|██▉       | 16/55 [00:17<00:45,  1.18s/it]

API record found for: 	"Civil society and agricultural sustainability"
JSTOR Title for above:	"Civil Society and Agricultural Sustainability"


API -> year...:  31%|███       | 17/55 [00:18<00:46,  1.21s/it]

API failed with error: 	list index out of range


API -> year...:  33%|███▎      | 18/55 [00:20<00:44,  1.22s/it]

API record found for: 	""Brainwashing" theories in European parliamentary and administrative reports on "cults" and "sects""
JSTOR Title for above:	""Brainwashing" Theories in European Parliamentary and Administrative Reports on "Cults" and "Sects""


API -> year...:  35%|███▍      | 19/55 [00:21<00:43,  1.19s/it]

API record found for: 	"CONCEPTUALIZING RACISMS - SOCIAL-THEORY, POLITICS AND RESEARCH"
JSTOR Title for above:	"CONCEPTUALISING RACISMS: SOCIAL THEORY, POLITICS AND RESEARCH"


API -> year...:  36%|███▋      | 20/55 [00:22<00:41,  1.18s/it]

API record found for: 	"Deepening our commitment, hitting the streets: A call to action"
JSTOR Title for above:	"Deepening Our Commitment, Hitting the Streets: A Call to Action"


API -> year...:  38%|███▊      | 21/55 [00:23<00:39,  1.17s/it]

API failed with error: 	list index out of range


API -> year...:  40%|████      | 22/55 [00:24<00:38,  1.16s/it]

API failed with error: 	list index out of range


API -> year...:  42%|████▏     | 23/55 [00:25<00:36,  1.15s/it]

API failed with error: 	list index out of range


API -> year...:  44%|████▎     | 24/55 [00:26<00:35,  1.14s/it]

API failed with error: 	list index out of range


API -> year...:  45%|████▌     | 25/55 [00:28<00:34,  1.14s/it]

API record found for: 	"How the civil rights movement revitalized labor militancy. (vol 67, pg 723, 2002)"
JSTOR Title for above:	"How the Civil Rights Movement Revitalized Labor Militancy"


API -> year...:  47%|████▋     | 26/55 [00:29<00:33,  1.15s/it]

API record found for: 	"The Production and Transmission of Knowledge in Colonial Malaya"
JSTOR Title for above:	"The Production and Transmission of Knowledge in Colonial Malaya"


API -> year...:  49%|████▉     | 27/55 [00:30<00:32,  1.14s/it]

API failed with error: 	list index out of range


API -> year...:  51%|█████     | 28/55 [00:31<00:31,  1.17s/it]

API record found for: 	"Board-Management Relationships: Resources and Internal Dynamics"
JSTOR Title for above:	"Board-Management Relationships: Resources and Internal Dynamics"


API -> year...:  53%|█████▎    | 29/55 [00:32<00:30,  1.16s/it]

API failed with error: 	list index out of range


API -> year...:  55%|█████▍    | 30/55 [00:33<00:28,  1.15s/it]

API record found for: 	"Health and quality of life of older people, a replication after six years"
JSTOR Title for above:	"Health and quality of life of older people, a replication after six years"


API -> year...:  56%|█████▋    | 31/55 [00:35<00:27,  1.15s/it]

API failed with error: 	list index out of range


API -> year...:  58%|█████▊    | 32/55 [00:36<00:26,  1.15s/it]

API failed with error: 	list index out of range


API -> year...:  60%|██████    | 33/55 [00:37<00:25,  1.14s/it]

API record found for: 	"Race and theory: Culture, poverty, and adaptation to discrimination in Wilson and Ogbu"
JSTOR Title for above:	"Race and Theory: Culture, Poverty, and Adaptation to Discrimination in Wilson and Ogbu"


API -> year...:  62%|██████▏   | 34/55 [00:38<00:24,  1.14s/it]

API failed with error: 	list index out of range


API -> year...:  64%|██████▎   | 35/55 [00:39<00:22,  1.14s/it]

API record found for: 	"An analysis of Zimbabwean hotel managers' perspectives on workforce diversity"
JSTOR Title for above:	"An analysis of Zimbabwean hotel managers perspectives on workforce diversity"


API -> year...:  65%|██████▌   | 36/55 [00:40<00:21,  1.14s/it]

API record found for: 	"Organizational foundings in community context: Instruments manufacturers and their interrelationship with other organizations"
JSTOR Title for above:	"Organizational Foundings in Community Context: Instruments Manufacturers and Their Interrelationship with Other Organizations"


API -> year...:  67%|██████▋   | 37/55 [00:41<00:20,  1.14s/it]

API record found for: 	"From Experience to Experiential Learning: Cultural Intelligence as a Learning Capability for Global Leader Development"
JSTOR Title for above:	"From Experience to Experiential Learning: Cultural Intelligence as a Learning Capability for Global Leader Development"


API -> year...:  69%|██████▉   | 38/55 [00:43<00:19,  1.14s/it]

API record found for: 	"Change and complementarities in the new competitive landscape: A European panel study, 1992-1996"
JSTOR Title for above:	"Change and Complementarities in the New Competitive Landscape: A European Panel Study, 1992-1996"


API -> year...:  71%|███████   | 39/55 [00:44<00:18,  1.15s/it]

API record found for: 	"Bad apples or bad barrels? A former CEO discusses the interplay of person and situation with implications for business education"
JSTOR Title for above:	"Bad Apples or Bad Barrels? A Former CEO Discusses the Interplay of Person and Situation with Implications for Business Education"


API -> year...:  73%|███████▎  | 40/55 [00:45<00:17,  1.14s/it]

API record found for: 	"Nazism, nationalism, and the sociology of emotions: Escape from Freedom revisited"
JSTOR Title for above:	"Nazism, Nationalism, and the Sociology of Emotions: Escape from Freedom Revisited"


API -> year...:  75%|███████▍  | 41/55 [00:46<00:16,  1.14s/it]

API record found for: 	"Sports in civil society: Networks, social capital and influence"
JSTOR Title for above:	"Sports in Civil Society: Networks, Social Capital and Influence"


API -> year...:  76%|███████▋  | 42/55 [00:47<00:14,  1.14s/it]

API failed with error: 	list index out of range


API -> year...:  78%|███████▊  | 43/55 [00:48<00:13,  1.14s/it]

API failed with error: 	list index out of range


API -> year...:  80%|████████  | 44/55 [00:49<00:12,  1.14s/it]

API record found for: 	"Family Complexity and Children's Behavior Problems over Two US Cohorts"
JSTOR Title for above:	"The Impact of the First Birth: Married and Single Women Preferring Childlessness, One Child, or Two Children"


API -> year...:  82%|████████▏ | 45/55 [00:50<00:11,  1.14s/it]

API record found for: 	"Linking organizational values to relationships with external constituents: A study of nonprofit professional theatres"
JSTOR Title for above:	"Linking Organizational Values to Relationships with External Constituents: A Study of Nonprofit Professional Theatres"


API -> year...:  84%|████████▎ | 46/55 [00:52<00:10,  1.14s/it]

API failed with error: 	list index out of range


API -> year...:  85%|████████▌ | 47/55 [00:53<00:09,  1.14s/it]

API record found for: 	"Disparities in Academic Achievement: Assessing the Role of Habitus and Practice"
JSTOR Title for above:	"Disparities in Academic Achievement: Assessing the Role of Habitus and Practice"


API -> year...:  87%|████████▋ | 48/55 [00:54<00:07,  1.14s/it]

API failed with error: 	list index out of range


API -> year...:  89%|████████▉ | 49/55 [00:55<00:06,  1.14s/it]

API record found for: 	"WHEN EXPERIENCE MEETS NATIONAL INSTITUTIONAL ENVIRONMENTAL CHANGE: FOREIGN ENTRY ATTEMPTS OF US FIRMS IN THE CENTRAL AND EASTERN EUROPEAN REGION"
JSTOR Title for above:	"When Experience Meets National Institutional Environmental Change: Foreign Entry Attempts of U.S. Firms in the Central and Eastern European Region"


API -> year...:  91%|█████████ | 50/55 [00:56<00:05,  1.14s/it]

API record found for: 	"Knowledge management and organisational culture in higher education institutions"
JSTOR Title for above:	"Knowledge management and organisational culture in higher education institutions"


API -> year...:  93%|█████████▎| 51/55 [00:57<00:04,  1.14s/it]

API failed with error: 	list index out of range


API -> year...:  95%|█████████▍| 52/55 [00:59<00:03,  1.17s/it]

API record found for: 	"Non-profit foundations in four countries of central and eastern Europe"
JSTOR Title for above:	"Non-Profit Foundations in Four Countries of Central and Eastern Europe"


API -> year...:  96%|█████████▋| 53/55 [01:00<00:02,  1.19s/it]

API record found for: 	"SHAS as a struggle to create a new field: A Bourdieuan perspective of an Israeli phenomenon"
JSTOR Title for above:	"SHAS as a Struggle to Create a New Field: A Bourdieuan Perspective of an Israeli Phenomenon"


API -> year...:  98%|█████████▊| 54/55 [01:01<00:01,  1.18s/it]

API failed with error: 	list index out of range


API -> year...: 100%|██████████| 55/55 [01:02<00:00,  1.17s/it]

API failed with error: 	list index out of range
API record found for: 	"THE COLLECTIVE STRATEGY FRAMEWORK - AN APPLICATION TO COMPETING PREDICTIONS OF ISOMORPHISM"
JSTOR Title for above:	"The Collective Strategy Framework: An Application to Competing Predictions of Isomorphism"


API -> year...: 100%|██████████| 55/55 [01:04<00:00,  1.18s/it]


In [17]:
df

Unnamed: 0,file_name,edited_filename,article_name,jstor_url,abstract,journal_title,given_names,primary_subject,year_jstor_new,type,word_count,year_hand_checked,year_jstor_old,year_wos,article_name_wos,similarity_wos_title
0,journal-article-10.2307_41275157,10.2307_41275157,Foucault Reads Freud: The Dialogue with Unreas...,https://www.jstor.org/stable/41275157,The title of the essay refers to the famous st...,Polish Sociological Review,"[Loren C., Jeffrey, Stephen, Gerrie ter, Mathi...",Sociology,2010,research-article,6943,2010,2010,2010.0,Foucault Reads Freud: The Dialogue with Unreas...,100.0
1,journal-article-10.2307_29770169,10.2307_29770169,A Socio-instutionalist Critique of the 1990s' ...,https://www.jstor.org/stable/29770169,This paper argues that the on-going reforms to...,Review of Social Economy,"[Jack W., A. Harvey, Christopher, Michael G., ...",Sociology,2002,research-article,8969,2002,2002,,,
2,journal-article-10.2307_20460016,10.2307_20460016,Changes in the Practice of Eating: A Comparati...,https://www.jstor.org/stable/20460016,This article examines changes in aspects of th...,Acta Sociologica,"[Sandra H., David E., Christoph, Martine, Arno...",Sociology,2007,research-article,9074,2007,2007,2007.0,Changes in the practice of eating - A comparat...,98.0
3,journal-article-10.2307_2112560,10.2307_2112560,Collective Bargaining and Faculty Compensation...,https://www.jstor.org/stable/2112560,This article assesses the impact of the unioni...,Sociology of Education,"[Mabel, David, Jorge A., Benito E., Danièle, L...",Sociology,1980,research-article,6366,1987,1980,1987.0,COLLECTIVE-BARGAINING AND FACULTY COMPENSATION...,96.0
4,journal-article-10.2307_2635073,10.2307_2635073,Windows of Opportunity: Temporal Patterns of T...,https://www.jstor.org/stable/2635073,This paper examines the introduction and adapt...,Organization Science,"[Arthur, Alan R., Georg, K., E., Yaniv, Stephe...",Management & Organizational Behavior,1986,research-article,10514,1994,1986,1994.0,WINDOWS OF OPPORTUNITY - TEMPORAL PATTERNS OF ...,98.0
5,journal-article-10.2307_23252765,10.2307_23252765,DRAMATURGY AND POLITICAL MYSTIFICATION: POLITI...,https://www.jstor.org/stable/23252765,,Mid-American Review of Sociology,"[Hans-Jürgen, Conny Herbert, M. L., Herbert Sp...",Sociology,1985,research-article,5131,1985,1985,1985.0,DRAMATURGY AND POLITICAL MYSTIFICATION - POLIT...,97.0
6,journal-article-10.2307_42858192,10.2307_42858192,Commanding Materials: (Re)legitimating Authori...,https://www.jstor.org/stable/42858192,This article explores some specific consequenc...,Sociology,"[J. L., L. M., Elizabeth G., Robert Y., Ronald...",Sociology,2004,research-article,5913,2004,2004,2004.0,Commanding materials: (Re)legitimating authori...,100.0
7,journal-article-10.2307_1387003,10.2307_1387003,Anonymity and the Rise of Universal Occasions ...,https://www.jstor.org/stable/1387003,"In this research, Durkheim's theory of the uni...",Journal for the Scientific Study of Religion,"[Michal, Martin, Zora, Miroslav, Grigorij, Joh...",Sociology,1981,research-article,6311,1992,1981,1992.0,ANONYMITY AND THE RISE OF UNIVERSAL OCCASIONS ...,99.0
8,journal-article-10.1525_si.2011.34.1.63,10.1525_si.2011.34.1.63,"Constructing Clean Dreams: Accounts, Future Se...",https://www.jstor.org/stable/10.1525/si.2011.3...,This article investigates the discourse indivi...,Symbolic Interaction,"[Ernest S., Michael, Don, Henry S., Régine, Do...",Sociology,2011,research-article,8670,2011,2011,2011.0,"Constructing Clean Dreams: Accounts, Future Se...",100.0
9,journal-article-10.2307_43497847,10.2307_43497847,"Being Special, Becoming Indigenous: Dilemmas o...",https://www.jstor.org/stable/43497847,From 1998 onwards Indonesias reform era (refor...,Asian Journal of Social Science,"[Ernest S., Michael, Don, Henry S., Régine, Do...",Sociology,2011,research-article,7008,2011,2011,2011.0,"Being Special, Becoming Indigenous: Dilemmas o...",100.0


**Take-aways:**

- Old and new JSTOR publication year are the same --> Metadata processing may be source of error
- WOS and hand-checking give same result --> Matched WOS pub year data can be trusted