# Journal Analysis of the LOC-DB Project


Here we make an analysis of the journals covered by the LOC-DB project. For the first part we concentrate on the social sciences journals licensed by the Mannheim University Library published in 2011.

## Requirements

You need to have jupyter installed together with python, and in python you need the crossrefapi, which can be installed for example like this:
```py
pip install crossrefapi
```

In [82]:
from crossref.restful import Works, Journals
#works = Works()
#journals = Journals()

We have 100 journals with their ISSNs. We save them here into a list:

In [114]:
issnList =  ['0342-300X', '0033-5177', '0340-0425', '0023-2653', '0049-089x', '1550-3585', '0335-5322', '0065-2601', '0342-2275', '0092-6566', '0340-613x', '1869-8980', '0097-9740', '1864-9335', '0514-2776', '0038-6073', '0003-1224', '0018-7267', '0038-0164', '0027-3171', '0037-783X', '0035-2969', '0033-362X', '0038-0261', '0037-7732', '0022-2445', '0022-1031', '0038-0296', '0197-6664', '0894-3257', '0067-5830', '0146-1672', '0165-4896', '0278-016X', '0002-9602', '0007-1315', '0020-7152', '0011-3204', '0018-7259', '0037-8046', '0019-8676', '0022-3506', '0163-786X', '0378-8733', '0539-0184', '0038-0385', '0038-0407', '0066-6505', '0171-5860', '0933-9361', '0735-2751', '0266-7215', '0276-5624', '0749-5978', '0174-0202', '0048-8046', '0343-4109', '0001-6993', '0197-3533', '0360-0572', '0162-895x', '0002-7642', '1231-1413', '0046-2772', '0022-250x', '0340-1804', '0021-8308', '0094-3061', '0049-1241', '0048-3931', '1536-867X', '0730-8884', '0948-423X', '1749-5679', '0340-918X', '1043-4631', '1469-5405', '0891-2432', '0950-0170', '1477-996X', '0141-9889', '0098-7921', '0304-2421', '0011-3921', '0032-3292', '0044-118x', '0863-1808', '0263-2764', '0002-7162', '0038-609x', '0037-7791', '0012-155X', '0958-9287', '0951-6328', '1438-5627', '1864-3361', '1360-7804', '1435-9871', '0959-6801', '1468-0181', '0019-8692']
# For testing use only the second line, but for the final results you need to comment it out.
#issnList =  ['0342-300X', '0033-5177']

## Number of Articles Published in 2011

In [93]:
inCrossref = {}

In [94]:
k = 0
sum = 0
for issn in issnList:
    journal = Journals().journal(issn)
    if journal:
        inCrossref[issn] = True
        if (journal['breakdowns'] and journal['breakdowns']['dois-by-issued-year']):
            dois = journal['breakdowns']['dois-by-issued-year']
            #results[issn]['doisPerYear'] = {}
            for doi in dois:
                if doi[0]==2011:
                    k += 1
                    sum += doi[1]
    else:
        inCrossref[issn] = False
print(k, sum, sum/k)

(86, 5737, 66)


Thus, we have found 86 journals in Crossref. The number of DOIs these journals registered in Crossref is 5737, which gives an average of **66 DOIs per journal**. Because every journal article has a DOI, we can take this as an upper bound for the number of articles published in a year.

## References in one Journal Article

Next, we count the number of references per journal article. This takes quite some time for the whole list of ISSNs. Therefore, we skip all journals not in Crossref beforehands by the variable `inCrossref` created above. Moreover, in the same run we can count how many of the references are structered such that they contain a DOI, title, year.

In [119]:
k = 0
z = 0
sum = 0
errors = 0
withDOI = 0
withTitle = 0
withYear = 0
onlyUnstructured = 0
for issn in issnList:
    if (results[issn]['found']):
        works = Journals().works(issn).filter(has_references="true").filter(from_pub_date=2011).filter(until_pub_date=2011)
        for article in works:
            nref = article['reference-count']
            k += 1
            sum += nref
            if ('reference' in article):
                for reference in article['reference']:
                    if ('DOI' in reference): withDOI += 1
                    if ('title' in reference or 'volume-title' in reference or 'journal-title' in reference): withTitle += 1
                    #else: print(reference)
                    if ('year' in reference): withYear += 1
                    if ('key' in reference and 'unstructured' in reference and len(reference)==2): onlyUnstructured += 1
            else: errors += 1
        if (works.count()>0): z += 1
print(k, z, sum, sum/k)
print(withDOI, withTitle, withYear)
print(onlyUnstructured)

(3835, 80, 169361, 44)
(59726, 73852, 73205)
7302


Crossref has references of 3835 journal articles suming up to 169.361 single references, i.e. **44 references in average per article**. Journals not in Crossref or without references in Crossref are not considered here for calculating the average number, because they will also have list of references.

Out of these 169.361 references 7.302 (4%) have only unstructured information. The remaining references have some structured information, e.g.:
* 59.726 (35%) have a DOI
* 73.852 (44%) have a title (including journal title or volume title)
* 73.205 (43%) have a year