# `Python for academics` : Managing your bibliography

by **Kamila Zdybał**

[`https://kamilazdybal.github.io`](https://kamilazdybal.github.io)

In this notebook, we explore various ways in which Python can help us manage bibliographic files, citations, and literature reviews.

<a id=top-page></a>
***

## Table of contents

- [**Operations on `.bib` files**](#bib)
    - [Exercise 1](#bib-ex-1)
    - [Exercise 2](#bib-ex-2)
    - [Exercise 3](#bib-ex-3)
    - [Exercise 4](#bib-ex-4)
    - [Exercise 5](#bib-ex-5)

<a id=bib></a>
***

## Operations on `.bib` files

[**Go to the top ↑**](#top-page)

<a id=bib-ex-1></a>
***
### Exercise 1

[**Go to the top ↑**](#top-page)

<a href="https://youtu.be/6S-o_TRQMn4">
  <img src="https://img.shields.io/badge/youtube-firebrick?style=for-the-badge&logo=youtube&logoColor=white" alt="YouTube Badge"/>
</a>

We want to count how many bibliography items are included in the `.bib` file and get a detailed description of how many of each entry type we have.

Given the `bibliography.bib` file, we want to accomplish:

```
Total bibliography items: 8
- - - - - - - - - - - - - - - - - - - - 
           article: 4
              book: 2
           booklet: 0
            inbook: 0
      incollection: 0
     inproceedings: 1
            manual: 0
     mastersthesis: 0
              misc: 0
         phdthesis: 1
       proceedings: 0
        techreport: 0
       unpublished: 0
- - - - - - - - - - - - - - - - - - - - 
```

In [1]:
entry_types = ['article',
               'book',
               'booklet',
               'inbook',
               'incollection',
               'inproceedings',
               'manual',
               'mastersthesis',
               'misc',
               'phdthesis',
               'proceedings',
               'techreport',
               'unpublished']

In [2]:
import re

In [3]:
directory = './'

In [4]:
filename = 'bibliography.bib'

In [5]:
file = open(directory + filename, 'r')

In [6]:
file_content = file.read()

In [9]:
file_list = file_content.split('\n')

In [14]:
count_entry_types = {}

for i in entry_types:
    count_entry_types[i] = 0
    
for item in file_list:
    match = re.search(r'@(.*)\{', item)
    if match is not None:
        
        entry_type = match.group(1)
        count_entry_types[entry_type] += 1

In [16]:
n_items = sum(count_entry_types.values())

In [18]:
file.close()

In [23]:
print('Total bibliography items: ' + str(n_items))
print('- '*20)
for entry_type, value in count_entry_types.items():
    print('%20s%i' % (entry_type + ': ', value) )
print('- '*20)

Total bibliography items: 8
- - - - - - - - - - - - - - - - - - - - 
           article: 4
              book: 2
           booklet: 0
            inbook: 0
      incollection: 0
     inproceedings: 1
            manual: 0
     mastersthesis: 0
              misc: 0
         phdthesis: 1
       proceedings: 0
        techreport: 0
       unpublished: 0
- - - - - - - - - - - - - - - - - - - - 


<a id=bib-ex-2></a>
***
### Exercise 2

[**Go to the top ↑**](#top-page)

<a href="">
  <img src="https://img.shields.io/badge/youtube-firebrick?style=for-the-badge&logo=youtube&logoColor=white" alt="YouTube Badge"/>
</a>

We want to list all tags to items from the `.bib` file.

Given the `bibliography.bib` file, we want to accomplish:

```text
nilsson2007regression
goodfellow2016deep
lusch2018deep
kobak2019art
```

In [None]:
import re

In [None]:
directory = './'

In [None]:
filename = 'bibliography.bib'

In [None]:
file = open(directory + filename, 'r+')
file_content = file.read()
file_list = file_content.split('\n')

In [None]:
for item in file_list:
    match = re.search(r'\@.*?\{(.*)\,.*', item)
    if match is not None:
        tag = match.group(1)
        print(tag)
        
file.close()

<a id=bib-ex-3></a>
***
### Exercise 3

[**Go to the top ↑**](#top-page)

<a href="">
  <img src="https://img.shields.io/badge/youtube-firebrick?style=for-the-badge&logo=youtube&logoColor=white" alt="YouTube Badge"/>
</a>

In [None]:
import re

In [None]:
directory = './'

In [None]:
filename = 'bibliography.bib'

In [None]:
file = open(directory + filename, 'r+')
file_content = file.read()
file_list = file_content.split('\n')

In [None]:
tags_list = []

for item in file_list:
    match = re.search(r'\@.*?\{(.*)\,.*', item)
    if match is not None:
        tag = match.group(1)
        tags_list.append(tag)
        
file.close()

In [None]:
if len(tags_list) != len(set(tags_list)):

    duplicates = set([i for i in tags_list if tags_list.count(i) > 1])

    print('Duplicate tags found: ' + str(len(duplicates)))
    print('- '*20)
    for item in sorted(duplicates):
        print(item)
    print('- '*20)

else:

    print('- '*20)
    print('No duplicate tags found.')
    print('- '*20)

<a id=bib-ex-4></a>
***
### Exercise 4

[**Go to the top ↑**](#top-page)

<a href="">
  <img src="https://img.shields.io/badge/youtube-firebrick?style=for-the-badge&logo=youtube&logoColor=white" alt="YouTube Badge"/>
</a>

We want to order items in the `.bib` file according to the year of publishing.

<a id=bib-ex-5></a>
***
### Exercise 5

[**Go to the top ↑**](#top-page)

<a href="">
  <img src="https://img.shields.io/badge/youtube-firebrick?style=for-the-badge&logo=youtube&logoColor=white" alt="YouTube Badge"/>
</a>

In [None]:
field_types = ['address',
               'annote',
               'author',
               'booktitle',
               'chapter',
               'crossref',
               'edition',
               'editor',
               'howpublished',
               'institution',
               'journal',
               'key',
               'month',
               'note',
               'number',
               'organization',
               'pages',
               'publisher',
               'school',
               'series',
               'title',
               'type',
               'volume',
               'year']

***