# Reconstruct the Dewey classification at the British Library
We start as usual by prepraring everything to work with our csv file.
 * we import the module 'csv'
 * we create an empty dictionary, *listDewey*, that will include DDC numbers, types of topics and specific topics
 * we open the csv file and we iterate over its rows
 
In order to fill the dictionary, we group pairs of 'type of topic'/'topic' by each DDC number. Therefore the new dictionary, *dictDewey*, will store DDC numbers as keys, and a list of tuples - in the form *(type of topic , topic)* - as values. 

To fill the dictionary with only unique keys, we define an *if/else* statement while iterating over the rows of the csv file. 

 * If the DDC number has already been included as a key of the dictionary (and therefore it has a list as value, including at least a tuple *(type of topic , topic)*), append only the new tuples to the end of the list. 
 * Otherwhise, if the DDC number has not been included yet as a key, create a key/value pair, where the value is a list including the tuple *(type of topic , topic)*. 



In [23]:
import csv 

dictDewey = {}

with open('topics.csv', 'r', errors='ignore') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:

        # 1. dewey classification: dewey numbers are unique keys; values are lists. 
        # if the dewey number already exists, append new tuples to the list, otherwise the list is just that value
        if row['Dewey classification'] in dictDewey:
            dictDewey[row['Dewey classification']].append((row['Type of topic'], row['Topic']))
        else:
            dictDewey[row['Dewey classification']] = [(row['Type of topic'], row['Topic'])]
dictDewey

{'': [('chronological term', '1997-2011'),
  ('general term', 'Adventure stories--History and criticism'),
  ('general term', 'Apocalyptic literature--History and criticism'),
  ('general term', 'Authors--Great Britain--Biography'),
  ('title', 'Bible--Study and teaching'),
  ('general term', 'Books and reading'),
  ('person', 'Brown, Dan, 1964-'),
  ('person', 'Bulgakov, Mikhail, 1891-1940'),
  ('person', 'Bulgakov, Mikhail, 1891-1940--Philosophy'),
  ('general term', 'Characters and characteristics'),
  ('general term', "Children's literature"),
  ('general term', "Children's literature, English"),
  ('general term',
   "Children's literature, English--Germany--History and criticism"),
  ('general term', "Children's literature, English--History and criticism"),
  ('general term', "Children's literature, English--History and criticism"),
  ('general term', "Children's literature--History and criticism"),
  ('general term', "Children's literature--Marketing"),
  ('general term', "Child

So doing we get a dictionary where the keys are unique, but there are several repetitions in their values (i.e. many tuples are the repeated in the same list). 

We clean the lists of tuples iterating over the pairs key/value of the dictionary (by using dictDewey*.items()*). The values (i.e. the lists) are defined as sets, thus duplicate tuples are removed.

Duplicate tuples are removed, but still, the first item of the tuple is repeated several times for each different topic with the same type. Then we group topics by types of topic by modifying the structure of the list in a **defaultdict**. 
 * we define an empty defaultdict for each pair DDC/list of *dictDewey*
 * we iterate over each item of a tuple (*k*, *v*)
 * such defaultict are defined as empty lists that will be filled with unique keys *k* (types of topics) and a list of topics *v* as values.

This process groups all the topics by a common type of topic, i.e. all the values are grouped by a common key.

Lastly, we substitute the original values in key/value pairs of *dictDewey* with such new structure. The result is a cleaned dictionary


In [27]:
import csv 
from collections import defaultdict

dictDewey = {}

with open('topics.csv', 'r', errors='ignore') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:

        # 1. dewey classification: dewey numbers are unique keys; values are lists. 
        # if the dewey number already exists, append new tuples to the list, otherwise the list is just that value
        if row['Dewey classification'] in dictDewey:
            dictDewey[row['Dewey classification']].append((row['Type of topic'], row['Topic']))
        else:
            dictDewey[row['Dewey classification']] = [(row['Type of topic'], row['Topic'])]

# clean lists in the dictionary of Dewey numbers
for dewey, values_list in sorted(dictDewey.items()):
    values_list =list(set(values_list)) # remove duplicate tuples: (type of topic, topic)

    d = defaultdict(list)
    for k, v in values_list: # group tuples by type of topic
        d[k].append(v)
        dictDewey[dewey] = d.items()

dictDewey

{'': dict_items([('organisation', ['National Library of Medicine (U.S.)--Exhibitions']), ('geographical term', ['England--Juvenile fiction', 'Sâarospatak', 'Magyarorszâag', 'England']), ('person', ['Potter, Harry, (Fictitious character)--Religious aspects--Christianity', 'Rowling, J. K.--Study and teaching', 'Rowling, J. K.--Characters--Miscellanea--Juvenile literature', 'Potter, Harry, (Fictitious character)--Religious aspects', 'Rowling, J. K.--Influence', 'Potter, Harry, (Fictitious character)--Criticism and interpretation', 'Булгаков, Михаил Афанасьевич, 1891-1940', 'Tolkien, J. R. R. (John Ronald Reuel), 1892-1973', 'Ködöböcz, József', 'Newman, John Henry, 1801-1890', 'Rowling, J. K.--Criticism and interpretation--Congresses', 'Potter, Harry, (Fictitious character)--Dictionaries--English', 'Bulgakov, Mikhail, 1891-1940', 'Potter, Harry, (Fictitious character)--Juvenile fiction', 'Rowling, J. K.--Criticism and interpretation', 'Rowling, J. K.--Characters', 'Rowling, J. K.--Literary

We print in a pretty way our data by iterating over the new structure and we build a bullet list.

In [28]:
# print a list of dewey numbers, types of topics and topics
for x,y in sorted(dictDewey.items()): # sort by dewey number
    print(x, ':')
    for w, z in y:
        print('	-', w+ ':')
        for o in z:
            print('		-', o)

 :
	- organisation:
		- National Library of Medicine (U.S.)--Exhibitions
	- geographical term:
		- England--Juvenile fiction
		- Sâarospatak
		- Magyarorszâag
		- England
	- person:
		- Potter, Harry, (Fictitious character)--Religious aspects--Christianity
		- Rowling, J. K.--Study and teaching
		- Rowling, J. K.--Characters--Miscellanea--Juvenile literature
		- Potter, Harry, (Fictitious character)--Religious aspects
		- Rowling, J. K.--Influence
		- Potter, Harry, (Fictitious character)--Criticism and interpretation
		- Булгаков, Михаил Афанасьевич, 1891-1940
		- Tolkien, J. R. R. (John Ronald Reuel), 1892-1973
		- Ködöböcz, József
		- Newman, John Henry, 1801-1890
		- Rowling, J. K.--Criticism and interpretation--Congresses
		- Potter, Harry, (Fictitious character)--Dictionaries--English
		- Bulgakov, Mikhail, 1891-1940
		- Potter, Harry, (Fictitious character)--Juvenile fiction
		- Rowling, J. K.--Criticism and interpretation
		- Rowling, J. K.--Characters
		- Rowling, J. K.--Liter

		- Weasley, Ron, (Fictitious character)--Juvenile literature
		- Dumbledore, Albus, (Fictitious character)--Juvenile literature
		- Rowling, J. K.--Characters--Hermione Granger--Juvenile literature
		- Rowling, J. K.--Characters--Harry Potter--Juvenile literature
		- Rowling, J. K.--Characters--Juvenile literature
		- Granger, Hermione, (Fictitious character)--Juvenile literature
		- Potter, Harry (Fictitious character)--Miscellanea
791.4575 :
	- general term:
		- Harry Potter films--Juvenile literature
	- person:
		- Rowling, J. K.--Characters--Harry Potter--Juvenile literature
		- Rowling, J. K.--Characters--Ron Weasley--Juvenile literature
		- Rowling, J. K.--Characters--Albus Dumbledore--Juvenile literature
		- Rowling, J. K.--Characters--Hermione Granger--Juvenile literature
794.1 :
	- general term:
		- Chess for children--Periodicals
		- Chess sets--Periodicals
	- person:
		- Potter, Harry, (Fictitious character)--Collectibles--Periodicals
794.14 :
	- general term:
		- Chess for

# Reproduce the British Library organization
In order to construct our three sections (notated music, texts and online resources)
 * we create a new empty list, *listTypes*, that will include tuples in the form *('Content type','BL record ID')* 
 * we iterate over the rows of the csv file to extract data from 'Content type' and 'BL record ID' fields 
 * we store such values as items of tuples in *listTypes*
 
To group BL IDs by type of content we first define a new **defaultdict** called *types* whose values are defined as lists by default. Then we iterate over each pair of items in the tuples (*k*, *v*) included in *listTypes* and we store them as unique keys *k* and a list as value *v*.

In [None]:
import csv 
from collections import defaultdict

listTypes = []

with open('topics.csv', 'r', errors='ignore') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        # 2. type classification: create a list of tuples including the type and the BNB number
        listTypes.append( (row['Content type'], row['BL record ID'] ) )

# clean the list of types and BL IDs
types = defaultdict(list)
for k, v in listTypes:
    types[k].append(v)

We print our results in a prettier way by directly iterating over the new defaultdict.

In [None]:
# print the list of types and BL IDs
for key, blnumbers in types.items():
    print(key+': ')
    for blnumber in set(blnumbers):
        if blnumber is not '':
            print('- ', blnumber)