# Getting rid of punctuations and accents in the Catalogue Raisonne Dataset

## 1. Import the requires libraries

In [35]:
import pandas as pd
import numpy as np
from unidecode import unidecode

If you do not have unidecode installed, please use the following command to install it. Remember that if you are using a Mac you might have to use `pip3` for the following command to work.

In [32]:
!pip install unidecode

Collecting unidecode
  Downloading https://files.pythonhosted.org/packages/d0/42/d9edfed04228bacea2d824904cae367ee9efd05e6cce7ceaaedd0b0ad964/Unidecode-1.1.1-py2.py3-none-any.whl (238kB)
Installing collected packages: unidecode
Successfully installed unidecode-1.1.1


## 2. Read the data and clean it

Read the dataset and assign it to the variable df (a.k.a. dataframe).

In [15]:
# the excel spreadsheet provided had the data in "Sheet2" so update this as necessary
df = pd.read_excel('Your/Path/catalogue_raisonne_data.xlsx', sheet_name="Sheet2")

The following lines of code `.apply` a `lambda` function to clean each pandas Series. They follow the steps below.
1. Take the pandas Series we would like to modify
2. Convert each element into string (np.nan are consider floats)
3. Clean the punctuations and accents for each element in the Series
4. For those containing "NaN" values it we reformat them as np.nan
5. Create a new variable in our dataframe called "new_..."

In [65]:
df['new_artist'] = df['artist'].apply(lambda x: unidecode(str(x)))
df['new_author'] = df['author'].apply(lambda x: unidecode(str(x))).apply(lambda x: np.nan if x == 'nan' else x)
df['new_author_s'] = df['author_s'].apply(lambda x: unidecode(str(x))).apply(lambda x: np.nan if x == 'nan' else x)
df['new_imprint'] = df['imprint'].apply(lambda x: unidecode(str(x))).apply(lambda x: np.nan if x == 'nan' else x)
df['new_public_note'] = df['public_note'].apply(lambda x: unidecode(str(x))).apply(lambda x: np.nan if x == 'nan' else x)
df.head() #observe your data to make sure the columns were successfully created

Unnamed: 0,artist,pub_or_prep,item_t,author,author_s,isbn,imprint,language,size,pages,...,index,exhibition_list,cronology,ind_entr_cont,public_note,new_artist,new_author,new_author_s,new_imprint,new_public_note
0,"Diziani, Gaspare",1,Gaspare Diziani.,"Zugni-Tauro, Anna Paola",,,"Venice : Alfieri, 1971.",Italian,28 cm,363,...,No,No,No,Bibliog.; Comments,"Introductory material includes a biography, a ...","Diziani, Gaspare","Zugni-Tauro, Anna Paola",,"Venice : Alfieri, 1971.","Introductory material includes a biography, a ..."
1,"Dufy, Raoul",1,Raoul Dufy : catalogue raisonné de l'oeuvre pe...,,"['Guillon-Laffaille, Fanny (bibliog.)', 'Laffa...",2865740056,"Paris : Éditions Louis Carré, 1985.",French,29 cm,212,...,Yes,No,No,Exhib. Hist; Bibliog.; Comments,This is considered the fifth volume of the cat...,"Dufy, Raoul",,"['Guillon-Laffaille, Fanny (bibliog.)', 'Laffa...","Paris : Editions Louis Carre, 1985.",This is considered the fifth volume of the cat...
2,"Guidobono, Bartolomeo",1,Bartolomeo e Domenico Guidobono.,,"['Newcome Schleier, Mary', 'Cameirana, Arrigo'...",8880520156,"Turin : Artema : Compagnia di belle arti, 2002.",,30 cm,"xxvii, 217",...,Yes,No,No,Provenance; Bibliog.; Comments,This study covers painted works found in churc...,"Guidobono, Bartolomeo",,"['Newcome Schleier, Mary', 'Cameirana, Arrigo'...","Turin : Artema : Compagnia di belle arti, 2002.",This study covers painted works found in churc...
3,"Heath, Frederick",1,"The Heath Family Engravers, 1779-1878.","Heath, John",,085967908X,"Aldershot : Scolar Press, 1993.",English,26 cm,v. 1: 242; v. 2: 351,...,Yes,No,No,,Each volume begins with sections on the histor...,"Heath, Frederick","Heath, John",,"Aldershot : Scolar Press, 1993.",Each volume begins with sections on the histor...
4,"Jones, John Llewelyn",1,John Llewelyn Jones : Australia's Forgotten Pa...,,"['Aufy, Gile', 'Corbally Stourton, Patrick']",0646348868,"Edgecliff, N.S.W. : Corbally Stourton Contempo...",English,30 cm,279,...,No,No,No,,,"Jones, John Llewelyn",,"['Aufy, Gile', 'Corbally Stourton, Patrick']","Edgecliff, N.S.W. : Corbally Stourton Contempo...",


## 3. Save the data

In [66]:
df.to_csv('new_catalogue_raisonne.csv', index=False, encoding='utf-8')

In [67]:
pd.read_csv('new_catalogue_raisonne.csv')

Unnamed: 0,artist,pub_or_prep,item_t,author,author_s,isbn,imprint,language,size,pages,...,index,exhibition_list,cronology,ind_entr_cont,public_note,new_artist,new_author,new_author_s,new_imprint,new_public_note
0,"Diziani, Gaspare",1,Gaspare Diziani.,"Zugni-Tauro, Anna Paola",,,"Venice : Alfieri, 1971.",Italian,28 cm,363,...,No,No,No,Bibliog.; Comments,"Introductory material includes a biography, a ...","Diziani, Gaspare","Zugni-Tauro, Anna Paola",,"Venice : Alfieri, 1971.","Introductory material includes a biography, a ..."
1,"Dufy, Raoul",1,Raoul Dufy : catalogue raisonné de l'oeuvre pe...,,"['Guillon-Laffaille, Fanny (bibliog.)', 'Laffa...",2865740056,"Paris : Éditions Louis Carré, 1985.",French,29 cm,212,...,Yes,No,No,Exhib. Hist; Bibliog.; Comments,This is considered the fifth volume of the cat...,"Dufy, Raoul",,"['Guillon-Laffaille, Fanny (bibliog.)', 'Laffa...","Paris : Editions Louis Carre, 1985.",This is considered the fifth volume of the cat...
2,"Guidobono, Bartolomeo",1,Bartolomeo e Domenico Guidobono.,,"['Newcome Schleier, Mary', 'Cameirana, Arrigo'...",8880520156,"Turin : Artema : Compagnia di belle arti, 2002.",,30 cm,"xxvii, 217",...,Yes,No,No,Provenance; Bibliog.; Comments,This study covers painted works found in churc...,"Guidobono, Bartolomeo",,"['Newcome Schleier, Mary', 'Cameirana, Arrigo'...","Turin : Artema : Compagnia di belle arti, 2002.",This study covers painted works found in churc...
3,"Heath, Frederick",1,"The Heath Family Engravers, 1779-1878.","Heath, John",,085967908X,"Aldershot : Scolar Press, 1993.",English,26 cm,v. 1: 242; v. 2: 351,...,Yes,No,No,,Each volume begins with sections on the histor...,"Heath, Frederick","Heath, John",,"Aldershot : Scolar Press, 1993.",Each volume begins with sections on the histor...
4,"Jones, John Llewelyn",1,John Llewelyn Jones : Australia's Forgotten Pa...,,"['Aufy, Gile', 'Corbally Stourton, Patrick']",0646348868,"Edgecliff, N.S.W. : Corbally Stourton Contempo...",English,30 cm,279,...,No,No,No,,,"Jones, John Llewelyn",,"['Aufy, Gile', 'Corbally Stourton, Patrick']","Edgecliff, N.S.W. : Corbally Stourton Contempo...",
5,"Moy, Seong",2,,"Iacono, Domenic",,,,,,,...,,,,,,"Moy, Seong","Iacono, Domenic",,,
6,"Schönebeck, Eugen",2,,"Judin, Juerg",,,,,,,...,,,,,,"Schonebeck, Eugen","Judin, Juerg",,,
7,"Paresce, Renato",1,René Paresce : catalogo ragionato delle opere.,"Ferrario, Rachele",,9788857216232,"Milan : Skira, 2012.",,30 cm,351,...,No,Yes,No,Exhib. Hist; Bibliog.,"Following the introductory essay, the catalogu...","Paresce, Renato","Ferrario, Rachele",,"Milan : Skira, 2012.","Following the introductory essay, the catalogu..."
8,"Patel, Jacques",1,Les Patel : Pierre Patel (1605-1676) et ses fi...,,"['Coural, Natalie', 'Thuillier, Jacques (prefa...",2903239282,"Paris : Arthena, 2001.",French,29 cm,447,...,Yes,No,No,,,"Patel, Jacques",,"['Coural, Natalie', 'Thuillier, Jacques (prefa...","Paris : Arthena, 2001.",
9,"Schönebeck, Eugen",1,Eugen Schönebeck.,"Hirsch, Thomas",,9783943616095,"Munich : Klinkhardt &amp; Biermann, 2014.",German,21 cm,71,...,No,No,Yes,,"The publication has the alternate title ""Eugen...","Schonebeck, Eugen","Hirsch, Thomas",,"Munich : Klinkhardt &amp; Biermann, 2014.","The publication has the alternate title ""Eugen..."
