**Consider the books dataset BL-Flickr-Images-Book.csv from Kaggle which contains information about books. Write a program to demonstrate the following**

*Import the data into DataFrame*

*Find and drop the column which are irrelevant for the book information*

*Change the Index of the DataFrame*

*Tidy up fields in the data such as date of publication with the help of simple regular expression*

*Combine str methods with Numpy to clean columns*

**Importing Dependencies :**

In [52]:
import numpy as np
import pandas as pd

**Importing data :**

In [53]:
dataset = pd.read_csv("datasets/BL-Flickr-Images-Book.csv")
dataset.shape

(8287, 15)

In [54]:
dataset.head()

Unnamed: 0,Identifier,Edition Statement,Place of Publication,Date of Publication,Publisher,Title,Author,Contributors,Corporate Author,Corporate Contributors,Former owner,Engraver,Issuance type,Flickr URL,Shelfmarks
0,206,,London,1879 [1878],S. Tinsley & Co.,Walter Forbes. [A novel.] By A. A,A. A.,"FORBES, Walter.",,,,,monographic,http://www.flickr.com/photos/britishlibrary/ta...,British Library HMNTS 12641.b.30.
1,216,,London; Virtue & Yorston,1868,Virtue & Co.,All for Greed. [A novel. The dedication signed...,"A., A. A.","BLAZE DE BURY, Marie Pauline Rose - Baroness",,,,,monographic,http://www.flickr.com/photos/britishlibrary/ta...,British Library HMNTS 12626.cc.2.
2,218,,London,1869,"Bradbury, Evans & Co.",Love the Avenger. By the author of “All for Gr...,"A., A. A.","BLAZE DE BURY, Marie Pauline Rose - Baroness",,,,,monographic,http://www.flickr.com/photos/britishlibrary/ta...,British Library HMNTS 12625.dd.1.
3,472,,London,1851,James Darling,"Welsh Sketches, chiefly ecclesiastical, to the...","A., E. S.","Appleyard, Ernest Silvanus.",,,,,monographic,http://www.flickr.com/photos/britishlibrary/ta...,British Library HMNTS 10369.bbb.15.
4,480,"A new edition, revised, etc.",London,1857,Wertheim & Macintosh,"[The World in which I live, and my place in it...","A., E. S.","BROOME, John Henry.",,,,,monographic,http://www.flickr.com/photos/britishlibrary/ta...,British Library HMNTS 9007.d.28.


In [55]:
list(dataset)

['Identifier',
 'Edition Statement',
 'Place of Publication',
 'Date of Publication',
 'Publisher',
 'Title',
 'Author',
 'Contributors',
 'Corporate Author',
 'Corporate Contributors',
 'Former owner',
 'Engraver',
 'Issuance type',
 'Flickr URL',
 'Shelfmarks']

**Dropping irrelevant columns :**

In [56]:
droppedCol = ['Edition Statement','Corporate Author','Corporate Contributors','Former owner','Engraver','Contributors','Issuance type','Shelfmarks']
dataset = dataset.drop(columns=droppedCol)
dataset.shape

(8287, 7)

In [57]:
list(dataset)

['Identifier',
 'Place of Publication',
 'Date of Publication',
 'Publisher',
 'Title',
 'Author',
 'Flickr URL']

**Changing index of DataFrame :**

In [58]:
dataset.set_index('Identifier',inplace=True)

In [59]:
dataset.head()

Unnamed: 0_level_0,Place of Publication,Date of Publication,Publisher,Title,Author,Flickr URL
Identifier,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
206,London,1879 [1878],S. Tinsley & Co.,Walter Forbes. [A novel.] By A. A,A. A.,http://www.flickr.com/photos/britishlibrary/ta...
216,London; Virtue & Yorston,1868,Virtue & Co.,All for Greed. [A novel. The dedication signed...,"A., A. A.",http://www.flickr.com/photos/britishlibrary/ta...
218,London,1869,"Bradbury, Evans & Co.",Love the Avenger. By the author of “All for Gr...,"A., A. A.",http://www.flickr.com/photos/britishlibrary/ta...
472,London,1851,James Darling,"Welsh Sketches, chiefly ecclesiastical, to the...","A., E. S.",http://www.flickr.com/photos/britishlibrary/ta...
480,London,1857,Wertheim & Macintosh,"[The World in which I live, and my place in it...","A., E. S.",http://www.flickr.com/photos/britishlibrary/ta...


**Cleaning up data :**

In [60]:
dataset['Date of Publication']

Identifier
206        1879 [1878]
216               1868
218               1869
472               1851
480               1857
              ...     
4158088           1838
4158128       1831, 32
4159563      [1806]-22
4159587           1834
4160339        1834-43
Name: Date of Publication, Length: 8287, dtype: object

In [61]:
dataset['Date of Publication'] = dataset['Date of Publication'].str.extract(r'(\d{4})',expand=False)

In [62]:
dataset['Date of Publication']

Identifier
206        1879
216        1868
218        1869
472        1851
480        1857
           ... 
4158088    1838
4158128    1831
4159563    1806
4159587    1834
4160339    1834
Name: Date of Publication, Length: 8287, dtype: object

**Cleaning columns :**

In [63]:
dataset['Place of Publication']

Identifier
206                          London
216        London; Virtue & Yorston
218                          London
472                          London
480                          London
                     ...           
4158088                      London
4158128                       Derby
4159563                      London
4159587         Newcastle upon Tyne
4160339                      London
Name: Place of Publication, Length: 8287, dtype: object

In [64]:
dataset['Place of Publication'] = np.where(dataset['Place of Publication'].str.contains('London'),'London',dataset['Place of Publication'].replace('-','',regex=True))

In [65]:
dataset['Place of Publication']

Identifier
206                     London
216                     London
218                     London
472                     London
480                     London
                  ...         
4158088                 London
4158128                  Derby
4159563                 London
4159587    Newcastle upon Tyne
4160339                 London
Name: Place of Publication, Length: 8287, dtype: object

In [66]:
dataset.shape

(8287, 6)

**Cleaned Dataset :**

In [67]:
dataset.head()

Unnamed: 0_level_0,Place of Publication,Date of Publication,Publisher,Title,Author,Flickr URL
Identifier,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
206,London,1879,S. Tinsley & Co.,Walter Forbes. [A novel.] By A. A,A. A.,http://www.flickr.com/photos/britishlibrary/ta...
216,London,1868,Virtue & Co.,All for Greed. [A novel. The dedication signed...,"A., A. A.",http://www.flickr.com/photos/britishlibrary/ta...
218,London,1869,"Bradbury, Evans & Co.",Love the Avenger. By the author of “All for Gr...,"A., A. A.",http://www.flickr.com/photos/britishlibrary/ta...
472,London,1851,James Darling,"Welsh Sketches, chiefly ecclesiastical, to the...","A., E. S.",http://www.flickr.com/photos/britishlibrary/ta...
480,London,1857,Wertheim & Macintosh,"[The World in which I live, and my place in it...","A., E. S.",http://www.flickr.com/photos/britishlibrary/ta...
