### Problem Statement:
### The so-called paradoxes of an author, to which a reader takes exception, often exist not in the author's book at all, but rather in the reader's head. - Friedrich Nietzsche Books are open doors to the unimagined worlds which is unique to every person. It is more than just a hobby for many. There are many among us who prefer to spend more time with books than anything else. Here we explore a big database of books. Books of different genres, from thousands of authors. In this challenge, participants are required to use the dataset to build a Machine Learning model to predict the price of books based on a given set of features. Size of training set: 6237 records Size of test set: 1560 records FEATURES: Title: The title of the book Author: The author(s) of the book. Edition: The edition of the book eg (Paperback,– Import, 26 Apr 2018) Reviews: The customer reviews about the book Ratings: The customer ratings of the book Synopsis: The synopsis of the book Genre: The genre the book belongs to BookCategory: The department the book is usually available at. Price: The price of the book (Target variable)



In [74]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression

In [71]:
train_data = pd.read_excel('D:\Datascience\MachineHacks\Book Prediction\Participants_Data\Data_Train.xlsx')
test_data = pd.read_excel('D:\Datascience\MachineHacks\Book Prediction\Participants_Data\Data_Test.xlsx')
sample_data = pd.read_excel('D:\Datascience\MachineHacks\Book Prediction\Participants_Data\Sample_Submission.xlsx') 

In [5]:
train_data.head(10)

Unnamed: 0,Title,Author,Edition,Reviews,Ratings,Synopsis,Genre,BookCategory,Price
0,The Prisoner's Gold (The Hunters 3),Chris Kuzneski,"Paperback,– 10 Mar 2016",4.0 out of 5 stars,8 customer reviews,THE HUNTERS return in their third brilliant no...,Action & Adventure (Books),Action & Adventure,220.0
1,Guru Dutt: A Tragedy in Three Acts,Arun Khopkar,"Paperback,– 7 Nov 2012",3.9 out of 5 stars,14 customer reviews,A layered portrait of a troubled genius for wh...,Cinema & Broadcast (Books),"Biographies, Diaries & True Accounts",202.93
2,Leviathan (Penguin Classics),Thomas Hobbes,"Paperback,– 25 Feb 1982",4.8 out of 5 stars,6 customer reviews,"""During the time men live without a common Pow...",International Relations,Humour,299.0
3,A Pocket Full of Rye (Miss Marple),Agatha Christie,"Paperback,– 5 Oct 2017",4.1 out of 5 stars,13 customer reviews,A handful of grain is found in the pocket of a...,Contemporary Fiction (Books),"Crime, Thriller & Mystery",180.0
4,LIFE 70 Years of Extraordinary Photography,Editors of Life,"Hardcover,– 10 Oct 2006",5.0 out of 5 stars,1 customer review,"For seven decades, ""Life"" has been thrilling t...",Photography Textbooks,"Arts, Film & Photography",965.62
5,ChiRunning: A Revolutionary Approach to Effort...,Danny Dreyer,"Paperback,– 5 May 2009",4.5 out of 5 stars,8 customer reviews,The revised edition of the bestselling ChiRunn...,Healthy Living & Wellness (Books),Sports,900.0
6,Death on the Nile (Poirot),Agatha Christie,"Paperback,– 5 Oct 2017",4.4 out of 5 stars,72 customer reviews,Agatha Christie’s most exotic murder mystery\n...,"Crime, Thriller & Mystery (Books)","Crime, Thriller & Mystery",224.0
7,Yoga Your Home Practice Companion: A Complete ...,Sivananda Yoga Vedanta Centre,"Hardcover,– Import, 1 Mar 2018",4.7 out of 5 stars,16 customer reviews,"Achieve a healthy body, mental alertness, and ...",Sports Training & Coaching (Books),Sports,836.0
8,Karmayogi: A Biography of E. Sreedharan,M S Ashokan,"Paperback,– 15 Dec 2015",4.2 out of 5 stars,111 customer reviews,Karmayogi is the dramatic and inspiring story ...,Biographies & Autobiographies (Books),"Biographies, Diaries & True Accounts",130.0
9,"The Iron King (The Accursed Kings, Book 1)",Maurice Druon,"Paperback,– 26 Mar 2013",4.0 out of 5 stars,1 customer review,‘This is the original game of thrones’ George ...,Action & Adventure (Books),Action & Adventure,695.0


In [7]:
train_data.shape

(6237, 9)

In [9]:
train_data.describe(include='all').T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Title,6237.0,5568.0,Casino Royale: James Bond 007 (Vintage),4.0,,,,,,,
Author,6237.0,3679.0,Agatha Christie,69.0,,,,,,,
Edition,6237.0,3370.0,"Paperback,– 5 Oct 2017",48.0,,,,,,,
Reviews,6237.0,36.0,5.0 out of 5 stars,1375.0,,,,,,,
Ratings,6237.0,342.0,1 customer review,1040.0,,,,,,,
Synopsis,6237.0,5549.0,A Tinkle Double Digest is two Tinkle Digests i...,8.0,,,,,,,
Genre,6237.0,345.0,Action & Adventure (Books),947.0,,,,,,,
BookCategory,6237.0,11.0,Action & Adventure,818.0,,,,,,,
Price,6237.0,,,,560.707516,690.110657,25.0,249.18,373.0,599.0,14100.0


In [11]:
test_data.describe(include='all').T

Unnamed: 0,count,unique,top,freq
Title,1560,1521,The Five Greatest Warriors (Jack West Series),3
Author,1560,1224,Agatha Christie,18
Edition,1560,1259,"Paperback,– 5 Oct 2017",12
Reviews,1560,30,5.0 out of 5 stars,376
Ratings,1560,163,1 customer review,288
Synopsis,1560,1519,The end is approaching ... Can Jack West unrav...,3
Genre,1560,225,Action & Adventure (Books),236
BookCategory,1560,11,Action & Adventure,218


In [18]:
data = pd.concat([train_data,test_data],ignore_index=True)

In [19]:
data

Unnamed: 0,Title,Author,Edition,Reviews,Ratings,Synopsis,Genre,BookCategory,Price
0,The Prisoner's Gold (The Hunters 3),Chris Kuzneski,"Paperback,– 10 Mar 2016",4.0 out of 5 stars,8 customer reviews,THE HUNTERS return in their third brilliant no...,Action & Adventure (Books),Action & Adventure,220.00
1,Guru Dutt: A Tragedy in Three Acts,Arun Khopkar,"Paperback,– 7 Nov 2012",3.9 out of 5 stars,14 customer reviews,A layered portrait of a troubled genius for wh...,Cinema & Broadcast (Books),"Biographies, Diaries & True Accounts",202.93
2,Leviathan (Penguin Classics),Thomas Hobbes,"Paperback,– 25 Feb 1982",4.8 out of 5 stars,6 customer reviews,"""During the time men live without a common Pow...",International Relations,Humour,299.00
3,A Pocket Full of Rye (Miss Marple),Agatha Christie,"Paperback,– 5 Oct 2017",4.1 out of 5 stars,13 customer reviews,A handful of grain is found in the pocket of a...,Contemporary Fiction (Books),"Crime, Thriller & Mystery",180.00
4,LIFE 70 Years of Extraordinary Photography,Editors of Life,"Hardcover,– 10 Oct 2006",5.0 out of 5 stars,1 customer review,"For seven decades, ""Life"" has been thrilling t...",Photography Textbooks,"Arts, Film & Photography",965.62
...,...,...,...,...,...,...,...,...,...
7792,100 Things Every Designer Needs to Know About ...,Susan Weinschenk,"Paperback,– 14 Apr 2011",5.0 out of 5 stars,4 customer reviews,We design to elicit responses from people. We ...,Design,"Computing, Internet & Digital Media",
7793,"Modern Letter Writing Course: Personal, Busine...",ARUN SAGAR,"Paperback,– 8 May 2013",3.6 out of 5 stars,13 customer reviews,"A 30-day course to write simple, sharp and att...",Children's Reference (Books),"Biographies, Diaries & True Accounts",
7794,The Kite Runner Graphic Novel,Khaled Hosseini,"Paperback,– 6 Sep 2011",4.0 out of 5 stars,5 customer reviews,The perennial bestseller-now available as a se...,Humour (Books),Humour,
7795,Panzer Leader (Penguin World War II Collection),Heinz Guderian,"Paperback,– 22 Sep 2009",3.5 out of 5 stars,3 customer reviews,Heinz Guderian - master of the Blitzkrieg and ...,United States History,"Biographies, Diaries & True Accounts",


In [23]:
LE = LabelEncoder()


In [27]:
data['Title_code']=LE.fit_transform(data['Title'])
data['Author_code']=LE.fit_transform(data['Author'])
data['Edition_code']=LE.fit_transform(data['Edition'])
data['Reviews_code']=LE.fit_transform(data['Reviews'])
data['Ratings_code']=LE.fit_transform(data['Ratings'])
data['Synopsis_code']=LE.fit_transform(data['Synopsis'])
data['Genre_code']=LE.fit_transform(data['Genre'])
data['BookCategory_code']=LE.fit_transform(data['BookCategory'])

In [29]:
data

Unnamed: 0,Title,Author,Edition,Reviews,Ratings,Synopsis,Genre,BookCategory,Price,Title_code,Author_code,Edition_code,Reviews_code,Ratings_code,Synopsis_code,Genre_code,BookCategory_code
0,The Prisoner's Gold (The Hunters 3),Chris Kuzneski,"Paperback,– 10 Mar 2016",4.0 out of 5 stars,8 customer reviews,THE HUNTERS return in their third brilliant no...,Action & Adventure (Books),Action & Adventure,220.00,5803,748,1231,25,324,4580,1,0
1,Guru Dutt: A Tragedy in Three Acts,Arun Khopkar,"Paperback,– 7 Nov 2012",3.9 out of 5 stars,14 customer reviews,A layered portrait of a troubled genius for wh...,Cinema & Broadcast (Books),"Biographies, Diaries & True Accounts",202.93,2120,370,3164,24,57,711,78,2
2,Leviathan (Penguin Classics),Thomas Hobbes,"Paperback,– 25 Feb 1982",4.8 out of 5 stars,6 customer reviews,"""During the time men live without a common Pow...",International Relations,Humour,299.00,2982,4045,2272,33,286,37,202,6
3,A Pocket Full of Rye (Miss Marple),Agatha Christie,"Paperback,– 5 Oct 2017",4.1 out of 5 stars,13 customer reviews,A handful of grain is found in the pocket of a...,Contemporary Fiction (Books),"Crime, Thriller & Mystery",180.00,189,79,3000,26,47,678,96,5
4,LIFE 70 Years of Extraordinary Photography,Editors of Life,"Hardcover,– 10 Oct 2006",5.0 out of 5 stars,1 customer review,"For seven decades, ""Life"" has been thrilling t...",Photography Textbooks,"Arts, Film & Photography",965.62,2853,1138,99,35,0,2228,264,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7792,100 Things Every Designer Needs to Know About ...,Susan Weinschenk,"Paperback,– 14 Apr 2011",5.0 out of 5 stars,4 customer reviews,We design to elicit responses from people. We ...,Design,"Computing, Internet & Digital Media",,16,3935,1438,35,231,6248,103,4
7793,"Modern Letter Writing Course: Personal, Busine...",ARUN SAGAR,"Paperback,– 8 May 2013",3.6 out of 5 stars,13 customer reviews,"A 30-day course to write simple, sharp and att...",Children's Reference (Books),"Biographies, Diaries & True Accounts",,3313,27,3222,21,47,420,72,2
7794,The Kite Runner Graphic Novel,Khaled Hosseini,"Paperback,– 6 Sep 2011",4.0 out of 5 stars,5 customer reviews,The perennial bestseller-now available as a se...,Humour (Books),Humour,,5483,2159,3086,25,264,5366,186,6
7795,Panzer Leader (Penguin World War II Collection),Heinz Guderian,"Paperback,– 22 Sep 2009",3.5 out of 5 stars,3 customer reviews,Heinz Guderian - master of the Blitzkrieg and ...,United States History,"Biographies, Diaries & True Accounts",,3810,1517,2135,20,185,2621,347,2


In [31]:
pp_train = data[data['Price'].notna()]
pp_test = data[data['Price'].isna()]

In [33]:
pp_train.columns

Index(['Title', 'Author', 'Edition', 'Reviews', 'Ratings', 'Synopsis', 'Genre',
       'BookCategory', 'Price', 'Title_code', 'Author_code', 'Edition_code',
       'Reviews_code', 'Ratings_code', 'Synopsis_code', 'Genre_code',
       'BookCategory_code'],
      dtype='object')

In [55]:
X = pp_train.drop(columns=['Title', 'Author', 'Edition', 'Reviews', 'Ratings', 'Synopsis', 'Genre',
       'BookCategory', 'Price'])
y = pp_train['Price']

In [60]:
X

Unnamed: 0,Title_code,Author_code,Edition_code,Reviews_code,Ratings_code,Synopsis_code,Genre_code,BookCategory_code
0,5803,748,1231,25,324,4580,1,0
1,2120,370,3164,24,57,711,78,2
2,2982,4045,2272,33,286,37,202,6
3,189,79,3000,26,47,678,96,5
4,2853,1138,99,35,0,2228,264,1
...,...,...,...,...,...,...,...,...
6232,2390,4102,3185,35,111,201,17,6
6233,5039,3843,2076,18,350,3024,96,5
6234,5195,2013,3208,23,185,279,294,9
6235,1910,177,1538,20,231,2424,1,0


In [59]:
y

0       220.00
1       202.93
2       299.00
3       180.00
4       965.62
         ...  
6232    322.00
6233    421.00
6234    399.00
6235    319.00
6236    452.00
Name: Price, Length: 6237, dtype: float64

In [75]:
randomForest = RandomForestRegressor(n_estimators = 100, random_state = 0)
linearReg = LinearRegression()

In [76]:
randomForest.fit(X, y) 
linearReg.fit(X,y)


LinearRegression()

In [77]:
pptest_X = pp_test.drop(columns=['Title', 'Author', 'Edition', 'Reviews', 'Ratings', 'Synopsis', 'Genre',
       'BookCategory', 'Price'])

In [78]:
y_unpred_RF = randomForest.predict(pptest_X)

In [79]:
y_unpred_LR = linearReg.predict(pptest_X)

In [80]:
np.unique(y_unpred_RF, return_counts=True)

(array([  63.4   ,  116.2956,  122.6884, ..., 3317.1796, 3597.9607,
        4227.9294]),
 array([1, 1, 1, ..., 1, 1, 1], dtype=int64))

In [81]:
np.unique(y_unpred_LR, return_counts=True)

(array([142.88731551, 194.8171362 , 203.80000812, ..., 936.13039677,
        943.96112446, 956.86589747]),
 array([1, 1, 1, ..., 1, 1, 1], dtype=int64))

In [84]:
sample_data['Price']=y_unpred_RF
sample_data['Price']=sample_data['Price']
sample_data.to_csv('submission_Bookprice_RF1.csv',index=False)

In [85]:
sample_data['Price']=y_unpred_LR
sample_data['Price']=sample_data['Price']
sample_data.to_csv('submission_Bookprice_LR1.csv',index=False)