# I Love Computer Science

![](https://thumbs.dreamstime.com/b/computer-science-man-working-holographic-interface-visual-screen-high-quality-hologram-computer-science-man-working-99460738.jpg)

Computer science is the study of algorithmic processes and computational machines. As a discipline, computer science spans a range of topics from theoretical studies of algorithms, computation and information to the practical issues of implementing computing systems in hardware and software.

There are many books in market to teach us about computer science but i would like to learn from the best :)

This dataset holds a list of 270 books in the field of computer science and programming related topics.
The list of books was constructed using many popular websites which provide information on book ratings an of all the book in those websites the 270 most popular were selected.

# Importing Packages

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly
import plotly.express as px
import matplotlib.patches as mpatches
print('Seaborn verion', sns.__version__)
from wordcloud import WordCloud
from math import pi
plotly.offline.init_notebook_mode (connected = True)

# Having a look at the data

In [None]:
data=pd.read_csv('../input/top-270-rated-computer-science-programing-books/prog_book.csv')
data.head()

# Checking For Null Values

In [None]:
data.isna().sum()

#### Well from this we can see that our data don't have any null values and that's great :)

# Visualizations

In [None]:
# Removing , from the Reviews so that reviews can be converted to Int values
data['Reviews']=data['Reviews'].map(lambda x: x.replace(',',''))
data['Reviews']=pd.to_numeric(data['Reviews'])
group=data.groupby('Type').mean()
group

In [None]:
# This is a plot where we are making sublots with common y axis Type of the books and variety of x axis
# Rating , Reviews and Price 


fig, ax = plt.subplots(1, 3, figsize=(11,10), sharey=True)

color = sns.color_palette("hls", 6)
# Making plot 1 which is a horizontal bar plot
ax[0].barh(y=group.index, width=group['Rating'].values, color=color, edgecolor='black', height=0.7)
ax[0].set_xlabel('Mean of Ratings')
ax[0].set_yticklabels(group.index, fontweight='semibold')
ax[0].set_title('Ratings')

# Making plot 2 which is a horizontal line plot
ax[1].hlines(y=group.index , xmin=0, xmax=group['Reviews'].values, color=color, linestyles='dashed')
ax[1].plot(group['Reviews'].values, group.index, 'go', markersize=9)
ax[1].set_xlabel('Mean of Reviews')
ax[1].set_yticklabels(group.index, fontweight='semibold')
ax[1].set_title('Reviews')

# Making plot 3 which is a horizontal bar plot
ax[2].barh(y=group.index, width=group['Price'].values, color=color, edgecolor='black', height=0.7)
ax[2].set_xlabel('Mean of Price')
ax[2].set_yticklabels(group.index, fontweight='semibold')
ax[2].set_title('Price')


plt.show()



#### From this we can see that Boxed Set - Hardcover is the Category with the highest amount of ratings least amount of reviews and highest amount of mean Price . 

#### Is it worth it ????

#### Hardcover on the other side has slightly less rating but largest amount of reviews on which we can clearly rely on and lower price  . I think Hardcover is better than Boxed Set -Hardcover :)

# Spider Plot For Price And Reviews For Different Types

In [None]:
# ------- PART 1: Define a function that do a plot for one line of the dataset!
plt.figure(figsize=(18,12))
def make_spider(count, row, title, color):
 
    # number of variable
    categories=group.index
    N = len(categories)

    # What will be the angle of each axis in the plot? (we divide the plot / number of variable)
    angles = [n / float(N) * 2 * pi for n in range(N)]
    angles += angles[:1]

    # Initialise the spider plot
    ax = plt.subplot(2,2,count+1, polar=True, )

    # If you want the first axis to be on top:
    ax.set_theta_offset(pi / 2)
    ax.set_theta_direction(-1)

    # Draw one axe per variable + add labels labels yet
    plt.xticks(angles[:-1], categories, color='grey', size=8)

    # Draw ylabels
    ax.set_rlabel_position(0)
    p=np.arange(0,group[row].max(),100,int)
    plt.yticks(p, [str(i) for i in p], color="grey", size=7)
    plt.ylim(0,400)

    # Ind1
    values=group[row].values.flatten().tolist()
    values += values[:1]
    ax.plot(angles, values, color=color, linewidth=2, linestyle='solid')
    ax.fill(angles, values, color=color, alpha=0.4)

    # Add a title
    plt.title(title, size=11, color=color, y=1.1)

    # ------- PART 2: Apply to all individuals
    # initialize the figure
    my_dpi=96
    plt.figure(figsize=(1000/my_dpi, 1000/my_dpi), dpi=my_dpi)

    # Create a color palette:
my_palette = plt.cm.get_cmap("Set2", len(group.index))

# Loop to plot
count=0
for row in group.columns[1:]:
    make_spider( count,row=row, title='The Plot is for '+row, color=my_palette(count))
    count+=1
plt.show()


This graph is the same as the above plot which shows the data inclination towards the different types of the books  ....

# Relationship between number of pages and reviews

In [None]:
# Making a jointplot between the number of pages in the book and the price of the book with kind as regression
sns.jointplot(data=data,x='Number_Of_Pages',y='Price',kind='reg',color='orange',xlim={200,1000},ylim={0,150},joint_kws={'line_kws':{'color':'green'}})
green_patch=mpatches.Patch(color='green',label='Reg Line')
plt.legend(handles=[green_patch])
plt.show()

Well from this we can clearly see the linear dependance between the Price and the number of pages which means as the number of the pages in the book increases we can also see a hike in the price of the book :)

# Relationship between number of pages and Ratings

In [None]:
#Lmplot between number of pages and Ratings 
sns.lmplot(data=data,x='Number_Of_Pages',y='Rating')

This graph clearly shows the Linear relation between number of pages and Ratings

# Common Findings in the name of the book :)

In [None]:
# Making a new data column with the length of the name of the books
data['Length']=data['Book_title'].map(lambda x: len(x))

In [None]:
data['Length']=pd.to_numeric(data['Length'])

In [None]:
sns.kdeplot(data['Length'])

Well we can see most of the titles are of the length between 25-50

# Word Cloud For Books With Higher Than 4 Rating

In [None]:
fig,ax= plt.subplots(1, 2, figsize=(18,8), sharey=True)

text =''
for i in data[data['Rating']>=4.0]['Description'].values:
    text+=i + ' '

wordcloud = WordCloud().generate(text)

# Display the generated image:
ax[0].imshow(wordcloud, interpolation='bilinear')
ax[0].axis("off")
ax[0].set_title('Word Cloud for ratings more than 4',fontsize=15)
text2 =''
for i in data[data['Rating']<4.0]['Description'].values:
    text2+=i + ' '

wordcloud2 = WordCloud().generate(text2)
ax[1].imshow(wordcloud2, interpolation='bilinear')
ax[1].axis("off")
ax[1].set_title('Word Cloud for ratings less than 4',fontsize=15)
plt.show()

From this we can clearly see the difference between both !! The one which is above 4 rating has the words like C programming focused but the books with lower than 4 ratings have the new, book language words focused in it . So we can clearly see the difference ....