# 基于图书馆借书数据分析的大学生课外阅读的偏好研究

## 摘要
图书馆数据包含许多学生的阅读记录，这些记录反映了他们的一般知识获取情况。本研究的目的是深入挖掘图书馆的借书数据，关注不同的书籍目录和属性，以预测学生的课外兴趣。
## 一、引言
课外学习是大学生知识获取的重要组成部分，在很大程度上影响着学生的毕业、求职乃至未来的发展。除了注重专业课程学习外，大学生的课外学习行为也逐渐成为大学教育管理关注的问题。了解大学生的课外学习偏好，有助于及早预测大学生阅读习惯和状态，从而防止学生偏向学习的不良后果。

作为学生学习的场所，校园图书馆是研究学生课外阅读行为的主要来源场所。客观地说，每个大学图书馆都有一个图书访问系统，是一个单向模块，可以完整地记录学生的个人信息和借书历史。然后，通过分析这些记录的数据，可以调查学生的学习和阅读行为。然而，许多来自不同专业的学生去图书馆查看各种不同类型的学术或课外阅读材料。图书馆积累了大量的借书和还书数据，工作人员也需要定期对数据进行描述性统计分析。
## 二、导入必要的库

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statistics as st

## 导入数据集

In [2]:
df=pd.read_csv('Reading Habbit Of Students.csv')

## 关于数据集
### 数据内容

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,gender,faculty,Enter Your Location,kind of books preffered for study,How Frequently do you visit library,For what Purposes do yo visit library,Average Time spent in collage,What is general Purposes,Which one is your Prefered location,...,Dose Covid 19 Pandemic Affected Your Reading Habits,Do you purchase Books from store,Average Expenditure on books,Occupation Of Father,Parents Education,Select your Faculty,Enter your Location,Preferred Language for Learning,Do you Using National dig,Occupation of Father
0,0,Female,Arts,Urban,Text Books,Once in a week,For Reading Novels and,s 2-4 hours,To while away time,Home,...,No - Not Affected,Yes,Less Than Rs. 500,Farmer,Educated,Science,Rural,English,No,Farmer
1,1,Female,Commerce,Urban,Lecture Videos,Once in a month,For Reading Novels and,s Less than an hour,To pass the examination,Class Room,...,Yes - Positively Affected,Yes,Rs. 500 to Rs. 2000,Job,Educated,Science,Rural,English,No,Farmer
2,2,Female,Science,Rural,Reference Books,Rarely,For Reading Novels and,s Less than an hour,To get the knowledge and,Home,...,No - Not Affected,No,Less Than Rs. 500,Business,Educated,Science,Rural,English,No,Farmer
3,3,Male,other,Urban,Text Books,Once in a week,For Reading Novels and,s Less than an hour,To get the knowledge and,Central Library,...,No - Not Affected,Yes,Rs. 500 to Rs. 2000,Business,Educated,Science,Rural,English,No,Farmer
4,4,Male,Science,Rural,Lecture Videos,2-3 times in a week,To Complete Assignment,Less than an hour,To get the knowledge and,Home,...,No - Not Affected,Yes,Rs. 500 to Rs. 2000,Farmer,Educated,Science,Rural,English,No,Farmer


### 数据信息

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 228 entries, 0 to 227
Data columns (total 30 columns):
Unnamed: 0                                             228 non-null int64
gender                                                 228 non-null object
faculty                                                228 non-null object
Enter Your Location                                    228 non-null object
kind of books preffered for study                      228 non-null object
How Frequently do you visit library                    228 non-null object
For what Purposes do yo visit library                  228 non-null object
Average Time spent in collage                          228 non-null object
What is general Purposes                               228 non-null object
Which one is your Prefered location                    228 non-null object
What is your preferred time?                           228 non-null object
Preferred language for Learning                        228 non-null obj

### 删除不要的列

In [5]:
df =df.drop(['Unnamed: 0','Select your Faculty','Occupation of Father',"Enter your Location","Preferred Language for Learning","Do you Using National dig"], axis=1)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 228 entries, 0 to 227
Data columns (total 24 columns):
gender                                                 228 non-null object
faculty                                                228 non-null object
Enter Your Location                                    228 non-null object
kind of books preffered for study                      228 non-null object
How Frequently do you visit library                    228 non-null object
For what Purposes do yo visit library                  228 non-null object
Average Time spent in collage                          228 non-null object
What is general Purposes                               228 non-null object
Which one is your Prefered location                    228 non-null object
What is your preferred time?                           228 non-null object
Preferred language for Learning                        228 non-null object
Preferred type for reading                             228 non-null ob

### 重命名列

In [7]:
"rename the column"
df.rename(columns = {'Select your faculty':'faculty'}, inplace = True)

### 数据集缩略图

In [8]:


# Plot the bar chart
ax = grouped.plot(kind="bar", legend=False, figsize=(10, 5))
plt.xlabel("Faculty")
plt.ylabel("Number of visits")
plt.title("Faculty wise visit in library")

# Adding actual numbers on top of the bars
for idx, value in enumerate(grouped["How Frequently do you visit library"]):
    ax.text(idx, value, str(value), ha='center', va='bottom')

plt.show()


NameError: name 'grouped' is not defined

# Faculty wise Reading Habbits

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.countplot(x='faculty', data=df)
plt.title('Reading Habits of Different Faculties')
plt.xlabel('Faculty')
plt.ylabel('Count')

plt.show()


# To see What contains in Dataset

In [None]:
df.head()

# All Columns

In [None]:
df.columns

# Information of Data

In [None]:
df.info()

# OBJECTIVES

# Objective 1
To find the reading habits of different faculties

In [None]:
"""To find the reading habits of different faculties, you can use the "groupby" 
function in pandas to group the data by faculty and then analyze the reading 
habits for each group."""

In [None]:
import pandas as pd


# Group the data by faculty
grouped = df.groupby('faculty')

# Get the count of each value for the "Do you enjoy the Reading" column for each faculty
reading_habits = grouped['Do you enjoy the Reading'].value_counts()

# Print the results
print(reading_habits)


In [None]:
import pandas as pd
import seaborn as sns



# Group the data by faculty
grouped = df.groupby('faculty')

# Get the count of each value for the "Do you enjoy the Reading" column for each faculty
reading_habits = grouped['Do you enjoy the Reading'].value_counts().reset_index(name='count')

# Create a bar plot using Seaborn
sns.catplot(x='faculty', y='count', hue='Do you enjoy the Reading', data=reading_habits, kind='bar')

# Display the plot
plt.show()


# Objective 2
	To find the association between parents education and  frequency of reading of students.

In [None]:
"""To find the association between parents' education and the frequency of 
reading of students, we can create a contingency table and perform 
a chi-square test of independence."""

In [None]:
import pandas as pd
from scipy.stats import chi2_contingency


# Create a contingency table for parents education and frequency of reading
contingency_table = pd.crosstab(df['Parents Education'], df['How Frequently do you visit library'])

# Perform chi-square test of independence
chi2, p, dof, expected = chi2_contingency(contingency_table)

# Print the results
print("Chi-square value:", chi2)
print("p-value:", p)
print("Degrees of freedom:", dof)
print("Expected values:", expected)


# Objective 
		To find association of frequency of reading between rural and urban area.

In [None]:
"""To find the association of frequency of reading between rural and urban areas, 
we can first group the data by location (rural or urban) and then compare the
frequency of reading between the two groups.

"""

In [None]:
import pandas as pd
import seaborn as sns



# Group the data by location (rural or urban)
grouped = df.groupby('Enter Your Location')

# Create a new dataframe with the frequency of reading for each location
reading_freq = grouped['How Frequently do you visit library'].value_counts().unstack().fillna(0)

# Create a stacked bar chart to compare the frequency of reading between rural and urban areas
sns.set_style("whitegrid")
ax = reading_freq.plot(kind='bar', stacked=True)
ax.set_xlabel("Location")
ax.set_ylabel("Frequency of Reading")
ax.set_title("Association between Frequency of Reading and Location")


# objective 4

	To find association of reading frequency between male and female.


In [None]:
"""To find the association between reading frequency and gender, 
we can use a chi-square test of independence"""

In [None]:
import pandas as pd
from scipy.stats import chi2_contingency



# Create a contingency table of reading frequency and gender
cont_table = pd.crosstab(df['How Frequently do you visit library'], df['gender'])


# Perform chi-square test of independence
chi2, p_value, dof, expected = chi2_contingency(cont_table)

# Print the results
print("Chi-square test statistic: ", chi2)
print("P-value: ", p_value)
print("Degrees of freedom: ", dof)
print("Expected frequencies: ")
print(expected)


In [None]:
"""The null hypothesis for the chi-square test of independence is that there is
no association between the two variables (reading frequency and gender). 
If the p-value is less than the significance level (usually 0.05), 
we can reject the null hypothesis and conclude that there is a 
significant association between the variables."""

# Pie charts

In [None]:

for col in df.columns:
    plt.figure(figsize=(5,5))
    df[col].value_counts().plot.pie(autopct='%1.1f%%')
    plt.title(col + " Distribution")
    plt.show()

# Bar plots

In [None]:

for col in df.columns:
    plt.figure(figsize=(5,5))
    sns.countplot(x=col, data=df)
    plt.title(col + " Distribution")
    plt.show()

# Histogram

In [None]:
for col in df.columns:
    plt.figure(figsize=(5,5))
    sns.histplot(x=col, data=df)
    plt.title(col + " Distribution")
    plt.show()

# Displot

In [None]:

for col in df.columns:
    plt.figure(figsize=(5,5))
    sns.displot(x=col, data=df)
    plt.title(col + " Distribution")
    plt.show()
