# Student Marks Prediction with Machine Learning...

Student marks prediction is a popular data science case study based on the problem of regression. It is a good regression problem for data science beginners as it is easy to solve and understand. So if you want to learn how to predict the marks of a student with machine learning, this article is for you. In this article, I will take you through the task of student marks prediction with machine learning using Python.

#### Student Marks Prediction (Case Study)...

1.the number of courses they have opted for

2.the average time studied per day by students

3.marks obtained by students

By using this information, you need to predict the marks of other students.

#### let’s start with this task by importing the necessary Python libraries and dataset:

In [3]:
import numpy as np
import pandas as pd
import plotly.express as px
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

In [7]:
df = pd.read_csv('Student_Marks.csv')
df.head(10)

Unnamed: 0,number_courses,time_study,Marks
0,3,4.508,19.202
1,4,0.096,7.734
2,4,3.133,13.811
3,6,7.909,53.018
4,8,7.811,55.299
5,6,3.211,17.822
6,3,6.063,29.889
7,5,3.413,17.264
8,4,4.41,20.348
9,3,6.173,30.862


In [9]:
df.shape

(100, 3)

In [10]:
df.index

RangeIndex(start=0, stop=100, step=1)

So there are only three columns in the dataset. The marks column is the target column as we have to predict the marks of a student.

In [11]:
df.columns

Index(['number_courses', 'time_study', 'Marks'], dtype='object')

In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   number_courses  100 non-null    int64  
 1   time_study      100 non-null    float64
 2   Marks           100 non-null    float64
dtypes: float64(2), int64(1)
memory usage: 2.5 KB


In [13]:
df.memory_usage()

Index             128
number_courses    800
time_study        800
Marks             800
dtype: int64

#### Now before moving forward, let’s have a look at whether this dataset contains any null values or not:

In [8]:
pd.DataFrame({'Count':df.isna().sum(),'Percentage':df.isna().sum()})/len(df)

Unnamed: 0,Count,Percentage
number_courses,0.0,0.0
time_study,0.0,0.0
Marks,0.0,0.0


The dataset is ready to use because there are no null values in the data. There is a column in the data containing information about the number of courses students have chosen. Let’s look at the number of values of all values of this column:

In [20]:
df['number_courses'].value_counts()

3    22
4    21
6    16
8    16
7    15
5    10
Name: number_courses, dtype: int64

So there are a minimum of three and a maximum of eight courses students have chosen. Let’s have a look at a scatter plot to see whether the number of courses affects the marks of a student:

In [22]:
figure = px.scatter( data_frame =df, x = 'number_courses',
                    y = 'Marks', size = 'time_study',
                   title = "Number of Courses and Marks Scored")
figure.show()

According to the above data visualization, we can say that the number of courses may not affect the marks of a student if the student is studying for more time daily. So let’s have a look at the relationship between the time a studied daily and the marks scored by the student:

In [23]:
df.columns

Index(['number_courses', 'time_study', 'Marks'], dtype='object')

In [26]:
figure = px.scatter(data_frame = df, x = 'time_study',
                    y = 'Marks',size = "number_courses", trendline = 'ols',
                   title = "Number of Courses and Marks Scored")
figure.show()

You can see that there is a linear relationship between the time studied and the marks obtained. This means the more time students spend studying, the better they can score.

#### Now let’s have a look at the correlation between the marks scored by the students and the other two columns in the data:

In [27]:
correlation = df.corr()
print(correlation['Marks'].sort_values(ascending = False))

Marks             1.000000
time_study        0.942254
number_courses    0.417335
Name: Marks, dtype: float64


### Student Marks Prediction Model...

Now let’s move to the task of training a machine learning model for predicting the marks of a student. Here, I will first start by splitting the data into training and test sets:

In [28]:
x = np.array(df[['time_study','number_courses']])
y = np.array(df['Marks'])

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.2,random_state = 42)


#### Now I will train a machine learning model using the linear regression algorithm:

In [29]:
model = LinearRegression()
model.fit(x_train, y_train)
model.score(x_test, y_test)

0.9459936100591211

Now let’s test the performance of this machine learning model by giving inputs based on the features we have used to train the model and predict the marks of a student:

In [30]:
features = df[['time_study','number_courses']]
features = np.array([[4.508 ,3]])
model.predict(features)

array([22.30738483])

In [34]:
features = df[['time_study','number_courses']]
features = np.array([[7.909 ,6]])
model.predict(features)

array([45.50476836])

In [36]:
features = df[['time_study', 'number_courses']]
features = np.array([[10.607,1]])
model.predict(features)

array([50.09533296])

So this is how you can predict the marks of a student with machine learning using Python.

#### Summary
So this is how you can solve the problem of student marks prediction with machine learning. It is a good regression problem for data science beginners as it is easy to solve and understand