### Cell 1: Importing all the necessary Libraries

1. **Tkinter**: Tkinter is a standard GUI (Graphical User Interface) toolkit for Python.  
2. **Matplotlib**: Matplotlib is a comprehensive plotting library for Python, providing a wide range of high-quality 2D plotting capabilities.
3. **Pandas**: Pandas is a powerful data manipulation and analysis library for Python, offering versatile data structures and functions for handling structured data effectively.
4. **Scikit-learn**: Scikit-learn is a comprehensive machine learning library for Python, providing tools for building and evaluating various machine learning models.

In [44]:
import tkinter as tk
from tkinter import ttk
import matplotlib.pyplot as plt
from matplotlib.figure import Figure
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import numpy as np

### Cell 2: Data Preprocessing
This cell defines a series of classes to handle data preprocessing and analysis tasks for a dataset containing student information. The classes are structured using object-oriented programming (OOP) principles to facilitate code organization, reusability, and modularity.

#### Type of OOP Concepts Used:
- **Inheritance**: The code employs inheritance to create specialized classes `(BasicDataProcessor and DataWrangling)` that inherit attributes and methods from more general classes `(DataProcessor and BasicDataProcessor)`. This promotes code reuse and supports the `"is-a" relationship` between classes.

- **Multilevel Inheritance** is implemented in this code.

- **Encapsulation**: Each class encapsulates specific functionalities related to data processing, such as loading data `(loadData())`, handling missing values `(handleMissingValues())`, and generating visualizations `(pieChart())`. Encapsulation helps organize related code into cohesive units and hides internal implementation details from external users.

In [45]:
#This Class Loads the dataset from the filepath into the dataframe and drop the insignificant columns 
# from the dataframe.
class DataProcessor:
    #Constructor
    def __init__(self, file_path):
        self.file_path = file_path
        self.data = None

    #Method to load data from a given file path
    def loadData(self):
        try:
            self.data = pd.read_csv(self.file_path)
            return self.data
        except FileNotFoundError:
            messagebox.showerror("Error", "File not found!")
            return None
    
    #Mathod to drop the columns from the dataframe
    def dropColumns(self):
        columnsToDrop =  [0, 1, 2, 10, 12, 14]
        self.data.drop(columns=self.data.columns[columnsToDrop], inplace=True)
        return self.data

  
#This class handles the missing values, if any. And also it replaces the misssing values with the most frequent
#category in the column
class BasicDataProcessor(DataProcessor):
    #Invokes the constructor of the parent class
    def __init__(self, file_path):
        super().__init__(file_path)

    #Replacing the missing values with the most frequent category
    def handleMissingValues(self):
        most_frequent_category = self.data['Gender'].mode()[0]
        self.data['Gender'] = self.data['Gender'].fillna(most_frequent_category)
        return self.data

    #Converting categorical variables into numerical using label encoding
    def labelEncoder(self):
        label_encoder = LabelEncoder()
        self.data['Gender_numeric'] = label_encoder.fit_transform(self.data['Gender'])
        self.data['Current_Year_of_Study_Numeric'] = label_encoder.fit_transform(self.data['Current Year of Study:'])
        self.data['Branch_of_Study_Numeric'] = label_encoder.fit_transform(self.data['Branch of Study: '])
        self.data['Average_Hours_Numeric'] = label_encoder.fit_transform(self.data['Average Hours of Daily Study;'])
        self.data['Extra-curricular_Numeric'] = label_encoder.fit_transform(self.data['Participation in Extra-curricular Activities: '])
        self.data['Study_Ambiance_Preferences_Numeric'] = label_encoder.fit_transform(self.data['Studying Ambiance Preferences'])

        return self.data
    

#This class is  used for handling imbalanced dataset where there are many negative samples compared to positive ones.
#numerical value in the column.
class DataWrangling(BasicDataProcessor):
    #Invoking the constructor of the parent class and calling the methods which are performing data-preprocessing
    def __init__(self, file_path):
        super().__init__(file_path)
        self.loadData()
        self.dropColumns()
        self.handleMissingValues()
        self.labelEncoder()
        
    #Plotting the pie chart to show the distribution of the values present in the dataframe
    def pieChart(self, frame):
        branch_counts = self.data['Branch of Study: '].value_counts()

        fig = Figure(figsize=(12, 6))
        ax = fig.add_subplot(111)
        
        ax.pie(branch_counts, labels=branch_counts.index, autopct='%1.1f%%', startangle=140)
        ax.set_title('Distribution of Branch of Study')
        ax.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

        # Display the pie chart on the GUI frame
        canvas = FigureCanvasTkAgg(fig, master=frame)
        canvas.draw()
        canvas.get_tk_widget().pack(side=tk.TOP, fill=tk.BOTH, expand=1)

    #Method for creating the subset lists with respect to the branch of study
    def branchSubsets(self):
        self.unique_branches = self.data['Branch of Study: '].unique()

        self.branch_subsets = {}

        for branch in self.unique_branches:
            self.branch_subsets[branch] = self.data[self.data['Branch of Study: '] == branch]
    
    def printData(self):
        print(self.data.head())

### Cell 3: Segmentation
- The `Segmentation` class specializes in performing segmentation and analysis on data related to student information, such as their branch of study, current year of study, average study hours, and current CGPA.
- `Segmentation` class inherits the properties of the class `DataWrangling`.

In [46]:
class Segmentation(DataWrangling):
    #Invokes the construtor of the parent class and calls the method which are creating the subsets for 
    #segmentation
    def __init__(self, file_path):
        super().__init__(file_path)
        self.unique_years = None
        self.years_subsets = None
        self.average_hours_of_study_by_branch = None

    #Plots the pie chart for the segments based on the branch of study
    def pieChartForBranches(self, frame):
        fig = Figure(figsize=(12, 8))
        num_branches = len(self.branch_subsets)
        num_cols = 2
        num_rows = (num_branches + 1) // 2  # Calculate the number of rows for subplots
        legend_labels = ['First Year', 'Second Year', 'Third Year', 'Final Year']  # Collect legend labels for all branches

        for i, (branch, subset) in enumerate(self.branch_subsets.items(), 1):
            year_counts = subset['Current Year of Study:'].value_counts()

            ax = fig.add_subplot(num_rows, num_cols, i)

            wedges, texts, autotexts = ax.pie(year_counts, labels=None, autopct=lambda pct: f'{pct:.0f}%', startangle=140)
            ax.set_title(f'{branch}', fontsize=10)
            ax.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

        fig.legend(wedges, legend_labels, loc='center right')
        fig.subplots_adjust(right=0.85)
        # Set font size for autopct labels
        for autotext in autotexts:
            autotext.set_fontsize(8)

        # Display the subplots on the GUI frame
        canvas = FigureCanvasTkAgg(fig, master=frame)
        canvas.draw()
        canvas.get_tk_widget().pack(side=tk.TOP, fill=tk.BOTH, expand=1)

        plt.close(fig)  

    #Segments the dataframe with respect to the values present in the year of study column
    def uniqueYears(self):
        self.unique_years = self.data['Current Year of Study:'].unique()
        self.years_subsets = {}

        for year in self.unique_years:
            self.years_subsets[year] = self.data[self.data['Current Year of Study:'] == year]
    
    #Plotting the pie chart for the segments created in the above method.
    def pieChartForYears(self, frame):
        fig = Figure(figsize=(12, 8))
        num_years = len(self.years_subsets)
        num_cols = 2
        num_rows = (num_years + 1) // 2
        legend_labels = []

        for i, (year, subset) in enumerate(self.years_subsets.items(), 1):
            branch_counts = subset['Branch of Study: '].value_counts()
            ax = fig.add_subplot(num_rows, num_cols, i)
            
            wedges, texts, autotexts = ax.pie(branch_counts, labels=None, autopct='%1.1f%%', startangle=140)
            ax.set_title(f'Distribution of Current Year of Study for {year}', fontsize=10)
            ax.axis('equal')

            legend_labels.extend([f'{label.get_text()} ({branch_counts[j]:.0f})' for j, label in enumerate(autotexts)])
            
        fig.subplots_adjust(right=0.85)

        canvas = FigureCanvasTkAgg(fig, master=frame)
        canvas.draw()
        canvas.get_tk_widget().pack(side=tk.TOP, fill=tk.BOTH, expand=1)
        plt.close(fig)

    #Creating the segments based on the values in the average hours of study columns.
    def averageStudyHoursByBranch(self, frame):
        self.average_hours_of_study_by_branch = {}

        for branch, data in self.branch_subsets.items():
            self.average_hours_of_study_by_branch[branch] = data['Average_Hours_Numeric'].mean()
        
        branches = list(self.average_hours_of_study_by_branch.keys())
        average_hours = list(self.average_hours_of_study_by_branch.values())

        fig = Figure(figsize=(12, 6))
        ax = fig.add_subplot(111)

        # Create bar plot
        ax.bar(branches, average_hours, color='orange')
        ax.set_xlabel('Branch of Study')
        ax.set_ylabel('Average Hours of Study')
        ax.set_title('Average Hours of Study by Branch')
        ax.set_xticklabels(branches, rotation=45, ha='right')  # Rotate x-axis labels for better readability

        for i in range(len(branches)):
            ax.text(i, average_hours[i] + 0.05, f'{average_hours[i]:.2f}', ha='center', va='bottom')

        canvas = FigureCanvasTkAgg(fig, master=frame)
        canvas.draw()
        canvas.get_tk_widget().pack(side=tk.TOP, fill=tk.BOTH, expand=1)

        plt.close(fig)

    # Creatin the segments based on the value present in the Current CGPA Column.
    def averageCGPAByBranch(self, frame):
        average_cgpa_by_branch = {}

        for branch, data in self.branch_subsets.items():
            average_cgpa_by_branch[branch] = data['Current CGPA (0.0 - 10.0): '].mean()

        branches = list(average_cgpa_by_branch.keys())
        average_cgpa = list(average_cgpa_by_branch.values())

        fig = Figure(figsize=(10, 8))
        ax = fig.add_subplot(111)

        # Create bar plot
        ax.bar(branches, average_cgpa, color='green')
        ax.set_xlabel('Branch of Study')
        ax.set_ylabel('Average CGPA')
        ax.set_title('Average CGPA by Branch')
        ax.set_xticklabels(branches, rotation=45, ha='right')  # Rotate x-axis labels for better readability

        for i in range(len(branches)):
            ax.text(i, average_cgpa[i] + 0.05, f'{average_cgpa[i]:.2f}', ha='center', va='bottom')

        canvas = FigureCanvasTkAgg(fig, master=frame)
        canvas.draw()
        canvas.get_tk_widget().pack(side=tk.TOP, fill=tk.BOTH, expand=1)

        plt.close(fig)

### Machine Learning


In [47]:
class MachineLearning(Segmentation):
    def __init__(self, filePath):
        super().__init__(filePath)
        self.X_train = pd.read_csv("F:\\SECOND YEAR\\4. SEM NOTES\\DS\\Assignments\\train_X.csv")
        self.Y_train = pd.read_csv("F:\\SECOND YEAR\\4. SEM NOTES\\DS\\Assignments\\train_Y.csv")
        self.X_test = pd.read_csv("F:\\SECOND YEAR\\4. SEM NOTES\\DS\\Assignments\\test_X.csv")
        self.Y_test = pd.read_csv("F:\\SECOND YEAR\\4. SEM NOTES\\DS\\Assignments\\test_Y.csv")

        #return self.X_train, self.Y_train, self.X_test, self.Y_test

    def dropID(self):
        self.X_train = self.X_train.drop("Id", axis = 1)
        self.Y_train = self.Y_train.drop("Id", axis = 1)
        self.X_test = self.X_test.drop("Id", axis = 1)
        self.Y_test = self.Y_test.drop("Id", axis = 1)

        self.X_train = self.X_train.values
        self.Y_train = self.Y_train.values
        self.X_test = self.X_test.values
        self.Y_test = self.Y_test.values

        self.X_train = self.X_train.T
        self.Y_train = self.Y_train.reshape(1, self.X_train.shape[1])

        self.X_test = self.X_test.T
        self.Y_test = self.Y_test.reshape(1, self.X_test.shape[1])

        return self.X_train, self.Y_train, self.X_test, self.Y_test

    def model(self, learning_rate, iterations):
        m = self.X_train.shape[1]
        n = self.X_train.shape[0]
        self.W = np.zeros((n,1))
        self.B = 0
        self.cost_list = []

        for i in range(iterations):
            Z = np.dot(self.W.T, self.X) + B # T-transpose of matrix W
            A = sigmoid(Z)
            # cost function
            cost = -(1/m)*np.sum( self.Y*np.log(A) + (1-self.Y)*np.log(1-A))
            # Gradient Descent
            dW = (1/m)*np.dot(A-self.Y, self.X.T)
            dB = (1/m)*np.sum(A - self.Y)
            self.W = self.W - learning_rate*dW.T
            self.B = self.B - learning_rate*dB
            # Keeping track of our cost function value
            cost_list.append(cost)
            if(i%(iterations/10) == 0):
                print("cost after ", i, "iteration is : ", cost)

        return self.W, self.B, self.cost_list

    def curve(self):
        plt.plot(np.arange(iterations), cost_list)
        plt.show()

    def accuracy(self):
        Z = np.dot(self.W.T, self.X) + B
        A = sigmoid(Z)

        A = A > 0.5
        A = np.array(A, dtype = 'int64')
        acc = (1 - np.sum(np.absolute(A - self.Y))/self.Y.shape[1])*100
        print("Accuracy of the model is : ", round(acc, 2), "%")


### GUI Section:
- This block of code defines several classes, each representing a different homepage for a student performance analysis application. Each class inherits from the `tk.Tk` class, indicating that it's a tkinter GUI application.
- Each class inherits from the tkinter.Tk class, which is the main window of our application.

In [48]:
class HomePage1(tk.Tk):
    def __init__(self, filePath):
        super().__init__()
        self.title("Student Performance Analysis")
        self.geometry("1300x800")

        self.label = ttk.Label(self, text="Welcome to Student Performance Analysis", font=("Helvetica", 20))
        self.label.pack(pady=20)

        # Initialize a frame to contain the pie chart
        self.pie_chart_frame = ttk.Frame(self)
        self.pie_chart_frame.pack(pady=10)

        self.show_pie_chart(filePath)

    def show_pie_chart(self, filePath):
        data_wrangling = MachineLearning(filePath)
        data_wrangling.pieChart(self.pie_chart_frame)


class HomePage2(tk.Tk):
    def __init__(self, filePath):
        super().__init__()
        self.title("Student Performance Analysis")
        self.geometry("1300x800")

        self.label = ttk.Label(self, text="Welcome to Student Performance Analysis", font=("Helvetica", 16))
        self.label.pack(pady=20)

        # Initialize frames for pie charts
        self.pie_chart_branch_frame = ttk.Frame(self)
        self.pie_chart_branch_frame.pack(pady=10)

        self.pie_chart_year_frame = ttk.Frame(self)
        self.pie_chart_year_frame.pack(pady=10)

        self.show_pie_charts()

    def show_pie_charts(self):
        data_wrangling = MachineLearning(filePath)
        data_wrangling.branchSubsets()
        data_wrangling.pieChartForBranches(self.pie_chart_branch_frame)

class HomePage3(tk.Tk):
    def __init__(self, filePath):
        super().__init__()
        self.title("Student Performance Analysis")
        self.geometry("1300x800")

        self.label = ttk.Label(self, text="Welcome to Student Performance Analysis", font=("Helvetica", 16))
        self.label.pack(pady=20)

        # Initialize frames for pie charts
        self.pie_chart_year_frame = ttk.Frame(self)
        self.pie_chart_year_frame.pack(pady=10)

        self.show_pie_charts(filePath)

    def show_pie_charts(self, filePath):
        data_wrangling = MachineLearning(filePath)
        data_wrangling.uniqueYears()
        data_wrangling.pieChartForYears(self.pie_chart_year_frame)

class HomePage4(tk.Tk):
    def __init__(self, filePath):
        super().__init__()
        self.title("Student Performance Analysis")
        self.geometry("1300x800")

        self.label = ttk.Label(self, text="Welcome to Student Performance Analysis", font=("Helvetica", 16))
        self.label.pack(pady=20)

        # Initialize frames for pie charts
        self.pie_chart_year_frame = ttk.Frame(self)
        self.pie_chart_year_frame.pack(pady=10)

        # Initialize frames for bar plot
        self.bar_plot_frame = ttk.Frame(self)
        self.bar_plot_frame.pack(pady=10)

        self.show_bar_plot(filePath)

    def show_bar_plot(self, filePath):
        data_wrangling = MachineLearning(filePath)
        data_wrangling.branchSubsets()
        data_wrangling.averageStudyHoursByBranch(self.bar_plot_frame)

class HomePage5(tk.Tk):
    def __init__(self, filePath):
        super().__init__()
        self.title("Student Performance Analysis")
        self.geometry("1300x800")

        self.label = ttk.Label(self, text="Welcome to Student Performance Analysis", font=("Helvetica", 16))
        self.label.pack(pady=20)

        # Initialize frames for pie charts
        self.pie_chart_year_frame = ttk.Frame(self)
        self.pie_chart_year_frame.pack(pady=10)

        # Initialize frames for bar plot
        self.bar_plot_frame = ttk.Frame(self)
        self.bar_plot_frame.pack(pady=10)

        self.show_bar_plot(filePath)

    def show_bar_plot(self, filePath):
        data_wrangling = MachineLearning(filePath)
        data_wrangling.branchSubsets()
        data_wrangling.averageCGPAByBranch(self.bar_plot_frame)

class HomePage6(tk.Tk):
    def __init__(self, filePath):
        super().__init__()
        self.title("Student Performance Analysis")
        self.geometry("1300x800")

        self.label = ttk.Label(self, text="Welcome to Student Performance Analysis", font=("Helvetica", 20))
        self.label.pack(pady=20)

        # Initialize a frame to contain the pie chart
        self.pie_chart_frame = ttk.Frame(self)
        self.pie_chart_frame.pack(pady=10)

        self.show_pie_chart(filePath)

    def show_pie_chart(self, filePath):
        iterations = 100000
        learning_rate = 0.0015
        Model = MachineLearning(filePath)
        Model.dropID()
        Model.model(learning_rate, iterations)
        model.curve()
        model.accuracy()


In [50]:
filePath = "F:\\SECOND YEAR\\4. SEM NOTES\\OOP\\Course Project\\CP\\Student Responses.csv"
    
app = HomePage1(filePath)
app.mainloop()

In [None]:
app = HomePage2(filePath)
app.mainloop()

In [None]:
app = HomePage3(filePath)
app.mainloop()

  legend_labels.extend([f'{label.get_text()} ({branch_counts[j]:.0f})' for j, label in enumerate(autotexts)])
  legend_labels.extend([f'{label.get_text()} ({branch_counts[j]:.0f})' for j, label in enumerate(autotexts)])
  legend_labels.extend([f'{label.get_text()} ({branch_counts[j]:.0f})' for j, label in enumerate(autotexts)])
  legend_labels.extend([f'{label.get_text()} ({branch_counts[j]:.0f})' for j, label in enumerate(autotexts)])


In [None]:
app = HomePage4(filePath)
app.mainloop()

  ax.set_xticklabels(branches, rotation=45, ha='right')  # Rotate x-axis labels for better readability


In [None]:
app = HomePage5(filePath)
app.mainloop()

  ax.set_xticklabels(branches, rotation=45, ha='right')  # Rotate x-axis labels for better readability


: 

In [None]:
app = HomePage6(filePath)
app.mainloop()

AttributeError: 'MachineLearning' object has no attribute 'X'