#**Setting up Google Drive in Google Colab**
The provided code snippet is used to authenticate and mount Google Drive in a Google Colab environment. This is particularly useful when you need to access files from your Google Drive or save files to it while working on a Colab notebook.

In [2]:
#Set google drive
from google.colab import auth
auth.authenticate_user()

from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


#**Installing Necessary Libraries and Packages**
The provided commands are used to install several Python libraries and packages that are essential for executing the project. These libraries provide various functionalities such as data manipulation, machine learning modeling, natural language processing, and generating PDF documents. Additionally, wkhtmltopdf is installed as it is a command-line tool required by pdfkit to convert HTML to PDF.

In [4]:
!pip install numpy pandas scikit-learn openpyxl xlrd
!pip install nltk
!pip install openpyxl
!pip install fpdf
!apt-get install -y wkhtmltopdf
!pip install pdfkit



#**Creation and Saving of Pre-Trained Model**
This code snippet is focused on creating a pre-trained model for classifying post-approval variations. It starts by importing necessary libraries such as pandas for data manipulation, TfidfVectorizer for text feature extraction, MultinomialNB for implementing the Naive Bayes algorithm, and joblib for saving the model. The dataset is loaded, and it is split into training and testing sets. A pipeline is created using TF-IDF Vectorizer and Multinomial Naive Bayes, and the model is trained using the training data. Finally, the trained model is saved to a specified path, allowing it to be loaded and used later without retraining.

In [48]:
# Pre-Trained Model for Post Approval Variations Classifier

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
import joblib

# Load your dataset
df = pd.read_excel('/content/gdrive/My Drive/Colab Notebooks/dataset.xlsx')

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df['Change'], df['Category'], test_size=0.2, random_state=42)

# Create a model pipeline with TF-IDF Vectorizer and Multinomial Naive Bayes
model = make_pipeline(TfidfVectorizer(), MultinomialNB())

# Train the model
model.fit(X_train, y_train)

# Save the model to the specified path
model_filename = '/content/gdrive/My Drive/Colab Notebooks/Regclassifyx/regclassifyx_model.joblib'
joblib.dump(model, model_filename)
print(f"Model saved to {model_filename}")

Model saved to /content/gdrive/My Drive/Colab Notebooks/Regclassifyx/regclassifyx_model.joblib


#**Implementation of RegClassifyX Tool**
This code represents the final implementation of the RegClassifyX tool, designed to classify post-approval variations. It begins by importing necessary libraries and modules, including pandas for data manipulation, joblib for loading the pre-trained model, pdfkit for converting HTML to PDF, and files from google.colab for downloading files. The pre-trained model and the dataset are loaded, and the user is prompted to enter the Change Description. The tool then predicts the Variation Category using the loaded model and finds the corresponding Change Type from the dataset. The output, along with additional information such as Reference and Author, is displayed, and a timestamp is generated to mark the time of generation. The tool then prepares a stylized HTML output, converts it to a PDF, and allows the user to download it. The HTML output is structured to look professional and official, providing a clear and concise representation of the results, and it includes a note emphasizing the responsible use of the tool due to its reliance on AI and ML models.

In [60]:
#Final Tool- RegClassifyX

import pandas as pd
import joblib
import pdfkit
from google.colab import files
from datetime import datetime

# Load the model
model_filename = '/content/gdrive/My Drive/Colab Notebooks/Regclassifyx/regclassifyx_model.joblib'
model = joblib.load(model_filename)
print("Model loaded successfully!")

# Load your dataset
df = pd.read_excel('/content/gdrive/My Drive/Colab Notebooks/dataset.xlsx')

# User Input
user_input = input("Enter the Change Description: ")

# Predict Category
predicted_category = model.predict([user_input])[0]

# Find the corresponding 'Change Type' from the DataFrame
matching_rows = df.loc[df['Change'] == user_input, 'Change Type']

if not matching_rows.empty:
    change_type = matching_rows.iloc[0]
else:
    change_type = "No exact match found in the dataset for the entered Change Description"

# Displaying the Output
print("\nRegClassifyX (Version: 1.0.0)- Post Approval Variations Classifier")
print(f"Change Description- {user_input}")
print(f"Change Type: {change_type}")
print(f"Variation Category: {predicted_category}")
print("Reference- https://www.sukl.sk/buxus/docs/Registracie/Tlaciva/classification_guideline_adopted.pdf")
print("Author- Ganesh Waghule- (CMC Scientist- Post Approval Changes), Ashwini Kumawat- (CMC Scientist- Post Approval Changes)")
print("Note: This tool exemplifies the application of AI and ML models. Please use this resource with due diligence and consideration")

# Get Current Date and Time
now = datetime.now()
current_time = now.strftime("%Y-%m-%d %H:%M:%S")

# Prepare HTML Output
html_output = f"""
<html>
<head>
    <style>
        body {{
            font-family: 'Arial', sans-serif;
            background-color: #f4f4f4;
            margin: 0;
            padding: 0;
        }}
        .container {{
            width: 50%;
            margin: auto;
        }}
        header {{
            background: #50b3a2;
            color: white;
            text-align: center;
            padding: 1em;
        }}
        p {{
            font-size: 18px;
            line-height: 1.6em;
            color: #666;
        }}
        a {{
            color: #50b3a2;
        }}
        .timestamp {{
            font-size: 14px;
            color: #888;
        }}
        .note {{
            font-size: 16px;
            color: #888;
            margin-top: 2em;
            border-top: 1px solid #eee;
            padding-top: 1em;
        }}
    </style>
</head>
<body>
    <div class='container'>
        <header>
            <h1>RegClassifyX (Version: 1.0.0)- Post Approval Variations Classifier</h1>
        </header>
        <p><b>Change Description:</b> {user_input}</p>
        <p><b>Change Type:</b> {change_type}</p>
        <p><b>Variation Category:</b> {predicted_category}</p>
        <p><b>Reference:</b> <a href='https://www.sukl.sk/buxus/docs/Registracie/Tlaciva/classification_guideline_adopted.pdf' target='_blank'>Classification Guideline</a></p>
        <p><b>Author:</b> Ganesh Waghule- (CMC Scientist- Post Approval Changes), Ashwini Kumawat- (CMC Scientist- Post Approval Changes)</p>
        <p class='timestamp'><i>Generated on: {current_time}</i></p>
        <p class='note'><i>Note: This tool exemplifies the application of AI and ML models. Please use this resource with due diligence and consideration.</i></p>
    </div>
</body>
</html>
"""

# Convert HTML to PDF
pdfkit.from_string(html_output, 'output.pdf')

# Download PDF
files.download('output.pdf')

Model loaded successfully!
Enter the Change Description: Deletion of the solvent / diluent container from the pack

RegClassifyX (Version: 1.0.0)- Post Approval Variations Classifier
Change Description- Deletion of the solvent / diluent container from the pack
Change Type: RCX007
Variation Category: IB
Reference- https://www.sukl.sk/buxus/docs/Registracie/Tlaciva/classification_guideline_adopted.pdf
Author- Ganesh Waghule- (CMC Scientist- Post Approval Changes), Ashwini Kumawat- (CMC Scientist- Post Approval Changes)
Note: This tool exemplifies the application of AI and ML models. Please use this resource with due diligence and consideration


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>