<a href="https://colab.research.google.com/github/sivasurasani/research_paper_recommendation_system/blob/main/Rec_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Importing pandas and extracting the desired dataset

In [12]:
import pandas as pd

try:
    df = pd.read_csv("DM_dataset.csv")
except pd.errors.ParserError as e:
    print(f"Error reading CSV: {e}")
    print("Trying to read with error_bad_lines=False and quoting=3")
    df = pd.read_csv("DM_dataset.csv", on_bad_lines='skip', quoting=3)

    if df.isnull().values.any():
        print("Warning: Some rows are skipped due to errors. Please check the data for inconsistencies.")

Installing sentence transformers and torch

In [13]:
%pip install -U -q sentence-transformers
%pip install torch



Saving the models in a Models directory

In [14]:
import os
directory = "Models/"
if not os.path.exists(directory):
    os.makedirs(directory)

Data preprocessing

In [15]:
df.drop(columns = ["terms","abstracts"], inplace = True)
df.drop_duplicates(inplace= True)
df.reset_index(drop= True,inplace = True)
pd.set_option('display.max_colwidth', None)

Creading sentence embeddings

In [16]:
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')
sentences = df['titles']
embeddings = model.encode(sentences)

result_limit = 0

for sentence, embedding in zip(sentences, embeddings):
    print("Sentence:", sentence)
    print("Embedding length:", len(embedding))
    if result_limit >= 5:
        break
    result_limit = result_limit + 1


Sentence: Multi-Level Attention Pooling for Graph Neural Networks: Unifying Graph Representations with Multiple Localities
Embedding length: 384
Sentence: Decision Forests vs. Deep Networks: Conceptual Similarities and Empirical Differences at Small Sample Sizes
Embedding length: 384
Sentence: Power up! Robust Graph Convolutional Network via Graph Powering
Embedding length: 384
Sentence: Releasing Graph Neural Networks with Differential Privacy Guarantees
Embedding length: 384
Sentence: Recurrence-Aware Long-Term Cognitive Network for Explainable Pattern Classification
Embedding length: 384
Sentence: Lifelong Graph Learning
Embedding length: 384


Saving the models using pickle

In [17]:
import pickle
with open('Models/embeddings.pkl', 'wb') as emb:
    pickle.dump(embeddings, emb)

with open('Models/rec_model.pkl', 'wb') as rec_mod:
    pickle.dump(model, rec_mod)

with open('Models/sentences.pkl', 'wb') as sent:
    pickle.dump(sentences, sent)


sentences_embeddings = pickle.load(open('Models/embeddings.pkl','rb'))
all_sentences = pickle.load(open('Models/sentences.pkl','rb'))
rec_model = pickle.load(open('Models/rec_model.pkl','rb'))


Recommending top five research papers

In [18]:
import torch
def recommendation_definition(input_paper):
    scores_similarities_cosine = util.cos_sim(sentences_embeddings, rec_model.encode(input_paper))

    top_five_recommendations = torch.topk(scores_similarities_cosine, dim=0, k=5, sorted=True)

    list_of_papers = []
    for r in top_five_recommendations.indices:
        list_of_papers.append(all_sentences[r.item()])
    return list_of_papers

input_paper = input("Enter the title for recommendations")
recommend_papers = recommendation_definition(input_paper)
for paper in recommend_papers:
    print("\n")
    print(paper)

Enter the title for recommendationsMachine learning applications


Reinforcement Learning Applications


Techniques for Automated Machine Learning


Applications of Deep Neural Networks


Applications of Machine Learning in Document Digitisation


Machine learning with limited data
