Project Name: **Classification of Abstracts from arXiv publications into their most relevant category**

Course: **CIS 545**

Project Members: **Arvind Balaji Narayan, Bharathrushab Manthripragada, Gopik Anand**

**Model Used: Naive Bayes & LSTM**

To begin with, we implemented statistical Machine Learning architectures such as SVM and Naive Bayes and tabulated their performance on our dataset. We reached the conclusion that even though SVM and Naive Bayes are comparatively simpler than other complex architectures, they did not do very well but could however be considered as good starting points to train further complex ensemble models.

Package Installations

In [None]:
!pip install transformers



In [None]:
!pip install kaggle



Loading the arXiv Dataset 

In [None]:
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [None]:
!kaggle datasets download -d Cornell-University/arxiv

arxiv.zip: Skipping, found more recently modified local copy (use --force to force download)


In [None]:
!ls

arxiv-metadata-oai-snapshot.json  arxiv.zip  kaggle.json  sample_data


In [None]:
!unzip /content/arxiv.zip

Archive:  /content/arxiv.zip
replace arxiv-metadata-oai-snapshot.json? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: arxiv-metadata-oai-snapshot.json  


In [None]:
import numpy as np
import pandas as pd
import os, json, gc, re, random
from tqdm.notebook import tqdm
from sklearn.model_selection import train_test_split

In [None]:
import tensorflow as tf
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AdamW
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from transformers import get_linear_schedule_with_warmup
import random
from sklearn.preprocessing import LabelEncoder

In [None]:
data_file = '/content/arxiv-metadata-oai-snapshot.json'

In [None]:
def get_metadata():
    with open(data_file, 'r') as f:
        for line in f:
            yield line

Listing all Categories in cat_map

In [None]:
cat_map =      {'astro-ph': 'Astrophysics',
                'astro-ph.CO': 'Cosmology and Nongalactic Astrophysics',
                'astro-ph.EP': 'Earth and Planetary Astrophysics',
                'astro-ph.GA': 'Astrophysics of Galaxies',
                'astro-ph.HE': 'High Energy Astrophysical Phenomena',
                'astro-ph.IM': 'Instrumentation and Methods for Astrophysics',
                'astro-ph.SR': 'Solar and Stellar Astrophysics',
                'cond-mat.dis-nn': 'Disordered Systems and Neural Networks',
                'cond-mat.mes-hall': 'Mesoscale and Nanoscale Physics',
                'cond-mat.mtrl-sci': 'Materials Science',
                'cond-mat.other': 'Other Condensed Matter',
                'cond-mat.quant-gas': 'Quantum Gases',
                'cond-mat.soft': 'Soft Condensed Matter',
                'cond-mat.stat-mech': 'Statistical Mechanics',
                'cond-mat.str-el': 'Strongly Correlated Electrons',
                'cond-mat.supr-con': 'Superconductivity',
                'cs.AI': 'Artificial Intelligence',
                'cs.AR': 'Hardware Architecture',
                'cs.CC': 'Computational Complexity',
                'cs.CE': 'Computational Engineering, Finance, and Science',
                'cs.CG': 'Computational Geometry',
                'cs.CL': 'Computation and Language',
                'cs.CR': 'Cryptography and Security',
                'cs.CV': 'Computer Vision and Pattern Recognition',
                'cs.CY': 'Computers and Society',
                'cs.DB': 'Databases',
                'cs.DC': 'Distributed, Parallel, and Cluster Computing',
                'cs.DL': 'Digital Libraries',
                'cs.DM': 'Discrete Mathematics',
                'cs.DS': 'Data Structures and Algorithms',
                'cs.ET': 'Emerging Technologies',
                'cs.FL': 'Formal Languages and Automata Theory',
                'cs.GL': 'General Literature',
                'cs.GR': 'Graphics',
                'cs.GT': 'Computer Science and Game Theory',
                'cs.HC': 'Human-Computer Interaction',
                'cs.IR': 'Information Retrieval',
                'cs.IT': 'Information Theory',
                'cs.LG': 'Machine Learning',
                'cs.LO': 'Logic in Computer Science',
                'cs.MA': 'Multiagent Systems',
                'cs.MM': 'Multimedia',
                'cs.MS': 'Mathematical Software',
                'cs.NA': 'Numerical Analysis',
                'cs.NE': 'Neural and Evolutionary Computing',
                'cs.NI': 'Networking and Internet Architecture',
                'cs.OH': 'Other Computer Science',
                'cs.OS': 'Operating Systems',
                'cs.PF': 'Performance',
                'cs.PL': 'Programming Languages',
                'cs.RO': 'Robotics',
                'cs.SC': 'Symbolic Computation',
                'cs.SD': 'Sound',
                'cs.SE': 'Software Engineering',
                'cs.SI': 'Social and Information Networks',
                'cs.SY': 'Systems and Control',
                'econ.EM': 'Econometrics',
                'eess.AS': 'Audio and Speech Processing',
                'eess.IV': 'Image and Video Processing',
                'eess.SP': 'Signal Processing',
                'gr-qc': 'General Relativity and Quantum Cosmology',
                'hep-ex': 'High Energy Physics - Experiment',
                'hep-lat': 'High Energy Physics - Lattice',
                'hep-ph': 'High Energy Physics - Phenomenology',
                'hep-th': 'High Energy Physics - Theory',
                'math.AC': 'Commutative Algebra',
                'math.AG': 'Algebraic Geometry',
                'math.AP': 'Analysis of PDEs',
                'math.AT': 'Algebraic Topology',
                'math.CA': 'Classical Analysis and ODEs',
                'math.CO': 'Combinatorics',
                'math.CT': 'Category Theory',
                'math.CV': 'Complex Variables',
                'math.DG': 'Differential Geometry',
                'math.DS': 'Dynamical Systems',
                'math.FA': 'Functional Analysis',
                'math.GM': 'General Mathematics',
                'math.GN': 'General Topology',
                'math.GR': 'Group Theory',
                'math.GT': 'Geometric Topology',
                'math.HO': 'History and Overview',
                'math.IT': 'Information Theory',
                'math.KT': 'K-Theory and Homology',
                'math.LO': 'Logic',
                'math.MG': 'Metric Geometry',
                'math.MP': 'Mathematical Physics',
                'math.NA': 'Numerical Analysis',
                'math.NT': 'Number Theory',
                'math.OA': 'Operator Algebras',
                'math.OC': 'Optimization and Control',
                'math.PR': 'Probability',
                'math.QA': 'Quantum Algebra',
                'math.RA': 'Rings and Algebras',
                'math.RT': 'Representation Theory',
                'math.SG': 'Symplectic Geometry',
                'math.SP': 'Spectral Theory',
                'math.ST': 'Statistics Theory',
                'math-ph': 'Mathematical Physics',
                'nlin.AO': 'Adaptation and Self-Organizing Systems',
                'nlin.CD': 'Chaotic Dynamics',
                'nlin.CG': 'Cellular Automata and Lattice Gases',
                'nlin.PS': 'Pattern Formation and Solitons',
                'nlin.SI': 'Exactly Solvable and Integrable Systems',
                'nucl-ex': 'Nuclear Experiment',
                'nucl-th': 'Nuclear Theory',
                'physics.acc-ph': 'Accelerator Physics',
                'physics.ao-ph': 'Atmospheric and Oceanic Physics',
                'physics.app-ph': 'Applied Physics',
                'physics.atm-clus': 'Atomic and Molecular Clusters',
                'physics.atom-ph': 'Atomic Physics',
                'physics.bio-ph': 'Biological Physics',
                'physics.chem-ph': 'Chemical Physics',
                'physics.class-ph': 'Classical Physics',
                'physics.comp-ph': 'Computational Physics',
                'physics.data-an': 'Data Analysis, Statistics and Probability',
                'physics.ed-ph': 'Physics Education',
                'physics.flu-dyn': 'Fluid Dynamics',
                'physics.gen-ph': 'General Physics',
                'physics.geo-ph': 'Geophysics',
                'physics.hist-ph': 'History and Philosophy of Physics',
                'physics.ins-det': 'Instrumentation and Detectors',
                'physics.med-ph': 'Medical Physics',
                'physics.optics': 'Optics',
                'physics.plasm-ph': 'Plasma Physics',
                'physics.pop-ph': 'Popular Physics',
                'physics.soc-ph': 'Physics and Society',
                'physics.space-ph': 'Space Physics',
                'q-bio.BM': 'Biomolecules',
                'q-bio.CB': 'Cell Behavior',
                'q-bio.GN': 'Genomics',
                'q-bio.MN': 'Molecular Networks',
                'q-bio.NC': 'Neurons and Cognition',
                'q-bio.OT': 'Other Quantitative Biology',
                'q-bio.PE': 'Populations and Evolution',
                'q-bio.QM': 'Quantitative Methods',
                'q-bio.SC': 'Subcellular Processes',
                'q-bio.TO': 'Tissues and Organs',
                'q-fin.CP': 'Computational Finance',
                'q-fin.EC': 'Economics',
                'q-fin.GN': 'General Finance',
                'q-fin.MF': 'Mathematical Finance',
                'q-fin.PM': 'Portfolio Management',
                'q-fin.PR': 'Pricing of Securities',
                'q-fin.RM': 'Risk Management',
                'q-fin.ST': 'Statistical Finance',
                'q-fin.TR': 'Trading and Market Microstructure',
                'quant-ph': 'Quantum Physics',
                'stat.AP': 'Applications',
                'stat.CO': 'Computation',
                'stat.ME': 'Methodology',
                'stat.ML': 'Machine Learning',
                'stat.OT': 'Other Statistics',
                'stat.TH': 'Statistics Theory'}

Data Wrangling and Preprocessing

In [None]:
titles = []
abstracts = []
categories = []

# Consider all categories in the `category_map` to be used during training and prediction
paper_categories = np.array(list(cat_map.keys())).flatten()

metadata = get_metadata()
for paper in tqdm(metadata):
    paper_dict = json.loads(paper)
    category = paper_dict.get('categories')
    try:
        try:
            year = int(paper_dict.get('journal-ref')[-4:])    ### Example Format: "Phys.Rev.D76:013009,2007"
        except:
            year = int(paper_dict.get('journal-ref')[-5:-1])    ### Example Format: "Phys.Rev.D76:013009,(2007)"

        if category in paper_categories and 2018<=year<=2022:
            titles.append(paper_dict.get('title'))
            abstracts.append(paper_dict.get('abstract'))
            categories.append(paper_dict.get('categories'))
    except:
        pass 

len(titles), len(abstracts), len(categories)

0it [00:00, ?it/s]

(41027, 41027, 41027)

In [None]:
papers = pd.DataFrame({
    'title': titles,
    'abstract': abstracts,
    'categories': categories
})
papers.head(5)

Unnamed: 0,title,abstract,categories
0,Bohmian Mechanics at Space-Time Singularities....,We develop an extension of Bohmian mechanics...,quant-ph
1,On the derivation of exact eigenstates of the ...,We construct the states that are invariant u...,quant-ph
2,Weight Reduction for Mod l Bianchi Modular Forms,Let K be an imaginary quadratic field with c...,math.NT
3,Lawson Method for Obtaining Wave Functions and...,Lawson has shown that one can obtain sensibl...,nucl-th
4,Exact results for the Wigner transform phase s...,Closed form analytical expressions are obtai...,physics.atom-ph


In [None]:
papers['abstract'] = papers['abstract'].apply(lambda x: x.replace("\n",""))
papers['abstract'] = papers['abstract'].apply(lambda x: x.strip())
papers['text'] = papers['title'] + '. ' + papers['abstract']

In [None]:
papers.head(5)

Unnamed: 0,title,abstract,categories,text
0,Bohmian Mechanics at Space-Time Singularities....,We develop an extension of Bohmian mechanics t...,quant-ph,Bohmian Mechanics at Space-Time Singularities....
1,On the derivation of exact eigenstates of the ...,We construct the states that are invariant und...,quant-ph,On the derivation of exact eigenstates of the ...
2,Weight Reduction for Mod l Bianchi Modular Forms,Let K be an imaginary quadratic field with cla...,math.NT,Weight Reduction for Mod l Bianchi Modular For...
3,Lawson Method for Obtaining Wave Functions and...,Lawson has shown that one can obtain sensible ...,nucl-th,Lawson Method for Obtaining Wave Functions and...
4,Exact results for the Wigner transform phase s...,Closed form analytical expressions are obtaine...,physics.atom-ph,Exact results for the Wigner transform phase s...


In [None]:
df = papers[["text","categories"]].copy()
df

Unnamed: 0,text,categories
0,Bohmian Mechanics at Space-Time Singularities....,quant-ph
1,On the derivation of exact eigenstates of the ...,quant-ph
2,Weight Reduction for Mod l Bianchi Modular For...,math.NT
3,Lawson Method for Obtaining Wave Functions and...,nucl-th
4,Exact results for the Wigner transform phase s...,physics.atom-ph
...,...,...
41022,Constant of Motion for several one-dimensional...,physics.class-ph
41023,Activity ageing in growing networks. We presen...,physics.soc-ph
41024,Simple computer model for the quantum Zeno eff...,quant-ph
41025,Alternative Derivation of the Hu-Paz-Zhang Mas...,quant-ph


In [None]:
label_encoder = LabelEncoder()
label_encoder.fit(df['categories'])

LabelEncoder()

In [None]:
df['categories_encoded'] = df['categories'].apply(lambda x: label_encoder.transform([x])[0])
df

Unnamed: 0,text,categories,categories_encoded
0,Bohmian Mechanics at Space-Time Singularities....,quant-ph,140
1,On the derivation of exact eigenstates of the ...,quant-ph,140
2,Weight Reduction for Mod l Bianchi Modular For...,math.NT,83
3,Lawson Method for Obtaining Wave Functions and...,nucl-th,98
4,Exact results for the Wigner transform phase s...,physics.atom-ph,103
...,...,...,...
41022,Constant of Motion for several one-dimensional...,physics.class-ph,106
41023,Activity ageing in growing networks. We presen...,physics.soc-ph,119
41024,Simple computer model for the quantum Zeno eff...,quant-ph,140
41025,Alternative Derivation of the Hu-Paz-Zhang Mas...,quant-ph,140


In [None]:
df['x'] = df['text']
df['y'] = df['categories_encoded']
df = df.drop(columns = ['text', 'categories', 'categories_encoded'])
df

Unnamed: 0,x,y
0,Bohmian Mechanics at Space-Time Singularities....,140
1,On the derivation of exact eigenstates of the ...,140
2,Weight Reduction for Mod l Bianchi Modular For...,83
3,Lawson Method for Obtaining Wave Functions and...,98
4,Exact results for the Wigner transform phase s...,103
...,...,...
41022,Constant of Motion for several one-dimensional...,106
41023,Activity ageing in growing networks. We presen...,119
41024,Simple computer model for the quantum Zeno eff...,140
41025,Alternative Derivation of the Hu-Paz-Zhang Mas...,140


In [None]:
df.drop_duplicates(inplace=True)
df

Unnamed: 0,x,y
0,Bohmian Mechanics at Space-Time Singularities....,140
1,On the derivation of exact eigenstates of the ...,140
2,Weight Reduction for Mod l Bianchi Modular For...,83
3,Lawson Method for Obtaining Wave Functions and...,98
4,Exact results for the Wigner transform phase s...,103
...,...,...
41022,Constant of Motion for several one-dimensional...,106
41023,Activity ageing in growing networks. We presen...,119
41024,Simple computer model for the quantum Zeno eff...,140
41025,Alternative Derivation of the Hu-Paz-Zhang Mas...,140


In [None]:
import random
import copy
import time
import pandas as pd
import numpy as np
import gc
import re
import torch as t

#import spacy
from tqdm import tqdm_notebook, tnrange
from tqdm.auto import tqdm

tqdm.pandas(desc='Progress')
from collections import Counter

from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.preprocessing import LabelEncoder
from collections import defaultdict
from nltk.corpus import wordnet as wn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score

import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
from torch.autograd import Variable
import os 

# cross validation and metrics
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import f1_score

from sklearn.preprocessing import StandardScaler
from multiprocessing import  Pool
from functools import partial
from sklearn.decomposition import PCA

import matplotlib.pyplot as plt

In [None]:
import tensorflow as tf
import torch
import pandas as pd
import numpy as np
import random
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from collections import defaultdict
from nltk.corpus import wordnet as wn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
from tqdm.notebook import tqdm
from keras.preprocessing.sequence import pad_sequences
from sklearn.preprocessing import LabelEncoder
import transformers
from transformers import GPT2Tokenizer, GPT2ForSequenceClassification, AdamW
from transformers import get_linear_schedule_with_warmup
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler

In [None]:
import nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [None]:
#Convert text to lowercase
df['x'] = [text.lower() for text in df['x']]

#Tokenization
df['x'] = [word_tokenize(text) for text in df['x']]

#WordNetLemmatizer
tag_map = defaultdict(lambda : wn.NOUN)
tag_map['J'] = wn.ADJ
tag_map['V'] = wn.VERB
tag_map['R'] = wn.ADV

In [None]:
membersProcessed = 0
for idx, text in enumerate(df['x']):
  finalWords = []
  word_net_lemmatizer = WordNetLemmatizer()
  set_stop = set(stopwords.words('english'))
  iterate = pos_tag(text)
  [finalWords.append(word_net_lemmatizer.lemmatize(word, tag_map[tag[0]])) for word, tag in iterate if word not in set_stop and word.isalpha()]
  df.loc[idx, 'finalText'] = str(finalWords)
  membersProcessed+=1
  print('Progress: {}/{} members processed'.format(membersProcessed, len(df)))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Progress: 36028/41027 members processed
Progress: 36029/41027 members processed
Progress: 36030/41027 members processed
Progress: 36031/41027 members processed
Progress: 36032/41027 members processed
Progress: 36033/41027 members processed
Progress: 36034/41027 members processed
Progress: 36035/41027 members processed
Progress: 36036/41027 members processed
Progress: 36037/41027 members processed
Progress: 36038/41027 members processed
Progress: 36039/41027 members processed
Progress: 36040/41027 members processed
Progress: 36041/41027 members processed
Progress: 36042/41027 members processed
Progress: 36043/41027 members processed
Progress: 36044/41027 members processed
Progress: 36045/41027 members processed
Progress: 36046/41027 members processed
Progress: 36047/41027 members processed
Progress: 36048/41027 members processed
Progress: 36049/41027 members processed
Progress: 36050/41027 members processed
Progress: 36051

In [None]:
df = df.dropna()

In [None]:
df

Unnamed: 0,x,y,finalText
0,"[bohmian, mechanics, at, space-time, singulari...",140,"['bohmian', 'mechanic', 'singularity', 'timeli..."
1,"[on, the, derivation, of, exact, eigenstates, ...",140,"['derivation', 'exact', 'eigenstates', 'genera..."
2,"[weight, reduction, for, mod, l, bianchi, modu...",83,"['weight', 'reduction', 'mod', 'l', 'bianchi',..."
3,"[lawson, method, for, obtaining, wave, functio...",98,"['lawson', 'method', 'obtain', 'wave', 'functi..."
4,"[exact, results, for, the, wigner, transform, ...",103,"['exact', 'result', 'wigner', 'transform', 'ph..."
...,...,...,...
41022,"[constant, of, motion, for, several, one-dimen...",106,"['constant', 'motion', 'several', 'system', 'o..."
41023,"[activity, ageing, in, growing, networks, ., w...",119,"['activity', 'age', 'grow', 'network', 'presen..."
41024,"[simple, computer, model, for, the, quantum, z...",140,"['simple', 'computer', 'model', 'quantum', 'ze..."
41025,"[alternative, derivation, of, the, hu-paz-zhan...",140,"['alternative', 'derivation', 'master', 'equat..."


In [None]:
from sklearn.model_selection import KFold

In [None]:
kf = KFold(n_splits=10)

In [None]:
import pandas as pd
import numpy as np
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.preprocessing import LabelEncoder
from collections import defaultdict
from nltk.corpus import wordnet as wn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import model_selection, naive_bayes, svm
from sklearn.metrics import accuracy_score

Model Definition - Naive Bayes & LSTM

Training and Testing

In [None]:
X = df['finalText']
y = df['y']

In [None]:
acc_ls = []
for i, (train_index, test_index) in enumerate(kf.split(X)):
  X_train, X_test = X[train_index], X[test_index]
  y_train, y_test = y[train_index], y[test_index]
  TFIDF_vect = TfidfVectorizer(max_features=5000)
  x_train_tfidf = TFIDF_vect.fit_transform(X_train)
  x_test_tfidf = TFIDF_vect.transform(X_test)
  Naive = naive_bayes.MultinomialNB()
  Naive.fit(x_train_tfidf,y_train)
  predictions_NB = Naive.predict(x_test_tfidf)
  acc = accuracy_score(predictions_NB, y_test)
  acc_ls.append(acc)
  print("Naive Bayes Accuracy Score " + str(i) + " -> ",acc*100)
print("Mean Accuracy : ", sum(acc_ls)*100/len(acc_ls))

Naive Bayes Accuracy Score 0 ->  46.453814282232514
Naive Bayes Accuracy Score 1 ->  60.955398488910554
Naive Bayes Accuracy Score 2 ->  62.41774311479406
Naive Bayes Accuracy Score 3 ->  62.466487935656836
Naive Bayes Accuracy Score 4 ->  62.10090177918596
Naive Bayes Accuracy Score 5 ->  63.027053375578845
Naive Bayes Accuracy Score 6 ->  63.83134291981477
Naive Bayes Accuracy Score 7 ->  62.28668941979522
Naive Bayes Accuracy Score 8 ->  61.79912237932715
Naive Bayes Accuracy Score 9 ->  58.75182837640176
Mean Accuracy :  60.40903820716976


In [None]:
acc_ls = []
for i, (train_index, test_index) in enumerate(kf.split(X)):
  X_train, X_test = X[train_index], X[test_index]
  y_train, y_test = y[train_index], y[test_index]
  TFIDF_vect = TfidfVectorizer(max_features=5000)
  x_train_tfidf = TFIDF_vect.fit_transform(X_train)
  x_test_tfidf = TFIDF_vect.transform(X_test)
  SVM = svm.SVC(C=1.0, kernel='linear', degree=3, gamma='auto')
  SVM.fit(x_train_tfidf, y_train)
  predictions_SVM = SVM.predict(x_test_tfidf)
  acc = accuracy_score(predictions_SVM, y_test)
  acc_ls.append(acc)
  print("SVM Accuracy Score " + str(i) + " -> ",acc*100)
print("Mean Accuracy : ", sum(acc_ls)*100/len(acc_ls))

SVM Accuracy Score 0 ->  66.92663904460152
SVM Accuracy Score 1 ->  76.55374116500123
SVM Accuracy Score 2 ->  75.40823787472581
SVM Accuracy Score 3 ->  78.01608579088472
SVM Accuracy Score 4 ->  78.06483061174751
SVM Accuracy Score 5 ->  78.23543748476725
SVM Accuracy Score 6 ->  79.28345113331709
SVM Accuracy Score 7 ->  75.54851292052656
SVM Accuracy Score 8 ->  76.20672842515846
SVM Accuracy Score 9 ->  73.98829839102876
Mean Accuracy :  75.82319628417588
