# HackerRank - Artificial Intelligence - Machine Learning and Statistics

## StackExchange Question Classifier



Stack Exchange is an information powerhouse, built on the power of crowdsourcing. It has 105 different topics and each topic has a library of questions which have been asked and answered by knowledgeable members of the StackExchange community. The topics are as diverse as travel, cooking, programming, engineering and photography.

We have hand-picked ten different topics (such as Electronics, Mathematics, Photography etc.) from Stack Exchange, and we provide you with a set of questions from these topics.

**Given a question and an excerpt, your task is to identify which among the 10 topics it belongs to.**


**Input Format**

The first line will be an integer N. N lines follow each line being a valid JSON object. The following fields of raw data are given in json

* question (string) : The text in the title of the question.
* excerpt (string) : Excerpt of the question body.
* topic (string) : The topic under which the question was posted.
The input for the program has all the fields but topic which you have to predict as the answer.

In [None]:
import json
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score
import numpy as np
from sklearn.svm import LinearSVC

# Read training data
text_data = []
labels = []
with open('training.json') as tr:
    for n_line in range(int(tr.readline())):
        text = json.loads(tr.readline())
        text_data.append(text["question"] + "\n" + text["excerpt"])
        labels.append(text["topic"])

# Fit SVM using CountVectorizer
vect = CountVectorizer(lowercase=True, 
                       stop_words="english")

vec_train = vect.fit_transform(text_data)
svm = LinearSVC()  
svm.fit(X = vec_train, y = labels)


N = int(input())

new_text = []
for i in range(N):
    text = json.loads(input())
    new_text.append(text["question"] + "\n" + text["excerpt"])
    
vec_test = vect.transform(new_text)
predictions = svm.predict(vec_test)

for pred in predictions:
    print(pred)

