DUPLICATE-QUESTION-PREDICTION-IN-QUORA-DATASET

It is a binary classification problem, for a given pair of questions we need to predict if they are duplicate or not.

OBSERVATIONS:

Procedure :

1] Firstly we are combining all the features which we have engineered earlier.
2] Before building a model we are combining both the question1 and question2 in to one dataframe and then adding all the features in to this dataframe.
3] After building dataframe we are splitting into train and test (70-30).
4] Then we are applying tfidfvectorizer on the text data which are combination of question1 and question2.
5] Further we are hstacking both the tfidfvectorizered features with the features which we have engineered.
6] We then build a Logistic regression, Linear SVM and XGBOOST model. XGBOOST implementation gave test logloss of 0.316.
7] Then we are applying tfidf weighted W2V on the text data which are combination of question1 and question2 and by building an XGBOOST model on this text we get test Logloss of 0.3157.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
1.EDA.ipynb		1.EDA.ipynb
2.Quora_Preprocessing.ipynb		2.Quora_Preprocessing.ipynb
3.Q_Mean_W2V.ipynb		3.Q_Mean_W2V.ipynb
4.ML_models_.ipynb		4.ML_models_.ipynb
Duplicate question prediction in Quora Dataset.ipynb		Duplicate question prediction in Quora Dataset.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DUPLICATE-QUESTION-PREDICTION-IN-QUORA-DATASET

It is a binary classification problem, for a given pair of questions we need to predict if they are duplicate or not.

OBSERVATIONS:

Procedure :

About

Releases

Packages

Languages

sahildigikar15/Duplicate-Question-Detection

Folders and files

Latest commit

History

Repository files navigation

DUPLICATE-QUESTION-PREDICTION-IN-QUORA-DATASET

It is a binary classification problem, for a given pair of questions we need to predict if they are duplicate or not.

OBSERVATIONS:

Procedure :

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages