**Quora Question Pairs**

Where else but Quora can a physicist help a chef with a math problem and get cooking tips in return? Quora is a place to gain and share knowledge—about anything. It’s a platform to ask questions and connect with people who contribute unique insights and quality answers. This empowers people to learn from each other and to better understand the world.

Over 100 million people visit Quora every month, so it's no surprise that many people ask similarly worded questions. Multiple questions with the same intent can cause seekers to spend more time finding the best answer to their question, and make writers feel they need to answer multiple versions of the same question. Quora values canonical questions because they provide a better experience to active seekers and writers, and offer more value to both of these groups in the long term.

Currently, Quora uses a Random Forest model to identify duplicate questions. In this competition, Kagglers are challenged to tackle this natural language processing problem by applying advanced techniques to classify whether question pairs are duplicates or not. Doing so will make it easier to find high quality answers to questions resulting in an improved experience for Quora writers, seekers, and readers.

Acknowledgements: Data and descriptions modified from https://www.kaggle.com/c/quora-question-pairs

In [1]:
import numpy as np
import pandas as pd
import sklearn as sl

In [45]:
# import datasets
df_train = pd.read_csv('data/train_data.csv')
df_test = pd.read_csv('data/test_data.csv')
train_label = pd.read_csv('data/train_labels.csv')

In [46]:
df_train = df_train.drop('is_duplicate',1)
df_train = df_train.merge(train_label,on="id")

df_train['id1'] = range(1, 2*len(df_train), 2)
df_train['id2'] = range(2, 2*len(df_train)+2, 2)
df_train = df_train[['id1', 'question1', 'id2', 'question2', 'is_duplicate']]
df_train

Unnamed: 0,id1,question1,id2,question2,is_duplicate
0,1,What is the step by step guide to invest in sh...,2,What is the step by step guide to invest in sh...,0
1,3,What is the story of Kohinoor (Koh-i-Noor) Dia...,4,What would happen if the Indian government sto...,0
2,5,How can I increase the speed of my internet co...,6,How can Internet speed be increased by hacking...,0
3,7,Why am I mentally very lonely? How can I solve...,8,Find the remainder when [math]23^{24}[/math] i...,0
4,9,"Which one dissolve in water quikly sugar, salt...",10,Which fish would survive in salt water?,0
5,11,Astrology: I am a Capricorn Sun Cap moon and c...,12,"I'm a triple Capricorn (Sun, Moon and ascendan...",1
6,13,Should I buy tiago?,14,What keeps childern active and far from phone ...,0
7,15,How can I be a good geologist?,16,What should I do to be a great geologist?,1
8,17,When do you use シ instead of し?,18,"When do you use ""&"" instead of ""and""?",0
9,19,Motorola (company): Can I hack my Charter Moto...,20,How do I hack Motorola DCX3400 for free internet?,0


In [42]:
df_train.groupby('is_duplicate').count()

Unnamed: 0_level_0,id1,question1,id2,question2
is_duplicate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,203971,203970,203971,203968
1,119193,119193,119193,119193


In [48]:
df_train.loc[df_train['is_duplicate'] == 1]

Unnamed: 0,id1,question1,id2,question2,is_duplicate
5,11,Astrology: I am a Capricorn Sun Cap moon and c...,12,"I'm a triple Capricorn (Sun, Moon and ascendan...",1
7,15,How can I be a good geologist?,16,What should I do to be a great geologist?,1
11,23,How do I read and find my YouTube comments?,24,How can I see all my Youtube comments?,1
12,25,What can make Physics easy to learn?,26,How can you make physics easy to learn?,1
13,27,What was your first sexual experience like?,28,What was your first sexual experience?,1
15,31,What does manipulation mean?,32,What does manipulation means?,1
17,35,Why are so many Quora users posting questions ...,36,Why do people ask Quora questions which can be...,1
25,51,How should I prepare for CA final law?,52,How one should know that he/she completely pre...,1
27,55,What are some special cares for someone with a...,56,How can I keep my nose from getting stuffy at ...,1
28,57,What Game of Thrones villain would be the most...,58,What Game of Thrones villain would you most li...,1
