# Quick and Easy Spacy Lemmatizer

This is my first competition and I learned a lot from the kernels and discussions. In this notebook and the next I will try to give back a little bit to the community by sharing some examples.

Spacy is a great tool and there is already some pretty detailed kernel on it. Yet I didn't find them easy to understand for someone new to spacy. My intend here is to make it as clear and direct as possible.

I don't use stemmers because I feel we lose to much data. 
My initial goal was actually to find something that would make the question more similar to Chinese. For the basis I have in Chinese one would say: I eat, he eat, yesterday he eat, tomorrow they eat, etc. The verb do not change and yet we perfectly understand the meaning of the sentence. The lemmatizer offered with spacy gave me a very similar result and added some perks (ex: removing plurals)

And as Anokas says: "**if this helped you, some upvotes would be very much appreciated - that's where I get my motivation! :D**"

In [None]:
#!pip install spacy

In [None]:
import pandas as pd
import numpy as np

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

import spacy
nlp = spacy.load('en')

def lemmatizer(text):        
    sent = []
    doc = nlp(text)
    for word in doc:
        sent.append(word.lemma_)
    return " ".join(sent)

TEST_DATA_FILE  = '../input/test.csv'

dataTest = pd.read_csv(TEST_DATA_FILE, sep=',', encoding='utf-8')
dataTest['question1'] = dataTest['question1'].fillna("empty")
print("lemma spacyQ1_test")
dataTest["spacyQ1_test"] =  dataTest.apply(lambda x: lemmatizer(x['question1']), axis=1)

dataTest['question2'] = dataTest['question2'].fillna("empty")
print("lemma spacyQ2_test")
dataTest["spacyQ2_test"] =  dataTest.apply(lambda x: lemmatizer(x['question2']), axis=1)

dataTest['question1'] = dataTest["spacyQ1_test"]
dataTest['question2'] = dataTest["spacyQ2_test"]

dataTest = dataTest.drop(['spacyQ1_test', 'spacyQ2_test'], axis=1)

So, without spacy in this notebook I can't directly display data, but here are some output from this code on the training set, question1.

## Outputs

    what be the step by step guide to invest in share market in india ?
    what be the story of kohinoor ( koh - i - noor ) diamond ?
    how can -PRON- increase the speed of -PRON- internet connection while use a vpn ?
    why be -PRON- mentally very lonely ? how can -PRON- solve -PRON- ?
    which one dissolve in water quikly sugar , salt , methane and carbon di oxide ?
    astrology : -PRON- be a capricorn sun cap moon and cap rise ... what do that say about -PRON- ?
    should -PRON- buy tiago ?
    how can -PRON- be a good geologist ?
    when do -PRON- use シ instead of し ?
    motorola ( company ) : can -PRON- hack -PRON- charter motorolla dcx3400 ?
    method to find separation of slit use fresnel biprism ?
    how do -PRON- read and find -PRON- youtube comment ?
    what can make physics easy to learn ?
    what be -PRON- first sexual experience like ?
    what be the law to change -PRON- status from a student visa to a green card in the us , how do -PRON- compare to the immigration law in canada ?
    what would a trump presidency mean for current international master ’s student on an f1 visa ?
    what do manipulation mean ?
    why do girl want to be friend with the guy -PRON- reject ?
    why be so many quora user post question that be readily answer on google ?
    which be the good digital marketing institution in banglore ?
    why do rocket look white ?
    what be cause someone to be jealous ?
    what be the question should not ask on quora ?
    how much be 30 kv in hp ?
    what do -PRON- mean that every time -PRON- look at the clock the number be the same ?
    what be some tip on make -PRON- through the job interview process at medicines ?
    what be web application ?
    do society place too much importance on sport ?
    what be good way to make money online ?
    how should -PRON- prepare for ca final law ?
    what be one thing -PRON- would like to do good ?
    what be some special care for someone with a nose that get stuffy during the night ?
    what game of thrones villain would be the most likely to give -PRON- mercy ?
    do the united states government still blacklist ( employment , etc . ) some united states citizen because -PRON- political view ?
    what be the good travel website in spain ?
    why do some people think obama will try to take -PRON- gun away ?
    -PRON- be a 19-year - old . how can -PRON- improve -PRON- skill or what should -PRON- do to become an entrepreneur in the next few year ?
    "when a girlfriend ask -PRON- boyfriend "" why do -PRON- choose -PRON- ? what make -PRON- want to be with -PRON- ? "" , what should one reply to -PRON- ?"
    how do -PRON- prepare for upsc ?
    what be the stall speed and aoa of an f-14 with wing fully sweep back ?
    why do slavs squat ?
    when can -PRON- expect -PRON- cognizant confirmation mail ?
    can -PRON- make 50,000 a month by day trading ?
    be be a good kid and not be a rebel worth -PRON- in the long run ?
    what university do rexnord recruit new grad from ? what major be -PRON- look for ?
    what be the quick way to increase instagram follower ?
    how do darth vader fight darth maul in star wars legends ?
    what be the stage of break up between couple ? -PRON- mean , what happen after the breaking up emotionally whether -PRON- a male or female ?
    what be some example of product that can be make from crude oil ?
    how do -PRON- make friend .
    be career launcher good for rbi grade b preparation ?
    will a blu ray play on a regular dvd player ? if so , how ?
    nd -PRON- be always sad ?
    what be the good / most memorable thing -PRON- have ever eat and why ?
    how gst affect the ca and tax officer ?
    how difficult be -PRON- get into rsi ?
    who be israil friend ?
    what be some good rap song to dance to ?
    -PRON- be suddenly log off gmail . -PRON- can not remember -PRON- gmail password and just realize the recovery email be no longer alive . what can -PRON- do ?
    what be the good way to learn french ?
    how do -PRON- download content from a kickass torrent without registration ?
    be -PRON- normal to have a dark ring around the iris of -PRON- eye ?
    how be the new harry potter book ' harry potter and the cursed child ' ?
    why do -PRON- always get depressed ?
    where can -PRON- find a european family office database ?
    what be java programming ? how to learn java programming language ?
    what be the good book ever make ?
    can -PRON- ever store energy produce in lightning ?
    what be -PRON- review of performance testing ?
    at what cost do so much privacy as in germany come ? what else be lose to gain so much privacy ?
    what be the type of immunity ?
    what be a narcissistic personality disorder ?
    how -PRON- can speak english fluently ?
    how helpful be quickbooks ' auto datum recovery support phone number to recover -PRON- corrupt data file ?
    who be the rich gambler of all time and how can -PRON- reach -PRON- level ?
    if -PRON- fire a bullet backward from an aircraft go faster than the bullet ; will the bullet be go backwards ?
    how do -PRON- prevent breast cancer ?
    how do -PRON- log out of -PRON- gmail account on -PRON- friend 's phone ?
    how can -PRON- make money through the internet ?
    what be purpose of life ?
    when will the bjp government strip all the muslims and the christians of the indian citizenship and put -PRON- on boat like the rohingya 's of burma ?
    what be the right etiquette for wish a jehovah witness happy birthday ?
    if someone want to open a commercial fm radio station in any city of india , how much do -PRON- cost and what be the procedure ?
    why do swiss despise asians ?
    what be some of the high salary income job in the field of biotechnology ?
    how can -PRON- increase -PRON- height after 21 also ?
    what be the major effect of the cambodia earthquake , and how do these effect compare to the kamchatca earthquake in 1952 ?
    what be the difference between sincerity and fairness ?
    which be the good gaming laptop under 60k inr ?
    what be -PRON- review of the next warrior : prove grounds - part 9 ?
    what be the good reference book for physics class 11th ?
    national institute of technology , kurukshetra : how be the social life at nitk , surathkal ?
    what be some of the good romantic movie in english ?
    what cause a nightmare ?
    what be abstract expressionism in painting ?
    how do 3d print work ?
    what be -PRON- like to attend caltech with jeremy ehrhardt ?
    why do harry become a horcrux ?
    what be the good associate product manager ( apm ) program that someone in -PRON- early 20 can join to learn product management and have a rewarding career in the company ?
    why be the number for skype at 1 - 855 - 425 - 3768 always busy ?
    will there really be any war between india and pakistan over the uri attack ? what will be -PRON- effect ?
    do ronald reagan have a mannerism in -PRON- speech ?
    what be the war strategy of the union and the confederate during the civil war ?
    which be the good fiction novel of 2016 ?
    can -PRON- recover -PRON- email if -PRON- forget the password ?
    will the recent demonetisation result in high gdp ? if so how much ?
    have -PRON- ever hear of travel hacking ?
    what be the difference between love and pity ?
    how competitive be the hiring process at republic bank ?
    how google help in spam rank adjustment of the search result ?
    where can -PRON- watch gonulcelen with english subtitle ?
    be usa the most powerful country of the world ?
    how do -PRON- obtain an instant ulcer pain relief ?
    what do -PRON- think china food ?
    what do take advantage of someone mean ?
    why do -PRON- cry when -PRON- be happy and when -PRON- be sad ?
    why do some girl like to stick -PRON- -PRON- tongue out when take picture ?
    "do -PRON- find the ending of the novel "" 1984 "" depress ?"
    what be some mind - blow computer tool that exist that most people do not know about ?
    should the toothbrush be wet or dry before apply the toothpaste ?
    why -PRON- question be mark as need imrovement ?
    what be the difference between a neutral state and a buffer state ?
    what mineral hold the high electrical charge ?

