# NLP Questions 

### Part of Speech Tagging

- part-of-speech tagging, also called grammatical tagging or word-category disambiguation
- process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph. 
- A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

Identifying part of speech tags is much more complicated than simply mapping words to their part of speech tags. This is because POS tagging is not something that is generic. It is quite possible for a single word to have a different part of speech tag in different sentences based on different contexts. That is why it is impossible to have a generic mapping for POS tags.

Part-of-Speech tagging in itself may not be the solution to any particular NLP problem. It is however something that is done as a pre-requisite to simplify a lot of different problems. Let us consider a few applications of POS tagging in various NLP tasks.

1. Text to Speech Conversion

Let us look at the following sentence:

`They refuse to permit us to obtain the refuse permit.`

The word refuse is being used twice in this sentence and has two different meanings here. refUSE (/rəˈfyo͞oz/)is a verb meaning “deny,” while REFuse(/ˈrefˌyo͞os/) is a noun meaning “trash” (that is, they are not homophones). Thus, we need to know which word is being used in order to pronounce the text correctly. (For this reason, text-to-speech systems usually perform POS-tagging.)

2. Word Sense Disambiguation

Words often occur in different senses as different parts of speech. For example:

- She saw a `bear`.
- Your efforts will `bear` fruit.

The word bear in the above sentences has completely different senses, but more importantly one is a noun and other is a verb. Rudimentary word sense disambiguation is possible if you can tag words with their POS tags.

Word-sense disambiguation (WSD) is identifying which sense of a word (that is, which meaning) is used in a sentence, when the word has multiple meanings.

Try to think of the multiple meanings for this sentence:

`Time flies like an arrow`

Here are the various interpretations of the given sentence. The meaning and hence the part-of-speech might vary for each word.

![image.png](attachment:image.png)

The above example shows us that a single sentence can have three different POS tag sequences assigned to it that are equally likely. That means that it is very important to know what specific meaning is being conveyed by the given sentence whenever it’s appearing. This is word sense disambiguation, as we are trying to find out THE sequence.

There are other applications as well which require POS tagging, like Question Answering, Speech Recognition, Machine Translation, and so on.

__POS-tagging algorithms fall into two distinctive groups:__

1. __Rule-Based POS Taggers__

    Automatic part of speech tagging is an area of natural language processing where statistical techniques have been more successful than rule-based methods.

    Typical rule-based approaches use contextual information to assign tags to unknown or ambiguous words. Disambiguation is done by analyzing the linguistic features of the word, its preceding word, its following word, and other aspects.

    For example, if the preceding word is an article, then the word in question must be a noun. This information is coded in the form of rules.

    Example of a rule: `If an ambiguous/unknown word X is preceded by a determiner and followed by a noun, tag it as an adjective.`

    Defining a set of rules manually is an extremely cumbersome process and is not scalable at all. So we need some automatic way of doing this.

    The Brill’s tagger is a rule-based tagger that goes through the training data and finds out the set of tagging rules that best define the data and minimize POS tagging errors. The most important point to note here about Brill’s tagger is that the rules are not hand-crafted, but are instead found out using the corpus provided. The only feature engineering required is a set of rule templates that the model can use to come up with new features.


2. __Stochastic POS Taggers__

    The term ‘stochastic tagger’ can refer to any number of different approaches to the problem of POS tagging. Any model which somehow incorporates frequency or probability may be properly labelled stochastic.

    The simplest stochastic taggers disambiguate words based solely on the probability that a word occurs with a particular tag. In other words, the tag encountered most frequently in the training set with the word is the one assigned to an ambiguous instance of that word. The problem with this approach is that while it may yield a valid tag for a given word, it can also yield inadmissible sequences of tags.

    An alternative to the word frequency approach is to calculate the probability of a given sequence of tags occurring. This is sometimes referred to as the n-gram approach, referring to the fact that the best tag for a given word is determined by the probability that it occurs with the n previous tags. This approach makes much more sense than the one defined before, because it considers the tags for individual words based on context.
    
    The next level of complexity that can be introduced into a stochastic tagger combines the previous two approaches, using both tag sequence probabilities and word frequency measurements. This is known as the __Hidden Markov Model (HMM).__
    

#### Different Tagging Methods

1. HMM Tagger
2. SVM Tagger
3. TnT tagger

### Questions

__1. What is part of speech (POS) tagging? What is the simplest approach to building a POS tagger that you can imagine?__

__2. Whats the need of Part-of-Speech tagging? name a few application where it is used.__

__3. How would you build a POS tagger from scratch given a corpus of annotated sentences? How would you deal with unknown words?__  
 
__4. Which is a better algorithm for POS tagging - SVM or hidden markov models ? why?__ 

### N-Gram Analysis

What is N- Gram, Unigram, Bigram  and Trigram?  
Which is better to use while extracting features character n-grams or word n-grams? Why?  

### Sentence Parsing

What is dependency parsing?  
What is semantic parsing ?  
What is constituency parsing ?  
What is difference between shallow parsing and dependency parsing ?  

### Text Filtering

What are stop words? Describe an application in which stop words should be removed?  
What are punctuation’s ? How can you remove it ?  
What is Noise Removal ?  

### Embedings

Are you familiar with WordNet or other related linguistic resources?  
What is word embedding ?  
What are word embedding libraries ?  
    - Word2vec  
    - Glove  
    - Fasttext  
    - genism  
What is word2vec ?  
What is Glove ?  
What is Fasttext ?  
What is Genism ?  

### Document Summarization

What is the TF-IDF score of a word and in what context is this useful?  
What is the significance of TF-IDF?  

### Text Classification

What is latent semantic indexing and where can it be applied?  
What is Latent semantic analysis ?  

### Graph Analysis

How does the PageRank algorithm work?  


### Sentiment Analysis

How would you design a model to predict whether a movie review was positive or negative?   
 

### Language Analysis

How would you approach a problem in NLP which is very easily solved for English (where you have abundant resources like Wordnet, Dictionaries, Sense tagged and parallel corpora) for other resource deprived languages like Hindi, Marathi etc.?  
How would you build a system to translate English text to Greek and vice-versa?   
What are the linguistic properties that are invariant across languages?  

### Applications

What is text mining ?  
What is Information Extraction ?  
What is object standardization ? When it will be used ?  
What is text generation ? When we will do it ?  
What is text summarization ? When we will do it ?  
What is Topic Modeling ? When we will do it ?  
What is sentiment analysis ? When we will do it ?  

### General

How would you train a model that identifies whether the word “Apple” in a sentence belongs to the fruit or the company?  
How would you find all the occurrences of quoted text in a news article?  
How would you build a system that auto corrects text that has been generated by a speech recognition system?  
How would you build a system that automatically groups news articles by subject?  
What is entropy? How would you estimate the entropy of the English language?
What is a regular grammar? Does this differ in power to a regular expression and if so, in what way?
What are the difficulties in building and using an annotated corpus of text such as the Brown Corpus and what can be done to mitigate them?
What tools for training NLP models (nltk, Apache OpenNLP, GATE, MALLET etc…) have you used?
Do you have any experience in building ontologies?
Do you speak any foreign languages?
What is dimensionality reduction?
Explain the working of SVM/NN/Maxent algorithms
What packages are you aware of in python which are used in NLP and ML?
What are conditional random fields ?
When can you use Naive Bayes algorithm for training, what are its advantages and disadvantages?
How will you cluster short text tweets, what problems you expect to face during this process?
What are the trade-offs between statistical and rule based machine translation?
What are the trade offs between supervised and unsupervised methods specific to NLP problems like word sense disambiguation or machine translation etc.

What is NLP(natural language processing) ?
What is applications of NLP ?
What is tokenization ?
What is stemming ?
What is lemmatizing ?
What is Normalization ?

What is NER (name entitry recognition)?

What are nlp libraries and tools ?  
    - CoreNLP from Stanford group.  
    - NLTK, the most widely-mentioned NLP library for Python.
    - TextBlob, a user-friendly and intuitive NLTK interface.
    - Gensim, a library for document similarity analysis.
    - SpaCy, an industrial-strength NLP library built for performance.


What is Wordnet ?
How can you find synonyms and antonyms for a word ?
What is NLG (Natural language Generation) ?
What is NLU (Natural language understanding) ?
What is Corpus ?

What is Language modeling ?



What Term frequency(TF) ?
What is Inverse term frequency (IDF) ?
What is difference between NLTK and Spacy ?
What is difference between OpenNLP and NLTK ?
What is sequence modeling ? How it’s helpful in NLP ?

How does the PageRank algorithm work?
What is Differentiate regular grammar and regular expression?
How will you estimate the entropy of the English language?
What is bagofwords model ?
What is cosine distance ?
What is doc2vec model ?
What is CBOW( continuous bag of words )
What is Skip-gram ?
What are models to reduce dimensionality of data in nlp
Latent Dirichlet Allocation
Latent Semantic Indexing
Keyword Normalization
What is document-term matrix ?
A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents.
What is pragmatic analysis in NLP?
How can you find word similarity in nlp ?
How can you find sentence similarity in nlp ?
How can you find document similarity in nlp ?
What is NLP usage in recommendation engines ?
What are conditional random fields ?
What are hidden markov fields ?
What is Naive bayes algorithm, When we can use this algorithm in NLP ?
What is Text Matching / Similarity techniques ?
Levenshtein Distance
Phonetic Matching
Flexible String Matching
Cosine Similarity
What is Coreference Resolution ?
What is Ambiguity in NLP ?
Explain about one project you have done in Nlp from start to ending.