# Recurrent Neural Network

- From [v1] Lecture 51

## RNN use cases in NLP

- Using Word2Vec, we were able to understand the word and were able to capture the neighbourhood relationship of that word
- Once the word is predicted, next we want to
  - Understand the Sentence
  - Understand the Paragraph
  - Understand the Document
  - Para Phrase/ Summarize
  - Translate the paragraph from one language to the other

## NLP Applications in Action

1. Keyboard prediction - What could be the next word
  - NLP is used in predicting the next word
2. Searching for a topic in a Search Engline (Eg., Google Search Engine) - Lots of options dropping down relating to the typed query
  - NLP is behind it in predicting the most possible query matching to the words that you have typed
3. Option to complete the sentece - E.g., GMail option to complete the sentence while typing the eMail
  - Note that it provides the possible _sentence_, not the words like case 1
  
- In all of the applications, size of the __*input is not limited by the words*__
  - You can have two characters typed, or 3 words typed or 5 words typed, and so on...
  - Once you start typing character, it keeps providing the right word for you
  - Once you complete the first word, it starts suggesting the next possible word for you
  - Once you have second word, it starts suggesting complete sentece for you

<a id='NLP_Applications_Problems'></a>

- Looking at the application, is it possible solve these problems with?
  - Vector Space Model
  - Probabilistic Language Model
  - ANN

![RNN_NLP_Applications_In_Action](images/RNN_NLP_Applications_In_Action.jpg)

## ANN for LM

- Study Link
  - [An Introduction to Recurrent Neural Networks](https://medium.com/explore-artificial-intelligence/an-introduction-to-recurrent-neural-networks-72c97bf0912>) gives more details on the limitations of Traditional Neural Network

- Using __*Traditional Neural Network*__ (Fixed Input Neural Networks)
  - Is it possible to use the NN for solving the [NLP Applications problem](#NLP_Applications_Problems)?
  - It is possible, However
    - The window size is fixed (in case of Word2Vec)
    - But the problems said above are limited by any size
    - If we use ANN (Referrring Word2Vec kind of ANN, traditional NN), we need to change the model for each increase in number of words in those problems
      - It is not easy to reuse (for longer sequence of words) the model what we have trained
  - In Traditional Neural Networks, there is a restriction that input layer size is fixed
  - Traditiona NN does not bother about the sequence of words. Meaning, time series (sequence data) cannot be used in Traditional NN

![RNN_ANN_For_LM](images/RNN_ANN_For_LM.jpg)

## Limitations of Fixed Input Neural Networks

- Embeddings are learned based on a _small local window_ surronding words
  - $\textrm{good}$ and $\textrm{bad}$ share the almost the same embedding
    - E.g., In this sentence $\textrm{good or bad take it}$
    - In this $\textrm{good}$ is appearing as contex for $\textrm{bad}$, but both are opposite words
      - Both are not similar, but embedding learnt it. It is not a problem of Neural Network.
      - It is because of the problem of the way we have constructed the sentence
- Does not address __*polysemy*__
  - $\textrm{The boys play cricket on the banks of a river}$
  - $\textrm{The boys play cricket near a national bank}$
  - In above sentences, $\textrm{bank}$ refers to different places - river side palce and actual financial transaction place
  - Small window size is not enough to understand the meaning of the entire sentence. Hence traditional NN has limitations for this.
- Does not use frequencies of _term co-occurences_
  - For e.g., in CBOW and Skip-Gram models, frequency of the word is completely ingnored
- Word embedding provide _distributed vectors for words_
  - How about phrases? $\textrm{India Today}$, $\textrm{Indian Express}$, $\textrm{The Sun News}$
    - Word Embedding represnts only words as vectors, not phrases like the one listed above, which as adjoint
    - Traiditional NN can't understand the phrases like above
  - Can we encode a sentence as a distributed vector - _Sentence Vectors_?
  - How about paragraphs? _Paragraph Vectors_?
- Memory less and does not bother where the words and context came from
- Not able to handle variable length text
- Some NLP tasks require semantic modeling over the whole sentence
  - Machine Translation
  - Question answering, Chat-bots
  - Text Summarization
- The data is considered as static - does not depend on a sequence of time
- They are location invariant
- Some important tasks depend on the sequence of data
  - $(y(t+1) = f(x(t),x(t-1),x(t-2),...,x(t-n))$

# Sequence Learning and its Applications

## Sequence Learning

- Sequence Learning is the study of machine learning algorithms designed for applications that require sequential data or temporal data
  - Example 1
    - Translation of a speech in English to Chinese speech
  - Example 2
    - A professional is speaking in a conference. He might be referring to his talks that are in initial time of session in later portion of that session
    - System should be able to understand the setences, understand the contexts, and when the reference happens, it should be able to understand in relation to that initial speech
    - This requires sequential learning, as sequence of sentences (speech) need to understood in that order for more context information
- We model the speech in the Time-Series in the NLP and use that to the NN as input, to make it understand the sequence, for the required task

## Applciations (Uses RNN over Time Series Data)

- Named Entity Recognition (NER)
  - Example: "Mr.John is the CEO of the Company. And he had done great things for the company"
    - What does that "he" means there in the second sentece?
    - Sytem should be able to say that "he" refers to the CEO of that company
    - It is called NER.
  - How many times does the 'CEO' referred in the documennt? Is it possible to find that?
    - NER model should be used, which can recognize that person as CEO, wherever he is mentioned as part of the document.
- Paraphrase detection - identifying semantically equivalent questions
  - A question can be asked in different ways
  - All those questions are semantically equivalent
  - Paraphrase detection is finding semantically equivalent sentences
  - Example: IT Call Center
    - They receive calls having various queries, but semantically equivalent
    - Company need to find most frequently asked questions (FAQ), so that new joinee can answer those calls
    - Here paraphrase detection is required
- Language Generation
  - 'Given a photograph, you are asked to write a line about the photograph'
    - You look at the content of the photograph, say objects and you give some title
    - We can use Language Generation Model
      - Input is going to be different
        - Meaning, the photograph may have 3 object, or 5 object or any number of objects in it
        - We should be able to process those without changing or adjusting our neural network size
- Machine Translation
  - Given a parallel corpora, we should be able to translate from one language to the other
  - We have to do sentence by sentence translation to have correct translation (not word by word translation, which won't give correct translation)
- Speech Recognition
  - Should be able to translation a speech audio file from one language to the other
  - Based on the speech, system should be able to recognize what he is speaking about
    - Example: Wreck a nice beach or recognize speech
      - Based on the context, it should be able to understand that the speaker said "recognize speech"
    - This can be acheived in NLP, when we have taken words as time-series
- Automatically generating subtitles for a video
- Spell Checking
  - When you type, you should really be able to figure out the distance between the characters that you have typed so-far and the words that are in the dictionary, start suggesting what is the right word.
- Predictive Typing
- Chat-bots/ Dialog Understanding
  - An application which should understand the sentences and provide some input to the user
    - Eg., Customer Service Chat bot. Based on user input, it should provide input to the user
- Generate/ Correct Hard-written text
  - OCR cannot be used always for generating text from hand written text
    - E.g., - ![SL_Hard_Written_Text](images/SL_Hard_Written_Text.jpg)
    - In this, it is unclear whether "the quick __brown__ fo" or "the quick __frown__ fo"