# 1. [Introduction](https://entelecheia.github.io/ekorpkit-book/docs/lectures/deep_nlp/lecture01.html)

In [2]:
from IPython.display import Image

## What is NLP?

- NLP, or natural language processing, is a branch of artificial intelligence that deals with the interpretation and manipulation of human language. 
- NL ∈ {Korean, English, Spanish, Mandarin, Hindi, Arabic, … Inuktitut}
- Automation of NLs:
  - analysis (NL → $R$)
  - generation ($R$ → NL)
  - acquisition of $R$ from knowledge and data

## IBM’s Watson wins at Jeopardy!

- Watson is a computer system built by IBM that is designed to answer questions posed in natural language. 
- The system made its public debut on the game show Jeopardy! in 2011, where it competed against human contestants and won. Since then, Watson has been applied to a number of different tasks, including healthcare, finance, and customer service.

```{image} ./figs/entelecheia_IBMs_Watson_wins_at_Jeopardy_72e86643-7115-4d9b-874e-ee2e4c9c0fb0.png
:alt: watson
:width: 600px
:align: center
```

<center><img src="./figs/entelecheia_IBMs_Watson_wins_at_Jeopardy_72e86643-7115-4d9b-874e-ee2e4c9c0fb0.png" alt="watson" width="600px"></center>

## What are the related areas of NLP?

-  Computational Linguistics
-  Artificial Intelligence
-  Machine Learning
-  Data Mining
-  Information Retrieval
-  Natural Language Understanding
-  Robotics
-  Speech Recognition
-  Text Mining

## Why study or research NLP?

- NLP is an interesting and important field with a wide range of applications. 
- It is constantly evolving, which means there are always new challenges to tackle.

- NLP is also an interdisciplinary field, drawing from linguistics, computer science, and psychology. This means that there are many different perspectives and approaches to NLP, making it a rich and fascinating area to study.

- Language is the key to understanding knowledge and human intelligence.  
- It is also the key to communication and social interaction. 
- NLP is a powerful tool that can help us to unlock the secrets of language and human cognition, and to build better systems for communication and social interaction.


## What can you do with NLP?

- **Natural language (and speech) interfaces**

  - Search/IR, database access, image search, image description
  - Dialog systems (e.g., customer service, robots, cars, tutoring), chatbots

- **Information extraction, summarization, translation:**

  - Process (large amounts of) text automatically to obtain meaning/knowledge contained in the text
  - Identify/analyze trends, opinions, etc. (e.g., in social media)
  - Translate text automatically from one language to another

- **Convenience**:

  - Grammar/style checking, automate email filing, autograding



- **Natural language understanding**

  - Extract information (e.g., about entities, events or relations between them) from text
  - Translate raw text into a meaning representation
  - Reason about information given in text
  - Execute NL instructions
  
- **Natural language generation and summarization**

  - Translate database entries or meaning representations to raw natural language text
  - Produce (appropriate) utterances/responses in a dialog
  - Summarize (newspaper or scientiﬁc) articles, describe images

- **Natural language translation**

  - Translate one natural language to another

### Some of the more popular applications of NLP include:

- Virtual assistants, such as Siri, Alexa, and Google Assistant
- Chatbots
- Text analytics
- Content moderation
- Speech recognition

### There are also interesting techniques that let you go between text and images, such as:

- Image captioning
- Text-to-image synthesis
- Image-to-text translation

### NLP Application - Machine Translation



## What are the desiderata, or goals, for NLP?

-  Generality across different languages, genres, styles, and modalities
-  Sensitivity to a wide range of the phenomena and constraints in human language  
-  Ability to learn from very small amounts of data
-  Ease of use and interpretability
-  Robustness in the face of errors and noise
-  Explainable to humans
-  Account for the complexities of real-world language use
-  Generate outputs that are natural and appropriate for the context

## Why is NLP hard?

NLP is hard because language is ambiguous and constantly changing. Language is also a complex system with many different levels of structure, from the phonetic to the pragmatic.


### The main challenges in NLP are:

-  Ambiguity: Language is ambiguous, which makes it difficult for NLP systems to interpret a text.
-  Change: Language is constantly changing, which makes it difficult for NLP systems to keep up with the latest changes.
-  Complexity: Language is a complex system with many different levels of structure, from the phonetic to the pragmatic.
  

## Neural Nets for NLP

A neural net is a type of machine learning algorithm that is inspired by the structure of the brain. 

Neural nets are composed of a series of interconnected processing nodes, or neurons, that can learn to recognize patterns of input data.


### Some popular neural net models for NLP include:

-  Long Short-Term Memory (LSTM)
-  Gated recurrent unit (GRU)
-  Transformers
-  Bidirectional Encoder Representations from Transformers (BERT)
-  Generative Adversarial Networks (GANs)


### What are the benefits of neural nets for NLP?

-  Neural nets can learn from very small amounts of data
-  Neural nets are good at capturing the statistical properties of language
-  Neural nets can be trained on a variety of NLP tasks

### What are the limitations of neural nets for NLP?

-  Neural nets require a lot of computational power
-  Neural nets can be difficult to interpret
-  Neural nets can be overfit to the training data

## Ethics issues in NLP

NLP is a powerful tool that can be used for a wide range of tasks, including content moderation, text analytics, and speech recognition. However, NLP also raises a number of ethical concerns, including:

-  Privacy: NLP can be used to process large amounts of personal data, which raises privacy concerns.

-  Bias: NLP models can be biased against certain groups of people.

-  Manipulation: NLP can be used to manipulate people by presenting them with false or misleading information.

 What does it take to understand the text?

There are many theories in the field of linguistics, but one of the most common theories is that language is composed of three levels:

1. Phonology: the study of the sound system of a language

2. Morphology: the study of the structure of words

3. Syntax: the study of the structure of sentences

In order to understand text, you need to be able to understand all three levels.

What does an NLP system need to “know”?

Humans fluently integrate all of these levels in producing and understanding language. NLP systems, however, need to be explicitly told how to do this. This is because NLP systems are designed to work with digital text, which is just a sequence of characters.

In order to work with this sequence of characters, an NLP system needs to be told how the sequence corresponds to the three levels of linguistic structure.

1. Phonology: how do the characters in the text correspond to the sounds of the language?

2. Morphology: how do the characters in the text correspond to the structure of words?

3. Syntax: how do the characters in the text correspond to the structure of sentences?

In order to answer these questions, an NLP system needs a set of rules that define the correspondence between characters and the three levels of linguistic structure. These rules are called linguistic resources.

What are linguistic resources?

Linguistic resources are a set of rules that define the correspondence between characters and the three levels of linguistic structure.

The most important linguistic resources for English are:

1. A phonetic alphabet: a set of symbols that represent the sounds of English

2. A set of morphological rules: rules that define how English words are formed

3. A set of syntactic rules: rules that define how English sentences are formed

4. A set of lexical rules: rules that define the meaning of English words

In order to understand text, an NLP system needs to be able to map the characters in the text to the three levels of linguistic structure.

1. Phonology: the system needs to be able to map the characters in the text to the sounds of the language.

2. Morphology: the system needs to be able to map the characters in the text to the structure of words.

3. Syntax: the system needs to be able to map the characters in the text to the structure of sentences.
