In [35]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords

nltk.download('punkt') # tokenization
nltk.download('stopwords') # stopwords
nltk.download('wordnet') # lemmatization
nltk.download('omw-1.4') # lemmatization
nltk.download('averaged_perceptron_tagger') # tagging
nltk.download('tagsets') # tagging

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\pawst\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\pawst\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\pawst\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\pawst\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\pawst\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package tagsets to
[nltk_data]     C:\Users\pawst\AppData\Roaming\nltk_data...
[nltk_data]   Package tagsets is already up-to-date!


True

## 2. Please familiarize yourself with the methods of text analysis (NLP: Natural Language Processing).
## The following English terms will be helpful in finding information:
- normalization
- cleaning
- tokenization
- stop word removal
- stemming
- lemmatization
- tagging
- N-grams

and the following representations:
- frequency encoding
- one-hot encoding
- TF-IDF
- continuous word representation
- word embeddings ( Word2Vec, GloVe, fastText).

Find the above text analysis methods in any Python API.

### Tokenization
#### Tokenization means splitting up strings of text into smaller pieces.

In [17]:
text = "These are short, famous texts in English from classic sources like the Bible or Shakespeare. Some texts have word definitions and explanations to help you. Some of these texts are written in an old style of English. Try to understand them, because the English that we speak today is based on what our great, great, great, great grandparents spoke before! Of course, not all these texts were originally written in English. The Bible, for example, is a translation. But they are all well known in English today, and many of them express beautiful thoughts."
print("Sample text: \n{}".format(text))
tokenized_text = word_tokenize(text, 'english')
print("Tokenized text: \n{}".format(tokenized_text))
print("Splitted text: \n{}".format(text.split()))

Sample text: 
These are short, famous texts in English from classic sources like the Bible or Shakespeare. Some texts have word definitions and explanations to help you. Some of these texts are written in an old style of English. Try to understand them, because the English that we speak today is based on what our great, great, great, great grandparents spoke before! Of course, not all these texts were originally written in English. The Bible, for example, is a translation. But they are all well known in English today, and many of them express beautiful thoughts.
Tokenized text: 
['These', 'are', 'short', ',', 'famous', 'texts', 'in', 'English', 'from', 'classic', 'sources', 'like', 'the', 'Bible', 'or', 'Shakespeare', '.', 'Some', 'texts', 'have', 'word', 'definitions', 'and', 'explanations', 'to', 'help', 'you', '.', 'Some', 'of', 'these', 'texts', 'are', 'written', 'in', 'an', 'old', 'style', 'of', 'English', '.', 'Try', 'to', 'understand', 'them', ',', 'because', 'the', 'English', '

### Stopwords removal
#### Stopwords are commonly occuring words in a language like 'the', 'a' and so on. They can be removed from the text most of the times, as they don't provide valuable information for downstream analysis.

In [18]:
text = "These are short, famous texts in English from classic sources like the Bible or Shakespeare. Some texts have word definitions and explanations to help you. Some of these texts are written in an old style of English. Try to understand them, because the English that we speak today is based on what our great, great, great, great grandparents spoke before! Of course, not all these texts were originally written in English. The Bible, for example, is a translation. But they are all well known in English today, and many of them express beautiful thoughts."
STOPWORDS = set(stopwords.words('english'))
def remove_stopwords(text: str) -> str:
    return " ".join([word for word in str(text).split() if word not in STOPWORDS])
print("Text with stopwords:\n{}".format(text))
print("Text without stopwords:\n{}".format(remove_stopwords(text)))

Text with stopwords:
These are short, famous texts in English from classic sources like the Bible or Shakespeare. Some texts have word definitions and explanations to help you. Some of these texts are written in an old style of English. Try to understand them, because the English that we speak today is based on what our great, great, great, great grandparents spoke before! Of course, not all these texts were originally written in English. The Bible, for example, is a translation. But they are all well known in English today, and many of them express beautiful thoughts.
Text without stopwords:
These short, famous texts English classic sources like Bible Shakespeare. Some texts word definitions explanations help you. Some texts written old style English. Try understand them, English speak today based great, great, great, great grandparents spoke before! Of course, texts originally written English. The Bible, example, translation. But well known English today, many express beautiful thoug

### Stemming
#### Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form.
##### For example, if there are two words in the corpus walks and walking, then stemming will stem the suffix to make them walk.

In [20]:
text = "These are short, famous texts in English from classic sources like the Bible or Shakespeare. Some texts have word definitions and explanations to help you. Some of these texts are written in an old style of English. Try to understand them, because the English that we speak today is based on what our great, great, great, great grandparents spoke before! Of course, not all these texts were originally written in English. The Bible, for example, is a translation. But they are all well known in English today, and many of them express beautiful thoughts."
stemmer = PorterStemmer()
def stemming(text: str) -> str:
    return " ".join([stemmer.stem(word) for word in text.split()])
print("Text before stemming:\n{}".format(text))
print("Text after stemming:\n{}".format(stemming(text)))

Text before stemming:
These are short, famous texts in English from classic sources like the Bible or Shakespeare. Some texts have word definitions and explanations to help you. Some of these texts are written in an old style of English. Try to understand them, because the English that we speak today is based on what our great, great, great, great grandparents spoke before! Of course, not all these texts were originally written in English. The Bible, for example, is a translation. But they are all well known in English today, and many of them express beautiful thoughts.
Text after stemming:
these are short, famou text in english from classic sourc like the bibl or shakespeare. some text have word definit and explan to help you. some of these text are written in an old style of english. tri to understand them, becaus the english that we speak today is base on what our great, great, great, great grandpar spoke before! of course, not all these text were origin written in english. the bibl

### Lemmatization
#### Lemmatization is similar to stemming in reducing inflected words to their word stem but differs in the way that it makes sure the root word (also called as lemma) belongs to the language.

In [28]:
text = "These are short, famous texts in English from classic sources like the Bible or Shakespeare. Some texts have word definitions and explanations to help you. Some of these texts are written in an old style of English. Try to understand them, because the English that we speak today is based on what our great, great, great, great grandparents spoke before! Of course, not all these texts were originally written in English. The Bible, for example, is a translation. But they are all well known in English today, and many of them express beautiful thoughts."
lemmatizer = WordNetLemmatizer()
def lemmatization(text: str) -> str:
    return " ".join([lemmatizer.lemmatize(word) for word in text.split()])
print("Text before lemmatization:\n{}".format(text))
print("Text after lemmatization:\n{}".format(lemmatization(text)))

Text before lemmatization:
These are short, famous texts in English from classic sources like the Bible or Shakespeare. Some texts have word definitions and explanations to help you. Some of these texts are written in an old style of English. Try to understand them, because the English that we speak today is based on what our great, great, great, great grandparents spoke before! Of course, not all these texts were originally written in English. The Bible, for example, is a translation. But they are all well known in English today, and many of them express beautiful thoughts.
Text after lemmatization:
These are short, famous text in English from classic source like the Bible or Shakespeare. Some text have word definition and explanation to help you. Some of these text are written in an old style of English. Try to understand them, because the English that we speak today is based on what our great, great, great, great grandparent spoke before! Of course, not all these text were originall

### Tagging
#### Tagging is an essential feature of text processing where we tag the words into grammatical categorization. We take help of tokenization and pos_tag function to create the tags for each word.

In [32]:
text = "These are short, famous texts in English from classic sources like the Bible or Shakespeare. Some texts have word definitions and explanations to help you. Some of these texts are written in an old style of English. Try to understand them, because the English that we speak today is based on what our great, great, great, great grandparents spoke before! Of course, not all these texts were originally written in English. The Bible, for example, is a translation. But they are all well known in English today, and many of them express beautiful thoughts."
def tagging(text: str) -> str:
    return  nltk.pos_tag(nltk.word_tokenize(text))

print("Text before tagging:\n{}".format(text))
print("Text after tagging:\n{}".format(tagging(text)))

Text before tagging:
These are short, famous texts in English from classic sources like the Bible or Shakespeare. Some texts have word definitions and explanations to help you. Some of these texts are written in an old style of English. Try to understand them, because the English that we speak today is based on what our great, great, great, great grandparents spoke before! Of course, not all these texts were originally written in English. The Bible, for example, is a translation. But they are all well known in English today, and many of them express beautiful thoughts.
Text after tagging:
[('These', 'DT'), ('are', 'VBP'), ('short', 'JJ'), (',', ','), ('famous', 'JJ'), ('texts', 'NN'), ('in', 'IN'), ('English', 'NNP'), ('from', 'IN'), ('classic', 'JJ'), ('sources', 'NNS'), ('like', 'IN'), ('the', 'DT'), ('Bible', 'NNP'), ('or', 'CC'), ('Shakespeare', 'NNP'), ('.', '.'), ('Some', 'DT'), ('texts', 'NNS'), ('have', 'VBP'), ('word', 'NN'), ('definitions', 'NNS'), ('and', 'CC'), ('explanatio

We can describe the meaning of each tag by using the following program which shows the in-built values.

In [36]:
nltk.help.upenn_tagset('NN')
nltk.help.upenn_tagset('IN')
nltk.help.upenn_tagset('DT')

NN: noun, common, singular or mass
    common-carrier cabbage knuckle-duster Casino afghan shed thermostat
    investment slide humour falloff slick wind hyena override subhumanity
    machinist ...
IN: preposition or conjunction, subordinating
    astride among uppon whether out inside pro despite on by throughout
    below within for towards near behind atop around if like until below
    next into if beside ...
DT: determiner
    all an another any both del each either every half la many much nary
    neither no some such that the them these this those


### N-grams
#### N-grams are continuous sequences of words or symbols or tokens in a document. In technical terms, they can be defined as the neighbouring sequences of items in a document.
Thinking in the same lines, n-grams are classified into the following types, depending on the value that ‘n’ takes.

| n | term    |
|---|---------|
| 1 | unigram |
| 2 | bigram  |
| 3 | trigram |
| n | n-gram  |

In [None]:
# https://www.analyticsvidhya.com/blog/2021/09/what-are-n-grams-and-how-to-implement-them-in-python/

### Text Normalization
Text normalization includes:
- converting all letters to lower or upper case
- converting numbers into words or removing numbers
- removing punctuations, accent marks and other diacritics
- removing white spaces
- expanding abbreviations
- removing stop words, sparse terms, and particular words
- text canonicalization

### Text cleaning
#### Preparing raw text for NLP so that machines can understand human language.
Steps:
- remove extra spaces
- remove punctuations
- case normalization
- tokenization
- remove stopwords
- lemming / stemming

### frequency encoding

### one-hot encoding

###  TF-IDF

###  continuous word representation

###  word embeddings ( Word2Vec, GloVe, fastText).

## 3. Proszę przedstawić przykład klasyfikacji tekstu za pomocą sieci neuronowej (może to być tzw. analiza sentymentu).

## 4. Proszę przedstawić przykład uczenia szeregu czasowego za pomocą sieci neuronowej. Dane można wygenerować samodzielnie w postaci pewnego przebiegu funkcyjnego.

## 5. Please provide an example of text generation using a neural network.

## 6. Please provide an example of text translation using a neural network.