Advance Natural Language Processing

Tasks for Advance Natural Language Processing Course at ITMO University.

Index

Data cleaning and duplicaets detection
Quora duplicates detection
Attention mechanism and Bert
Fine tuning and dividing documents on 10 topic categories
Topic Modeling

Data cleaning and duplicaets detection

This file contains :

Data Cleaning:

Remove non-english words
Remove html-tags (try to do it with regular expression, or play with beautifulsoap library)
Apply lemmatization / stemming
Remove stop-words

Duplicates detection using LSH

Quora duplicates detection

The task in this file isbuild an LSTM-based siamese homework and search for the duplicates in quora question pairs dataset.

Attention mechanism and Bert

The task in this file is to create attention mechanism with the numpy tool. And use of pre-trained models for text processing.

Fine tuning

The task in this file is to divide documents on 10 topic categories using Huggingface Datasets library.

Topic Modeling

In this file I've applied topic modeling with NMF (using sklearn.decomposition.NMF) and topic modeling with LDA (using gensim implementation) in addition to applying the following two quality fuctions: coherence, and normalized PMI.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data		Data
.gitattributes		.gitattributes
README.md		README.md
[Nemat.Aloush.J41332c].HW1.ipynb		[Nemat.Aloush.J41332c].HW1.ipynb
[Nemat_Aloush_J41332c]HW2.ipynb		[Nemat_Aloush_J41332c]HW2.ipynb
[Nemat_Aloush_J41332c]HW3_both.ipynb		[Nemat_Aloush_J41332c]HW3_both.ipynb
[Nemat_Aloush_J41332c]HW4_tune.ipynb		[Nemat_Aloush_J41332c]HW4_tune.ipynb
[Nemat_Aloush_J41332c]_HW1_optional_ipynb.ipynb		[Nemat_Aloush_J41332c]_HW1_optional_ipynb.ipynb
requirments.txt		requirments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advance Natural Language Processing

Index

Data cleaning and duplicaets detection

Quora duplicates detection

Attention mechanism and Bert

Fine tuning

Topic Modeling

About

Releases

Packages

Languages

nemat-al/Advance-Natural-Language-Processing

Folders and files

Latest commit

History

Repository files navigation

Advance Natural Language Processing

Index

Data cleaning and duplicaets detection

Quora duplicates detection

Attention mechanism and Bert

Fine tuning

Topic Modeling

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages