Skip to content

Tasks for Advance Natural Language Processing Course @ ITMO University

Notifications You must be signed in to change notification settings

nemat-al/Advance-Natural-Language-Processing

Repository files navigation

Advance Natural Language Processing

Tasks for Advance Natural Language Processing Course at ITMO University.

Index

  1. Data cleaning and duplicaets detection
  2. Quora duplicates detection
  3. Attention mechanism and Bert
  4. Fine tuning and dividing documents on 10 topic categories
  5. Topic Modeling

This file contains :

  • Data Cleaning:
  1. Remove non-english words
  2. Remove html-tags (try to do it with regular expression, or play with beautifulsoap library)
  3. Apply lemmatization / stemming
  4. Remove stop-words
  • Duplicates detection using LSH

The task in this file isbuild an LSTM-based siamese homework and search for the duplicates in quora question pairs dataset.

The task in this file is to create attention mechanism with the numpy tool. And use of pre-trained models for text processing.

The task in this file is to divide documents on 10 topic categories using Huggingface Datasets library.

In this file I've applied topic modeling with NMF (using sklearn.decomposition.NMF) and topic modeling with LDA (using gensim implementation) in addition to applying the following two quality fuctions: coherence, and normalized PMI.

Releases

No releases published

Packages

No packages published