Swedish Augmentation Packages

Includes many different Augmentation packages for Swedish.

How do i setup?

Step 1

!git clone https://github.com/mosh98/swe_aug.git

This is built on top of a swedish word2vec. Make sure you download that first.

Step 2

!wget https://www.ida.liu.se/divisions/hcs/nlplab/swectors/swectors-300dim.txt.bz2
!bzip2 -dk /content/swectors-300dim.txt.bz2
!pip install -r reqs.txt


word_vec_path = '/content/swectors-300dim.txt' #path to txt vector file

#you can even set path to your own pretrain word2vec (make sure its a txt file)

Then Use your desired augmentation package

EDA

EDA: Easy Data Augmentation in Swedish

What is EDA? [2]

A way to augment data in a way that is easy to understand and use. There are 4 mains components

Random Synomym Replacement
Random Word Replacement
Random Word Deletion
Random Word Insertion

from swe_aug import EDA
aug = EDA.Enkel_Data_Augmentation(word_vec_path)

txt = "enter ur desired text. It can be a sentence or a paragraph"

augmented_sentences = aug.enkel_augmentation(txt, alpha_sr=0.1, 
                                             alpha_ri=0.3, alpha_rs=0.2, 
                                             alpha_rd=0.1, num_aug=4)
#returns a list of augmented sentences

Text Fragmenter

from swe_aug.Other_Techniques import Text_Cropping

frag = Text_Cropping.cropper(percent = 0.25)
list_of_fragmented_sentence = frag.text_fragmeter(txt)
# chops sentence into 4 halfs.

Type Specific Similar word Replacement

Idea is to replace word that are similar in an embeddings space that has the same POS token. [4]

# "NOUN", "VERB", "ADJ", "ADV", "PROPN","CONJ"
#These are the tokens you can perturb! [CASE SENSITIVE!]

from swe_aug.Other_Techniques import Type_SR
aug = Type_SR.type_DA(word_vec_path)

list_of_augs = aug.type_synonym_sr(txt, token_type = "NOUN", n = 2)

References

[1] Swedish word2vec: https://www.ida.liu.se/divisions/hcs/nlplab/swectors/

[2] EDA: https://aclanthology.org/D19-1670/

[3] Text Fragmenter: That was me

[4] Type Specific: That was me too

Cite?

@software{Mahamud2022,
  author = {Mahamud,Mosleh},
  title = {Swedish Augmentation Packages},
  year = {2022},
  publisher = {GitHub},
  journal = {Not Decided yet},
  howpublished = {\url{https://github.com/mosh98/swe_aug}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.idea		.idea
Other_Techniques		Other_Techniques
EDA.py		EDA.py
main.py		main.py
readme.md		readme.md
reqs.txt		reqs.txt
synonyms.csv		synonyms.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Swedish Augmentation Packages

How do i setup?

Step 1

Step 2

Then Use your desired augmentation package

EDA

EDA: Easy Data Augmentation in Swedish

What is EDA? [2]

Text Fragmenter

Type Specific Similar word Replacement

References

Cite?

About

Releases

Packages

Languages

mosh98/swe_aug

Folders and files

Latest commit

History

Repository files navigation

Swedish Augmentation Packages

How do i setup?

Step 1

Step 2

Then Use your desired augmentation package

EDA

EDA: Easy Data Augmentation in Swedish

What is EDA? [2]

Text Fragmenter

Type Specific Similar word Replacement

References

Cite?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages