# FastText
https://fasttext.cc/
0. FastText - python
1. Word representation
2. Text classificaiton

# FastText - Python

## 1. fastText 설치
https://pypi.org/project/fasttext/
```bash
$ pip install fastText
```
<b>※ 윈도우 오류 <br></b>
1. https://pypi.org/project/fasttext-win/
```bash
$ pip install fasttext-win
```
2. https://github.com/ageitgey/fastText-windows-binaries/releases
* python3.6/64bit : fasttext-0.9.1-cp36-cp36m-win_amd64.whl
* python3.7/64bit : fasttext-0.9.1-cp37-cp37m-win_amd64.whl
* python3.8/64bit : fasttext-0.9.1-cp38-cp38-win_amd64.whl
```bash
$ pip install fasttext-0.9.1-cp38-cp38-win_amd64.whl
```
3. download - https://github.com/xiamx/fastText/releases/download/fastText-latest-build43/fasttext-win64-latest-Release.zip <br>+ 디렉토리 경로 환경 설정해줘야 함.

## 2. Import

In [None]:
#모듈 임포트
import fasttext

## 3. Model

In [None]:
# Word representation model
model = fasttext.train_unsupervised('data.txt')

In [None]:
# skipgram
model = fasttext.train_unsupervised('data.txt', model='skipgram')

In [None]:
# cbow
model = fasttext.train_unsupervised('data.txt', model='cbow')

In [None]:
# train_unsupervised parameters
input             # training file path (required)
model             # unsupervised fasttext model {cbow, skipgram} [skipgram]
lr                # learning rate [0.05]
dim               # size of word vectors [100]
ws                # size of the context window [5]
epoch             # number of epochs [5]
minCount          # minimal number of word occurences [5]
minn              # min length of char ngram [3]
maxn              # max length of char ngram [6]
neg               # number of negatives sampled [5]
wordNgrams        # max length of word ngram [1]
loss              # loss function {ns, hs, softmax, ova} [ns]
bucket            # number of buckets [2000000]
thread            # number of threads [number of cpus]
lrUpdateRate      # change the rate of updates for the learning rate [100]
t                 # sampling threshold [0.0001]
verbose           # verbose [2]

In [None]:
# Text classification model
model = fasttext.train_supervised('data.train.txt')

In [None]:
# train_supervised parameters
input             # training file path (required)
lr                # learning rate [0.1]
dim               # size of word vectors [100]
ws                # size of the context window [5]
epoch             # number of epochs [5]
minCount          # minimal number of word occurences [1]
minCountLabel     # minimal number of label occurences [1]
minn              # min length of char ngram [0]
maxn              # max length of char ngram [0]
neg               # number of negatives sampled [5]
wordNgrams        # max length of word ngram [1]
loss              # loss function {ns, hs, softmax, ova} [softmax]
bucket            # number of buckets [2000000]
thread            # number of threads [number of cpus]
lrUpdateRate      # change the rate of updates for the learning rate [100]
t                 # sampling threshold [0.0001]
label             # label prefix ['__label__']
verbose           # verbose [2]
pretrainedVectors # pretrained word vectors (.vec file) for supervised learning []

In [None]:
#이미 저장된 모델 불러오기
model = fasttext.load_model("model_filename.bin")

## 4. Model object functions

In [None]:
get_dimension           # Get the dimension (size) of a lookup vector (hidden layer).
                        # This is equivalent to `dim` property.
get_input_vector        # Given an index, get the corresponding vector of the Input Matrix.
get_input_matrix        # Get a copy of the full input matrix of a Model.
get_labels              # Get the entire list of labels of the dictionary
                        # This is equivalent to `labels` property.
get_line                # Split a line of text into words and labels.
get_output_matrix       # Get a copy of the full output matrix of a Model.
get_sentence_vector     # Given a string, get a single vector represenation. This function
                        # assumes to be given a single line of text. We split words on
                        # whitespace (space, newline, tab, vertical tab) and the control
                        # characters carriage return, formfeed and the null character.
get_subword_id          # Given a subword, return the index (within input matrix) it hashes to.
get_subwords            # Given a word, get the subwords and their indicies.
get_word_id             # Given a word, get the word id within the dictionary.
get_word_vector         # Get the vector representation of word.
get_words               # Get the entire list of words of the dictionary
                        # This is equivalent to `words` property.
is_quantized            # whether the model has been quantized
predict                 # Given a string, get a list of labels and a list of corresponding probabilities.
quantize                # Quantize the model reducing the size of the model and it's memory footprint.
save_model              # Save the model to the given path
test                    # Evaluate supervised model using file given by path
test_label              # Return the precision and recall score for each label. 

In [None]:
#ex
model.get_word_vector('environment')

In [None]:
# equivalent to model.get_words()
model.words

In [None]:
# equivalent to model.get_labels()
model.labels

In [None]:
# equivalent to model.get_word_vector('king')
model['king']

In [None]:
# equivalent to `'king' in model.get_words('king')
'king' in model