Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoded strings seem to be in reverse order #1

Closed
ftkurt opened this issue Dec 2, 2017 · 5 comments
Closed

Decoded strings seem to be in reverse order #1

ftkurt opened this issue Dec 2, 2017 · 5 comments

Comments

@ftkurt
Copy link

ftkurt commented Dec 2, 2017

Hi there,

I really appreciate your work, and I am currently using this for my thesis, but I run across somthing as follows, which I think might be a bug:

image
The tokens seem to be in reverse order:

Examples:

  • bastan --> an bas
  • duygu --> u duyg
  • sömürüsü --> s s ömürüü

I am not sure if this is expected output. But something seems to be wrong.

@soaxelbrooke
Copy link
Owner

Hey there! Thank you for creating this issue and sorry it took me so long to respond - I had this github repo configured wrong. I'll check it out!

@soaxelbrooke
Copy link
Owner

Hey @ftkurt, I couldn't reproduce it on my computer - would you mind sharing a small dataset/code snippet I can copy to reproduce locally? Thanks!

@ftkurt
Copy link
Author

ftkurt commented May 6, 2018

Hey @soaxelbrooke,

Thank you for the response. I tried again, but I am still getting the same output. I am attaching the corpus (a pickle in a zip) and adding some sample code below.

import pickle
from bpe import Encoder
with open("corpus.pickle","rb") as f:
    corpus = pickle.load(f).split("\n")

bpe_1k = Encoder(1000, ngram_max=10)
bpe_1k.fit(corpus)
print(next(bpe_1k.inverse_transform(bpe_1k.transform(["bastan"]))))
print(next(bpe_1k.inverse_transform(bpe_1k.transform(["duygu"]))))
print(next(bpe_1k.inverse_transform(bpe_1k.transform(["sömürüsü"]))))

The output should be as follows:

an bast
u duyg
s s ömürüü

corpus.zip

@soaxelbrooke
Copy link
Owner

Hey @ftkurt, I believe I've fixed the problem - can you reinstall and try it again?

@ftkurt
Copy link
Author

ftkurt commented May 18, 2018

Yes, it is fixed now. Great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants