Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with memory mapped files and vectors.most_similar() #4

Closed
yassin-taskin opened this issue Mar 11, 2018 · 12 comments
Closed

Problems with memory mapped files and vectors.most_similar() #4

yassin-taskin opened this issue Mar 11, 2018 · 12 comments
Assignees
Labels
bug Something isn't working windows

Comments

@yassin-taskin
Copy link

Hello. I am on windows using python 3.6

There seems to be a problem with creating the magmap files for using the vectors.most_similar() method. Error:

  File "\pymagnitude\__init__.py", line 1074, in get_vectors_mmap
    os.rename(path_to_mmap_temp, self.path_to_mmap)

PermissionError: [WinError 32] The process cannot access the file because it is in use by another process \\AppDataTemp\\82ce74f40baf842d23c45b0e90688b9f.magmmap.tmp' -> '\AppData\\Local\\Temp\\82ce74f40baf842d23c45b0e90688b9f.magmmap'

I looked into the code and it seems you are creating and filling these files in the background.
It works fine for the vectors.most_similar_approx() method. The memmap file for approx seems to be created just fine.

The problem persists even when i use blocking=True when constructing the magnitude object.
Then i can't even use the approx method. It just hangs while initializing the object.

Thank you for your time.

@AjayP13
Copy link
Contributor

AjayP13 commented Mar 11, 2018

I use the Python fastener library to create a InterProcess lock and a threading.Lock to ensure only 1 process and 1 thread ever calls os.rename(path_to_mmap_temp, self.path_to_mmap). This seems to work on Mac and Linux.

It seems like fastener only supports Mac and Linux with fcntl. However, there is a way to implement InterProcess file locking with msvcrt on Windows.

If I gave you contributor access on the main GitLab repository, would you be comfortable creating a branch that fixes this issue and Issue #2 by checking if WINDOWS: # do windows specific code? That would be greatly appreciated as I don't have a Windows computer to test this code.

@AjayP13 AjayP13 added bug Something isn't working windows labels Mar 11, 2018
@AjayP13 AjayP13 self-assigned this Mar 11, 2018
@yassin-taskin
Copy link
Author

yassin-taskin commented Mar 11, 2018

If you can point me to the specific parts in the code that would need to be modified concerning the lock i could try and see if it works.
I haven't worked with github branches and merging yet however.
But i would be willing to give it a shot.

@AjayP13
Copy link
Contributor

AjayP13 commented Mar 11, 2018

Thanks. Can you sign up to GitLab and give me your username? I'll add you to the repository as a contributor and comment the parts of the code that need to be changed.

@yassin-taskin
Copy link
Author

yassin-taskin commented Mar 11, 2018

Or would it be possible to implement the locking with this?
https://pythonhosted.org/lockfile/lockfile.html

This seems to be a crossplatform solution that abstracts

Unix fcntl.flock(), fcntl.lockf()
and
msvcrt.locking()

EDIT:
https://pypi.python.org/pypi/filelock
This seems to be another library thats better maintained and also crossplatform

@AjayP13
Copy link
Contributor

AjayP13 commented Mar 11, 2018

Possibly, I'll give that library a try and see if that works first.

@AjayP13
Copy link
Contributor

AjayP13 commented Mar 11, 2018

I actually just went ahead and made the changes. I integrated that lockfile library and added an if-statement that opens self.path instead of /dev/fd/ when the OS is detected as Windows.

The diff can be seen here:
https://gitlab.com/Plasticity/magnitude/commit/41a409942896dde314a4966df3fe294756b80ce0

Test cases are running on the CI server, once they pass, it will be deployed to PyPI as version 0.1.10. Once it is deployed, you can upgrade magnitude on your system by running pip3 install pymagnitude -U. Please post back here if the changes fixed the issue, and I'll close out this issue and Issue #2.

@yassin-taskin
Copy link
Author

Thank you very much for your time Ajay. I will post as soon as i can, most likely tomorrow.

@AjayP13
Copy link
Contributor

AjayP13 commented Mar 11, 2018

The update is now deployed to PyPI. I actually continued using the fastener library because it does apparently support Windows. Not sure why you were getting the process locking error.

Please update magnitude and restart your computer (that will get rid of temp files). And hopefully, the package should be compatible with Windows now.

@yassin-taskin
Copy link
Author

Damn. The bug was still there after the test but i figured it out. It doesn't seem to have anything to do with the locks.

You need to delete the mmap reference to close it.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html
Says here right at the end of the class description.

So i simply added

                                all_vectors.flush()
                                del all_vectors

at line 1129

I don't understand why there would be different behavior on windows vs mac and linux.
It is also weird that the connection persists through the with statement that opens the file.

Maybe mac/linux and windows python have different GC implementations so the reference gets cleaned up immedietly on max/linux? Would be my best guess.

Anyway. All the examples in the documentation work now except slice indexing of vector.

plasticity-admin pushed a commit that referenced this issue Mar 12, 2018
@AjayP13
Copy link
Contributor

AjayP13 commented Mar 12, 2018

Hm, okay, that seems fine to me. I guess the GC is different on Windows or the open file reference from NumPy doesn't cause a conflict on macOS / Unix.

Fixed on this commit:
https://gitlab.com/Plasticity/magnitude/commit/3a73d0b4545e6d7da700c95f0e1c22fbfea3cb5d

And deploying to version 0.1.14 after the tests run.

@yassin-taskin
Copy link
Author

Well this seems to be the last of it then. Thank you for your time and swift responses.

This library is very helpful for me since i can use it in my master thesis.

It is easier to use and more ergonomic and has more functionality than the official fasttext library. Also the documentation is a lot better/ easier to navigate. And now it works on windows, which the original does not. 👍

Thank you for all the work you put into it.

@AjayP13
Copy link
Contributor

AjayP13 commented Mar 12, 2018

Your welcome and thanks for the help in making it Windows compatible! I added your handle as a Developer on the GitLab repository. If you come across any other issues, feel free to submit a pull request / merge request to get it merged into master and deployed to PyPI.

Would love to see what you working on / end up doing with Magnitude in your master's thesis, e-mail us at opensource@plasticity.ai if you can share it.

@AjayP13 AjayP13 closed this as completed Mar 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working windows
Projects
None yet
Development

No branches or pull requests

2 participants