Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about performance !? #14

Closed
florindumitru opened this issue May 15, 2015 · 9 comments
Closed

What about performance !? #14

florindumitru opened this issue May 15, 2015 · 9 comments

Comments

@florindumitru
Copy link

Hi,
First of all you are the best :) !
Second: Tell me please about the performance, like how many photos can I use in my Index.

Thanks

@florindumitru florindumitru changed the title Index dump and performance !? What about performance !? May 15, 2015
@magwyz
Copy link
Owner

magwyz commented May 19, 2015

It depends a lot on your server performance.
I strongly discourage you to use a virtual machine on a cloud infrastructure as Pastec will be very slow on it. Pastec runs a lot better on a dedicated server. Typically, you can load an instance with up to 1 000 000 images on a server with 40GB RAM (yes, it needs RAM...). With a good CPU, it should answer in less than 4 seconds in this case.

@magwyz magwyz closed this as completed May 19, 2015
@ecdeveloper
Copy link

ecdeveloper commented May 15, 2016

I have a few questions too.

  1. Is there a way to increase performance on adding images to the index? Currently adding an image to the index takes about a second (running in OSX, 2.5 GHz Intel Core i7, 16GB). I may have to index millions of images, and 1s per indexing is not acceptable. Any way to improve this performance?
  2. Also, is there any cap in terms of amount of images I can store in an index? I have to store, eventually, billions of images. But judging by your last comment, I'm afraid it may not be even possible. Can you suggest anything here?
  3. Is there a way to estimate the index size, based on the amount of images? Currently I have an index with a few thousands of images, and the index file size is ~23Mb. If it keeps to grow with such a rate, I may end up having an index which weights terrabytes for hundreds of millions of images.

Thanks in advance!

@magwyz
Copy link
Owner

magwyz commented May 15, 2016

  1. You can first try to multithread your image insertion code.
    There are also some possible optimizations in the code that I need to write correctly and push.
  2. The maximum number of images you can store in an index is set by your compute amount of RAM. Given the signature size, storing billions of images requires indeed a lot of servers... You can however try to rewrite the index to store the signatures on disk but the search will be probably very slow.

@ecdeveloper
Copy link

I got it, thanks! And what about the 3rd question? :)

@magwyz
Copy link
Owner

magwyz commented May 15, 2016

3 - It will keep growing at such a rate. Keep also in mind that the size of the index saved on disk is different than its size in the RAM. Besides, you will never be able to index hundreds of millions of images with the current Pastec on a single computer.

@ecdeveloper
Copy link

Got it, thank you. What about scaling the current Pastec app? Can I scale it across multiple servers?

@magwyz
Copy link
Owner

magwyz commented May 15, 2016

Le 15/05/2016 21:26, Evgheni C. a écrit :

Got it, thank you. What about scaling the current Pastec app? Can I
scale it across multiple servers?


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
#14 (comment)

Scaling is currently not supported. You have to manage several instances
on your own.

Adrien Maglo, Ph.D.
Pastec developer
http://www.pastec.io
+33 6 27 94 34 41

@ecdeveloper
Copy link

Got it, thank you.
So let's assume I figure out how to scale it. But is there a way to reduce the index db file size somehow? I currently indexed only 13K images, and my index size (physical size) is about 70MB. So with this rate, for 1M images it may grow to about 5G.

@magwyz
Copy link
Owner

magwyz commented May 17, 2016

There is no easy way.
But this makes little sense since, once again, the size of the index in RAM is different than what is written on disk... What is important is the size in the RAM since you have often less RAM than disk space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants