-
Notifications
You must be signed in to change notification settings - Fork 8
Azure files share #10
Comments
Before we get into I/O performance I just want to reference the documentation on the Azure File Service benchmarks/limits:
Can you go into some details about your test?
|
The test where performed on a Standard_D11 sized worker role and the only difference between the deployments where the data path. I ran the test with 4 nodes and 5 shards. I also tried to mount a unique share to each instance, this had no significant impact on the write speed. |
So that's interesting. We should probably seek the guidance of the Azure Team on this one. For your local storage did you use a local resource or the temp disk on the role? I know the new D series instances have SSD based temp disks. That would make a significant difference. I am shocked that assigning a share to each node made no significant difference. Were you using a public dataset that I can download? Can you share your index/node configuration and stats from a pretty print? There may be other configuration options we can tweak to improve the performance. On the broader subject of benchmarking we must be careful not to impose requirements we may not need in production. The mere fact that the local disk allows you to create 13 million records 2 1/2 times faster than the file share doesn't necessary make the file share slow. It could also mean that the local disk is fast. What other points of comparison do we have to validate a conclusion that the file share's performance is not good? |
For the local test i used the local resource "ElasticRoot" I have created a new pull request with code for storing es data locally. I also bundled the Marvel plugin for easier monitoring. |
Unfortunately, azure worker roles is a PAAS solution. You have to think of these virtual machines as completely disposable. The local storage is not persistent and therefore is not suitable for data storage. The ElasticRoot is actually emptied out on every recycle. |
That is true, but you have a scenario where an external database will serve as the persistent storage. The worker roles will only store a recreatable index based on the db. If all nodes holds a copy of this index it should be no problem if a node goes down and new nodes added to the cluster will automatically get their own copy of the index. A problem may arise if all nodes goes down, but according to the Azure sla this is unlikely. |
But if that is the case, the time difference for creating the index is less of an issue. You have an automated process bootstrapping your index data when you first upload the role. Once the index is created write throughput becomes less of a priority. I understand that we want things to be as fast as possible but what harm does the extra hour do if it will only happens once on initial setup when the service is not being used? Realistically, how many records will be created initially? The courtlistener dataset I plan to test this with only has 2million records. Do you intend to periodically trash and rebuild the entire index? What is your plan for keeping these things in sync as your persistent store gets updated? |
Hi @tormodu can you please send an email to mastoragequestions@microsoft.com with your storage account name and approximate time you run your tests? Reference this issue and let them know you are looking for guidance on performance improvements. I contacted them as well and they asked me to send them my info when I get a chance to run my tests. They also suggested perfmon and redirector for monitoring performance on our end. |
I have contacted Microsoft and will perform further tests. |
I have tested some more and I think my problem has nothing to do with write speed to Azure files. This behavior does not occur when using local storage so I don't think neither the bulk indexing nor elastic search is to blame. At this point the conclusion point towards Azure files. Could it be that the worker role loses its connection to the share and has to reconnect? |
I know you tested with one unique share per worker role from a single storage account. I would think it would be informative to the Azure team the results of your tests using a unique share per worker role from different storage accounts. Underneath Azure search (https://azure.microsoft.com/en-us/services/search/ ) Microsoft uses a custom implementation of Elasticsearch and I am assuming they would be able to give you a better insight in to this problem. Also Azure files still is in preview and assuming Azure team would like to hear about this performance issue. |
It will be some time before I can get dedicated time to run some tests myself. However, I was hoping that we could establish some sort of baseline or determine what is an acceptable rate of document inserts per minute/second. @tormodu the role may not be losing a connection but they may be queued up. A couple questions.
|
I am using a console application using nest to index my data. The console application measures the time it takes for the batch to complete. I have looked at the performance counters and have found nothing out of the ordinary. What I have found is that when everything is working as expected the Azure files performance is quite good. I manage to index about 2000 documents per second using Azure files, using local ssd disk this number grows to about 3000 documents per second. I think this means we can conclude that write performance to Azure files is not an issue. I will pick this up again after Christmas. |
It seems like the issue is related to the following error message: java.io.IOException: An existing connection was forcibly closed by the remote host |
Seems to be an issue with Elasticsearch when communicating with other nodes. The only connections here are between nodes and the client. Strange that it would go away when writing to local disk. Perhaps, one of the nodes is locking up while rapidly writing to the file share. Could there be a performance counter we are neglecting to look into? |
Also, although I don't think we should abandon the file share, I have been doing some reading and I am leaning more toward the idea of keeping Elasticsearch in sync with a primary store such as bulk import files or some sort of database. You can still use the file share to store snapshots for quick recovery. At least quicker than it would take to rebuild the entire index with bulk inserts. |
Use local store plus data load instead of shares. |
I have been testing the performance of Elastic Search using the Azure files share persistent storage for Elastic Search indexes and the performance is not good.
I ran two tests one using local storage and one using the file share, both tests indexed 13 million documents. When using local storage indexing took about 45 minutes, using the share it took over 2 hours.
There should be a config value specifying whether to use local storage or azure files.
For systems using an external data store to populate the index, there may not be any need for a persistent disk for the index.
The text was updated successfully, but these errors were encountered: