Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any documentation? #257

Closed
ashald opened this issue Mar 16, 2015 · 39 comments
Closed

Any documentation? #257

ashald opened this issue Mar 16, 2015 · 39 comments

Comments

@ashald
Copy link

ashald commented Mar 16, 2015

Erm... Proof me if I'm wrong, but I looked everywhere - official site, github repo and github wiki, google and still wasn't able to find any documentation and/or user guide for LizardFS. Is it hidden somewhere? Or there are no docs at all? How people are getting familiar with it? o_O

@kazik208
Copy link
Contributor

There is a LizardFS whitepaper available on our site. Currently there is no other documentation but we are happy to answer any of Your questions either here or on IRC channel - #lizardfs on FreeNode.

@psarna
Copy link
Member

psarna commented Mar 16, 2015

As referred in #205, some documentation will hopefully be created at github wiki soon. For now, besides the whitepaper, you can also refer to our man pages.

@ashald
Copy link
Author

ashald commented Mar 16, 2015

Comparing to alternative solutions it's kinda frustrating... :(
Whitepaper is good for advertising LizardFS features but has no information about deployment.
Regarding Wiki pages - they are indeed exist but there are no info useful info. :(

Let's say I have a bunch of VPS and I want to create a distributed FS setup to share storage between them. Is that possible with LizardFS?

Assuming that it's possible, how can I setup LizardFS across my servers? What services I need to configure and run? How could I mount LizardFS volumes via fstab?

And how can I configure transport-layer encryption for LizardFS?

@psarna
Copy link
Member

psarna commented Mar 18, 2015

Hi Ashald,

First of all, I started working on a quick start guide: https://github.com/lizardfs/lizardfs/wiki/Quick-Start-Guide

Secondly, how do you wish to share storage between your VPS?
Master server and shadows may need a lot of memory and work best with SSD drives,
so they should probably have separate, dedicated machines.
You could install chunk servers on some of your machines and clients on all of them
and you would achieve shared storage between them.
Refer to quick start guide for how to install master, shadows, chunk servers, clients etc.

Example fstab entry (assuming mfsmaster is in /etc/hosts and default master port):
mfsmount /mnt/mfs-test fuse _netdev 0 0

@ashald
Copy link
Author

ashald commented Mar 18, 2015

@psarna great, thanks! I think it will be very useful for LizardFS to have a quick start guide!

Well my servers are run on SSD though, CPU and memory are somehow limited. On the other hand the total disks capacity not so big and I'm planing to use distributed FS to store files from several MBs up to several GBs.

I think I'll give it a try and then can post test results and comparison with xtreemfs, if you interested in it.

@psarna
Copy link
Member

psarna commented Mar 18, 2015

@ashald any feedback will be welcome:)

@onlyjob
Copy link
Member

onlyjob commented Mar 18, 2015

Comparing to alternative solutions it's kinda frustrating... :(

What alternative solutions? There is only one -- GfarmFS unless you are talking about worthless crap like Ceph or XtreemFS. RozoFS and QuantcastFS are so buggy and immature that they are not to be taken seriously and OrangeFS is not there yet.

Let's say I have a bunch of VPS and I want to create a distributed FS setup to share storage between them. Is that possible with LizardFS?

Yes.

[...] no information about deployment.

We need quick start guide -- the one which was just created is not very quick and not too accurate...

how can I setup LizardFS across my servers?
What services I need to configure and run?

You will need to choose IP for master and name it "mfsmaster" in your DNS configuration. Install master, then CGI to monitor it. Then add chunkservers one by one as many as you need.

Master does not need much RAM -- it uses only about 3GiB for 5 million files.

@onlyjob
Copy link
Member

onlyjob commented Mar 18, 2015

@psarna, thanks very much for making quick start page in Wiki.
Please re-structure it so quick start is really a quick start with minimum components like I described in previous post.
Please create a paragraph for recommended install describing optional components (i.e. shadow master, metalogger etc.). Avoid technical details in those paragraph.
Then continue with installation instructions but split it into two paragraphs for Debian and RHEL/CentOS/Fedora. Describe installation and initial configuration of services.

Remove hardware requirements (IMHO it's best described in standalone page) and avoid suggesting that SSD or 32 GiB of RAM is necessary. It is not true and may scare away our users.

Thanks.

@onlyjob
Copy link
Member

onlyjob commented Mar 18, 2015

I'm also interested to read about transport-layer encryption in LizardFS. Anyone?

@onlyjob
Copy link
Member

onlyjob commented Mar 18, 2015

@ashald remember to use man pages and common sense. Setup of LizardFS is the easiest of all what you called alternatives. ;)

@ashald
Copy link
Author

ashald commented Mar 18, 2015

What alternative solutions? There is only one -- GfarmFS

Haven't heard about it. :)

unless you are talking about worthless crap like Ceph or XtreemFS.

Yes, I'm talking about these 2. I discarded 1st because it seems that they not provide anything for transport layer security (as I said, I want to encrypt traffic between my servers). And as for the 2nd one - maybe it's not the best solution but it has quickstart guide and I was able to make it working by following that guide. IMO, that's one of the advantages of XtreemFS compared to LizardFS. Though, it can be easily fixed by adding quckstart guide for LizardFS. :) I'm about to try LizardFS and hope I will be able to make it work with that draft of quickstart guide. I will share my feedback afterwards.

@ashald
Copy link
Author

ashald commented Mar 18, 2015

I'm also interested to read about transport-layer encryption in LizardFS. Anyone?

As I said in my previous comment, it's one of my requirements for distributed FS so I'm highly interested in how LizardFS deal with it.

@ashald
Copy link
Author

ashald commented Mar 18, 2015

@ashald remember to use man pages and common sense. Setup of LizardFS is the easiest of all what you called alternatives. ;)

Yeah, will do. But I also wanted some kind of quickstart so I can try the solution before diving deep into man pages. :)

@onlyjob
Copy link
Member

onlyjob commented Mar 18, 2015

I recommend to avoid Ceph and XtreemFS for too many reasons to describe here in details but primarily due to their gross disregard to data integrity, design flaws and intolerable amount of hopeless critical bugs. Besides Ceph and XtreemFS are much slower than LizardFS.

@ashald
Copy link
Author

ashald commented Mar 18, 2015

I have great hopes for LizardFS. :)

@ashald
Copy link
Author

ashald commented Mar 19, 2015

So I was able to do a minimal setup of LizardFS and it's pretty fast, but... It seems that there are no transport layer encryption at all. :( Let's try IPSec...

@ciroiriarte
Copy link

Any comments on Gfarmfs?, seems interesting, but also the project seems
dead.

Regards,
Ciro
El mar 18, 2015 7:03 PM, "Dmitry Smirnov" notifications@github.com
escribió:

Comparing to alternative solutions it's kinda frustrating... :(

What alternative solutions? There is only one -- GfarmFS
http://datafarm.apgrid.org/ unless you are talking about worthless crap
like Ceph or XtreemFS. RozoFS and QuantcastFS are so buggy and
immature that they are not to be taken seriously and OrangeFS is not
there yet.

Let's say I have a bunch of VPS and I want to create a distributed FS
setup to share storage between them. Is that possible with LizardFS?

Yes.

[...] no information about deployment.

We need quick start guide -- the one which was just created is not very
quick and not too accurate...

how can I setup LizardFS across my servers?
What services I need to configure and run?

You will need to choose IP for master and name it "mfsmaster" in your
DNS configuration. Install master, then CGI to monitor it. Then add
chunkservers one by one as many as you need.

Master does not need much RAM -- it uses only about 3GiB for 5 million
files.


Reply to this email directly or view it on GitHub
#257 (comment).

@onlyjob
Copy link
Member

onlyjob commented Mar 19, 2015

To @ciroiriarte:

Any comments on Gfarmfs?

Great reliable system; keeps metadata in PostgreSQL; optional file-based data integrity.
Available in Debian right away although latest version is in "experimental" suite.

, seems interesting, but also the project seems dead.

Dead? You must be kidding, they just released version 2.6.1 earlier this month (March 2015).
Project is very much alive and active but little known -- just like LizardFS.
It seems that Ceph attracts most undeserved attention among parallel storage systems these days... :(

@onlyjob
Copy link
Member

onlyjob commented Mar 19, 2015

On Wed, 18 Mar 2015 17:28:10 Ashald wrote:

So I was able to do a minimal setup of LizardFS and it's pretty fast, but...
It seems that there are no transport layer encryption at all. :(

No worries, don't expect too much from a file system...
I have a feeling that Gfarm may have what you need (but I don't remember for
sure). If I had to choose between those two I'd try LizardFS first and
investigate whether it is possible to overcome limitations with VPN.

@ashald
Copy link
Author

ashald commented Mar 19, 2015

So, here is my results.

I did basic setup with 4 VPS servers - each of them contains chunk server and client. And one of them hosts master.
All VPS servers have same configuration - 2 vcores, 512 MB ram. So far everything is awesome: write speed ~30 mb/sec through 1Gbit link and read speed ~70 mb/sec through the same link (within same DC). Write from remote location through 25 mbit link is ~2 mb/sec. CPU utilization ~25% and RAM utilization ~30 mb.
For comparison, after some tuning XtreemFS had 2 mb/sec write and 1 mb / sec read while using almost all RAM and CPU.

As I need to encrypt my traffic, I figured that there are 2 possible ways to do that: IPSec or VPN. Since my VPS provider uses OpenVZ with old kernel I cannot use IPSec so I ended up using OpenVPN (just followed this guide https://www.digitalocean.com/community/tutorials/how-to-secure-traffic-between-vps-using-openvpn) and everything is awesome. I even didn't notice performance degradation.
I think it would be nice to add this info to documentation.

Though I have some questions and issues.
The question is whether it is possible to set quotas for disk space usage in mfshdd.cfg?

And I have an issue with master server. From time to time it just dies and clients are hanging up in result.
When I try to start master service I get the following message:

Restarting lizardfs-master:
working directory: //var/lib/mfs
lockfile created and locked
initializing mfsmaster modules ...
loading sessions ... ok
sessions file has been loaded
exports file has been loaded
mfstopology configuration file (//etc/mfstopology.cfg) not found - using defaults
No custom goal configuration file specified and the default file doesn't exist - using builtin defaults
init: file system manager failed: Stale lockfile exists., consider running `mfsmetarestore -a' to fix problems with your datadir. !!!
error occured during initialization - exiting

The mfsmetarestore fixes things up and I'm able to start master server after this but obviously it not the desired behavior. :) I tried to look for log files for master server but wasn't able to find any of them. Can you advice me where I can find log files for master server and how I can increase logging verbosity in order to find out the cause of this issue?

Thanks for your help!

@AmokHuginnsson
Copy link
Contributor

Can you provide more information regarding when/how/why master dies? Do you know exactly when it happens and can provide system logs from that moment? lizardfs-master logs with syslog() so if your system is some Linux with default configuration, master's logs should be found in /var/log/syslog (or any file that syslog's logs go into).

@onlyjob
Copy link
Member

onlyjob commented Mar 19, 2015

On Thu, 19 Mar 2015 00:26:33 Ashald wrote:

For comparison, after some tuning XtreemFS had 2 mb/sec write and 1 mb / sec
read while using almost all RAM and CPU.

There is no surprise that XtreemFS is slow because it is written in retarded
programming language. What is surprising is how slow it is...

As I need to encrypt my traffic, I figured that there are 2 possible ways to
do that: IPSec or VPN. Since my VPS provider uses OpenVZ with old kernel I
cannot use IPSec so I ended up using OpenVPN (just followed this guide
https://www.digitalocean.com/community/tutorials/how-to-secure-traffic-betw
een-vps-using-openvpn) and everything is awesome. I even didn't notice
performance degradation. I think it would be nice to add this info to
documentation.

Awesome feedback, thanks.

And I have an issue with master server. From time to time it just dies and
clients are hanging up in result.

I have been using LizardFS extensively since December and my Master never died
even though it was stressed thoroughly for weeks (while I was moving dozen
terabytes from Ceph) and ever since because it is constantly under heavy load.

Unless you built your own LizardFS packages I suspect this issue may be due to
mismatch between binaries and OS libraries or because packages were not built
in clean environment. On Debian I use soon-to-be-released packages with many
changes from/since original .DEB packages.

Normally I would recommend to use only official Debian packages from native
Debian repository but because they are not uploaded yet I can give you link to
my pre-release packages (if you send me your email) or you can get updated
packaging from

http://anonscm.debian.org/cgit/collab-maint/lizardfs.git

and build your own packages (if you know how).

Regards,
Dmitry


Democracy is a pathetic belief in the collective wisdom of individual
ignorance.
-- H. L. Mencken

@onlyjob
Copy link
Member

onlyjob commented Mar 19, 2015

On Thu, 19 Mar 2015 00:26:33 Ashald wrote:

The question is whether it is possible to set quotas for disk space usage in
mfshdd.cfg?

No, mfshdd.cfg is for setting data directories (i.e. HDD mount points) for
chunkservers. All space will be used as you add more data to LizardFS unless
you add more HDDs as well.

For quotas see mfsrepquota(1).

@andypl78
Copy link

Read whitepaper from lizardfs.com I do not understand until the end of the LFS is a fault-tolerant disk.
Example: 2 chunk servers in each 5 hard drives without RAID settings and LFS has set goal = 2
In such a scenario can damage one chunk server and in the case where only one of the servers chunk of damage, for example 3 hard drives and the other 2 are working properly?

@onlyjob
Copy link
Member

onlyjob commented Mar 19, 2015

On Thu, 19 Mar 2015 04:24:32 andypl78 wrote:

Read whitepaper from lizardfs.com I do not understand until the end of the
LFS is a fault-tolerant disk. Example: 2 chunk servers in each 5 hard
drives without RAID settings and LFS has set goal = 2 In such a scenario
can damage one chunk server and in the case where only one of the servers
chunk of damage, for example 3 hard drives and the other 2 are working
properly?

It is quite hard to understand what is your question.

Each chunk server is a place for one replica. All HDDs will be (more-or-less)
evenly used and IO will be balanced among chunkservers.
In the event of HDD failure missing or damaged chunks will be copied from
another chunkserver to restore goal. Remaining HDDs will be used to store
replicas. All of this work transparently and automatically. Clients usually
won't even notice HDD failure.

@ashald
Copy link
Author

ashald commented Mar 19, 2015

@AmokHuginnsson I checked syslog and don't see anything unusual... Is there is a way to increase logging verbosity for master?

@ashald
Copy link
Author

ashald commented Mar 19, 2015

Hm, master just went down one more time. It's the only 2 lines from syslog that it wrote during shutdown

Mar 19 10:32:52 my-host mfsmaster[6807]: ML(198.35.46.18) packet too long (790644820/1500000)
Mar 19 10:32:53 my-host mfsmaster[6807]: connection with ML(198.35.46.18) has been closed by peer

@marcinsulikowski
Copy link
Contributor

This looks like a HTTP request sent to matoml (9419) port. Such a request would typically begin with:

GET / HTTP/1.0

which is:

00000000  47 45 54 20 2f 20 48 54  54 50 2f 31 2e 30 0a     |GET / HTTP/1.0.|

LizardFS packets begin with two 32-bit big endian values: the first one is packet's type, the second one is packet's length. Your length is 790644820 which would be encoded as 2f204854 and this exactly matches bytes 4-8 from a standard HTTP GET request. This should not cause the server to crash -- it just closes a connection with the offending client.

@ashald
Copy link
Author

ashald commented Mar 19, 2015

@marcinsulikowski Thanks for quick reply.
What is matoml? Sounds like meta logger. Though I haven't setup it and I do not recognize the IP address in the log message.

I guess it's my fault that I haven't setup authentication and firewall rules. Will do it now and let's see what will happen.

As I said, it's the only log messages by master that I see except messages logged on startup. Is there is a way to change logging level to something like debug? Or I need to run master in foreground in order to see more detailed logs?

Just in case, I'm running Debian 7 and installing LizardFS from the provided repo (in Downloads section)
My versions are:

$ dpkg -s lizardfs-master
Package: lizardfs-master
Status: install ok installed
Priority: extra
Section: admin
Installed-Size: 1194
Maintainer: Adam Ochmanski <contact@lizardfs.org>
Architecture: amd64
Source: lizardfs
Version: 2.5.4-2
Replaces: mfs-master
Provides: mfs-master
Depends: libc6 (>= 2.2.5), libgcc1 (>= 1:4.1.1), libstdc++6 (>= 4.6), zlib1g (>= 1:1.1.4), lizardfs-common
Conflicts: mfs-master
Conffiles:
 /etc/init.d/lizardfs-master 597c57533b9714c68602f5225aca4372
 /etc/default/lizardfs-master 3b45d5226561635869f11a3e26245ab2
 /etc/mfs/mfsgoals.cfg.dist f7772bd90316e06aa89509e99e6eaaf6
 /etc/mfs/mfsexports.cfg.dist f264cd9d158c3e4da020172fcb124559
 /etc/mfs/mfstopology.cfg.dist ccec359983fd3934a86339c96b3576b6
 /etc/mfs/mfsmaster.cfg.dist ec179d68d8a3134803f95ab0761210bd
Description: LizardFS master server
 LizardFS master (metadata) server.
Homepage: http://lizardfs.org/

and

$ dpkg -s lizardfs-client
Package: lizardfs-client
Status: install ok installed
Priority: extra
Section: admin
Installed-Size: 639
Maintainer: Adam Ochmanski <contact@lizardfs.org>
Architecture: amd64
Source: lizardfs
Version: 2.5.4-2
Replaces: mfs-client
Provides: mfs-client
Depends: libc6 (>= 2.3.2), libfuse2 (>= 2.8.1), libgcc1 (>= 1:4.1.1), libstdc++6 (>= 4.6), zlib1g (>= 1:1.1.4)
Conflicts: mfs-client
Conffiles:
 /etc/mfs/mfsmount.cfg.dist 64bf072f27aa301021703edd56b12159
Description: LizardFS client
 LizardFS clients: mfsmount and mfstools.
Homepage: http://lizardfs.org/

@ashald
Copy link
Author

ashald commented Mar 19, 2015

Btw, I see mfspassword option in mfsmount.cfg but I don't such such option in master config. Can I use this in order to setup authentication within master server?

Also is it possible to protect CGI stats server with a password? Or I need to use something like nginx/apache in order to do that?

And I just tried 'lizardfs-admin -h' (from the quickstart guide) and it doesn't work.

command not found: lizardfs-admin

@psarna
Copy link
Member

psarna commented Mar 19, 2015

  1. You can set passwords for resources by mfsexports.cfg, see man mfsexports.cfg
  2. CGI server does not offer password protection
  3. My mistake, try lizardfs-probe

@ashald
Copy link
Author

ashald commented Mar 20, 2015

It turned out that there was a mistake in my configuration and some traffic was going outside of VPN.

After proper configuration iperf estimated network performance in 145 mbit/s (unfortunately I cannot change a lot of settings imposed by VPS provider) and LizardFS r/w performance is 10-12 mb/s which I consider as a good (taking into account how limited resources are). Though I think IPSec could give a better performance than VPN.

After proper VPN setup I isolated my LizardFS from outside world and from that strange host that was crashing my master. Now master is solid stable.

In general I'm happy with LizardFS but I wish there was better documentation.

The only issue that I have with LizardFS is that it seems I need to mount meta in order to be able to clear trash (I expected some tool from lizardfs-adm). Also I expected that upon running off empty space trash will be auto-cleaned (starting from older records) but instead I got a segmentation fault on client. Though I wasn't able to reproduce that.

Thanks for your help!

@ashald
Copy link
Author

ashald commented Mar 21, 2015

Ah sorry, one more question. Assuming that I have some user and corresponding group with the same uid/gid on all machines that have LizardFS mounter (and on master) is it possible to change owner of all files in LizardFS to this user/group? I'm using this user with its group in mapall option in mfs export but I also want to change ownership of all files in LizardFS to this user/group.

@psarna
Copy link
Member

psarna commented Mar 25, 2015

You can change owner of LizardFS files just like in any other file system, just run

chown -R username mountpoint

@psarna
Copy link
Member

psarna commented Mar 25, 2015

Also, this issue is oversized, so I'm going to close it. If you have futher questions, please start a new one.

@psarna psarna closed this as completed Mar 25, 2015
@viktor-zhuromskyy
Copy link

Most of MooseFS Documentation is a good fit!

@Zorlin
Copy link

Zorlin commented May 14, 2015

I'm using tinc to provide an encrypted VPN but I think you're talking about
something else.
On 19/03/2015 6:11 am, "Dmitry Smirnov" notifications@github.com wrote:

I'm also interested to read about transport-layer encryption in LizardFS.
Anyone?


Reply to this email directly or view it on GitHub
#257 (comment).

@psarna
Copy link
Member

psarna commented Oct 15, 2015

@onlyjob
Copy link
Member

onlyjob commented Oct 15, 2015

Very nice but it would be even better to have free documentation with license and source for .PDF file. Why not wiki format?

Also I'm concerned about installation instructions for Debian -- they are in conflict with the official Debian packages that are available from "testing" and "jessie-backports" suites. Once package is in Debian, the official Debian repositories becomes authoritative hence there is no need for vendor-provided packages any more (or duplication of effort and deviations are inevitable). Please coordinate with Debian package maintainer (yours truly, right here at your service). ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants