New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase Performance restic-0.9.1-2 #1976

Closed
ifelsefi opened this Issue Aug 30, 2018 · 28 comments

Comments

Projects
None yet
6 participants
@ifelsefi
Copy link

ifelsefi commented Aug 30, 2018

Hi

I am running restic to a local repository backed by RAID6. We are backing up about 150TB of data from GPFS. Mixture of and small files. The network connection's FDR though we have some blocking so more like 30Gbps from restic client to the GPFS NSDs.

Looking on the client I am only seeing around 100MB/s being received from NSDs.

Any way to improve performance over the wire?

IB test between NSD and backup client:

---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF          Device         : mlx4_0
 Number of qps   : 1            Transport type : IB
 Connection type : UD           Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 2048[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x7d QPN 0x9679 PSN 0x6d91b6
 remote address: LID 0x15 QPN 0x0254 PSN 0xe3be9f
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2048       1000             39.12              0.51               0.031373
---------------------------------------------------------------------------------------

Local Restic repo:

/dev/mapper/VG_dat01-LV_amhold /restic xfs rw,relatime,attr2,inode64,noquota 0 0

/dev/mapper/VG_dat01-LV_amhold  236T   73T  163T  31% /restic

[root@server restic]# dd bs=1M count=4096 if=/dev/zero of=test conv=fdatasync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 18.5111 s, 332 MB/s

At other times I see very bad performance such as sub 10MB/s.

strace shows files being read on GPFS then written to local storage.

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Aug 30, 2018

Since you didn't fill out the issue template I need to ask: which version of restic are you using?

I'm not sure what GPFS and NSDs are, but I'll try to answer anyway :)

Within restic, there are several constants which may limit throughput in this case, the following ones come to my mind:

@fd0 fd0 added the question label Aug 30, 2018

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Aug 30, 2018

Sorry, I put it in the title.

0.9.2-1

GPFS is a parallel filesystem that's been around over 9000 months. Developed at IBM. Used for HPC. NSDs are the physical servers that clients connect to for I/O operations. We have pretty fast connection to them so unsure what's happening.

Current restic shows 3.13% complete with an ETA of 150 hours. We are at 524070 files which is 3.8 TB out of 8142422 files or 121 TB.

At the same time only seing KB/s network traffic.

I will try to change these settings. GPFS designed for many thousands of files to be read in parallel!

screenshot from 2018-08-30 14 26 47
screenshot from 2018-08-30 14 27 07

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Aug 30, 2018

@fd0

I need to change these settings in source then recompile?

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Aug 30, 2018

OK @fd0

I upped the number of concurrent files to 64

GO IS SO FAST!

Let's see if ETA now < 150 hours.

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Aug 30, 2018

Yes, you need to recompile restic to change the constants in the source :)

At the moment, restic isn't optimized for reading many files in parallel, because that isn't helpful for much users.

I have a different idea that may explain what you're seeing: are maybe the files you're saving sparse or filled mostly with zeroes or the same data over and over again? restic cannot detect that and reads a lot of zeroes, which it happily deduplicates, so you don't see anything written to the backend...

With a local storage server via sftp, I'm getting around ~45MiB/s outgoing data traffic for a normal backup.

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Aug 31, 2018

Hi @fd0

Got the go run restic.go down!

So with GPFS:

[Client w local RAID6 XFS repository ] --------- [ very fast low-latency network ] --------- [ GPFS Cluster ]

I have seen Restic well over 100MB/s using sftp so hoping we can get more in this case. In that scenario those clients would have local disk as bottleneck since they're typically single or R1 7.2 - 15K SAS/SATA though our GPFS cluster has 400 disks with 10 R6 LUNs around 20GB/s in aggregate. On a good day we can see 1-2GB/s async sequential writes with reads being comparable. I am likely not a typical user so have no problem being a guinea pig if you have some ideas that want to test.

We do not have many sparse files. We are backing up a lot of large genomic files which are several GB to tens of gigabytes in size. It's possible the deduplication's not writing much data given the difference in file data might not be that much. For example when I strace restic I see a lot of open file read descriptors on the GPFS file system as well write on the local filesystem that provides repository. Though not much activity going over the network where my data actually exists.

Any way to get more logging which shows this type of activity? I see verbose says if a file's new though can it record how much data changed in the file and thus how much was saved to repo?

Any way to make restic print the backup-completion ETA without printing each file that's being backed up? Like progress report lite? When I up the concurrency really high the per-file output makes the ETA hard to read.

Can restic print top throughput in MB's with maybe current vs peak?

Thank you!

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Sep 14, 2018

Any idea what could be causing these errors:

error: NodeFromFileInfo: Listxattr: xattr.list /athena/hwlab/store/user : cannot allocate memory

Fatal: unable to save snapshot: NodeFromFileInfo: Listxattr: xattr.list /athena/hwlab : cannot allocate memory```

When backups were running server still had around 50% of disk cache available.
@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Oct 3, 2018

Hi

Do you have an idea on how to fix the error cannot allocate memory

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 3, 2018

Install more memory? I'm not entirely sure why it occurs while reading the extended attributes, but the error means that restic needs more RAM than you have available at the moment. In general, restic's memory usage is known issue (and we still need to track it down...)

@rawtaz

This comment has been minimized.

Copy link
Contributor

rawtaz commented Oct 3, 2018

@ifelsefi You can always try giving it the environment GOGC=20 to see if that helps.

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Oct 3, 2018

OK, I did export GOGC=20. Trying another run.

The server has 64GB of RAM. During the backup half was being used by disk cache so restic wasn't limited by physical RAM present on the server.

@dionorgua

This comment has been minimized.

Copy link

dionorgua commented Oct 4, 2018

@ifelsefi what is your repository size? Also are you sure that you downloaded 64bit version of restic?
From my experience on ~1TB repo restic consumes up to 10GB of RAM (for prune, without GOGC=20). Too much, but definitely works on machine with 16GB.

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Oct 4, 2018

@dionorgua

Yes, 64 bit CentOS 7.

[restic@adleman ~]$ rpm -qa | grep restic
restic-0.9.2-1.el7.x86_64

I ran another backup with GOGC=20 though still getting cannot allocate memory on some files.

The server has 64GB of memory whereas sar, a Linux utility that reports system resource utilization, shows that around 20GB of memory remained free as disk cache during backup.

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 4, 2018

@ifelsefi just to confirm, you're running restic 0.9.2 on Linux/amd64 on CentOS 7, right? The error message comes from https://github.com/pkg/xattr, a Go library we're using to read the extended attributes. I've opened an issue there: pkg/xattr#35

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Oct 4, 2018

Hi @fd0 correct, CentOS 7.5 with that architecture:

[root@adleman adleman.pbtech]# uname -a 
Linux adleman.pbtech 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

The repo is 90TB:

 [root@adleman adleman.pbtech]# du -sh /restic/scu-backups/adleman.pbtech
90T   

Thanks for opening the issue!

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 4, 2018

@ifelsefi can you retry with one of the beta builds, e.g. https://beta.restic.net/restic-v0.9.2-129-gb1e1b71b/

It uses the xattr library in version 0.3.1, which is the latest.

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Oct 5, 2018

Thanks @fd0 running with restic_v0.9.2-129-gb1e1b71b_linux_amd64 now

@fd0 fd0 added bug needs triage and removed question labels Oct 5, 2018

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 8, 2018

any interesting result so far?

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Oct 12, 2018

I am still getting the error:

[restic@adleman ~]$ crontab -l
#Ansible: restic
0 2 * * * GOGC=20 bash -l /usr/local/sbin/run_restic

Running:

sudo /opt/restic_v0.9.2-129-gb1e1b71b_linux_amd64

Example errors:

error: NodeFromFileInfo: Listxattr: xattr.list /athena/accardilab/store/maf2080/Ca_all/Class3D/job029 : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/accardilab/store/maf2080/Ca_all/Class3D/job017/run_it007_class012_angdist.bild : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/accardilab/store/maf2080/Ca_all/Class3D/job017/run_it007_data.star : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/accardilab/store/maf2080/Ca_all/Class3D/job017/run_it007_model.star : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/accardilab/store/maf2080/Ca_all/Class3D/job017/run_it007_optimiser.star : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/accardilab/store/maf2080/Ca_all/Class3D/job017/run_it007_sampling.star : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/accardilab/store/maf2080/Ca_all/Class3D/job017/run_it008_class001.mrc : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/accardilab/store/maf2080/Ca_all/Class3D/job017/run_it008_class001_angdist.bild : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/accardilab/store/maf2080/Ca_all/Class3D/job017/run_it008_class002.mrc : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/accardilab/store/maf2080/Ca_all/Class3D/job017/run_it008_class002_angdist.bild : cannot allocate memory

File isn't that big:

[root@adleman ~]# ls -tlrh /athena/accardilab/store/maf2080/Ca_all/Class3D/job017/run_it007_data.star
-rw-r--r-- 1 maf2080 student 162M Oct  5 12:13 /athena/accardilab/store/maf2080/Ca_all/Class3D/job017/run_it007_data.star

I do not see any point where the server's out of memory. Summing kbmemfree and kbcached lets us know how much free.

[root@adleman ~]# sar -r
Linux 3.10.0-862.11.6.el7.x86_64 (adleman.pbtech)       10/12/2018      _x86_64_        (24 CPU)

12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
12:10:01 AM   1354828  47923648     97.25         0  27283960  29866528     55.85  10886512  29457896        32
12:20:01 AM   1457888  47820588     97.04         0  27251172  29849832     55.82  10810788  29427268        28
12:30:01 AM   1443540  47834936     97.07         0  27212464  29850024     55.82  10843504  29410348        24
12:40:01 AM   1860956  47417520     96.22         0  27212684  29850248     55.82  10456576  29381052        32
12:50:01 AM   1843108  47435368     96.26         0  27212892  29850436     55.82  10475124  29381688        32
01:00:01 AM   1851212  47427264     96.24         0  27213100  29850344     55.82  10461320  29381588        36
01:10:01 AM   1857872  47420604     96.23         0  27213320  29867872     55.86  10427612  29410732        12
01:20:01 AM   1720264  47558212     96.51         0  27286572  29850308     55.82  10558148  29411012        12
01:30:01 AM   1707324  47571152     96.54         0  27286788  29850420     55.82  10567360  29416680        28
01:40:01 AM   1780248  47498228     96.39         0  27287008  29850208     55.82  10494656  29417560        32
01:50:02 AM   1782628  47495848     96.38         0  27287208  29850252     55.82  10489760  29417472        16
02:00:01 AM   1746048  47532428     96.46         0  27261020  29855216     55.83  10509776  29437132        32
02:10:01 AM    413408  48865068     99.16         0   6035320  51149612     95.66  31378644   8518128        24
02:20:01 AM   9195108  40083368     81.34         0   6062512  51141660     95.64  22552104   8544564        28
02:30:01 AM   8587936  40690540     82.57         0   6065556  51136912     95.63  23152796   8546760       472
02:40:01 AM   8548596  40729880     82.65         0   6069172  51136616     95.63  23202528   8546828        20
02:50:01 AM   8812736  40465740     82.12         0   6069568  51136840     95.63  22942696   8546760        88
03:00:01 AM   8832716  40445760     82.08         0   6070544  51136868     95.63  22920228   8546852        44
03:10:01 AM   8966896  40311580     81.80         0   6071976  51137304     95.63  22777824   8547960        20
03:20:01 AM   8717540  40560936     82.31         0   6073112  51153076     95.66  23033184   8548072        52
03:30:01 AM   8990748  40287728     81.76         0   6075132  51152464     95.66  22761040   8548148        60
03:40:01 AM   9130956  40147520     81.47         0   6075456  51154792     95.67  22623256   8548108         4
03:50:01 AM   8376896  40901580     83.00         0   6082364  51154800     95.67  23375968   8548860       168
04:00:01 AM   8456896  40821580     82.84         0   6092384  51147672     95.65  23293100   8550692        20
04:10:01 AM   8030300  41248176     83.70         0   6109484  51165256     95.68  23705872   8553536        32
04:20:01 AM   8346220  40932256     83.06         0   6128664  51147856     95.65  23394656   8555868        28
04:30:01 AM   8495200  40783276     82.76         0   6144408  51148076     95.65  23244052   8557976        60
04:40:02 AM   9473896  39804580     80.77         0   6149184  51147888     95.65  22267284   8558100        28
04:50:01 AM   8839028  40439448     82.06         0   6149696  51147724     95.65  22898164   8558112        32
05:00:01 AM   9098024  40180452     81.54         0   6149940  51147920     95.65  22648292   8558068        24
05:10:01 AM   8538692  40739784     82.67         0   6150196  51165468     95.69  23203864   8558072        16
05:20:01 AM   8757040  40521436     82.23         0   6150436  51148244     95.65  22982988   8558096        20
05:30:02 AM   8766864  40511612     82.21         0   6150684  51148188     95.65  22971188   8558080        44
05:40:01 AM   8953168  40325308     81.83         0   6150964  51148144     95.65  22781092   8558032        20
05:50:01 AM  15590872  33687604     68.36         0  12531856  27396916     51.24  12234140  12347348        20
06:00:01 AM  14843876  34434600     69.88         0  12532116  27392160     51.23  12980344  12347356        36
06:10:01 AM  15585556  33692920     68.37         0  12532416  27391740     51.23  12235188  12347344        28
06:20:01 AM  15523924  33754552     68.50         0  12534400  27392300     51.23  12296872  12347464       204
06:30:01 AM  15451108  33827368     68.65         0  12537092  27392176     51.23  12363464  12347600        32
06:40:01 AM  15450452  33828024     68.65         0  12537500  27391728     51.23  12363424  12347572        36
06:50:01 AM  15564040  33714436     68.42         0  12537768  27392160     51.23  12252384  12347584        44
07:00:01 AM    202768  49075708     99.59         0  21264952  29263704     54.73  17444324  22790924     41344
07:10:01 AM    213596  49064880     99.57         0  19261428  32681392     61.12  20871332  20955732     47692
07:20:02 AM    182724  49095752     99.63         0  19286036  32661772     61.08  20681568  21184988     65476
07:30:01 AM    207068  49071408     99.58         0  19300520  32662372     61.08  20715424  21187848    122336
07:40:01 AM    252540  49025936     99.49         0  21928876  32661856     61.08  17855928  23925512     33436
07:50:01 AM    193460  49085016     99.61         0  21925272  32661964     61.08  18043312  23814104    125908
08:00:01 AM    226432  49052044     99.54         0  21621760  32661964     61.08  18182456  23641720     34104
08:10:01 AM    184848  49093628     99.62         0  21595860  32679464     61.11  18236132  23632256     36680
08:20:01 AM    212812  49065664     99.57         0  21590672  32662084     61.08  18152380  23698720    102744
08:30:01 AM    221364  49057112     99.55         0  21292772  32661856     61.08  18346888  23494036     26940
08:40:01 AM    207708  49070768     99.58         0  21306144  32662064     61.08  18356476  23497104     49992
08:50:01 AM    203776  49074700     99.59         0  21343156  32661788     61.08  18317316  23557248     46636
09:00:02 AM    200980  49077496     99.59         0  21322764  32662140     61.08  18360300  23504392     97496
09:10:01 AM    199184  49079292     99.60         0  21654260  32661716     61.08  17996112  23863728     51296
09:20:02 AM    180980  49097496     99.63         0  21672108  32661988     61.08  18001792  23884476     55552
Average:      5639908  43638568     88.56         0  15453476  38603792     72.19  17988920  17395199     16780
@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 12, 2018

Ok, next step: backup a single file with restic so that the error occurs while running restic via strace -f -y -e listxattr,getxattr restic backup [...].

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Oct 15, 2018

Here's some results:

[pid  4640] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_genes_quot_pValAdj_high_resolution.png", 0x1049ef0, 0) = -1 ENO
MEM (Cannot allocate memory)
[pid  4327] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_genes_quot_pVal_high_resolution.png", 0x1049ef0, 0) = -1 ENOMEM
 (Cannot allocate memory)
[pid  4341] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_genes_quot_quotSig.png",  <unfinished ...>
[pid  4327] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_genes_quot_quotSig_high_resolution.png", 0x1049ef0, 0) = -1 ENOMEM (Cannot allocate memory)
[pid  4328] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_promoters_diff_pVal.png", 0x1049ef0, 0) = -1 ENOMEM (Cannot allocate memory)
[pid  4341] <... listxattr resumed> 0x1049ef0, 0) = -1 ENOMEM (Cannot allocate memory)
[pid  4341] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_promoters_diff_pValAdj.png", 0x1049ef0, 0) = -1 ENOMEM (Cannot allocate memory)
[pid  4341] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_promoters_diff_pValAdj_high_resolution.png", 0x1049ef0, 0) = -1 ENOMEM (Cannot allocate memory)
[pid  4328] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_promoters_diff_pVal_high_resolution.png",  <unfinished ...>
[pid  4327] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_promoters_diff_quotSig.png",  <unfinished ...>
[pid  4328] <... listxattr resumed> 0x1049ef0, 0) = 0
[pid  4327] <... listxattr resumed> 0x1049ef0, 0) = 0
[pid  4326] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_promoters_diff_quotSig_high_resolution.png", 0x1049ef0, 0) = 0
[pid  4341] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_promoters_quot_pVal.png", 0x1049ef0, 0) = 0
[pid  4334] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_promoters_quot_pValAdj.png", 0x1049ef0, 0) = 0
[pid  4326] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_promoters_quot_pValAdj_high_resolution.png", 0x1049ef0, 0) = 0
[pid  4640] listxattr("/athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_promoters_quot_pVal_high_resolution.png", 0x1049ef0, 0) = 0

Also:

error: NodeFromFileInfo: Listxattr: xattr.list /athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp1_tiling_quot_pVal.png : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp1_tiling_quot_pValAdj.png : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp1_tiling_quot_pValAdj_high_resolution.png : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp1_tiling_quot_pVal_high_resolution.png : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp1_tiling_quot_quotSig.png : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp1_tiling_quot_quotSig_high_resolution.png : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_cpgislands_diff_pVal.png : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_cpgislands_diff_pValAdj.png : cannot allocate memory
error: NodeFromFileInfo: Listxattr: xattr.list /athena/abc/store/luce/abc-projects/gudas_errbs2015/results/RnBeads/RnBeads_report_pairwise/differential_methylation_images/diffMeth_region_volcano_cmp2_cpgislands_diff_pValAdj_high_resolution.png : cannot allocate memory

These png files are only a few hundred KB.

Ran this backup at 9:30AM today. The server still had around 4GB physical RAM free:

09:20:01 AM    182480  49095996     99.63         0  23340220  30077320     56.25  16163896  25659076     47344
09:30:01 AM    205876  49072600     99.58         0  26136432  30079476     56.25  13093192  28508236      9572
09:40:01 AM    554900  48723576     98.87         0   3862672  52717672     98.59  34875524   6653512     13604
09:50:01 AM  10126904  39151572     79.45         0  16947532  30093220     56.28  11936892  19692864     21060
10:00:01 AM    186160  49092316     99.62         0  23996000  30093132     56.28  14778536  26886412     73868
10:10:01 AM    513288  48765188     98.96         0  22068804  30097696     56.29  16397544  24993856        32
Average:      4219737  45058739     91.44         0  21345367  25526927     47.74  14261871  23051806     39589

There's also swap on the server which was not being used.

@ifelsefi ifelsefi closed this Oct 15, 2018

@ifelsefi ifelsefi reopened this Oct 15, 2018

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 15, 2018

It looks like the kernel and/or the file system you're using returns an odd error code. Can you try running getfacl <file>? It's not exactly the same syscall though.

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Oct 17, 2018

Sure, though I am running the backups using "sudo restic backup" so ACLs shouldn't be an issue.

[root@adleman ~]# getfacl /athena/abc/store/luce/abc-projects/gudas_errbs2015/
getfacl: Removing leading '/' from absolute path names
# file: athena/abc/store/luce/abc-projects/gudas_errbs2015/
# owner: luce
# group: abc
user::rwx
group::r-x
other::r-x
@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Oct 17, 2018

Hm, odd. What about strace -f -y -e listxattr,getxattr <file> on a file for which an error was reported?

@bkmeneguello

This comment has been minimized.

Copy link

bkmeneguello commented Nov 8, 2018

@fd0, I did some tests with the FileReadConcurrency constant and noticed it can improve the upload rate when the endpoint limit each TCP connection individually (like google drive seems to do).
I suggest exposing this to the user to improve this edge cases.

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Dec 11, 2018

Hi

I am no longer seeing memory allocation issues.

However it's taking several days to remove old snapshots as shown below:

screenshot from 2018-12-11 08-15-20

Any way to speed up?

@askielboe

This comment has been minimized.

Copy link
Contributor

askielboe commented Dec 12, 2018

You can take a look at the prune threshold pull request: #1994 - that should speed up the prune process significantly depending on your settings.

@ifelsefi

This comment has been minimized.

Copy link

ifelsefi commented Jan 11, 2019

Thanks everyone

Closing the issue

@ifelsefi ifelsefi closed this Jan 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment