Reduce mem usage #43

hawson · 2015-09-12T04:43:05Z

This pull aims to reduce the RAM usage needed by pihole when parsing new/updated block lists.

The files are downloaded locally and then operated on, instead of being stored as (very large) variables in the shell script. Some simplistic tests (query /proc//status for RSS size) show that the code in master uses ~345,000KB when running (that's the high water mark). When processing locally, RAM usage in this case is somewhere around 8,600KB

It's important to note that the files are downloaded only if the upstream copies are newer, which should address some of the concerns mentioned in #37. That said, if SD card IO is really that much of a concern, there should be full dependency processing to eliminate unnecessary writes.

Storing the output from 'curl' commands directly as shell variables is very inefficent, and requires much more RAM gravity.sh any time there is an update to the block lists (and especially on the first run). Store the raw blocklists in a temporary file on disk, and process those.

Remove extraneous calls to several programs (cat, uniq).

jacobsalmela · 2015-09-17T01:47:27Z

Initially, we switched to putting them in a variable (data) to prevent writing to the SD card. Later, the feature was added to only download the lists if there were changes. I have personally never had a failed SD card, but I have read that it happens to people often enough.

What do you mean by dependency processing?

hawson · 2015-09-17T02:13:24Z

By "dependency processing", I mean something like a makefile: you have outputs that require various actions and other dependencies.

Storing the data all in RAM (in bash variables), thus forcing a host to use additional swapfiles, in order to save writes to an SD card seems backwards.

korhadris · 2015-09-18T06:59:34Z

I think some of your memory usage stats may be low. The sorting step alone of supernova uses 134168 kB on my system. That said, I agree we should have a goal to get away from storing data in memory.

Here are my stats on sorting using time (Not the bash built-in time, the GNU command time, which may need to be installed:

command time -v sort -u pihole.2.supernova.txt > /dev/null
        Command being timed: "sort -u pihole.2.supernova.txt"
        User time (seconds): 17.21
        System time (seconds): 0.47
        Percent of CPU this job got: 308%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.72
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 134168
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 33727
        Voluntary context switches: 1854
        Involuntary context switches: 555
        Swaps: 0
        File system inputs: 62520
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

jacobsalmela · 2015-09-20T23:56:10Z

I haven't forgotten about this. Just waiting for some more time that I have to test it. It looks pretty good though from my initial scan.

jacobsalmela · 2015-09-26T14:44:18Z

Your commits are on the right track, but there are some issues:

Lines 47-70 can be removed since we won't be storing the variables in memory any more
On line 103, I would prefer to have the variables named to fit the theme of the script (sci-fi, Star Trek-esque). I think patternBuffer would be the ideal variable name.
On line 104, again, a themed variable name such as temporalDistortion or tachyonEmissions
On lines 108-109, you display the entire command that is running. I prefer to keep the echos simple and clean to reduce on-screen clutter
Lines 143-144 store the filename in the variable, so it displays as ** 1679450 /etc/pihole/pihole.1.andLight.txt domains being pulled in by gravity... instead of just ** 1679450 domains being pulled in by gravity...
Same thing for 152-153
I'm far from a sed expert, so could you please explain how line 119 works?

hawson · 2015-09-26T16:58:51Z

The patch aims to have minimal changes, and removing swap support is outside the scope of this.
I can update the names, but I will note that the...unusual names (even for someone versed in Star Trek Technobabble)...made coding slower than usual.
ditto
Helps with debugging (short of running with -x, which is entirely too verbose. Easy enough to comment out.
Okay
ditto
Sure:

-n means "don't print by default" (Perl's -n option is similar).
-r means use extedned regexes, instead of the very limited set normally supported
Then there are two different statements, each following the two -e options.
-- The first statement finds 2 or more literal "." characters, and replaces them with just one. I found several cases where there are multiples periods in a row, and this fixes that problem.
-- The second looks for lines with a literal "." character, and prints it (this is needed because we "-n" earlier on to suppress printing, and also handles blank lines.

jacobsalmela · 2015-10-10T17:56:34Z

Did you want to make the changes?

jacobsalmela · 2015-10-16T17:27:14Z

I have some time this weekend if you wanted to make the changes.

dschaper · 2015-11-06T01:24:58Z

I'd be happy to make a PR with the changes you requested if this issue is still open for consideration.

jacobsalmela · 2015-11-06T02:07:40Z

Yes, it is. Thanks!

dschaper · 2015-11-06T02:37:40Z

Okay, PR #68 opened with @jacobsalmela changes for @hawson memory reductions.

jacobsalmela · 2015-11-06T12:13:50Z

Cool. I'll close this one then. Thanks!

hawson added 5 commits September 11, 2015 23:26

Simplify (and speed up slightly) awk/sed domain name extraction

1e7f843

Check if files are readible, not just present

a5f2305

Remove duplicate -s in curl command

e1b9ed4

Simplfy gravity_advanced.

2d92a03

Remove extraneous calls to several programs (cat, uniq).

This was referenced Sep 18, 2015

Allow for local settings to disable swap #42

Merged

Getting adblock.mahakala.is list... Killed #46

Closed

jacobsalmela added the Enhancement label Oct 14, 2015

jacobsalmela closed this Nov 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce mem usage #43

Reduce mem usage #43

hawson commented Sep 12, 2015

jacobsalmela commented Sep 17, 2015

hawson commented Sep 17, 2015

korhadris commented Sep 18, 2015

jacobsalmela commented Sep 20, 2015

jacobsalmela commented Sep 26, 2015

hawson commented Sep 26, 2015

jacobsalmela commented Oct 10, 2015

jacobsalmela commented Oct 16, 2015

dschaper commented Nov 6, 2015

jacobsalmela commented Nov 6, 2015

dschaper commented Nov 6, 2015

jacobsalmela commented Nov 6, 2015

Reduce mem usage #43

Reduce mem usage #43

Conversation

hawson commented Sep 12, 2015

jacobsalmela commented Sep 17, 2015

hawson commented Sep 17, 2015

korhadris commented Sep 18, 2015

jacobsalmela commented Sep 20, 2015

jacobsalmela commented Sep 26, 2015

hawson commented Sep 26, 2015

jacobsalmela commented Oct 10, 2015

jacobsalmela commented Oct 16, 2015

dschaper commented Nov 6, 2015

jacobsalmela commented Nov 6, 2015

dschaper commented Nov 6, 2015

jacobsalmela commented Nov 6, 2015