New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python-based graphing tool #10

Merged
merged 17 commits into from Sep 26, 2015

Conversation

Projects
None yet
2 participants
@anarcat
Copy link

anarcat commented Sep 26, 2015

this tool will allow graphing lots of data while fixing some of the issues with the current gnuplot-based grapher, most notably the scale of the X axis.

eventually, this can be enhanced to perform linear regressions that will allow guessing when the battery will need to be replaced, but so far i have focused only in replacing the existing model.

please test with your dataset to see if it scales as well..

anarcat added some commits Sep 26, 2015

argument parsing in graphing script
this allows us to test other files easily, or output to arbitrary images
add fake logfile generation script
this can be used to benchmark the various graphing tools i built

right now, the gnuplot one outperforms the python one by a about an
order of magnitude (1.75s vs 11.67s on 100k fake entries), so it shows that the python script has some performance issues

@anarcat anarcat force-pushed the anarcat:pygraph branch from a8d9526 to dffb09c Sep 26, 2015

@anarcat

This comment has been minimized.

Copy link

anarcat commented Sep 26, 2015

so, as the last commit shows, the new script is actually around 8 times slower than the gnuplot one. not sure why, i guess i'll need to profile this. but at least now we have a generation script as well. :)

petterreinholdtsen added a commit that referenced this pull request Sep 26, 2015

Merge pull request #10 from anarcat/pygraph
Add python-based graphing tool and data set generator.

@petterreinholdtsen petterreinholdtsen merged commit 723e6a4 into petterreinholdtsen:master Sep 26, 2015

@petterreinholdtsen

This comment has been minimized.

Copy link
Owner

petterreinholdtsen commented Sep 26, 2015

Tested, and noticed a new dependency on python-matplotlib to get it running. Mentioning it here to increase the chance of us remembering.

@petterreinholdtsen

This comment has been minimized.

Copy link
Owner

petterreinholdtsen commented Sep 26, 2015

Is it possible to make the graph lines thinner and adjust the grey color to a less dark grey?

@anarcat

This comment has been minimized.

Copy link

anarcat commented Sep 26, 2015

sure, let me check...

@anarcat

This comment has been minimized.

Copy link

anarcat commented Sep 26, 2015

i am still analysing the performance, but basically, it looks like the date parsing takes 3 seconds in Python, then there is the overhead of looping which takes a full second just to iterate over all the timestamps. this would be hard to workaround - unless, of course, matplotlib can parse only the dates it shows in the label, something i am not sure of.

then drawing the PNG takes another 3 seconds, so that's another hard limit. parsing the CSV is ~2s, which about covers the 9s the script takes.. so i am not sure there's much more optimising i can realistically do here. 9s seems like a fair delay, considering this represents a few years of data being crunched, a worst-case scenario...

i was running my tests with:

python -m cProfile -s time ./battery-status-graph.py log /dev/null > out; less out
python -m cProfile -s cumulative ./battery-status-graph.py log /dev/null > out; less out
@anarcat

This comment has been minimized.

Copy link

anarcat commented Sep 26, 2015

hmm... i pushed two more commits here for profiling and the depends, not sure why they don't show up...

@anarcat

This comment has been minimized.

Copy link

anarcat commented Sep 27, 2015

for future reference, here are the two profiling runs i did, before and after the csv importer rewrite:

[1052]anarcat@angela:battery-status$ head out-parse_csv_np
         3072466 function calls (3067633 primitive calls) in 15.228 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.007    0.007   15.235   15.235 battery-status-graph.py:3(<module>)
        1    0.014    0.014   14.471   14.471 battery-status-graph.py:46(plot)
        1    0.072    0.072    8.251    8.251 battery-status-graph.py:21(parse_csv_np)
        1    3.236    3.236    8.179    8.179 npyio.py:1172(genfromtxt)
  1000010    3.747    0.000    3.747    0.000 _iotools.py:655(_loose_call)
[1053]anarcat@angela:battery-status$ head out-parse_csv
         2171997 function calls (2167184 primitive calls) in 9.896 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006    9.902    9.902 battery-status-graph.py:3(<module>)
        1    0.036    0.036    9.138    9.138 battery-status-graph.py:55(plot)
        1    0.567    0.567    3.008    3.008 battery-status-graph.py:35(parse_csv)
        1    0.000    0.000    2.907    2.907 pyplot.py:574(savefig)
        3    0.000    0.000    2.596    0.865 backend_agg.py:458(draw)

The details of the above breakdown:

         2171998 function calls (2167185 primitive calls) in 16.451 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   300083    3.322    0.000    4.255    0.000 dates.py:192(_to_ordinalf)
       39    3.302    0.085    3.305    0.085 {built-in method draw_path}
   100002    1.827    0.000    2.421    0.000 csv.py:104(next)
       14    1.026    0.073    5.371    0.384 function_base.py:1628(_vectorize_call)
   601369    0.729    0.000    0.729    0.000 {hasattr}
        1    0.721    0.721    3.572    3.572 battery-status-graph.py:35(parse_csv)
     3404    0.537    0.000    0.537    0.000 {numpy.core.multiarray.array}
        2    0.305    0.153    0.305    0.153 {built-in method write_png}
   100027    0.278    0.000    0.278    0.000 {zip}
   200003    0.257    0.000    0.257    0.000 csv.py:86(fieldnames)
   300110    0.207    0.000    0.207    0.000 {method 'toordinal' of 'datetime.date' objects}
       74    0.181    0.002    0.181    0.002 {built-in method call}
      264    0.140    0.001    0.140    0.001 {method 'set_text' of 'FT2Font' objects}
        1    0.133    0.133    0.133    0.133 {_tkinter.create}
     2018    0.108    0.000    0.108    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        6    0.094    0.016    0.094    0.016 {built-in method update_path_extents}
     2424    0.083    0.000    0.107    0.000 weakref.py:47(__init__)
221074/220745    0.070    0.000    0.071    0.000 {len}
   110909    0.069    0.000    0.069    0.000 {method 'append' of 'list' objects}
       69    0.067    0.001    0.067    0.001 {built-in method draw_text_image}
     2422    0.066    0.000    0.173    0.000 transforms.py:85(__init__)
@anarcat

This comment has been minimized.

Copy link

anarcat commented Sep 28, 2015

as a complement, here's an example of the graph with 1000k entries with gnuplot vs pygraph:

gnuplot

pygraph

gnuplot is still significantly faster (1.63s vs 7.62s) here, but, as mentioned elsewhere, about 3 seconds of that is spend drawing the PNG graph, something we can't work around. plus the graph is prettier and has a more readable X axis. i don't quite understand how gnuplot can read that csv file so fast (reading the CSV in Python is slower than the whole gnuplot run)... maybe there's some stuff running in parallel?

but anyways, gnuplot doesn't provide us with expiration time (#12) so i think pygraph still wins. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment