Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database performance issues #2714

Closed
melroy89 opened this issue Oct 10, 2019 · 22 comments
Closed

Database performance issues #2714

melroy89 opened this issue Oct 10, 2019 · 22 comments
Labels
stale No replies or comments. Will be auto-closed in 14 days.

Comments

@melroy89
Copy link
Contributor

melroy89 commented Oct 10, 2019

Bug description
I am running Firefly III version 4.8.1.4, and my problem is:

During import I get a time-out (error: Gateway Timeout).

Steps to reproduce

  1. Downloading a "huge" CSV file from the bank - In my case I want to import for the first time: 2366 lines.
  2. Import it - I'm using the SNS config file

I get the error message. At the same time, Firefly did import some records/transactions. Currently Firefly shows me my balance of -245,68 euros which is incorrect, I have a positive balance on my bank account.

How to increase time-out? If that is the solution...

Extra info

Debug information generated at 2019-10-10 15:05:38 Europe/Amsterdam for Firefly III version 4.8.1.4.

Variable Content
FF version 4.8.1.4
FF API version 0.10.3
App environment local
App debug mode false
App cache driver file
App logging , daily
PHP version 7.3.9
Display errors Off
Session start 2019-10-01 00:00:00
Session end 2019-10-31 23:59:59
Session first 2019-10-10 14:44:02
Error reporting ALL errors
Host Linux
Interface apache2handler
UserID 1
Attempt at "en" false
Attempt at "English" false
Attempt at "en_US.utf8" 'en_US.utf8'
Attempt at "en_US.UTF-8" 'en_US.UTF-8'
DB drivers mysql, pgsql, sqlite
Current driver pgsql
Login provider
Storage disks local-upload
Using Sandstorm? no
Is Sandstorm (.env) false
Is Docker (.env) true
bunq uses sandbox false
Trusted proxies (.env) **
User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0
Loaded extensions Core, date, libxml, openssl, pcre, sqlite3, zlib, ctype, curl, dom, fileinfo, filter, ftp, hash, iconv, json, mbstring, SPL, PDO, session, posix, Reflection, standard, SimpleXML, pdo_sqlite, Phar, tokenizer, xml, xmlreader, xmlwriter, mysqlnd, apache2handler, bcmath, gd, intl, ldap, memcached, pdo_mysql, pdo_pgsql, sodium, Zend OPcache

Bonus points

Error message:
image

Console log:
image

Got a lot of spamming the in console due to that it thinks some transactions are zero?
image

@melroy89
Copy link
Contributor Author

I also got a mail from the Mail Bot: "Firefly III has created 2363 transactions for you.".
I looks like it parsed all transactions correctly as far as I can see. I think the start amount I filled-in incorrect. Actually, Firefly could use the CSV file for the bank balance as well, that is also listed in this file.

Column #9 in the CSV file is the balance, during Import I can't find a column role that matches 'Balance', this would fix the negative balance, since the current balance could be retrieve from the CSV file.

@JC5
Copy link
Member

JC5 commented Oct 10, 2019

A quick reply from my phone. Firefly III won’t be able to use the balance column to self correct. If the start balance is correct, after the import the balance should still be correct. If this isn’t the case hiding it by “correcting” it through the balance will only hide the original problem.

I’ll do a deep dive when I have more time.

@melroy89
Copy link
Contributor Author

melroy89 commented Oct 10, 2019

Yea thanks in advance. The balance is easily solved. EDIT: I solved it my entering the start balance with the correct balance from the year 2017.

The import problem resulting into 'Gateway timeout' is the more urgent issue I'm facing.

@JC5
Copy link
Member

JC5 commented Oct 10, 2019

Make sure your front end or web server doesn’t time out at 30 seconds, which is usually the limit.

Another option is to use the command line to import your data where there usually is no time limit at all.

@melroy89
Copy link
Contributor Author

Ok in the docker image (firefly_iii_app) I increased the Timeout of apache2 in the /etc/apache2/apache2.conf. As well as the PHP timeouts in /usr/local/etc/php/php.ini:

max_execution_time = 800
max_input_time = 800

I stopped and restart the docker images.

On the reverse proxy side (Nginx), which is outside the Docker, I increased the proxy settings in /etc/nginx/nginx.conf:

        proxy_connect_timeout       600;
        proxy_send_timeout          600;
        proxy_read_timeout          600;
        send_timeout                600;

Lets see if this works.. Still I had no clue what Firefly was doing during the import.. since it did took all transactions into account. 😕

@JC5
Copy link
Member

JC5 commented Oct 10, 2019

Web interfaces are less ideal for long synchronized operations I’m afraid.

@melroy89
Copy link
Contributor Author

melroy89 commented Oct 10, 2019

Well I'm hosting Nextcloud and whatever not on the same server. I can upload 20Gb files without issues ;) We are talking about a CSV file of 511kb.

Do you know what Firefly is doing for operations, spamming the javascript console?

@JC5
Copy link
Member

JC5 commented Oct 10, 2019

Yes if only it was just uploading the file.

What’s your question exactly?

@melroy89
Copy link
Contributor Author

What could cause this issue? What is Firefly executing during the import? Since it causes this gateway timeout issue, yet the transactions are imported.

https://user-images.githubusercontent.com/628926/66571677-fc66bf00-eb6f-11e9-9990-4053b92204a1.png

https://user-images.githubusercontent.com/628926/66571145-25d31b00-eb6f-11e9-9520-d3bf6d65adb0.png

@melroy89
Copy link
Contributor Author

melroy89 commented Oct 10, 2019

As you an see Firefly is taking about 1 second per request. If this is 1 transaction, is it!??. This means every transaction during import will cost 1 second, this will add up very fast.. 😖 .. 400 transactions == 400 seconds. 1500 transactions == 1500 seconds. This is not really scalable. So what is it doing?

Logging from docker:

sudo docker logs --tail 30 -f <username>_firefly_iii_app_1                                                                                                                                                      
172.24.0.1 - - [10/Oct/2019:17:34:46 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2961 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:34:47 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2973 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:34:48 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2967 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:34:49 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2971 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:34:51 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2961 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:34:52 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2965 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:34:53 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2967 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:34:54 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2967 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:34:55 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2967 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:34:57 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2977 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:34:58 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2967 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:00 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2971 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:03 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2969 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:06 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2973 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:08 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2971 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:09 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2967 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:10 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2975 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:11 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2963 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:12 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2973 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:13 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2965 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:15 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2975 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:16 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2967 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:17 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2965 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:20 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2963 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:22 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2969 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:24 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2969 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:25 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2973 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:26 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2957 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:27 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2961 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:29 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2965 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:30 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2961 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:31 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2973 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:32 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2959 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"
172.24.0.1 - - [10/Oct/2019:17:35:34 +0200] "GET /import/job/json/liMKyAjc569y HTTP/1.1" 200 2971 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"

@melroy89
Copy link
Contributor Author

melroy89 commented Oct 10, 2019

Nevertheless, increasing all these timeouts worked, but I think 475 times (475 is taken from the logging), 475x 1 second is a bit too much. So would it take so long? What is it doing?

image

image

Ps. I looked-up the transaction with a zero amount. Apparently these were messages from the bank (that I disabled/enabled mobile payments), No idea why they added this into the bank records, were just informational messages.

@JC5
Copy link
Member

JC5 commented Oct 10, 2019

Ok let’s take one step back. Importing CSV data doesn’t scale. One second per line isn’t even too bad, depending on the complexity of the data and the speed of your system. Each individual cell has its own pre- and post-processors, database mappings and storage routines. All things considered 1s per line is pretty fast, depending on your hardware.

And yes, your Nextcloud is faster dumping data to a drive.

As for the import itself, it’s a bad idea regardless of Firefly III’s speed to import 2800 lines at once. Whatever error there is, it’ll be repeated 2800 times.

Part of the problem is of course your bank. Adding human readable informative message to a machine readable data format is a terribly stupid idea.

Personally I suggest you don’t import data at all. Start fresh, get to know your own finances and act on what you see there.

If you feel you must import transactions, do so in batches. Feel free to inspect the code to find out what needs to be done to reliably import data into Firefly III.

@JC5 JC5 added the question label Oct 10, 2019
@melroy89
Copy link
Contributor Author

melroy89 commented Oct 11, 2019

Personally I suggest you don’t import data at all.

I also read your post: https://www.patreon.com/posts/30012174. I understand what you mean by doing the finance by hand 1 transaction at a time. 😄 I get the point, sure you get a better feeling about your money flow, however in our case it quite timing consuming otherwise.

Whatever error there is, it’ll be repeated 2800 times.

Yea, maybe a good idea to clean-up this error log pollution. And see why a computer with 3Ghz quad-core needs 1 second to parse a single line of comma separated line, I think there are ways to optimize this.

If you feel you must import transactions, do so in batches.

I will from now one, the first batch is the biggest (from 2017 until now), I won't like importing this by hand.

Feel free to inspect the code to find out what needs to be done to reliably import data into Firefly III.

I will. Is it still relevant though, since you are also working on this other CSV import tool? Will this new project solve all of these issues?

https://github.com/JC5/firefly-iii-csv-importer

@melroy89
Copy link
Contributor Author

melroy89 commented Oct 11, 2019

I already prepared some clean-up actions in a PR as a start, so it's not spamming the console too much. I hope you like it!

Could you point me into the right direction in the app itself, where the 'magic' happens of processing happens of the CSV import. So, I really mean the parsing, adding transactions to the DB and whatever not.

I couldn't find those files were I'm looking for. Thanks in advance James!

@JC5
Copy link
Member

JC5 commented Oct 11, 2019

Check out these files. Mind you this code will move to a new project.

  • app/Import/Routine/FileRoutine.php
  • app/Import/JobConfiguration/FileJobConfiguration.php
  • app/Import/Specifics/SnsDescription.php
  • app/Import/Storage/ImportArrayStorage.php
  • app/Support/Import/JobConfiguration/File
  • app/Support/Import/Placeholder/ImportTransaction.php
  • app/Support/Import/Routine/File

@melroy89
Copy link
Contributor Author

melroy89 commented Oct 11, 2019

Thanks, I think there is also a performance issue in general. Eg. when applying rules. I do not know what the bottleneck is currently, but my current focus would go to:

  • Import (as you shown above)
  • Database management (close/open states, performance optimization settings for Postgresql/cache)
  • Queries (SQL queries performances)
  • Reads/writes from disk
  • Coding Loops

I know it still sounds a bit vague, but I think I will take a brother look to the performance.
Since I really have the feeling a lot could be improved here, currently hampering me quite a bit, not only during import but also now with applying rules.

EDIT: Pressing the button in the create rule page 'Check which transaction are matching', is rather pretty fast. I know its only on the last 200 records. But apply just 1 category takes very long. Nevertheless, I will dive into it.

@melroy89 melroy89 changed the title Import failed! Database performance issues Oct 11, 2019
@melroy89
Copy link
Contributor Author

melroy89 commented Oct 11, 2019

  • app/Import/Routine/FileRoutine.php => Relative simple code
  • app/Import/JobConfiguration/FileJobConfiguration.php => Interface only
  • app/Import/Specifics/SnsDescription.php => Easy mutations
  • app/Import/Storage/ImportArrayStorage.php => Properbly storeGroupArray() takes too long. Could you measure the execution time?
  • app/Support/Import/JobConfiguration/File => getValuesForMapping() in ConfigureMappingHandler.php is also not written efficiently. In general too much (nested) foreach loops in ConfigureRolesHandler.php. Rest looks fine.
  • app/Support/Import/Placeholder/ImportTransaction.php Maybe calculateAmount() if external APis are used.
  • app/Support/Import/Routine/File => Again a lot of forearch loops in forearch

Maybe we should run Xdebug (more info about profiler) and/or Blackfire (does has a free Hack edition).

With xdebug can be even more useful with xdebug_start_trace() and xdebug_stop_trace statements.

Pump the profile data of Xdebug into webgrind! And hopefully we can find the bottlenecks much easier/faster:

image

Stackoverflow..

@JC5
Copy link
Member

JC5 commented Oct 12, 2019

I appreciate the effort. But before you dive into this code realize I'm building a new import routine over here. Feel free to check it out.

@melroy89
Copy link
Contributor Author

As I said before, I think it's a generic performance issue. Dive into xdebug for tracing measurements.

@melroy89
Copy link
Contributor Author

melroy89 commented Oct 15, 2019

Finding so far:

  • Did you create indexes on the tables? I didn't see CREATE INDEX
  • Did you add limits when needed (otherwise it will try to fetch all results), use pagination instead
  • Bad practices like foreach/loops should be avoided at all costs in your code when querying results from the DB. Otherwise this could lead to creating objects of all results in memory, which makes it very slow.
  • Use the indexes to search on ideally
  • Use DB::raw() in some cases you really need to advanced database queries. Eloquent is handy, but costly at some times.
  • Create relationships in Laravel and use the ->with() function.
  • Introduce PostgreSQL production settings:
# DB Version: 12
# OS Type: linux
# DB Type: oltp
# Total Memory (RAM): 8 GB
# CPUs num: 4
# Data Storage: hdd

max_connections = 300
shared_buffers = 2GB
effective_cache_size = 6GB
maintenance_work_mem = 512MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 3495kB
min_wal_size = 2GB
max_wal_size = 4GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4

More info:

Hint:

  1. Try the debug bar, more info..

  2. Read more about optimizations

@JC5
Copy link
Member

JC5 commented Oct 15, 2019

Thanks. Those tips are really helpful.

@stale
Copy link

stale bot commented Oct 22, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale No replies or comments. Will be auto-closed in 14 days. label Oct 22, 2019
@stale stale bot closed this as completed Oct 29, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jan 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stale No replies or comments. Will be auto-closed in 14 days.
Projects
None yet
Development

No branches or pull requests

2 participants