Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller segfault on learn #331

Closed
cedricdubois opened this issue Jul 6, 2015 · 9 comments
Closed

Controller segfault on learn #331

cedricdubois opened this issue Jul 6, 2015 · 9 comments

Comments

@cedricdubois
Copy link

I'm testing rspamd and while learning ham and spam directories using rspamc, I noticed sometimes a bunch of messages returned "IO read error: unexpected EOF". It seems that when learning a multipart message containing empty parts, the controller crashed.

The crash occurs up to version 0.9.9.

I ran gdb on the controller process and got this:

Program received signal SIGSEGV, Segmentation fault.
0x000000000045e713 in rspamd_stat_cache_sqlite3_process (task=<optimized out>, is_spam=0, c=0x26d7380)
    at /vagrant/rspamd/src/libstat/learn_cache/sqlite3_cache.c:205
205             for (i = 0; i < part->words->len; i ++) {
(gdb) bt
#0  0x000000000045e713 in rspamd_stat_cache_sqlite3_process (task=<optimized out>, is_spam=0, c=0x26d7380)
    at /vagrant/rspamd/src/libstat/learn_cache/sqlite3_cache.c:205
#1  0x000000000045ac70 in rspamd_stat_learn (task=0x29d3100, spam=0, L=0x270c0c0, err=0x7fffdd8daa30)
    at /vagrant/rspamd/src/libstat/stat_process.c:549
#2  0x000000000041cf43 in rspamd_controller_learn_fin_task (ud=0x29d3100) at /vagrant/rspamd/src/controller.c:923
#3  0x0000000000450f45 in check_session_pending (session=session@entry=0x29c78c0)
    at /vagrant/rspamd/src/libserver/events.c:229
#4  0x0000000000451080 in remove_normal_event (session=0x29c78c0, fin=0x44f600 <rspamd_dns_fin_cb>, ud=0x29e69a8)
    at /vagrant/rspamd/src/libserver/events.c:168
#5  0x00000000004ae343 in rdns_process_read (fd=<optimized out>, arg=<optimized out>)
    at /vagrant/rspamd/contrib/librdns/resolver.c:276
#6  0x00007fd1a051c3dc in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
#7  0x000000000041d7c5 in start_controller_worker (worker=0x29bb4c0) at /vagrant/rspamd/src/controller.c:1987
#8  0x0000000000421d69 in fork_worker (rspamd=0x26d8d50, cf=0x29bb510) at /vagrant/rspamd/src/main.c:469
#9  0x000000000041c033 in fork_delayed (rspamd=<optimized out>) at /vagrant/rspamd/src/main.c:589
#10 main (argc=43786624, argv=0x29bb510, env=0x718690 <do_terminate>) at /vagrant/rspamd/src/main.c:1370
(gdb) p part->words->len
Cannot access memory at address 0x8

An example mail crashing the controller, is available at:
https://gist.github.com/cedricdubois/7fef1eb81cb5175dbd35

Let me know if you need any more information.

Thanks!

@vstakhov
Copy link
Member

vstakhov commented Jul 6, 2015

Oh, I see the problem. Will fix it shortly.

@vstakhov
Copy link
Member

vstakhov commented Jul 6, 2015

This should be fixed in master and 0.9 branch. Tests would follow soon.

@moisseev
Copy link
Member

moisseev commented Jul 7, 2015

I'v tested 0.9 0227d24. Not fixed.

@cedricdubois
Copy link
Author

The specific case in the original report seems to be fixed. I'm still seeing segfaults, but in another location. I haven't been able to find a specific mail causing the crash (or even if it is a specific mail). The crash happens when learning a directory containing subdirectories with ham mails. Here's the backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x000000000045bf39 in bayes_learn_ham_callback (key=<optimized out>, value=<optimized out>, data=<optimized out>)
    at /vagrant/rspamd/src/libstat/classifiers/bayes.c:288
288         if (!res->st_runtime->st->is_spam) {
(gdb) bt
#0  0x000000000045bf39 in bayes_learn_ham_callback (key=<optimized out>, value=<optimized out>,
    data=<optimized out>) at /vagrant/rspamd/src/libstat/classifiers/bayes.c:288
#1  0x00007f817d09e804 in g_tree_foreach () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x000000000045c9fe in bayes_learn_spam (ctx=<optimized out>, input=<optimized out>, rt=<optimized out>,
    task=<optimized out>, is_spam=<optimized out>, err=<optimized out>)
    at /vagrant/rspamd/src/libstat/classifiers/bayes.c:316
#3  0x000000000045adf8 in rspamd_stat_learn (task=0x1a183a0, spam=0, L=<optimized out>, err=0x7ffc14c0dd50)
    at /vagrant/rspamd/src/libstat/stat_process.c:579
#4  0x000000000041cf43 in rspamd_controller_learn_fin_task (ud=0x1a183a0) at /vagrant/rspamd/src/controller.c:923
#5  0x0000000000450f45 in check_session_pending (session=session@entry=0x1a031b0)
    at /vagrant/rspamd/src/libserver/events.c:229
#6  0x0000000000451080 in remove_normal_event (session=0x1a031b0, fin=0x44f600 <rspamd_dns_fin_cb>, ud=0x17d8b98)
    at /vagrant/rspamd/src/libserver/events.c:168
#7  0x00000000004ae343 in rdns_process_read (fd=<optimized out>, arg=<optimized out>)
    at /vagrant/rspamd/contrib/librdns/resolver.c:276
#8  0x00007f817c6be3dc in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
#9  0x000000000041d7c5 in start_controller_worker (worker=0x1a02da0) at /vagrant/rspamd/src/controller.c:1987
#10 0x0000000000421d69 in fork_worker (rspamd=rspamd@entry=0x171bd50, cf=cf@entry=0x17b08d8)
    at /vagrant/rspamd/src/main.c:469
#11 0x00000000004221b3 in spawn_workers (rspamd=0x171bd50) at /vagrant/rspamd/src/main.c:689
#12 0x000000000041be7d in main (argc=101, argv=0x171bd50, env=0x7ffc14c0f3a0) at /vagrant/rspamd/src/main.c:1282
(gdb) p res
$1 = (struct rspamd_token_result *) 0x1fe5150
(gdb) p res->st_runtime
$2 = (struct rspamd_statfile_runtime *) 0x0

@moisseev
Copy link
Member

moisseev commented Jul 7, 2015

I think subdirectories don't matter. I am using stdin:

# doveadm fetch -u $USER text mailbox-guid $GUID uid $UID | \
    rspamc learn_spam
2015-07-07 08:52:26 #81109(main) main: controller process 81510 terminated abnormally by signal: 11
2015-07-07 08:52:28 #81109(main) print_signals_info: got signal: 'Alarm clock'; received from pid: 0; uid: 0
2015-07-07 08:52:28 #81528(controller) fork_worker: starting controller process 81528

Mail message sample:
https://gist.github.com/moisseev/9d8611e1b4c94115bf2b

@cedricdubois
Copy link
Author

That message indeed causes the crash I'm seeing now, in bayes_learn_ham_callback() and bayes_learn_spam_callback().

@vstakhov
Copy link
Member

vstakhov commented Jul 7, 2015

I cannot reproduce this error with this message. However, I've found another related bug when classifying this message.

vstakhov added a commit that referenced this issue Jul 8, 2015
vstakhov added a commit that referenced this issue Jul 8, 2015
vstakhov added a commit that referenced this issue Jul 9, 2015
vstakhov added a commit that referenced this issue Jul 9, 2015
vstakhov added a commit that referenced this issue Jul 22, 2015
vstakhov added a commit that referenced this issue Jul 22, 2015
@moisseev
Copy link
Member

I'v tested 0.9 branch 4f9000c. Rspamd not crashes on learning any more, but rspamc throws an error:

 # rspamc learn_spam msg.eml
Results for file: msg.eml
HTTP parser error: invalid HTTP status code

Debug output:
https://gist.github.com/moisseev/8bef62e9b18c8b45dcad

@vstakhov
Copy link
Member

Well, it seems to be reporting issue now. The problem is that a message contains too few textual data to learn bayes.

@vstakhov vstakhov closed this as completed Sep 8, 2015
vstakhov added a commit that referenced this issue Sep 17, 2015
* Rework symbols processing:
	- Improve sorting logic for symbols
	- Organize processing into multiple stages
	- Added asynchronous watchers for symbols
	- Added ability to organize dependencies between symbols
* Fixed URL redirector:
	- Use optimized POE loop
	- Organize dependencies
	- Fix startup
* New sqlite3 backend:
	- Allow to have per-languages and per-user statistics
	- Allow sqlite3 to be used as statistics backend
* Store tokenizer configuration within statfiles
* Improve bayes statistics:
	- Use headers and images metainformation in bayes
	- Suggest using of pre-processed tokens for statistics
	- Fix tokens normalization for OSB algorithm
* Rewrite url parsing:
	- Fix numerous issues with url extraction and normalization
	- Fix mailto urls
* Fix settings plugin to allow custom actions scores
* Improve rbl plugin
* Allow capturing patterns in rspamd lua regexp library
* Add GTUBE support
* Fix spamc legacy support
* Add DKIM support to RBL module
* Fix issues with multiple DKIM signatures
* Fix issue if rspamd cannot create statfiles (#331)
* Rework parts and task structure:
	- Now text_parts, parts and received are arrays
	- Pre-allocate arrays with some reasonable defaults
	- Use arrays instead of lists in plugins and checks
	- Remove unused fields from task structure
	- Rework mime_foreach callback function
	- Remove deprecated scan_milliseconds field
* Add ip_score plugin support (not enabled by default):
	- Can check for asn/country and network using DNS lookups
	- Can store and load reputation from redis server
* Improve PARTS_DIFFER rule to count merely different words
* New HTML parser:
	- Parses HTML parts using a set of state machines
	- Extracts useful data and exports it to lua functions:
		+ Styles
		+ Images
		+ URLs
		+ Colors
		+ Structure elements
	- Added HTML rules for some checks
* New version of LUA DNS API
* Table versions of many functions in LUA API
* Improve rspamc client:
	- Print execution time
	- Allow executing of external commands and passing output to them
	- Allow mime output mode when rspamc alters message according to rspamd
		checks and send it to an external command or stdout
* Allow scanning of local files using HTTP requests
* Rework configuration system:
	- Rules are now moved from the $CONFDIR to $RULESDIR to avoid ambiguity
	- All modules configurations are now split in $CONFDIR/modules.d/* to
		simplify upgrades
	- Move hfilter to plugins
	- Allow plugins and rules to define default scores to simplify metrics
		setup
	- Include overrides for all modules to honor local/automatic parameters
	- Tune scores for many modules
* Rework and enable DMARC plugin
* Add whitelist plugin for SPF/DKIM/DMARC based whitelisting
* Add some common domains to whitelists shipped with rspamd
* Rework logging:
	- Now each log entry supports module name and a `tag`. Tag is used to
		identify unique objects (such as tasks) when checking log files
	- It is possible to turn on debugging for the specific modules
	- Systemd logging is fixed
* Improve spamassassin plugin.
	- Now headers are matched more like SA
	- Improve support of Message-ID
	- Add support of ToCc header type
	- Fix :addr and :name in headers regexps
* Resurrect rrd support code
* Save controller stats between restarts
* Fixed tonns of bugs
* Added tonns of minor improvements and features
* Added more unit tests
* Create functional tests framework
* Added documentation for missing modules
* Added rpm/deb repositories and scripts
* Updated WebUI and libucl externals

Signed-off-by: Vsevolod Stakhov <vsevolod@highsecure.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants