Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Captagent is utilizing 100% CPU when sip-parse = true #5

Closed
marrold opened this issue Jan 23, 2015 · 25 comments
Closed

Captagent is utilizing 100% CPU when sip-parse = true #5

marrold opened this issue Jan 23, 2015 · 25 comments

Comments

@marrold
Copy link

marrold commented Jan 23, 2015

Captagent is utilizing 100% CPU when sip-parse = true

This doesn't happen as soon as Captagent is started. It can take 2-5 minutes before it pegs at 100% CPU utilization.

Server Details-
Intel(R) Xeon(R) CPU E5504 @ 2.00GHz (Quad Core)
4 GB RAM

Traffic Details-
rx: 119.94 Mbit/s 77492 p/s
SIP MPS Approx 160

capagent.xml -

    <configuration name="core.conf" description="CORE Settings">
      <settings>
        <param name="debug" value="-3"/>
        <param name="daemon" value="true"/>
        <param name="syslog" value="false"/>
        <param name="pid_file" value="/var/run/captagent1.pid"/>
        <param name="path" value="/usr/local/lib/captagent/modules"/>
      </settings>
    </configuration>

    <configuration name="modules.conf" description="Modules">
      <modules>
            <load module="core_hep"/>
            <load module="proto_uni"/>
      </modules>
    </configuration>

    <!-- CORE MODULES -->

    <configuration name="core_hep.conf" description="HEP Socket">
      <settings>
        <param name="version" value="3"/>
        <param name="capture-host" value="127.0.0.1"/>
        <param name="capture-port" value="9061"/>
        <param name="capture-proto" value="udp"/>
        <param name="capture-id" value="1"/>
        <param name="capture-password" value="myHep"/>
        <param name="payload-compression" value="false" />
      </settings>
    </configuration>

    <!-- PROTOCOLS -->

    <configuration name="proto_uni.conf" description="UNI Proto Basic capture">
      <settings>
        <param name="portrange" value="5060-5080"/>
        <!-- <param name="portrange" value="5060-5090"/> -->
        <!--
            use -D flag for pcap import
            use "any" for all interfaces in your system
        -->
        <param name="dev" value="em2"/>
        <param name="promisc" value="true"/>
        <!--
            comment it if you want to see all IPProto (tcp/udp)
        -->
        <!--<param name="ip-proto" value="udp"/>-->
        <param name="proto-type"  value="sip"/>
        <param name="sip-parse"  value="true"/>
        <param name="rtcp-tracking"  value="false"/>
        <param name="reasm"  value="false"/>
        <param name="tcpdefrag"  value="true"/>
        <param name="debug" value ="false"/>
        <param name="buildin-reasm-filter" value="false"/>
        <!--
            <param name="expire-timer" value ="60"/>
        <!--
            <param name="expire-timer" value ="60"/>
            <param name="expire-rtcp" value ="120"/>
        -->
        <!-- <param name="filter" value="not src port 5099"/> -->
            <!-- <param name="vlan" value="false"/> -->
            <!--
                    ((ip[6:2] &amp; 0x3fff != 0) - syntax for REASM packets
                     if capturing sip messages, you can filter by method
                     you can specify which method to NOT match with !
                     <param name="sip_method" value="INVITE"/>
            -->
      </settings>
    </configuration>

Thanks in advance.

@adubovikov adubovikov added the bug label Jan 23, 2015
@adubovikov
Copy link
Member

I have checked, but unfortunately couldn't reproduce the issue. For me it looks like bad SIP parsing, Can you please crate a pcap dump with all SIP messages from start until high CPU usage ? After I will check again. Thank you.

@marrold
Copy link
Author

marrold commented Jan 23, 2015

Hi Alexandr,

That's plausible. I'm afraid sending a full PCAP isn't really possible as
it contains sensitive data as I'm sure you can imagine. Would increasing
the verbosity on the captagent logs help?

I could try capturing an offending packet this evening when there is less
traffic, to make it more straight forward to redact any numbers / IP's etc

Thanks

On Fri, Jan 23, 2015 at 2:52 PM, Alexandr Dubovikov <
notifications@github.com> wrote:

I have checked, but unfortunately couldn't reproduce the issue. For me it
looks like bad SIP parsing, Can you please crate a pcap dump with all SIP
messages from start until high CPU usage ? After I will check again. Thank
you.


Reply to this email directly or view it on GitHub
#5 (comment).

@adubovikov
Copy link
Member

please check last git.
i did on workaround, please check syslog/log file if you will see something like this:
"TOO MANY LOOPS [10]"

thank you.

@marrold
Copy link
Author

marrold commented Jan 24, 2015

Thanks, I will test and report back.
On 24 Jan 2015 13:00, "Alexandr Dubovikov" notifications@github.com wrote:

please check last git.
i did on workaround, please check syslog/log file if you will see
something like this:
"TOO MANY LOOPS [10]"

thank you.


Reply to this email directly or view it on GitHub
#5 (comment).

@marrold
Copy link
Author

marrold commented Jan 26, 2015

100% CPU reoccurred today. Please see the logs- http://pastebin.com/raw.php?i=nHSCKdk5

Also, captagent died with a segfault, possibly due to 100% CPU usage -

Jan 26 10:38:09 CLL-Tracing kernel: [926734.544000] captagent[6523]: segfault at 7fb51c12230f ip 00007fb4248a9b8e sp 00007fb4239c05c8 error 4 in libc-2.19.so[7fb424811000+1bb000]

I don't have the packet capture for this occurrence, I can grab one next time if required?

Thanks

@adubovikov
Copy link
Member

no, it's ok for now. I see that len is bigger than message itself. I have added more debug, can you please pull git again and check one more time ?

@marrold
Copy link
Author

marrold commented Jan 26, 2015

Pulled and running, Thanks

On Mon, Jan 26, 2015 at 3:08 PM, Alexandr Dubovikov <
notifications@github.com> wrote:

no, it's ok for now. I see that len is bigger than message itself. I have
added more debug, can you please pull git again and check one more time ?


Reply to this email directly or view it on GitHub
#5 (comment).

@marrold
Copy link
Author

marrold commented Jan 28, 2015

Please see a small selection of the latest log showing the common messages. As you believe it's a len issue I've been careful to use the same amount of characters when redacting numbers / IP's

http://pastebin.com/raw.php?i=GnraNnMW

Thanks

@marrold
Copy link
Author

marrold commented Feb 8, 2015

Hi, is there any update on this? I noticed it's no longer utilising 100% CPU due to the added 'break', but curious if a fix is on its way.

Thanks

@adubovikov
Copy link
Member

Yes, we will update tomorrow. Sorry but this week was very busy.

On 8 February 2015 at 22:15, marrold notifications@github.com wrote:

Hi, is there any update on this? I noticed it's no longer utilising 100%
CPU due to the added 'break', but curious if a fix is on its way.

Thanks


Reply to this email directly or view it on GitHub
#5 (comment).

@marrold
Copy link
Author

marrold commented Feb 8, 2015

No need to apologies, appreciated as always.

On Sun, Feb 8, 2015 at 9:15 PM, Alexandr Dubovikov <notifications@github.com

wrote:

Yes, we will update tomorrow. Sorry but this week was very busy.

On 8 February 2015 at 22:15, marrold notifications@github.com wrote:

Hi, is there any update on this? I noticed it's no longer utilising 100%
CPU due to the added 'break', but curious if a fix is on its way.

Thanks


Reply to this email directly or view it on GitHub
<#5 (comment)
.


Reply to this email directly or view it on GitHub
#5 (comment).

@adubovikov
Copy link
Member

so. please take the last git. Also I see that this TCP messages are broken. Probably it was bad message len, but it's hard to say.

anyway I am waiting on your feedback. Thank you!

@marrold
Copy link
Author

marrold commented Feb 9, 2015

Thanks, I will test tomorrow. Is there anyway to disregard broken TCP and
parse anyway?
On 9 Feb 2015 20:41, "Alexandr Dubovikov" notifications@github.com wrote:

so. please take the last git. Also I see that this TCP messages are
broken. Probably it was bad message len, but it's hard to say.

anyway I am waiting on your feedback. Thank you!


Reply to this email directly or view it on GitHub
#5 (comment).

@adubovikov
Copy link
Member

hard to say. If only some lines of SDP are missed, it's ok, but if RURI or several important headers are missed - this is complete different story. Anyway, kamailio will drop it :-)

@marrold
Copy link
Author

marrold commented Feb 9, 2015

What I am trying to understand is, why wasn't the below example (from the
previous pastebin) parsed? It looks RFC compliant. Was this a TCP LEN vs
Actual Data LEN issue? Will the latest code mean such SIP messages are
parsed?

Thanks

Jan 27 18:25:44 CLL-Tracing captagent[27177]: [ERR] proto_uni.c:328 TOO
MANY LOOP LEN [1386] vs NEWLEN: [0] vs SKIP: [1383] vs PARSED: [0]
Jan 27 18:25:44 CLL-Tracing captagent[27177]: [ERR] proto_uni.c:329 PACKET

INVITE sip:111111111111@172.16.1.23:5060;transport=tcp SIP/2.0
Via: SIP/2.0/TCP 11.111.1.111:5060
;branch=z9hG4bK-524287-1---d5a2e626e94acc0d;rport
Via: SIP/2.0/UDP 11.111.1.111:5061
;branch=z9hG4bK-6x3eokgxtich7y4w;rport=5061
Max-Forwards: 69
Record-Route: sip:11.111.1.111:5060;transport=tcp;lr;drr
Record-Route: sip:11.111.1.111:5060;lr;transport=UDP;drr
Contact: "Anonymous"sip:11.111.1.111:5061
To: sip:111111111111@172.16.1.23
From: +111111111111 sip:+111111111111@11.111.1.111;tag=nkgcuhqzbwycb74v.o
Call-ID: 456b4fac-333cbed0-7f5fb838-a72a@10.16.231.133
CSeq: 781 INVITE
Expires: 300
Content-Disposition: session
Content-Type: application/sdp
User-Agent: Sippy
P-Asserted-Identity: +111111111111 <sip:+111111111111@11.111.1.111
;user=phone>
h323-conf-id: 235399844-542757103-3421031868-2401107647
Portasip-3264-action: offer 1
cisco-GUID: 235399844-542757103-3421031868-2401107647
Content-Length: 445

v=0
o=Sippy 2712874956379544410 0 IN IP4 11.111.1.111
s=Cisco SDP 0
t=0 0
m=audio 64644 RTP/AVP 8 18 0 101
c=IN IP4 11.111.1.111
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=sqn:0
a=cdsc: 1 audio RTP/AVP 8 18 0 101
a=cdsc: 5 image udptl t38
a=cpar: a=T38FaxVersion:0
a=cpar: a=T38FaxRateManagement:transferredTCF
a=cpar: a=T38FaxMaxDatagram:160
a=cpar: a=T38FaxUdpEC:t38UDPRedundancy
a=X-sqn:0
a=X-cap: 1 image udptl t38

On Mon, Feb 9, 2015 at 9:05 PM, Alexandr Dubovikov <notifications@github.com

wrote:

hard to say. If only some lines of SDP are missed, it's ok, but if RURI or
several important headers are missed - this is complete different story.
Anyway, kamailio will drop it :-)


Reply to this email directly or view it on GitHub
#5 (comment).

@adubovikov
Copy link
Member

the message len is 1386, but we have only 1383, somethere 3 characters are gone.
anyway, with the new git, this message will be parsed.

@adubovikov
Copy link
Member

any feedback ? Can I close the issue ?

@marrold
Copy link
Author

marrold commented Feb 10, 2015

I spotted one entry in the logs regarding the loop, which is much less than
before. I was going to leave it 24 hours before providing feed back.

There still seems to be some issues with TCP packets, which may or may not
be related.

I don't suppose someone has created a HEP dissector for Wireshark?

Thanks

On Tue, Feb 10, 2015 at 8:48 PM, Alexandr Dubovikov <
notifications@github.com> wrote:

any feedback ? Can I close the issue ?


Reply to this email directly or view it on GitHub
#5 (comment).

@adubovikov
Copy link
Member

yeah, we planned do it long time ago, but unfortunately no time for this. If you can do it, we will be very appreciated!!!

@marrold
Copy link
Author

marrold commented Feb 10, 2015

I plan to give it a go but can't guarantee any success!

In the mean time, should I raise a new issue for the missing TCP packets
once I've gathered some examples? I'm happy to wait as you've been busy.

On Tue, Feb 10, 2015 at 9:13 PM, Alexandr Dubovikov <
notifications@github.com> wrote:

yeah, we planned do it long time ago, but unfortunately no time for this.
If you can do it, we will be very appreciated!!!


Reply to this email directly or view it on GitHub
#5 (comment).

@adubovikov
Copy link
Member

sure, contact me any time. Currently we have some requests, but we always find a bit time to help our users :-)

@marrold
Copy link
Author

marrold commented Feb 11, 2015

Please find the latest log here- https://gist.github.com/marrold/00e7e647ae0ece7e46f8

Looks like only 3 packets got caught in the loop out of thousands, so this is much better.

Thanks

@adubovikov
Copy link
Member

ok. looks like it was fixed. Please check the last git.

thank you very much!

@marrold
Copy link
Author

marrold commented Feb 17, 2015

Pulled and running. Thanks for your assistance.

On Mon, Feb 16, 2015 at 6:16 PM, Alexandr Dubovikov <
notifications@github.com> wrote:

Closed #5 #5.


Reply to this email directly or view it on GitHub
#5 (comment).

@adubovikov
Copy link
Member

you are welcome. thank for bug reporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants