Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: normalize AUDIT_NETFILTER_PKT events #11

Closed
pcmoore opened this issue Apr 7, 2016 · 15 comments
Closed

RFE: normalize AUDIT_NETFILTER_PKT events #11

pcmoore opened this issue Apr 7, 2016 · 15 comments

Comments

@pcmoore
Copy link
Member

pcmoore commented Apr 7, 2016

The AUDIT_NETFILTER_PKT audit events are not normalized. They swing fields in and out based on settings rather than changing the value of the event. Here's one example in ./net/netfilter/xt_AUDIT.c:

if (ntohs(ih->frag_off) & IP_OFFSET) {
    audit_log_format(ab, " frag=1");
    return;
}

frag should always be set like:

audit_log_format(ab, " frag=%d", ntohs(ih->frag_off) & IP_OFFSET);
@pcmoore
Copy link
Member Author

pcmoore commented Apr 7, 2016

Older upstream discussion regarding AUDIT_NETFILTER_PKT events:

@rgbriggs
Copy link
Member

Posted a query to linux-audit and netfilter-devel mailing lists requesting some guidance on how to proceed with pulling in fields and how many combinations are acceptable for one message type:
https://www.redhat.com/archives/linux-audit/2017-January/msg00074.html

@rgbriggs
Copy link
Member

Guidance received, working on design of number of different messages or combined message.

@rgbriggs
Copy link
Member

Posted proposed message format with some length stats and more questions:
https://www.redhat.com/archives/linux-audit/2017-January/msg00092.html

@rgbriggs
Copy link
Member

Posted preliminary test framework patch for comment:
https://github.com/rgbriggs/audit-testsuite/tree/ghak11-AUDIT_NETFILTER_PKT-normalization

@pcmoore
Copy link
Member Author

pcmoore commented Jan 25, 2017

Posted preliminary test framework patch for comment:
https://github.com/rgbriggs/audit-testsuite/tree/ghak11-AUDIT_NETFILTER_PKT-normalization

I realize this is still just an early draft, but did you mean to submit this as a PR against the audit-testsuite?

@rgbriggs
Copy link
Member

Not yet. Just looking for some initial feedback that I'm on the right path...

@rgbriggs
Copy link
Member

Posted RFC patch for message format upstream on linux-audit and netfilter-devel lists:
https://www.redhat.com/archives/linux-audit/2017-January/msg00118.html

fengguang pushed a commit to 0day-ci/linux that referenced this issue Jan 27, 2017
Eliminate flipping in and out of message fields.

linux-audit/audit-kernel#11

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
@pcmoore
Copy link
Member Author

pcmoore commented Jan 30, 2017

@rgbriggs okay, I see the mailing list posting, I guess I'm just trying to figure out that previous comment here ... can we comment on branches if they are not PRs? (It wasn't immediately obvious to me) Or was this just a FYI that a mailing list post was due very soon?

Any reason is fine - I've never complained about someone being to verbose when it comes to status reporting - I just want to make sure I'm clear on what you are expecting from me/us :)

@rgbriggs
Copy link
Member

It wasn't intended as an FYI that a mailing list posting was due. It was more wondering aloud if anyone knew how to comment on a push to a personal fork without making a PR. If you know how, I'd like to know how to do so.

Aside from that, I'm looking for feedback on that testsuite patch either on that personal fork in github, or on the mailing list testsuite patch RFC.

rgbriggs added a commit to rgbriggs/audit-testsuite that referenced this issue Feb 23, 2017
Test for simplified normalized NETFILTER_PKT audit message.
Check for receipt of each nfmarked packet and for correct number of fields.

See: linux-audit/audit-kernel#11
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
@rgbriggs
Copy link
Member

Message format simplified considerably to include only:
mark, saddr, daddr, proto

Patch v2 posted upstream Cc:netfilter-devel
https://www.redhat.com/archives/linux-audit/2017-February/msg00146.html

fengguang pushed a commit to 0day-ci/linux that referenced this issue Feb 23, 2017
Simplify and eliminate flipping in and out of message fields, relying on nfmark
the way we do for audit_key.

linux-audit/audit-kernel#11

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
@rgbriggs
Copy link
Member

Original requirements for RHEL6.1, 2010: https://bugzilla.redhat.com/show_bug.cgi?id=642391

@rgbriggs
Copy link
Member

Patch v3 posted upstream Cc: netfilter-devel
https://www.redhat.com/archives/linux-audit/2017-February/msg00170.html

fengguang pushed a commit to 0day-ci/linux that referenced this issue Feb 26, 2017
Eliminate flipping in and out of message fields, dropping fields in the process.

Sample raw message format IPv4 UDP:
type=NETFILTER_PKT msg=audit(1487874761.386:228):  mark=0xae8a2732 saddr=127.0.0.1 daddr=127.0.0.1 proto=17^]
Sample raw message format IPv6 ICMP6:
type=NETFILTER_PKT msg=audit(1487874761.381:227):  mark=0x223894b7 saddr=::1 daddr=::1 proto=58^]

Issue: linux-audit/audit-kernel#11
Test case: linux-audit/audit-testsuite#43

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
@pcmoore
Copy link
Member Author

pcmoore commented Mar 23, 2017

@pcmoore
Copy link
Member Author

pcmoore commented Mar 23, 2017

Merged into audit/next via 36fe46d.

@pcmoore pcmoore closed this as completed Mar 23, 2017
pcmoore pushed a commit that referenced this issue Mar 23, 2017
Eliminate flipping in and out of message fields, dropping fields in the
process.

Sample raw message format IPv4 UDP:
type=NETFILTER_PKT msg=audit(1487874761.386:228):  mark=0xae8a2732 saddr=127.0.0.1 daddr=127.0.0.1 proto=17^]
Sample raw message format IPv6 ICMP6:
type=NETFILTER_PKT msg=audit(1487874761.381:227):  mark=0x223894b7 saddr=::1 daddr=::1 proto=58^]

Issue: #11
Test case: linux-audit/audit-testsuite#43

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
pcmoore pushed a commit that referenced this issue Mar 27, 2017
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pcmoore pushed a commit that referenced this issue Apr 11, 2017
Eliminate flipping in and out of message fields, dropping fields in the
process.

Sample raw message format IPv4 UDP:
type=NETFILTER_PKT msg=audit(1487874761.386:228):  mark=0xae8a2732 saddr=127.0.0.1 daddr=127.0.0.1 proto=17^]
Sample raw message format IPv6 ICMP6:
type=NETFILTER_PKT msg=audit(1487874761.381:227):  mark=0x223894b7 saddr=::1 daddr=::1 proto=58^]

Issue: #11
Test case: linux-audit/audit-testsuite#43

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
pcmoore pushed a commit to pcmoore/misc-audit_testsuite that referenced this issue Apr 18, 2017
Test for simplified normalized NETFILTER_PKT audit message.
Check for receipt of each nfmarked packet and for correct number of fields.

See: linux-audit/audit-kernel#11

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
sudipm-mukherjee pushed a commit to sudipm-mukherjee/parport that referenced this issue May 3, 2017
Eliminate flipping in and out of message fields, dropping fields in the
process.

Sample raw message format IPv4 UDP:
type=NETFILTER_PKT msg=audit(1487874761.386:228):  mark=0xae8a2732 saddr=127.0.0.1 daddr=127.0.0.1 proto=17^]
Sample raw message format IPv6 ICMP6:
type=NETFILTER_PKT msg=audit(1487874761.381:227):  mark=0x223894b7 saddr=::1 daddr=::1 proto=58^]

Issue: linux-audit/audit-kernel#11
Test case: linux-audit/audit-testsuite#43

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue May 5, 2017
GIT 46f0537b1ecf672052007c97f102a7e6bf0791e4

commit ec8a09fbbeff252c80daf62c7a78342003dddf9c
Author: Jon Paul Maloy <jon.maloy@ericsson.com>
Date:   Tue May 2 18:16:54 2017 +0200

    tipc: refactor function tipc_sk_recv_stream()
    
    We try to make this function more readable by improving variable names
    and comments, using more stack variables, and doing some smaller changes
    to the logics. We also rename the function to make it consistent with
    naming conventions used elsewhere in the code.
    
    Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
    Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e9f8b10101c6da3ab000a2fb17162374c9bd2c69
Author: Jon Paul Maloy <jon.maloy@ericsson.com>
Date:   Tue May 2 18:16:53 2017 +0200

    tipc: refactor function tipc_sk_recvmsg()
    
    We try to make this function more readable by improving variable names
    and comments, plus some minor changes to the logics.
    
    Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
    Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 773225388dae15e72790d6f573e2e70e96292b6b
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:58 2017 +0530

    net: thunderx: Optimize page recycling for XDP
    
    Driver follows a method of taking one extra reference on the
    page for recycling which is fine in usual packet path where
    each 64KB page is segmented into multiple receive buffers.
    
    But in XDP mode since there is just one receive buffer per
    page taking extra page reference itself becomes big bottleneck
    consuming ~50% of CPU cycles due to atomic operations.
    
    This patch adds a internal ref count in pgcache for each
    page and additional page references are taken in a batch
    instead of just one at a time. Internal i.e 'pgcache->ref_count'
    and page's i.e 'page->_refcount' counters are compared to check
    page's recyclability.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e3d06ff9ec9400b93bacf8fa92f3985c9412e282
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:57 2017 +0530

    net: thunderx: Support for XDP header adjustment
    
    When in XDP mode reserve XDP_PACKET_HEADROOM bytes at the start
    of receive buffer for XDP program to modify headers and adjust
    packet start. Additional code changes done to handle such packets.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 16f2bccda75da48888772c4829a468be620c5d79
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:56 2017 +0530

    net: thunderx: Add support for XDP_TX
    
    Adds support for XDP_TX i.e transmits packet out of
    the XDP TX queue mapped to the corresponding Rx queue
    on which packet is received.
    
    Since SQ for XDP TX will be used only on a single cpu i.e
    SQ description creation and freeing, using atomic free count
    is not necessary and will become a bottleneck. Hence added
    a separate 'xdp_free_cnt' used for SQs designated for XDP
    to track descriptor free count.
    
    Changes also include
    - A new entry 'xdp_page' is added to save transmitted packet's
      page pointer for later cleanup.
    - XDP Tx SQ's doorbell is ringed once per NAPI instance.
    - Retrieving designated SQ for packets being sent out by stack
      via 'nicvf_xmit'.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit c56d91ce38d54c0c0dd8d0e4c6a9e0cfa557152f
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:55 2017 +0530

    net: thunderx: Add support for XDP_DROP
    
    Adds support for XDP_DROP.
    Also since in XDP mode there is just a single buffer per page,
    made changes to recycle DMA mapping info as well along with pages.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 05c773f52b96ef3fbc7d9bfa21caadc6247ef7a8
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:54 2017 +0530

    net: thunderx: Add basic XDP support
    
    Adds basic XDP support i.e attaching a BPF program to an
    interface. Also takes care of allocating separate Tx queues
    for XDP path and for network stack packet transmission.
    
    This patch doesn't support handling of any of the XDP actions,
    all are treated as XDP_PASS i.e packets will be handed over to
    the network stack.
    
    Changes also involve allocating one receive buffer per page in XDP
    mode and multiple in normal mode i.e when no BPF program is attached.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 927987f39f116db477fcd74ced2a2aea940e585c
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:53 2017 +0530

    net: thunderx: Cleanup receive buffer allocation
    
    Get rid of unnecessary double pointer references and type casting
    in receive buffer allocation code.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 0dada88b8cd74569abc3dda50f1b268a5868f6f2
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:52 2017 +0530

    net: thunderx: Optimize CQE_TX handling
    
    Optimized CQE handling with below changes
    - Feeing descriptors back to SQ in bulk i.e once per NAPI
      instance instead for every CQE_TX, this will reduce number
      of atomic updates to 'sq->free_cnt'.
    - Checking errors in CQE_TX and CQE_RX before calling appropriate
      fn()s to update error stats i.e reduce branching.
    
    Also removed debug messages in packet handling path which otherwise
    causes issues if DEBUG is enabled.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 5e848e4c5d77438e126c97702ec3bea477f550a9
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:51 2017 +0530

    net: thunderx: Optimize RBDR descriptor handling
    
    Receive buffer's physical address or iova will anyway not
    go beyond 49bits, since it is the max supported HW address.
    As per perf, updating bitfields i.e buf_addr:42 in RBDR
    descriptor entry consumes lots of cpu cycles, hence changed
    it to a 64bit field with alignment requirements taken care of.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 5836b4429777bf57ca8fc02b154263aa54d97508
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:50 2017 +0530

    net: thunderx: Support for page recycling
    
    Adds support for page recycling for allocating receive buffers
    to reduce cost of refilling RBDR ring. Also got rid of using
    compound pages when pagesize is 4K, only order-0 pages now.
    
    Only page is recycled, DMA mappings still needs to be done for
    every receive buffer allocated due to following constraints
    - Cannot have just one receive buffer per 64KB page.
    - There is just one buffer ring shared across 8 Rx queues, so
      buffers of same page can go to any Rx queue.
    - HW gives buffer address where packet has been DMA'ed and not
      the index into buffer ring.
    This makes it not possible to resue DMA mapping info. So unfortunately
    have to go through costly mapping route for every buffer.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit ee0d8d8482345ff97a75a7d747efc309f13b0d80
Author: Dan Carpenter <dan.carpenter@oracle.com>
Date:   Tue May 2 13:58:53 2017 +0300

    ipx: call ipxitf_put() in ioctl error path
    
    We should call ipxitf_put() if the copy_to_user() fails.
    
    Reported-by: 李强 <liqiang6-s@360.cn>
    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 9da3242e6a83b6f315aa9c394c939da8e4ad7774
Author: Jiri Pirko <jiri@mellanox.com>
Date:   Tue May 2 10:12:00 2017 +0200

    net: sched: add helpers to handle extended actions
    
    Jump is now the only one using value action opcode. This is going to
    change soon. So introduce helpers to work with this. Convert TC_ACT_JUMP.
    
    This also fixes the TC_ACT_JUMP check, which is incorrectly done as a
    bit check, not a value check.
    
    Fixes: e0ee84ded796 ("net sched actions: Complete the JUMPX opcode")
    Signed-off-by: Jiri Pirko <jiri@mellanox.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 8d3f87d8cd0a16c58ae7e4410938528866c1c0db
Author: sudarsana.kalluru@cavium.com <sudarsana.kalluru@cavium.com>
Date:   Tue May 2 01:11:03 2017 -0700

    qed*: Fix issues in the ptp filter config implementation.
    
    PTP hardware filter configuration performed by the driver for a given
    user requested config is not correct for some of the PTP modes.
    Following changes are needed for PTP config-filter implementation.
     1. NIG_REG_TX_PTP_EN register - Bits 0/1/2 respectively enables
        TimeSync/"V1 frame format support"/"V2 frame format support" on
        the TX side. Set the associated bits based on the user request.
     2. ptp4l application fails to operate in Peer Delay mode. Following
        changes are needed to fix this,
        a. Driver should enable (set to 0) DA #1-related bits for IPv4,
           IPv6 and MAC destination addresses in these registers:
             NIG_REG_TX_LLH_PTP_RULE_MASK
             NIG_REG_LLH_PTP_RULE_MASK
        b. NIG_REG_LLH_PTP_PARAM_MASK/NIG_REG_TX_LLH_PTP_PARAM_MASK should
           be set to 0x0 in all modes.
    
    Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 461eec12012c29b66525c270208d30be8f6da8e7
Author: sudarsana.kalluru@cavium.com <sudarsana.kalluru@cavium.com>
Date:   Tue May 2 01:11:02 2017 -0700

    qede: Fix concurrency issue in PTP Tx path processing.
    
    PTP Tx timestamping data structures are not protected against the
    concurrent access in the Tx paths. Protecting the same using atomic
    bit locks.
    
    Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 212c7fd614377fef4415d94856a59e9f484aa439
Author: Jan Kiszka <jan.kiszka@siemens.com>
Date:   Tue May 2 09:58:00 2017 +0200

    stmmac: Add support for SIMATIC IOT2000 platform
    
    The IOT2000 is industrial controller platform, derived from the Intel
    Galileo Gen2 board. The variant IOT2020 comes with one LAN port, the
    IOT2040 has two of them. They can be told apart based on the board asset
    tag in the DMI table.
    
    Based on patch by Sascha Weisenberger.
    
    Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
    Signed-off-by: Sascha Weisenberger <sascha.weisenberger@siemens.com>
    Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 412b65d15a7f8a93794653968308fc100f2aa87c
Author: Timmy Li <lixiaoping3@huawei.com>
Date:   Tue May 2 10:46:52 2017 +0800

    net: hns: fix ethtool_get_strings overflow in hns driver
    
    hns_get_sset_count() returns HNS_NET_STATS_CNT and the data space allocated
    is not enough for ethtool_get_strings(), which will cause random memory
    corruption.
    
    When SLAB and DEBUG_SLAB are both enabled, memory corruptions like the
    the following can be observed without this patch:
    [   43.115200] Slab corruption (Not tainted): Acpi-ParseExt start=ffff801fb0b69030, len=80
    [   43.115206] Redzone: 0x9f911029d006462/0x5f78745f31657070.
    [   43.115208] Last user: [<5f7272655f746b70>](0x5f7272655f746b70)
    [   43.115214] 010: 70 70 65 31 5f 74 78 5f 70 6b 74 00 6b 6b 6b 6b  ppe1_tx_pkt.kkkk
    [   43.115217] 030: 70 70 65 31 5f 74 78 5f 70 6b 74 5f 6f 6b 00 6b  ppe1_tx_pkt_ok.k
    [   43.115218] Next obj: start=ffff801fb0b69098, len=80
    [   43.115220] Redzone: 0x706d655f6f666966/0x9f911029d74e35b.
    [   43.115229] Last user: [<ffff0000084b11b0>](acpi_os_release_object+0x28/0x38)
    [   43.115231] 000: 74 79 00 6b 6b 6b 6b 6b 70 70 65 31 5f 74 78 5f  ty.kkkkkppe1_tx_
    [   43.115232] 010: 70 6b 74 5f 65 72 72 5f 63 73 75 6d 5f 66 61 69  pkt_err_csum_fai
    
    Signed-off-by: Timmy Li <lixiaoping3@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit a9f11f963a546fea9144f6a6d1a307e814a387e7
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon May 1 15:29:48 2017 -0700

    tcp: fix wraparound issue in tcp_lp
    
    Be careful when comparing tcp_time_stamp to some u32 quantity,
    otherwise result can be surprising.
    
    Fixes: 7c106d7e782b ("[TCP]: TCP Low Priority congestion control")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit ddc665a4bb4b728b4e6ecec8db1b64efa9184b9c
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Tue May 2 20:34:54 2017 +0200

    bpf, arm64: fix jit branch offset related to ldimm64
    
    When the instruction right before the branch destination is
    a 64 bit load immediate, we currently calculate the wrong
    jump offset in the ctx->offset[] array as we only account
    one instruction slot for the 64 bit load immediate although
    it uses two BPF instructions. Fix it up by setting the offset
    into the right slot after we incremented the index.
    
    Before (ldimm64 test 1):
    
      [...]
      00000020:  52800007  mov w7, #0x0 // #0
      00000024:  d2800060  mov x0, #0x3 // #3
      00000028:  d2800041  mov x1, #0x2 // #2
      0000002c:  eb01001f  cmp x0, x1
      00000030:  54ffff82  b.cs 0x00000020
      00000034:  d29fffe7  mov x7, #0xffff // #65535
      00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
      0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
      00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
      00000044:  d29dddc7  mov x7, #0xeeee // #61166
      00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
      0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
      00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
      [...]
    
    After (ldimm64 test 1):
    
      [...]
      00000020:  52800007  mov w7, #0x0 // #0
      00000024:  d2800060  mov x0, #0x3 // #3
      00000028:  d2800041  mov x1, #0x2 // #2
      0000002c:  eb01001f  cmp x0, x1
      00000030:  540000a2  b.cs 0x00000044
      00000034:  d29fffe7  mov x7, #0xffff // #65535
      00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
      0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
      00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
      00000044:  d29dddc7  mov x7, #0xeeee // #61166
      00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
      0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
      00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
      [...]
    
    Also, add a couple of test cases to make sure JITs pass
    this test. Tested on Cavium ThunderX ARMv8. The added
    test cases all pass after the fix.
    
    Fixes: 8eee539ddea0 ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()")
    Reported-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    Cc: Xi Wang <xi.wang@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 85f68fe89832057584a9e66e1e7e53d53e50faff
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Mon May 1 02:57:20 2017 +0200

    bpf, arm64: implement jiting of BPF_XADD
    
    This work adds BPF_XADD for BPF_W/BPF_DW to the arm64 JIT and therefore
    completes JITing of all BPF instructions, meaning we can thus also remove
    the 'notyet' label and do not need to fall back to the interpreter when
    BPF_XADD is used in a program!
    
    This now also brings arm64 JIT in line with x86_64, s390x, ppc64, sparc64,
    where all current eBPF features are supported.
    
    BPF_W example from test_bpf:
    
      .u.insns_int = {
        BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
        BPF_ST_MEM(BPF_W, R10, -40, 0x10),
        BPF_STX_XADD(BPF_W, R10, R0, -40),
        BPF_LDX_MEM(BPF_W, R0, R10, -40),
        BPF_EXIT_INSN(),
      },
    
      [...]
      00000020:  52800247  mov w7, #0x12 // #18
      00000024:  928004eb  mov x11, #0xffffffffffffffd8 // #-40
      00000028:  d280020a  mov x10, #0x10 // #16
      0000002c:  b82b6b2a  str w10, [x25,x11]
      // start of xadd mapping:
      00000030:  928004ea  mov x10, #0xffffffffffffffd8 // #-40
      00000034:  8b19014a  add x10, x10, x25
      00000038:  f9800151  prfm pstl1strm, [x10]
      0000003c:  885f7d4b  ldxr w11, [x10]
      00000040:  0b07016b  add w11, w11, w7
      00000044:  880b7d4b  stxr w11, w11, [x10]
      00000048:  35ffffab  cbnz w11, 0x0000003c
      // end of xadd mapping:
      [...]
    
    BPF_DW example from test_bpf:
    
      .u.insns_int = {
        BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
        BPF_ST_MEM(BPF_DW, R10, -40, 0x10),
        BPF_STX_XADD(BPF_DW, R10, R0, -40),
        BPF_LDX_MEM(BPF_DW, R0, R10, -40),
        BPF_EXIT_INSN(),
      },
    
      [...]
      00000020:  52800247  mov w7,  #0x12 // #18
      00000024:  928004eb  mov x11, #0xffffffffffffffd8 // #-40
      00000028:  d280020a  mov x10, #0x10 // #16
      0000002c:  f82b6b2a  str x10, [x25,x11]
      // start of xadd mapping:
      00000030:  928004ea  mov x10, #0xffffffffffffffd8 // #-40
      00000034:  8b19014a  add x10, x10, x25
      00000038:  f9800151  prfm pstl1strm, [x10]
      0000003c:  c85f7d4b  ldxr x11, [x10]
      00000040:  8b07016b  add x11, x11, x7
      00000044:  c80b7d4b  stxr w11, x11, [x10]
      00000048:  35ffffab  cbnz w11, 0x0000003c
      // end of xadd mapping:
      [...]
    
    Tested on Cavium ThunderX ARMv8, test suite results after the patch:
    
      No JIT:   [ 3751.855362] test_bpf: Summary: 311 PASSED, 0 FAILED, [0/303 JIT'ed]
      With JIT: [ 3573.759527] test_bpf: Summary: 311 PASSED, 0 FAILED, [303/303 JIT'ed]
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 586f8525979ad9574bf61637fd58c98d5077f29d
Author: David Miller <davem@davemloft.net>
Date:   Tue May 2 11:36:45 2017 -0400

    bpf: Align packet data properly in program testing framework.
    
    Make sure we apply NET_IP_ALIGN when reserving headroom for SKB
    and XDP test runs, just like a real driver would.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>

commit 78e5227237cae9172dd50c3ebb08d4fb31530676
Author: David Miller <davem@davemloft.net>
Date:   Tue May 2 11:36:33 2017 -0400

    bpf: Do not dereference user pointer in bpf_test_finish().
    
    Instead, pass the kattr in which has a kernel side copy of this
    data structure from userspace already.
    
    Fix based upon a suggestion from Alexei Starovoitov.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>

commit 4e9c3a667135799d50f2778a8a8dae2ca13aafd0
Author: David S. Miller <davem@davemloft.net>
Date:   Tue May 2 07:52:01 2017 -0700

    selftests: bpf: Use bpf_endian.h in test_xdp.c
    
    This fixes the testcase on big-endian.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 48d0e023af9799cd7220335baf8e3ba61eeafbeb
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: fix the RCU locking for the auditd_connection structure
    
    Cong Wang correctly pointed out that the RCU read locking of the
    auditd_connection struct was wrong, this patch correct this by
    adopting a more traditional, and correct RCU locking model.
    
    This patch is heavily based on an earlier prototype by Cong Wang.
    
    Cc: <stable@vger.kernel.org> # 4.11.x-
    Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
    Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 8cc96382d9a7fe1746286670dd5140c3b12638ae
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: use kmem_cache to manage the audit_buffer cache
    
    The audit subsystem implemented its own buffer cache mechanism which
    is a bit silly these days when we could use the kmem_cache construct.
    
    Some credit is due to Florian Westphal for originally proposing that
    we remove the audit cache implementation in favor of simple
    kmalloc()/kfree() calls, but I would rather have a dedicated slab
    cache to ease debugging and future stats/performance work.
    
    Cc: Florian Westphal <fw@strlen.de>
    Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 2115bb250f260089743e26decfb5f271ba71ca37
Author: Deepa Dinamani <deepa.kernel@gmail.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: Use timespec64 to represent audit timestamps
    
    struct timespec is not y2038 safe.
    Audit timestamps are recorded in string format into
    an audit buffer for a given context.
    These mark the entry timestamps for the syscalls.
    Use y2038 safe struct timespec64 to represent the times.
    The log strings can handle this transition as strings can
    hold upto 1024 characters.
    
    Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
    Reviewed-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Paul Moore <paul@paul-moore.com>
    Acked-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit b6c7c115c2ce679ac536f0adf0ff518fcd939196
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: store the auditd PID as a pid struct instead of pid_t
    
    This is arguably the right thing to do, and will make it easier when
    we start supporting multiple audit daemons in different namespaces.
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 45a0642b4d021a2f50d5db9c191b5bfe60bfa1c7
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: kernel generated netlink traffic should have a portid of 0
    
    We were setting the portid incorrectly in the netlink message headers,
    fix that to always be 0 (nlmsg_pid = 0).
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>
    Reviewed-by: Richard Guy Briggs <rgb@redhat.com>

commit a9d1620877748375cf60b43ef3fa5f61ab6d9f24
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: combine audit_receive() and audit_receive_skb()
    
    There is no reason to have both of these functions, combine the two.
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>
    Reviewed-by: Richard Guy Briggs <rgb@redhat.com>

commit bd120ded6a6af61ad342a8a95b36b64bd1e2f9e6
Author: Elena Reshetova <elena.reshetova@intel.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: convert audit_watch.count from atomic_t to refcount_t
    
    refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.
    
    Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
    Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: David Windsor <dwindsor@gmail.com>
    [PM: fix subject line, add #include]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 9d2378f8c8f1a3fcfab681fd90c139d90dca7b69
Author: Elena Reshetova <elena.reshetova@intel.com>
Date:   Tue May 2 10:16:04 2017 -0400

    audit: convert audit_tree.count from atomic_t to refcount_t
    
    refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.
    
    Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
    Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: David Windsor <dwindsor@gmail.com>
    [PM: fix subject line, add #include]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 2173c519d5e912a6e2934bb04255fcd36c1591c8
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Tue May 2 10:16:04 2017 -0400

    audit: normalize NETFILTER_PKT
    
    Eliminate flipping in and out of message fields, dropping fields in the
    process.
    
    Sample raw message format IPv4 UDP:
    type=NETFILTER_PKT msg=audit(1487874761.386:228):  mark=0xae8a2732 saddr=127.0.0.1 daddr=127.0.0.1 proto=17^]
    Sample raw message format IPv6 ICMP6:
    type=NETFILTER_PKT msg=audit(1487874761.381:227):  mark=0x223894b7 saddr=::1 daddr=::1 proto=58^]
    
    Issue: https://github.com/linux-audit/audit-kernel/issues/11
    Test case: https://github.com/linux-audit/audit-testsuite/issues/43
    
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 0cb88b6ff054ccfa30e0fd7f7b42ee9f088db432
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Tue May 2 10:16:04 2017 -0400

    netfilter: use consistent ipv4 network offset in xt_AUDIT
    
    Even though the skb->data pointer has been moved from the link layer
    header to the network layer header, use the same method to calculate the
    offset in ipv4 and ipv6 routines.
    
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    [PM: munged subject line]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit f6276ac95bde4312251535904af32b1de9d54949
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Tue May 2 10:16:04 2017 -0400

    audit: log module name on delete_module
    
    When a sysadmin wishes to monitor module unloading with a syscall rule such as:
     -a always,exit -F arch=x86_64 -S delete_module -F key=mod-unload
    the SYSCALL record doesn't tell us what module was requested for unloading.
    
    Use the new KERN_MODULE auxiliary record to record it.
    The SYSCALL record result code will list the return code.
    
    See: https://github.com/linux-audit/audit-kernel/issues/37
        https://github.com/linux-audit/audit-kernel/issues/7
        https://github.com/linux-audit/audit-kernel/wiki/RFE-Module-Load-Record-Format
    
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Acked-by: Jessica Yu <jeyu@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 9aab4f4ea7a4ca80ec3e0269ce2eb71a24f6fef9
Author: Nicholas Mc Guire <der.herr@hofr.at>
Date:   Tue May 2 10:16:04 2017 -0400

    audit: remove unnecessary semicolon in audit_watch_handle_event()
    
    The excess ; after the closing parenthesis is just code-noise it has no
    and can be removed.
    
    Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
    [PM: tweaked subject line]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit b5239fba69949a44290d4af517fc1c2eff3e36f6
Author: Nicholas Mc Guire <der.herr@hofr.at>
Date:   Tue May 2 10:16:04 2017 -0400

    audit: remove unnecessary semicolon in audit_mark_handle_event()
    
    The excess ; after the closing parenthesis is just code-noise it has no
    and can be removed.
    
    Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
    [PM: tweaked subject line]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit b7a84deaf8d1b0e62b437a290a40d6380975f126
Author: Nicholas Mc Guire <der.herr@hofr.at>
Date:   Tue May 2 10:16:03 2017 -0400

    audit: remove unnecessary semicolon in audit_field_valid()
    
    The excess ; after the closing parenthesis is just code-noise it has no
    and can be removed.
    
    Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
    [PM: tweak subject line]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit b5d60989c6f7501af72cb65893c02621dd16fd84
Author: Jakub Kicinski <jakub.kicinski@netronome.com>
Date:   Mon May 1 15:53:43 2017 -0700

    xdp: fix parameter kdoc for extack
    
    Fix kdoc parameter spelling from extact to extack.
    
    Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit eb6211d3606971a957fea28f7532687f9d0f93f2
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Tue May 2 00:47:09 2017 +0200

    bpf, samples: fix build warning in cookie_uid_helper_example
    
    Fix the following warnings triggered by 51570a5ab2b7 ("A Sample of
    using socket cookie and uid for traffic monitoring"):
    
      In file included from /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:54:0:
      /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c: In function 'prog_load':
      /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:119:27: warning: overflow in implicit constant conversion [-Woverflow]
         -32 + offsetof(struct stats, uid)),
                               ^
      /home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
       .off   = OFF,     \
                ^
      /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:121:27: warning: overflow in implicit constant conversion [-Woverflow]
         -32 + offsetof(struct stats, packets), 1),
                               ^
      /home/foo/net-next/samples/bpf/libbpf.h:155:12: note: in definition of macro 'BPF_ST_MEM'
       .off   = OFF,     \
                ^
      /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:129:27: warning: overflow in implicit constant conversion [-Woverflow]
         -32 + offsetof(struct stats, bytes)),
                               ^
      /home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
       .off   = OFF,     \
                ^
      HOSTLD  /home/foo/net-next/samples/bpf/per_socket_stats_example
    
    Fixes: 51570a5ab2b7 ("A Sample of using socket cookie and uid for traffic monitoring")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e3bf4c61da801c7967d0efff0c3f6b22d2c0e544
Author: David S. Miller <davem@davemloft.net>
Date:   Mon May 1 20:26:02 2017 -0700

    sparc64: Fix BPF JIT wrt. branches and ldimm64 instructions.
    
    Like other JITs, sparc64 maintains an array of instruction offsets but
    stores the entries off by one.  This is done because jumps to the
    exit block are indexed to one past the last BPF instruction.
    
    So if we size the array by the program length, we need to record
    the previous instruction in order to stay within the array bounds.
    
    This is explained in ARM JIT commit 8eee539ddea0 ("arm64: bpf: fix
    out-of-bounds read in bpf2a64_offset()").
    
    But this scheme requires a little bit of careful handling when
    the instruction before the branch destination is a 64-bit load
    immediate.  It takes up 2 BPF instruction slots.
    
    Therefore, we have to fill in the array entry for the second
    half of the 64-bit load immediate instruction rather than for
    the one for the beginning of that instruction.
    
    Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit a25fb8508c1b80dce742dbeaa4d75a1e9f2c5617
Author: Sergei Trofimovich <slyfox@gentoo.org>
Date:   Mon May 1 11:51:55 2017 -0700

    ia64: fix module loading for gcc-5.4
    
    Starting from gcc-5.4+ gcc generates MLX instructions in more cases to
    refer local symbols:
    
        https://gcc.gnu.org/PR60465
    
    That caused ia64 module loader to choke on such instructions:
    
        fuse: invalid slot number 1 for IMM64
    
    The Linux kernel used to handle only case where relocation pointed to
    slot=2 instruction in the bundle.  That limitation was fixed in linux by
    commit 9c184a073bfd ("[IA64] Fix 2.6 kernel for the new ia64 assembler")
    See
    
        http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433
    
    This change lifts the slot=2 restriction from the kernel module loader.
    
    Tested on 'fuse' and 'btrfs' kernel modules.
    
    Cc: Markus Elfring <elfring@users.sourceforge.net>
    Cc: H J Lu <hjl.tools@gmail.com>
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Bug: https://bugs.gentoo.org/601014
    Tested-by: Émeric MASCHINO <emeric.maschino@gmail.com>
    Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit 48e75b430670ebdbb00ba008e1d3690f61ab9824
Author: Florian Westphal <fw@strlen.de>
Date:   Mon May 1 22:18:01 2017 +0200

    rhashtable: compact struct rhashtable_params
    
    By using smaller datatypes this (rather large) struct shrinks considerably
    (80 -> 48 bytes on x86_64).
    
    As this is embedded in other structs, this also rerduces size of several
    others, e.g. cls_fl_head or nft_hash.
    
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e06422c43968916dc945018fd9220f60866470b1
Author: David S. Miller <davem@davemloft.net>
Date:   Mon May 1 12:58:21 2017 -0700

    bpf: Include bpf_endian.h in test_progs.c too.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit bc1bafbbe9b3558d7789ff151ef4f185b6ad21f3
Author: David S. Miller <davem@davemloft.net>
Date:   Mon May 1 12:43:49 2017 -0700

    bpf: Move endianness BPF helpers out of bpf_util.h
    
    We do not want to include things like stdio.h and friends into
    eBPF program builds.  bpf_util.h is for host compiled programs,
    so eBPF C-code helpers don't really belong there.
    
    Add a new bpf_endian.h as a quick fix for this for now.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 310b4816a5d8082416b4ab83e5a7b3cb92883a4d
Author: Tejun Heo <tj@kernel.org>
Date:   Mon May 1 15:24:14 2017 -0400

    cgroup: mark cgroup_get() with __maybe_unused
    
    a590b90d472f ("cgroup: fix spurious warnings on cgroup_is_dead() from
    cgroup_sk_alloc()") converted most cgroup_get() usages to
    cgroup_get_live() leaving cgroup_sk_alloc() the sole user of
    cgroup_get().  When !CONFIG_SOCK_CGROUP_DATA, this ends up triggering
    unused warning for cgroup_get().
    
    Silence the warning by adding __maybe_unused to cgroup_get().
    
    Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Link: http://lkml.kernel.org/r/20170501145340.17e8ef86@canb.auug.org.au
    Signed-off-by: Tejun Heo <tj@kernel.org>

commit 5b8481fa42ac58484d633b558579e302aead64c1
Author: David S. Miller <davem@davemloft.net>
Date:   Mon May 1 15:10:20 2017 -0400

    ipv6: Need to export ipv6_push_frag_opts for tunneling now.
    
    Since that change also made the nfrag function not necessary
    for exports, remove it.
    
    Fixes: 89a23c8b528b ("ip6_tunnel: Fix missing tunnel encapsulation limit option")
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 931d18223998c5360b960d1ce247579f6152ad8f
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:27 2017 -0400

    net: dsa: mv88e6xxx: add VTU support for 88E6390
    
    The 6390 family of chips use only 2 of the 3 VTU Data registers to pack
    the MemberTag and PortState VLAN data. This means that they must be
    written or read before or after each VTU/STU operations.
    
    Implement this variant to add support for VTU with such chips. These
    chips have a 13th bit for the VID thus set their max_vid to 8191.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 1ac758648b574d3d01a648fc7018fc8b0bb7454a
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:26 2017 -0400

    net: dsa: mv88e6xxx: support the VTU Page bit
    
    Newer chips such as the 88E6390 have a VTU Page bit in the VTU VID
    register to specify a 13th bit for the VID. This can be used to support
    8K VLANs.
    
    When dumping the whole VTU, all VID bits must be set to one, including
    this VTU Page bit. Add support for VID greater than 4095.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 567aa59a8b055bb5853bc7e5d5f8ac191216c9c7
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:25 2017 -0400

    net: dsa: mv88e6xxx: simplify VTU entry getter
    
    Make the code which fetches or initializes a new VTU entry more concise.
    This allows us the get rid of the old underscore prefix naming.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit bf7d71c0451776143cdbd9a42a5bcbd6478da3c8
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:24 2017 -0400

    net: dsa: mv88e6xxx: make VTU helpers static
    
    Now that we have chip operations for VTU accesses, mark all helpers from
    global1_vtu.c as static. Only the various implementations of the
    GetNext, LoadPurge and Flush operations need to be exposed.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 0ad5daf6ba80af4a8d72b4284079357c4e3b9e4a
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:23 2017 -0400

    net: dsa: mv88e6xxx: add VTU Load/Purge operation
    
    Add a new vtu_loadpurge operation to the chip info structure to differ
    the various implementations of the VTU accesses.
    
    Now that the STU handling is abstracted behind VTU operations, kill the
    obsolete MV88E6XXX_FLAG_STU flag.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit f1394b78a602bae124a9b8473465ba48f4a5d5b2
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:22 2017 -0400

    net: dsa: mv88e6xxx: add VTU GetNext operation
    
    Add a new vtu_getnext operation to the chip info structure to differ the
    various implementations of the VTU accesses.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 021e64ff7676ad5183c1845d4b316df20175702a
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:21 2017 -0400

    net: dsa: mv88e6xxx: load STU entry with VTU entry
    
    Now that the code writes both VTU and STU data when loading a VTU entry,
    load the corresponding STU entry at the same time.
    
    This allows us to get rid of the STU management in the
    _mv88e6xxx_vtu_new helper and thus remove the separate implementations
    of STU Load/Purge and STU GetNext, as well as the unused family checks.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit ef6fcea37f014ec54a0a9f7eaecc78cdb6ffc71e
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:20 2017 -0400

    net: dsa: mv88e6xxx: get STU entry on VTU GetNext
    
    Now that the code reads both VTU and STU data on VTU GetNext operation,
    fetch the STU entry data of a VTU entry at the same time.
    
    The STU data bits are masked with the VTU data bits and they are now all
    read at the same time a VTU GetNext operation is issued.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 66a8e1f93319b8e3f5b6e81c06a2534c1491157c
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:19 2017 -0400

    net: dsa: mv88e6xxx: move STU GetNext operation
    
    Extract the generic portion of code to issue an STU GetNext operation,
    which will be used in other implementations.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit c499a64f349d063d8cdb40c0b96e84c35bbc414c
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:18 2017 -0400

    net: dsa: mv88e6xxx: move VTU Data accessors
    
    The code to access the VTU Data registers currently only supports the
    88E6185 family and alike: 2-bit membership adjacent to 2-bit port state.
    
    Even though the 88E6352 family introduced an indirect table to program
    the VLAN Spanning Tree states, the usage of the VTU Data registers
    remains the same regardless the VTU or STU operation.
    
    Now that the mv88e6xxx_vtu_entry structure contains both port membership
    and states data, factorize the code to access them in global1_vtu.c.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit f169e5ee5f29f81c1c8d647fa6fb5387ee793131
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:17 2017 -0400

    net: dsa: mv88e6xxx: move generic VTU GetNext
    
    Even though every switch model has a different way to access the VTU
    Data bits, the base implementation of the VTU GetNext operation remains
    the same: wait, write the first VID to iterate from, start the
    operation, and read the next VID.
    
    Move this generic implementation into global1_vtu.c and abstract the
    handling of the start VID (similarly to the ATU GetNext implementation),
    before introducing a new chip operation for specific chips.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 3afb4bde6fe8f2d43b2153cc2672d07477729cca
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:16 2017 -0400

    net: dsa: mv88e6xxx: move VTU VID accessors
    
    Add helpers to access the VTU VID register in the global1_vtu.c file.
    
    At the same time, move mv88e6xxx_g1_vtu_vid_write at the beginning of
    _mv88e6xxx_vtu_loadpurge, which adds no functional changes but makes
    future patches simpler.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit d2ca1ea18db6a90475b983e65e8435632fe3d57e
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:15 2017 -0400

    net: dsa: mv88e6xxx: move VTU SID accessors
    
    Add helpers to access the VTU SID register in the global1_vtu.c file.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 8ee51f6b4f0819fd3d6a4143222be796779cf501
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:14 2017 -0400

    net: dsa: mv88e6xxx: move VTU FID accessors
    
    Add helpers to access the VTU FID register in the global1_vtu.c file.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit b486d7c95cc8336aa91813a156f9385439bde2fc
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:13 2017 -0400

    net: dsa: mv88e6xxx: move VTU flush
    
    Move the VTU flush operation to global1_vtu.c and call it from a
    mv88e6xxx_vtu_setup helper, similarly to the ATU and PVT setup.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 332aa5ccc82005de9441c1b65ca546488c7f8730
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:12 2017 -0400

    net: dsa: mv88e6xxx: move VTU Operation accessors
    
    Move the helper functions to access the Global 1 VTU Operation register
    to a new global1_vtu.c file, and get rid of the old underscore prefix
    naming convention. This file will be extended will all VTU/STU related
    code.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit bd00e053ae29d3215a9085b5c0add7298bbc417c
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:11 2017 -0400

    net: dsa: mv88e6xxx: split VTU entry data member
    
    VLAN aware Marvell chips can program 802.1Q VLAN membership as well as
    802.1s per VLAN Spanning Tree state using the same 3 VTU Data registers.
    
    Some chips such as 88E6185 use different Data registers offsets for
    ports state and membership, and program them in a single operation.
    
    Other chips such as 88E6352 use the same register layout but program
    them in distinct operations (an indirect table is used for 802.1s.)
    
    Newer chips such as 88E6390 use the same offsets for both state and
    membership in distinct operations, thus require multiple data accesses.
    
    To correctly abstract this, split the "data" structure member of
    mv88e6xxx_vtu_entry in two "state" and "member" members, before adding
    VTU support for newer chips.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 3cf3c8469f70d18f8bbcdf8361e62812ebc571cd
Author: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date:   Mon May 1 14:05:10 2017 -0400

    net: dsa: mv88e6xxx: add max VID to info
    
    Some chips don't have a VLAN Table Unit, most of them do have a 4K
    table, some others as the 88E6390 family has a 13th bit for the VID.
    
    Add a new max_vid member to the info structure, used to check the
    presence of a VTU as well as the value used to iterate from in VTU
    GetNext operations.
    
    This makes the MV88E6XXX_FLAG_VTU obsolete, thus remove it.
    
    Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 152afb9b45a8af4a93699a15925c392a28182a26
Author: Ilan Tayari <ilant@mellanox.com>
Date:   Sun Apr 30 16:51:19 2017 +0300

    xfrm: Indicate xfrm_state offload errors
    
    Current code silently ignores driver errors when configuring
    IPSec offload xfrm_state, and falls back to host-based crypto.
    
    Fail the xfrm_state creation if the driver has an error, because
    the NIC offloading was explicitly requested by the user program.
    
    This will communicate back to the user that there was an error.
    
    Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
    Signed-off-by: Ilan Tayari <ilant@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 67d349ed603d5ce4a6f1722b1736e2bcef0e8690
Author: Ilan Tayari <ilant@mellanox.com>
Date:   Sun Apr 30 16:34:38 2017 +0300

    net/esp4: Fix invalid esph pointer crash
    
    Both esp_output and esp_xmit take a pointer to the ESP header
    and place it in esp_info struct prior to calling esp_output_head.
    
    Inside esp_output_head, the call to esp_output_udp_encap
    makes sure to update the pointer if it gets invalid.
    However, if esp_output_head itself calls skb_cow_data, the
    pointer is not updated and stays invalid, causing a crash
    after esp_output_head returns.
    
    Update the pointer if it becomes invalid in esp_output_head
    
    Fixes: fca11ebde3f0 ("esp4: Reorganize esp_output")
    Signed-off-by: Ilan Tayari <ilant@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 89a23c8b528bd2c89f3981573d6cd7d23840c8a6
Author: Craig Gallek <cgallek@google.com>
Date:   Wed Apr 26 14:37:45 2017 -0400

    ip6_tunnel: Fix missing tunnel encapsulation limit option
    
    The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
    IPV6_TLV_PADN options when an encapsulation limit is defined (the
    default is a limit of 4).  An MTU adjustment is done to account for
    these options as well.  However, the options are never present in the
    generated packets.
    
    The issue appears to be a subtlety between IPV6_DSTOPTS and
    IPV6_RTHDRDSTOPTS defined in RFC 3542.  When the IPIP tunnel driver was
    written, the encap limit options were included as IPV6_RTHDRDSTOPTS in
    dst0opt of struct ipv6_txoptions.  Later, ipv6_push_nfrags_opts was
    (correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS
    are to be used.  This caused the options to no longer be included in v6
    encapsulated packets.
    
    The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions)
    instead.  IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement.
    
    Fixes: 1df64a8569c7: ("[IPV6]: Add ip6ip6 tunnel driver.")
    Fixes: 333fad5364d6: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)")
    Signed-off-by: Craig Gallek <kraig@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit a6a5993243550b09f620941dea741b7421fdf79c
Author: Ding Tianhong <dingtianhong@huawei.com>
Date:   Sat Apr 29 10:38:48 2017 +0800

    iov_iter: don't revert iov buffer if csum error
    
    The patch 327868212381 (make skb_copy_datagram_msg() et.al. preserve
    ->msg_iter on error) will revert the iov buffer if copy to iter
    failed, but it didn't copy any datagram if the skb_checksum_complete
    error, so no need to revert any data at this place.
    
    v2: Sabrina notice that return -EFAULT when checksum error is not correct
        here, it would confuse the caller about the return value, so fix it.
    
    Fixes: 327868212381 ("make skb_copy_datagram_msg() et.al. preserve->msg_iter on error")
    Cc: stable@vger.kernel.org # v4.11
    Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
    Acked-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit d5066c467ee3b8eb8716e776584b3953a0bb218a
Author: Liam Beguin <lbeguin@tycoint.com>
Date:   Mon May 1 11:02:01 2017 -0400

    switchdev: documentation: fix whitespace issues
    
    Figure 1 is full of whitespaces; fix it
    
    Signed-off-by: Liam Beguin <lbeguin@tycoint.com>
    Signed-off-by: Sylvain Lemieux <slemieux@tycoint.com>
    Acked-by: Ivan Vecera <ivecera@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit b1e455260c9187b16dd4ebc428b817ebac322043
Author: Ido Schimmel <idosch@mellanox.com>
Date:   Sun Apr 30 19:47:14 2017 +0300

    mlxsw: spectrum_router: Simplify VRF enslavement
    
    When a netdev is enslaved to a VRF master, its router interface (RIF)
    needs to be destroyed (if exists) and a new one created using the
    corresponding virtual router (VR).
    
    >From the driver's perspective, the above is equivalent to an inetaddr
    event sent for this netdev. Therefore, when a port netdev (or its
    uppers) are enslaved to a VRF master, call the same function that
    would've been called had a NETDEV_UP was sent for this netdev in the
    inetaddr notification chain.
    
    This patch also fixes a bug when a LAG netdev with an existing RIF is
    enslaved to a VRF. Before this patch, each LAG port would drop the
    reference on the RIF, but would re-join the same one (in the wrong VR)
    soon after. With this patch, the corresponding RIF is first destroyed
    and a new one is created using the correct VR.
    
    Fixes: 7179eb5acd59 ("mlxsw: spectrum_router: Add support for VRFs")
    Signed-off-by: Ido Schimmel <idosch@mellanox.com>
    Reviewed-by: Jiri Pirko <jiri@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 07ff2ed03bb874a5bb97361a5a07ee28f1afa574
Author: Mintz, Yuval <Yuval.Mintz@cavium.com>
Date:   Sun Apr 30 12:14:44 2017 +0300

    qed: Prevent warning without CONFIG_RFS_ACCEL
    
    After removing the PTP related initialization from slowpath start,
    the remaining PTT entry is required only in case CONFIG_RFS_ACCEL is set.
    Otherwise, it leads to a warning due to it being unused.
    
    Fixes: d179bd1699fc ("qed: Acquire/release ptt_ptp lock when enabling/disabling PTP")
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 20b1bd96e9f4feeffc9206284df3c6a4438e9ca8
Author: Ram Amrani <Ram.Amrani@cavium.com>
Date:   Sun Apr 30 11:49:10 2017 +0300

    qed: output the DPM status and WID count
    
    Output to the RDMA driver whether DPM mode is enabled or disabled in
    the HW and if so what is the number of WIDs it supports
    
    Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 107392b75ffc96a2418d5382e52b08c598575e1b
Author: Ram Amrani <Ram.Amrani@cavium.com>
Date:   Sun Apr 30 11:49:09 2017 +0300

    qed: align DPI configuration to HW requirements
    
    When calculating doorbell BAR partitioning round up the number of
    CPUs to the nearest power of 2 so the size of the DPI (per user
    section) configured in the hardware will be stored properly and
    not truncated.
    
    Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e015d58b44a93a3fd89ed910d68659dfdc57237c
Author: Ram Amrani <Ram.Amrani@cavium.com>
Date:   Sun Apr 30 11:49:08 2017 +0300

    qed: verify RoCE resource bitmaps are released
    
    Add mechanism to verify RoCE resources are released prior to freeing the
    bitmaps. If this is not the case, print what resources were not released.
    
    Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 105361943d3036f00f70a6621983b98673839591
Author: Ram Amrani <Ram.Amrani@cavium.com>
Date:   Sun Apr 30 11:49:07 2017 +0300

    qed: add error handling flow to TID deregistratin posting failure
    
    If the posting of the ramrod for the purpose of TID deregistration
    fails, abort the deregistration operation without using the FW's
    return code.
    
    Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit ba0154e96449a5be3360d3a07bc4b6d476e2667e
Author: Ram Amrani <Ram.Amrani@cavium.com>
Date:   Sun Apr 30 11:49:06 2017 +0300

    qed: remove unused SQ error state
    
    The internal RoCE SQE QP state isn't being used. Instead we mark the
    QP as in regular error state.
    
    Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 793ea8a9c7b4b4383348a717cb5c86c1bbdba30a
Author: Ram Amrani <Ram.Amrani@cavium.com>
Date:   Sun Apr 30 11:49:05 2017 +0300

    qed: configure the RoCE max message size
    
    Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 332270fdc8b6fba07d059a9ad44df9e1a2ad4529
Author: Yonghong Song <yhs@fb.com>
Date:   Sat Apr 29 22:52:42 2017 -0700

    bpf: enhance verifier to understand stack pointer arithmetic
    
    llvm 4.0 and above generates the code like below:
    ....
    440: (b7) r1 = 15
    441: (05) goto pc+73
    515: (79) r6 = *(u64 *)(r10 -152)
    516: (bf) r7 = r10
    517: (07) r7 += -112
    518: (bf) r2 = r7
    519: (0f) r2 += r1
    520: (71) r1 = *(u8 *)(r8 +0)
    521: (73) *(u8 *)(r2 +45) = r1
    ....
    and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
    This is because verifier marks register r2 as unknown value after #519
    where r2 is a stack pointer and r1 holds a constant value.
    
    Teach verifier to recognize "stack_ptr + imm" and
    "stack_ptr + reg with const val" as valid stack_ptr with new offset.
    
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 2faf26575350e49c1e242b7a464e9302f78b9b15
Author: Karim Eshapa <karim.eshapa@gmail.com>
Date:   Mon May 1 15:58:08 2017 +0200

    benet: Use time_before_eq for time comparison
    
    Use time_before_eq for time comparison more safe and dealing
    with timer wrapping to be future-proof.
    
    Signed-off-by: Karim Eshapa <karim.eshapa@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 1a7fca63cd405fc77a9c9ce9291792f2a7d222a4
Author: Benjamin LaHaise <benjamin.lahaise@netronome.com>
Date:   Mon May 1 09:58:40 2017 -0400

    flower: check unused bits in MPLS fields
    
    Since several of the the netlink attributes used to configure the flower
    classifier's MPLS TC, BOS and Label fields have additional bits which are
    unused, check those bits to ensure that they are actually 0 as suggested
    by Jamal.
    
    Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
    Cc: David Miller <davem@davemloft.net>
    Cc: Jamal Hadi Salim <jhs@mojatatu.com>
    Cc: Simon Horman <simon.horman@netronome.com>
    Cc: Jakub Kicinski <kubakici@wp.pl>
    Cc: Jiri Pirko <jiri@resnulli.us>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit f76254a845a661ccdb9fa246ee72197f90a8d3dd
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date:   Mon May 1 11:26:20 2017 +0200

    samples/bpf: fix XDP_FLAGS_SKB_MODE detach for xdp_tx_iptunnel
    
    The xdp_tx_iptunnel program can be terminated in two ways, after
    N-seconds or via Ctrl-C SIGINT.  The SIGINT code path does not
    handle detatching the correct XDP program, in-case the program
    was attached with XDP_FLAGS_SKB_MODE.
    
    Fix this by storing the XDP flags as a global variable, which is
    available for the SIGINT handler function.
    
    Fixes: 3993f2cb983b ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 6387d0111ca4740b69a082a92fc373185af11133
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date:   Mon May 1 11:26:15 2017 +0200

    samples/bpf: fix SKB_MODE flag to be a 32-bit unsigned int
    
    The kernel side of XDP_FLAGS_SKB_MODE is unsigned, and the rtnetlink
    IFLA_XDP_FLAGS is defined as NLA_U32. Thus, userspace programs under
    samples/bpf/ should use the correct type.
    
    Fixes: 3993f2cb983b ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 9861ce039c2a4ff3e50eb7c679dadc15a8469e3f
Author: Jakub Kicinski <jakub.kicinski@netronome.com>
Date:   Sun Apr 30 21:46:48 2017 -0700

    virtio_net: make use of extended ack message reporting
    
    Try to carry error messages to the user via the netlink extended
    ack message attribute.
    
    Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit d957c0f711aaeaac6bbffd82098737ac10b7985d
Author: Jakub Kicinski <jakub.kicinski@netronome.com>
Date:   Sun Apr 30 21:46:47 2017 -0700

    nfp: make use of extended ack message reporting
    
    Try to carry error messages to the user via the netlink extended
    ack message attribute.
    
    Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit ddf9f970764f4390aba767e77fddaaced4a6760d
Author: Jakub Kicinski <jakub.kicinski@netronome.com>
Date:   Sun Apr 30 21:46:46 2017 -0700

    xdp: propagate extended ack to XDP setup
    
    Drivers usually have a number of restrictions for running XDP
    - most common being buffer sizes, LRO and number of rings.
    Even though some drivers try to be helpful and print error
    messages experience shows that users don't often consult
    kernel logs on netlink errors.  Try to use the new extended
    ack mechanism to carry the message back to user space.
    
    Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 45d9b378e85f1b00ac047626827c68589168936c
Author: Jakub Kicinski <jakub.kicinski@netronome.com>
Date:   Sun Apr 30 21:46:45 2017 -0700

    netlink: add NULL-friendly helper for setting extended ACK message
    
    As we propagate extended ack reporting throughout various paths in
    the kernel it may be that the same function is called with the
    extended ack parameter passed as NULL.  One place where that happens
    is in drivers which have a centralized reconfiguration function
    called both from ndos and from ethtool_ops.  Add a new helper for
    setting the error message in such conditions.
    
    Existing helper is left as is to encourage propagating the ext act
    fully wherever possible.  It also makes it clear in the code which
    messages may be lost due to ext ack being NULL.
    
    Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 8eeef2350453aa012d846457eb6ecd012a35d99b
Author: Liping Zhang <zlpnobody@gmail.com>
Date:   Sat Apr 29 21:59:49 2017 +0800

    netfilter: nf_ct_ext: invoke destroy even when ext is not attached
    
    For NF_NAT_MANIP_SRC, we will insert the ct to the nat_bysource_table,
    then remove it from the nat_bysource_table via nat_extend->destroy.
    
    But now, the nat extension is attached on demand, so if the nat extension
    is not attached, we will not be notified when the ct is destroyed, i.e.
    we may fail to remove ct from the nat_bysource_table.
    
    So just keep it simple, even if the extension is not attached, we will
    still invoke the related ext->destroy. And this will also preserve the
    flexibility for the future extension.
    
    Fixes: 9a08ecfe74d7 ("netfilter: don't attach a nat extension by default")
    Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit 0e72f55f3510e722e725c54678800e99853faa3b
Author: Florian Westphal <fw@strlen.de>
Date:   Thu Apr 27 16:39:43 2017 +0200

    netfilter: snmp: avoid stack size warning
    
    net/ipv4/netfilter/nf_nat_snmp_basic.c:1158:1: warning: the frame size
    of 1160 bytes is larger than 1024 bytes
    
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit 039b40ee5854dc733cf786fee4a88e240a012115
Author: Florian Westphal <fw@strlen.de>
Date:   Mon Apr 24 15:37:41 2017 +0200

    netfilter: nf_queue: only call synchronize_net twice if nf_queue is active
    
    nf_unregister_net_hook(s) can avoid a second call to synchronize_net,
    provided there is no nfqueue active in that net namespace (which is
    the common case).
    
    This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally
    this gets called during netns cleanup so no packets should be queued.
    
    For the rare case of base chain being u…
fengguang pushed a commit to 0day-ci/linux that referenced this issue May 5, 2017
GIT d979d0dd9ff332bf1106a60e3841430fc2cbd9ef

commit 3a158a62da0673db918b53ac1440845a5b64fd90
Author: James Hogan <james.hogan@imgtec.com>
Date:   Tue May 2 19:41:06 2017 +0100

    metag/uaccess: Check access_ok in strncpy_from_user
    
    The metag implementation of strncpy_from_user() doesn't validate the src
    pointer, which could allow reading of arbitrary kernel memory. Add a
    short access_ok() check to prevent that.
    
    Its still possible for it to read across the user/kernel boundary, but
    it will invariably reach a NUL character after only 9 bytes, leaking
    only a static kernel address being loaded into D0Re0 at the beginning of
    __start, which is acceptable for the immediate fix.
    
    Reported-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: James Hogan <james.hogan@imgtec.com>
    Cc: linux-metag@vger.kernel.org
    Cc: stable@vger.kernel.org

commit ec8a09fbbeff252c80daf62c7a78342003dddf9c
Author: Jon Paul Maloy <jon.maloy@ericsson.com>
Date:   Tue May 2 18:16:54 2017 +0200

    tipc: refactor function tipc_sk_recv_stream()
    
    We try to make this function more readable by improving variable names
    and comments, using more stack variables, and doing some smaller changes
    to the logics. We also rename the function to make it consistent with
    naming conventions used elsewhere in the code.
    
    Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
    Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e9f8b10101c6da3ab000a2fb17162374c9bd2c69
Author: Jon Paul Maloy <jon.maloy@ericsson.com>
Date:   Tue May 2 18:16:53 2017 +0200

    tipc: refactor function tipc_sk_recvmsg()
    
    We try to make this function more readable by improving variable names
    and comments, plus some minor changes to the logics.
    
    Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
    Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 2da711ac3f6dfa379ccac427a435a558284eab1d
Author: Marcel Holtmann <marcel@holtmann.org>
Date:   Tue May 2 12:43:31 2017 -0700

    Bluetooth: Skip vendor diagnostic configuration for HCI User Channel
    
    When the HCI User Channel access is requested, then do not try to
    undermine it with vendor diagnostic configuration. The exclusive user
    is required to configure its own vendor diagnostic in that case and
    can not rely on the host stack support.
    
    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
    Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>

commit 773225388dae15e72790d6f573e2e70e96292b6b
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:58 2017 +0530

    net: thunderx: Optimize page recycling for XDP
    
    Driver follows a method of taking one extra reference on the
    page for recycling which is fine in usual packet path where
    each 64KB page is segmented into multiple receive buffers.
    
    But in XDP mode since there is just one receive buffer per
    page taking extra page reference itself becomes big bottleneck
    consuming ~50% of CPU cycles due to atomic operations.
    
    This patch adds a internal ref count in pgcache for each
    page and additional page references are taken in a batch
    instead of just one at a time. Internal i.e 'pgcache->ref_count'
    and page's i.e 'page->_refcount' counters are compared to check
    page's recyclability.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e3d06ff9ec9400b93bacf8fa92f3985c9412e282
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:57 2017 +0530

    net: thunderx: Support for XDP header adjustment
    
    When in XDP mode reserve XDP_PACKET_HEADROOM bytes at the start
    of receive buffer for XDP program to modify headers and adjust
    packet start. Additional code changes done to handle such packets.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 16f2bccda75da48888772c4829a468be620c5d79
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:56 2017 +0530

    net: thunderx: Add support for XDP_TX
    
    Adds support for XDP_TX i.e transmits packet out of
    the XDP TX queue mapped to the corresponding Rx queue
    on which packet is received.
    
    Since SQ for XDP TX will be used only on a single cpu i.e
    SQ description creation and freeing, using atomic free count
    is not necessary and will become a bottleneck. Hence added
    a separate 'xdp_free_cnt' used for SQs designated for XDP
    to track descriptor free count.
    
    Changes also include
    - A new entry 'xdp_page' is added to save transmitted packet's
      page pointer for later cleanup.
    - XDP Tx SQ's doorbell is ringed once per NAPI instance.
    - Retrieving designated SQ for packets being sent out by stack
      via 'nicvf_xmit'.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit c56d91ce38d54c0c0dd8d0e4c6a9e0cfa557152f
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:55 2017 +0530

    net: thunderx: Add support for XDP_DROP
    
    Adds support for XDP_DROP.
    Also since in XDP mode there is just a single buffer per page,
    made changes to recycle DMA mapping info as well along with pages.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 05c773f52b96ef3fbc7d9bfa21caadc6247ef7a8
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:54 2017 +0530

    net: thunderx: Add basic XDP support
    
    Adds basic XDP support i.e attaching a BPF program to an
    interface. Also takes care of allocating separate Tx queues
    for XDP path and for network stack packet transmission.
    
    This patch doesn't support handling of any of the XDP actions,
    all are treated as XDP_PASS i.e packets will be handed over to
    the network stack.
    
    Changes also involve allocating one receive buffer per page in XDP
    mode and multiple in normal mode i.e when no BPF program is attached.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 927987f39f116db477fcd74ced2a2aea940e585c
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:53 2017 +0530

    net: thunderx: Cleanup receive buffer allocation
    
    Get rid of unnecessary double pointer references and type casting
    in receive buffer allocation code.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 0dada88b8cd74569abc3dda50f1b268a5868f6f2
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:52 2017 +0530

    net: thunderx: Optimize CQE_TX handling
    
    Optimized CQE handling with below changes
    - Feeing descriptors back to SQ in bulk i.e once per NAPI
      instance instead for every CQE_TX, this will reduce number
      of atomic updates to 'sq->free_cnt'.
    - Checking errors in CQE_TX and CQE_RX before calling appropriate
      fn()s to update error stats i.e reduce branching.
    
    Also removed debug messages in packet handling path which otherwise
    causes issues if DEBUG is enabled.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 5e848e4c5d77438e126c97702ec3bea477f550a9
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:51 2017 +0530

    net: thunderx: Optimize RBDR descriptor handling
    
    Receive buffer's physical address or iova will anyway not
    go beyond 49bits, since it is the max supported HW address.
    As per perf, updating bitfields i.e buf_addr:42 in RBDR
    descriptor entry consumes lots of cpu cycles, hence changed
    it to a 64bit field with alignment requirements taken care of.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 5836b4429777bf57ca8fc02b154263aa54d97508
Author: Sunil Goutham <sgoutham@cavium.com>
Date:   Tue May 2 18:36:50 2017 +0530

    net: thunderx: Support for page recycling
    
    Adds support for page recycling for allocating receive buffers
    to reduce cost of refilling RBDR ring. Also got rid of using
    compound pages when pagesize is 4K, only order-0 pages now.
    
    Only page is recycled, DMA mappings still needs to be done for
    every receive buffer allocated due to following constraints
    - Cannot have just one receive buffer per 64KB page.
    - There is just one buffer ring shared across 8 Rx queues, so
      buffers of same page can go to any Rx queue.
    - HW gives buffer address where packet has been DMA'ed and not
      the index into buffer ring.
    This makes it not possible to resue DMA mapping info. So unfortunately
    have to go through costly mapping route for every buffer.
    
    Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit ee0d8d8482345ff97a75a7d747efc309f13b0d80
Author: Dan Carpenter <dan.carpenter@oracle.com>
Date:   Tue May 2 13:58:53 2017 +0300

    ipx: call ipxitf_put() in ioctl error path
    
    We should call ipxitf_put() if the copy_to_user() fails.
    
    Reported-by: 李强 <liqiang6-s@360.cn>
    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 9da3242e6a83b6f315aa9c394c939da8e4ad7774
Author: Jiri Pirko <jiri@mellanox.com>
Date:   Tue May 2 10:12:00 2017 +0200

    net: sched: add helpers to handle extended actions
    
    Jump is now the only one using value action opcode. This is going to
    change soon. So introduce helpers to work with this. Convert TC_ACT_JUMP.
    
    This also fixes the TC_ACT_JUMP check, which is incorrectly done as a
    bit check, not a value check.
    
    Fixes: e0ee84ded796 ("net sched actions: Complete the JUMPX opcode")
    Signed-off-by: Jiri Pirko <jiri@mellanox.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 8d3f87d8cd0a16c58ae7e4410938528866c1c0db
Author: sudarsana.kalluru@cavium.com <sudarsana.kalluru@cavium.com>
Date:   Tue May 2 01:11:03 2017 -0700

    qed*: Fix issues in the ptp filter config implementation.
    
    PTP hardware filter configuration performed by the driver for a given
    user requested config is not correct for some of the PTP modes.
    Following changes are needed for PTP config-filter implementation.
     1. NIG_REG_TX_PTP_EN register - Bits 0/1/2 respectively enables
        TimeSync/"V1 frame format support"/"V2 frame format support" on
        the TX side. Set the associated bits based on the user request.
     2. ptp4l application fails to operate in Peer Delay mode. Following
        changes are needed to fix this,
        a. Driver should enable (set to 0) DA #1-related bits for IPv4,
           IPv6 and MAC destination addresses in these registers:
             NIG_REG_TX_LLH_PTP_RULE_MASK
             NIG_REG_LLH_PTP_RULE_MASK
        b. NIG_REG_LLH_PTP_PARAM_MASK/NIG_REG_TX_LLH_PTP_PARAM_MASK should
           be set to 0x0 in all modes.
    
    Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 461eec12012c29b66525c270208d30be8f6da8e7
Author: sudarsana.kalluru@cavium.com <sudarsana.kalluru@cavium.com>
Date:   Tue May 2 01:11:02 2017 -0700

    qede: Fix concurrency issue in PTP Tx path processing.
    
    PTP Tx timestamping data structures are not protected against the
    concurrent access in the Tx paths. Protecting the same using atomic
    bit locks.
    
    Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
    Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 212c7fd614377fef4415d94856a59e9f484aa439
Author: Jan Kiszka <jan.kiszka@siemens.com>
Date:   Tue May 2 09:58:00 2017 +0200

    stmmac: Add support for SIMATIC IOT2000 platform
    
    The IOT2000 is industrial controller platform, derived from the Intel
    Galileo Gen2 board. The variant IOT2020 comes with one LAN port, the
    IOT2040 has two of them. They can be told apart based on the board asset
    tag in the DMI table.
    
    Based on patch by Sascha Weisenberger.
    
    Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
    Signed-off-by: Sascha Weisenberger <sascha.weisenberger@siemens.com>
    Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 412b65d15a7f8a93794653968308fc100f2aa87c
Author: Timmy Li <lixiaoping3@huawei.com>
Date:   Tue May 2 10:46:52 2017 +0800

    net: hns: fix ethtool_get_strings overflow in hns driver
    
    hns_get_sset_count() returns HNS_NET_STATS_CNT and the data space allocated
    is not enough for ethtool_get_strings(), which will cause random memory
    corruption.
    
    When SLAB and DEBUG_SLAB are both enabled, memory corruptions like the
    the following can be observed without this patch:
    [   43.115200] Slab corruption (Not tainted): Acpi-ParseExt start=ffff801fb0b69030, len=80
    [   43.115206] Redzone: 0x9f911029d006462/0x5f78745f31657070.
    [   43.115208] Last user: [<5f7272655f746b70>](0x5f7272655f746b70)
    [   43.115214] 010: 70 70 65 31 5f 74 78 5f 70 6b 74 00 6b 6b 6b 6b  ppe1_tx_pkt.kkkk
    [   43.115217] 030: 70 70 65 31 5f 74 78 5f 70 6b 74 5f 6f 6b 00 6b  ppe1_tx_pkt_ok.k
    [   43.115218] Next obj: start=ffff801fb0b69098, len=80
    [   43.115220] Redzone: 0x706d655f6f666966/0x9f911029d74e35b.
    [   43.115229] Last user: [<ffff0000084b11b0>](acpi_os_release_object+0x28/0x38)
    [   43.115231] 000: 74 79 00 6b 6b 6b 6b 6b 70 70 65 31 5f 74 78 5f  ty.kkkkkppe1_tx_
    [   43.115232] 010: 70 6b 74 5f 65 72 72 5f 63 73 75 6d 5f 66 61 69  pkt_err_csum_fai
    
    Signed-off-by: Timmy Li <lixiaoping3@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit a9f11f963a546fea9144f6a6d1a307e814a387e7
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon May 1 15:29:48 2017 -0700

    tcp: fix wraparound issue in tcp_lp
    
    Be careful when comparing tcp_time_stamp to some u32 quantity,
    otherwise result can be surprising.
    
    Fixes: 7c106d7e782b ("[TCP]: TCP Low Priority congestion control")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit ddc665a4bb4b728b4e6ecec8db1b64efa9184b9c
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Tue May 2 20:34:54 2017 +0200

    bpf, arm64: fix jit branch offset related to ldimm64
    
    When the instruction right before the branch destination is
    a 64 bit load immediate, we currently calculate the wrong
    jump offset in the ctx->offset[] array as we only account
    one instruction slot for the 64 bit load immediate although
    it uses two BPF instructions. Fix it up by setting the offset
    into the right slot after we incremented the index.
    
    Before (ldimm64 test 1):
    
      [...]
      00000020:  52800007  mov w7, #0x0 // #0
      00000024:  d2800060  mov x0, #0x3 // #3
      00000028:  d2800041  mov x1, #0x2 // #2
      0000002c:  eb01001f  cmp x0, x1
      00000030:  54ffff82  b.cs 0x00000020
      00000034:  d29fffe7  mov x7, #0xffff // #65535
      00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
      0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
      00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
      00000044:  d29dddc7  mov x7, #0xeeee // #61166
      00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
      0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
      00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
      [...]
    
    After (ldimm64 test 1):
    
      [...]
      00000020:  52800007  mov w7, #0x0 // #0
      00000024:  d2800060  mov x0, #0x3 // #3
      00000028:  d2800041  mov x1, #0x2 // #2
      0000002c:  eb01001f  cmp x0, x1
      00000030:  540000a2  b.cs 0x00000044
      00000034:  d29fffe7  mov x7, #0xffff // #65535
      00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
      0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
      00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
      00000044:  d29dddc7  mov x7, #0xeeee // #61166
      00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
      0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
      00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
      [...]
    
    Also, add a couple of test cases to make sure JITs pass
    this test. Tested on Cavium ThunderX ARMv8. The added
    test cases all pass after the fix.
    
    Fixes: 8eee539ddea0 ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()")
    Reported-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    Cc: Xi Wang <xi.wang@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 85f68fe89832057584a9e66e1e7e53d53e50faff
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Mon May 1 02:57:20 2017 +0200

    bpf, arm64: implement jiting of BPF_XADD
    
    This work adds BPF_XADD for BPF_W/BPF_DW to the arm64 JIT and therefore
    completes JITing of all BPF instructions, meaning we can thus also remove
    the 'notyet' label and do not need to fall back to the interpreter when
    BPF_XADD is used in a program!
    
    This now also brings arm64 JIT in line with x86_64, s390x, ppc64, sparc64,
    where all current eBPF features are supported.
    
    BPF_W example from test_bpf:
    
      .u.insns_int = {
        BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
        BPF_ST_MEM(BPF_W, R10, -40, 0x10),
        BPF_STX_XADD(BPF_W, R10, R0, -40),
        BPF_LDX_MEM(BPF_W, R0, R10, -40),
        BPF_EXIT_INSN(),
      },
    
      [...]
      00000020:  52800247  mov w7, #0x12 // #18
      00000024:  928004eb  mov x11, #0xffffffffffffffd8 // #-40
      00000028:  d280020a  mov x10, #0x10 // #16
      0000002c:  b82b6b2a  str w10, [x25,x11]
      // start of xadd mapping:
      00000030:  928004ea  mov x10, #0xffffffffffffffd8 // #-40
      00000034:  8b19014a  add x10, x10, x25
      00000038:  f9800151  prfm pstl1strm, [x10]
      0000003c:  885f7d4b  ldxr w11, [x10]
      00000040:  0b07016b  add w11, w11, w7
      00000044:  880b7d4b  stxr w11, w11, [x10]
      00000048:  35ffffab  cbnz w11, 0x0000003c
      // end of xadd mapping:
      [...]
    
    BPF_DW example from test_bpf:
    
      .u.insns_int = {
        BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
        BPF_ST_MEM(BPF_DW, R10, -40, 0x10),
        BPF_STX_XADD(BPF_DW, R10, R0, -40),
        BPF_LDX_MEM(BPF_DW, R0, R10, -40),
        BPF_EXIT_INSN(),
      },
    
      [...]
      00000020:  52800247  mov w7,  #0x12 // #18
      00000024:  928004eb  mov x11, #0xffffffffffffffd8 // #-40
      00000028:  d280020a  mov x10, #0x10 // #16
      0000002c:  f82b6b2a  str x10, [x25,x11]
      // start of xadd mapping:
      00000030:  928004ea  mov x10, #0xffffffffffffffd8 // #-40
      00000034:  8b19014a  add x10, x10, x25
      00000038:  f9800151  prfm pstl1strm, [x10]
      0000003c:  c85f7d4b  ldxr x11, [x10]
      00000040:  8b07016b  add x11, x11, x7
      00000044:  c80b7d4b  stxr w11, x11, [x10]
      00000048:  35ffffab  cbnz w11, 0x0000003c
      // end of xadd mapping:
      [...]
    
    Tested on Cavium ThunderX ARMv8, test suite results after the patch:
    
      No JIT:   [ 3751.855362] test_bpf: Summary: 311 PASSED, 0 FAILED, [0/303 JIT'ed]
      With JIT: [ 3573.759527] test_bpf: Summary: 311 PASSED, 0 FAILED, [303/303 JIT'ed]
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 59b5c13899989e3d8f68a3d19c68044cce8254c0
Author: Tobias Regnery <tobias.regnery@gmail.com>
Date:   Tue May 2 15:15:01 2017 +0200

    Bluetooth: hci_uart: fix kconfig dependency
    
    We see the following link error with CONFIG_BT_HCIUART=y,
    CONFIG_BT_HCIUART_LL=y and CONFIG_SERIAL_DEV_BUS=m:
    
    drivers/built-in.o: In function 'll_close':
    supp.c:(.text+0x55add4): undefined reference to 'serdev_device_close'
    supp.c:(.text+0x55add4): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol 'serdev_device_close'
    drivers/built-in.o: In function 'll_open':
    supp.c:(.text+0x55aed0): undefined reference to 'serdev_device_open'
    supp.c:(.text+0x55aed0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol 'serdev_device_open'
    drivers/built-in.o: In function `hci_ti_probe':
    supp.c:(.text+0x55b00c): undefined reference to 'hci_uart_register_device'
    supp.c:(.text+0x55b00c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol 'hci_uart_register_device'
    drivers/built-in.o: In function `ll_setup':
    supp.c:(.text+0x55b08c): undefined reference to 'serdev_device_set_flow_control'
    supp.c:(.text+0x55b08c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol 'serdev_device_set_flow_control'
    supp.c:(.text+0x55b324): undefined reference to 'serdev_device_set_baudrate'
    supp.c:(.text+0x55b324): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol 'serdev_device_set_baudrate'
    drivers/built-in.o: In function 'll_init':
    supp.c:(.init.text+0x1b508): undefined reference to '__serdev_device_driver_register'
    supp.c:(.init.text+0x1b508): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol '__serdev_device_driver_register'
    
    Fix this by dependig BT_HCIUART_LL on the BT_HCIUART_SERDEV symbol.
    This implies a dependency on BT_HCIUART and hci_ll.c is only compiled in
    if SERIAl_DEV_BUS is built in or SERIAL_DEV_BUS and BT_HCIUART are
    modules.
    
    Fixes: 371805522f87 ("bluetooth: hci_uart: add LL protocol serdev driver support")
    Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

commit 8a8b56638bcac4e64cccc88bf95a0f9f4b19a2fb
Author: James Hogan <james.hogan@imgtec.com>
Date:   Fri Apr 28 10:50:26 2017 +0100

    metag/uaccess: Fix access_ok()
    
    The __user_bad() macro used by access_ok() has a few corner cases
    noticed by Al Viro where it doesn't behave correctly:
    
     - The kernel range check has off by 1 errors which permit access to the
       first and last byte of the kernel mapped range.
    
     - The kernel range check ends at LINCORE_BASE rather than
       META_MEMORY_LIMIT, which is ineffective when the kernel is in global
       space (an extremely uncommon configuration).
    
    There are a couple of other shortcomings here too:
    
     - Access to the whole of the other address space is permitted (i.e. the
       global half of the address space when the kernel is in local space).
       This isn't ideal as it could theoretically still contain privileged
       mappings set up by the bootloader.
    
     - The size argument is unused, permitting user copies which start on
       valid pages at the end of the user address range and cross the
       boundary into the kernel address space (e.g. addr = 0x3ffffff0, size
       > 0x10).
    
    It isn't very convenient to add size checks when disallowing certain
    regions, and it seems far safer to be sure and explicit about what
    userland is able to access, so invert the logic to allow certain regions
    instead, and fix the off by 1 errors and missing size checks. This also
    allows the get_fs() == KERNEL_DS check to be more easily optimised into
    the user address range case.
    
    We now have 3 such allowed regions:
    
     - The user address range (incorporating the get_fs() == KERNEL_DS
       check).
    
     - NULL (some kernel code expects this to work, and we'll always catch
       the fault anyway).
    
     - The core code memory region.
    
    Fixes: 373cd784d0fc ("metag: Memory handling")
    Reported-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: James Hogan <james.hogan@imgtec.com>
    Cc: linux-metag@vger.kernel.org
    Cc: stable@vger.kernel.org

commit a3a8e90a11e7d5245fc985c2e920b196b388022e
Author: Javier Martinez Canillas <javier@dowhile0.org>
Date:   Fri Apr 28 00:22:15 2017 -0400

    MAINTAINERS: Remove myself as reviewer for Exynos
    
    I left Samsung and lost access to most Exynos hardware and documentation.
    Also, I likely won't be able to keep an eye on the platform anymore in the
    short term so remove myself as a reviewer for Exynos.
    
    Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
    Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>

commit 1dbdcc810928a2c1acdd0bbfce9495f63610a0d1
Author: Timur Tabi <timur@codeaurora.org>
Date:   Mon May 1 14:23:04 2017 -0500

    selftests: watchdog: accept multiple params on command line
    
    Watchdog drivers are not required to retain programming information,
    such as timeouts, after the watchdog device is closed.  Therefore,
    the watchdog test should be able to perform multiple actions after
    opening the watchdog device.
    
    For example, to set the timeout to 10s and ping every 5s:
    
            watchdog-test -t 10 -p 5 -e
    
    Also, display the periodic decimal point only if the keep-alive call
    succeeds.
    
    Signed-off-by: Timur Tabi <timur@codeaurora.org>
    Reviewed-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>

commit 933dfbd7c437bbbf65caae785dfa105fbfaa8485
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue May 2 08:48:33 2017 -0700

    rcu: Open-code the rcu_cblist_n_lazy_cbs() function
    
    Because the rcu_cblist_n_lazy_cbs() just samples the ->len_lazy counter,
    and because the rcu_cblist structure is quite straightforward, it makes
    sense to open-code rcu_cblist_n_lazy_cbs(p) as p->len_lazy, cutting out
    a level of indirection.  This commit makes this change.
    
    Reported-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>

commit 4b27f20b40a23f03df682eb1f69e9dc3da7d3b93
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue May 2 08:45:25 2017 -0700

    rcu: Open-code the rcu_cblist_n_cbs() function
    
    Because the rcu_cblist_n_cbs() just samples the ->len counter, and
    because the rcu_cblist structure is quite straightforward, it makes
    sense to open-code rcu_cblist_n_cbs(p) as p->len, cutting out a level
    of indirection.  This commit makes this change.
    
    Reported-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>

commit 586f8525979ad9574bf61637fd58c98d5077f29d
Author: David Miller <davem@davemloft.net>
Date:   Tue May 2 11:36:45 2017 -0400

    bpf: Align packet data properly in program testing framework.
    
    Make sure we apply NET_IP_ALIGN when reserving headroom for SKB
    and XDP test runs, just like a real driver would.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>

commit 78e5227237cae9172dd50c3ebb08d4fb31530676
Author: David Miller <davem@davemloft.net>
Date:   Tue May 2 11:36:33 2017 -0400

    bpf: Do not dereference user pointer in bpf_test_finish().
    
    Instead, pass the kattr in which has a kernel side copy of this
    data structure from userspace already.
    
    Fix based upon a suggestion from Alexei Starovoitov.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>

commit 8ef0f37efb7863a04b1e4102d42b7c0b1a59d40f
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue May 2 08:18:40 2017 -0700

    rcu: Open-code the rcu_cblist_empty() function
    
    Because the rcu_cblist_empty() just samples the ->head pointer, and
    because the rcu_cblist structure is quite straightforward, it makes
    sense to open-code rcu_cblist_empty(p) as !p->head, cutting out a
    level of indirection.  This commit makes this change.
    
    Reported-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>

commit 4e9c3a667135799d50f2778a8a8dae2ca13aafd0
Author: David S. Miller <davem@davemloft.net>
Date:   Tue May 2 07:52:01 2017 -0700

    selftests: bpf: Use bpf_endian.h in test_xdp.c
    
    This fixes the testcase on big-endian.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 24b43c99647bf9be4995e6a6c9c3a923c147770a
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue May 2 16:03:58 2017 +0200

    infiniband: avoid dereferencing uninitialized dst on error path
    
    With commit eea40b8f624f ("infiniband: call ipv6 route lookup
    via the stub interface"), if the route lookup fails due to
    ipv6 being disabled, the dst variable is left untouched, and
    the following dst_release() may access uninitialized memory.
    
    Since ipv6_dst_lookup() always sets dst to NULL in case of
    lookup failure with ipv6 enabled, fix the above just
    returning the error code if the lookup fails.
    
    Fixes: eea40b8f624 ("infiniband: call ipv6 route lookup via the stub interface")
    Reported-by: Sabrina Dubroca <sd@queasysnail.net>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
    Signed-off-by: Doug Ledford <dledford@redhat.com>

commit 98059b98619d093366462ff0a4e1258e946accb9
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue May 2 06:30:12 2017 -0700

    rcu: Separately compile large rcu_segcblist functions
    
    This commit creates a new kernel/rcu/rcu_segcblist.c file that
    contains non-trivial segcblist functions.  Trivial functions
    remain as static inline functions in kernel/rcu/rcu_segcblist.h
    
    Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>

commit 534c01ecdc458bf67e33b3815f1445025ab43719
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: the fix RCU locking for the auditd_connection structure
    
    Cong Wang correctly pointed out that the RCU read locking of the
    auditd_connection struct was wrong, this patch correct this by
    adopting a more traditional, and correct RCU locking model.
    
    This patch is heavily based on an earlier prototype by Cong Wang.
    
    Cc: <stable@vger.kernel.org> # 4.11.x-
    Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
    Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 8cc96382d9a7fe1746286670dd5140c3b12638ae
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: use kmem_cache to manage the audit_buffer cache
    
    The audit subsystem implemented its own buffer cache mechanism which
    is a bit silly these days when we could use the kmem_cache construct.
    
    Some credit is due to Florian Westphal for originally proposing that
    we remove the audit cache implementation in favor of simple
    kmalloc()/kfree() calls, but I would rather have a dedicated slab
    cache to ease debugging and future stats/performance work.
    
    Cc: Florian Westphal <fw@strlen.de>
    Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 2115bb250f260089743e26decfb5f271ba71ca37
Author: Deepa Dinamani <deepa.kernel@gmail.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: Use timespec64 to represent audit timestamps
    
    struct timespec is not y2038 safe.
    Audit timestamps are recorded in string format into
    an audit buffer for a given context.
    These mark the entry timestamps for the syscalls.
    Use y2038 safe struct timespec64 to represent the times.
    The log strings can handle this transition as strings can
    hold upto 1024 characters.
    
    Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
    Reviewed-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Paul Moore <paul@paul-moore.com>
    Acked-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit b6c7c115c2ce679ac536f0adf0ff518fcd939196
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: store the auditd PID as a pid struct instead of pid_t
    
    This is arguably the right thing to do, and will make it easier when
    we start supporting multiple audit daemons in different namespaces.
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 45a0642b4d021a2f50d5db9c191b5bfe60bfa1c7
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: kernel generated netlink traffic should have a portid of 0
    
    We were setting the portid incorrectly in the netlink message headers,
    fix that to always be 0 (nlmsg_pid = 0).
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>
    Reviewed-by: Richard Guy Briggs <rgb@redhat.com>

commit a9d1620877748375cf60b43ef3fa5f61ab6d9f24
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: combine audit_receive() and audit_receive_skb()
    
    There is no reason to have both of these functions, combine the two.
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>
    Reviewed-by: Richard Guy Briggs <rgb@redhat.com>

commit bd120ded6a6af61ad342a8a95b36b64bd1e2f9e6
Author: Elena Reshetova <elena.reshetova@intel.com>
Date:   Tue May 2 10:16:05 2017 -0400

    audit: convert audit_watch.count from atomic_t to refcount_t
    
    refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.
    
    Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
    Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: David Windsor <dwindsor@gmail.com>
    [PM: fix subject line, add #include]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 9d2378f8c8f1a3fcfab681fd90c139d90dca7b69
Author: Elena Reshetova <elena.reshetova@intel.com>
Date:   Tue May 2 10:16:04 2017 -0400

    audit: convert audit_tree.count from atomic_t to refcount_t
    
    refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.
    
    Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
    Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: David Windsor <dwindsor@gmail.com>
    [PM: fix subject line, add #include]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 2173c519d5e912a6e2934bb04255fcd36c1591c8
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Tue May 2 10:16:04 2017 -0400

    audit: normalize NETFILTER_PKT
    
    Eliminate flipping in and out of message fields, dropping fields in the
    process.
    
    Sample raw message format IPv4 UDP:
    type=NETFILTER_PKT msg=audit(1487874761.386:228):  mark=0xae8a2732 saddr=127.0.0.1 daddr=127.0.0.1 proto=17^]
    Sample raw message format IPv6 ICMP6:
    type=NETFILTER_PKT msg=audit(1487874761.381:227):  mark=0x223894b7 saddr=::1 daddr=::1 proto=58^]
    
    Issue: https://github.com/linux-audit/audit-kernel/issues/11
    Test case: https://github.com/linux-audit/audit-testsuite/issues/43
    
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 0cb88b6ff054ccfa30e0fd7f7b42ee9f088db432
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Tue May 2 10:16:04 2017 -0400

    netfilter: use consistent ipv4 network offset in xt_AUDIT
    
    Even though the skb->data pointer has been moved from the link layer
    header to the network layer header, use the same method to calculate the
    offset in ipv4 and ipv6 routines.
    
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    [PM: munged subject line]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit f6276ac95bde4312251535904af32b1de9d54949
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Tue May 2 10:16:04 2017 -0400

    audit: log module name on delete_module
    
    When a sysadmin wishes to monitor module unloading with a syscall rule such as:
     -a always,exit -F arch=x86_64 -S delete_module -F key=mod-unload
    the SYSCALL record doesn't tell us what module was requested for unloading.
    
    Use the new KERN_MODULE auxiliary record to record it.
    The SYSCALL record result code will list the return code.
    
    See: https://github.com/linux-audit/audit-kernel/issues/37
        https://github.com/linux-audit/audit-kernel/issues/7
        https://github.com/linux-audit/audit-kernel/wiki/RFE-Module-Load-Record-Format
    
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Acked-by: Jessica Yu <jeyu@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 9aab4f4ea7a4ca80ec3e0269ce2eb71a24f6fef9
Author: Nicholas Mc Guire <der.herr@hofr.at>
Date:   Tue May 2 10:16:04 2017 -0400

    audit: remove unnecessary semicolon in audit_watch_handle_event()
    
    The excess ; after the closing parenthesis is just code-noise it has no
    and can be removed.
    
    Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
    [PM: tweaked subject line]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit b5239fba69949a44290d4af517fc1c2eff3e36f6
Author: Nicholas Mc Guire <der.herr@hofr.at>
Date:   Tue May 2 10:16:04 2017 -0400

    audit: remove unnecessary semicolon in audit_mark_handle_event()
    
    The excess ; after the closing parenthesis is just code-noise it has no
    and can be removed.
    
    Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
    [PM: tweaked subject line]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit b7a84deaf8d1b0e62b437a290a40d6380975f126
Author: Nicholas Mc Guire <der.herr@hofr.at>
Date:   Tue May 2 10:16:03 2017 -0400

    audit: remove unnecessary semicolon in audit_field_valid()
    
    The excess ; after the closing parenthesis is just code-noise it has no
    and can be removed.
    
    Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
    [PM: tweak subject line]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit b5d60989c6f7501af72cb65893c02621dd16fd84
Author: Jakub Kicinski <jakub.kicinski@netronome.com>
Date:   Mon May 1 15:53:43 2017 -0700

    xdp: fix parameter kdoc for extack
    
    Fix kdoc parameter spelling from extact to extack.
    
    Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit eb6211d3606971a957fea28f7532687f9d0f93f2
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Tue May 2 00:47:09 2017 +0200

    bpf, samples: fix build warning in cookie_uid_helper_example
    
    Fix the following warnings triggered by 51570a5ab2b7 ("A Sample of
    using socket cookie and uid for traffic monitoring"):
    
      In file included from /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:54:0:
      /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c: In function 'prog_load':
      /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:119:27: warning: overflow in implicit constant conversion [-Woverflow]
         -32 + offsetof(struct stats, uid)),
                               ^
      /home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
       .off   = OFF,     \
                ^
      /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:121:27: warning: overflow in implicit constant conversion [-Woverflow]
         -32 + offsetof(struct stats, packets), 1),
                               ^
      /home/foo/net-next/samples/bpf/libbpf.h:155:12: note: in definition of macro 'BPF_ST_MEM'
       .off   = OFF,     \
                ^
      /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:129:27: warning: overflow in implicit constant conversion [-Woverflow]
         -32 + offsetof(struct stats, bytes)),
                               ^
      /home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
       .off   = OFF,     \
                ^
      HOSTLD  /home/foo/net-next/samples/bpf/per_socket_stats_example
    
    Fixes: 51570a5ab2b7 ("A Sample of using socket cookie and uid for traffic monitoring")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 45753c5f315749711b935a2506ee5c10eef5c23d
Author: Ingo Molnar <mingo@kernel.org>
Date:   Tue May 2 10:31:18 2017 +0200

    srcu: Debloat the <linux/rcu_segcblist.h> header
    
    Linus noticed that the <linux/rcu_segcblist.h> has huge inline functions
    which should not be inline at all.
    
    As a first step in cleaning this up, move them all to kernel/rcu/ and
    only keep an absolute minimum of data type defines in the header:
    
      before:   -rw-r--r-- 1 mingo mingo 22284 May  2 10:25 include/linux/rcu_segcblist.h
       after:   -rw-r--r-- 1 mingo mingo  3180 May  2 10:22 include/linux/rcu_segcblist.h
    
    More can be done, such as uninlining the large functions, which inlining
    is unjustified even if it's an RCU internal matter.
    
    Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

commit 5c9cfda13dc503f4b25b486a57974e0414adb6f1
Author: Karim Eshapa <karim.eshapa@gmail.com>
Date:   Tue May 2 13:47:54 2017 +0200

    drivers/video/fbdev/omap/lcd_mipid.c: Use time comparison kernel macros
    
    Use time_before_eq time comparison defind kernel macro
    that has safety check.
    
    Signed-off-by: Karim Eshapa <karim.eshapa@gmail.com>
    Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
    Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>

commit dc85e9a87420613b3129d5cc5ecd79c58351c546
Author: Alexey Khoroshilov <khoroshilov@ispras.ru>
Date:   Tue May 2 13:47:53 2017 +0200

    sm501fb: don't return zero on failure path in sm501fb_start()
    
    If fbmem iomemory mapping failed, sm501fb_start() breaks off
    initialization, deallocates resources, but returns zero.
    As a result, double deallocation can happen in sm501fb_stop().
    
    Found by Linux Driver Verification project (linuxtesting.org).
    
    Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
    Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
    Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>

commit 45f580c42e5c125d55dbd8099750a1998de3d917
Author: Maksim Salau <maksim.salau@gmail.com>
Date:   Tue May 2 13:47:53 2017 +0200

    video: fbdev: udlfb: Fix buffer on stack
    
    Allocate buffers on HEAP instead of STACK for local array
    that is to be sent using usb_control_msg().
    
    Signed-off-by: Maksim Salau <maksim.salau@gmail.com>
    Cc: Bernie Thompson <bernie@plugable.com>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>

commit 2f9d31b67ed3342e7297878716196bcb5e88ab64
Author: Marcel Holtmann <marcel@holtmann.org>
Date:   Mon May 1 23:54:19 2017 -0700

    Bluetooth: Set LE Default PHY preferences
    
    If the LE Set Default PHY command is supported, the indicate to the
    controller that the host has no preferences for transmitter PHY or
    receiver PHY selection.
    
    Issuing this command gives the controller a clear indication that other
    PHY can be selected if available.
    
    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
    Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>

commit d284f54e7ab1598b23dff8593401c3c497672ec9
Author: Marcel Holtmann <marcel@holtmann.org>
Date:   Mon May 1 23:54:18 2017 -0700

    Bluetooth: Enable LE PHY Update Complete event
    
    If either LE Set Default PHY command or LE Set PHY commands is
    supported, then enable the LE PHY Update Complete event.
    
    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
    Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>

commit 606f8eefacb58d3754a89d453744f78b0714f0b4
Author: Marcel Holtmann <marcel@holtmann.org>
Date:   Mon May 1 23:54:17 2017 -0700

    Bluetooth: Enable LE Channel Selection Algorithm event
    
    If the Channel Selection Algorithm #2 feature is supported, then enable
    the new LE Channel Selection Algorithm event.
    
    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
    Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>

commit e371fd7607999fabbd955b4d22c8e912594a7997
Author: Julien Grall <julien.grall@arm.com>
Date:   Mon Apr 24 18:58:39 2017 +0100

    xen: Implement EFI reset_system callback
    
    When rebooting DOM0 with ACPI on ARM64, the kernel is crashing with the stack
    trace [1].
    
    This is happening because when EFI runtimes are enabled, the reset code
    (see machine_restart) will first try to use EFI restart method.
    
    However, the EFI restart code is expecting the reset_system callback to
    be always set. This is not the case for Xen and will lead to crash.
    
    The EFI restart helper is used in multiple places and some of them don't
    not have fallback (see machine_power_off). So implement reset_system
    callback as a call to xen_reboot when using EFI Xen.
    
    [   36.999270] reboot: Restarting system
    [   37.002921] Internal error: Attempting to execute userspace memory: 86000004 [#1] PREEMPT SMP
    [   37.011460] Modules linked in:
    [   37.014598] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.11.0-rc1-00003-g1e248b60a39b-dirty #506
    [   37.023903] Hardware name: (null) (DT)
    [   37.027734] task: ffff800902068000 task.stack: ffff800902064000
    [   37.033739] PC is at 0x0
    [   37.036359] LR is at efi_reboot+0x94/0xd0
    [   37.040438] pc : [<0000000000000000>] lr : [<ffff00000880f2c4>] pstate: 404001c5
    [   37.047920] sp : ffff800902067cf0
    [   37.051314] x29: ffff800902067cf0 x28: ffff800902068000
    [   37.056709] x27: ffff000008992000 x26: 000000000000008e
    [   37.062104] x25: 0000000000000123 x24: 0000000000000015
    [   37.067499] x23: 0000000000000000 x22: ffff000008e6e250
    [   37.072894] x21: ffff000008e6e000 x20: 0000000000000000
    [   37.078289] x19: ffff000008e5d4c8 x18: 0000000000000010
    [   37.083684] x17: 0000ffffa7c27470 x16: 00000000deadbeef
    [   37.089079] x15: 0000000000000006 x14: ffff000088f42bef
    [   37.094474] x13: ffff000008f42bfd x12: ffff000008e706c0
    [   37.099870] x11: ffff000008e70000 x10: 0000000005f5e0ff
    [   37.105265] x9 : ffff800902067a50 x8 : 6974726174736552
    [   37.110660] x7 : ffff000008cc6fb8 x6 : ffff000008cc6fb0
    [   37.116055] x5 : ffff000008c97dd8 x4 : 0000000000000000
    [   37.121453] x3 : 0000000000000000 x2 : 0000000000000000
    [   37.126845] x1 : 0000000000000000 x0 : 0000000000000000
    [   37.132239]
    [   37.133808] Process systemd-shutdow (pid: 1, stack limit = 0xffff800902064000)
    [   37.141118] Stack: (0xffff800902067cf0 to 0xffff800902068000)
    [   37.146949] 7ce0:                                   ffff800902067d40 ffff000008085334
    [   37.154869] 7d00: 0000000000000000 ffff000008f3b000 ffff800902067d40 ffff0000080852e0
    [   37.162787] 7d20: ffff000008cc6fb0 ffff000008cc6fb8 ffff000008c7f580 ffff000008c97dd8
    [   37.170706] 7d40: ffff800902067d60 ffff0000080e2c2c 0000000000000000 0000000001234567
    [   37.178624] 7d60: ffff800902067d80 ffff0000080e2ee8 0000000000000000 ffff0000080e2df4
    [   37.186544] 7d80: 0000000000000000 ffff0000080830f0 0000000000000000 00008008ff1c1000
    [   37.194462] 7da0: ffffffffffffffff 0000ffffa7c4b1cc 0000000000000000 0000000000000024
    [   37.202380] 7dc0: ffff800902067dd0 0000000000000005 0000fffff24743c8 0000000000000004
    [   37.210299] 7de0: 0000fffff2475f03 0000000000000010 0000fffff2474418 0000000000000005
    [   37.218218] 7e00: 0000fffff2474578 000000000000000a 0000aaaad6b722c0 0000000000000001
    [   37.226136] 7e20: 0000000000000123 0000000000000038 ffff800902067e50 ffff0000081e7294
    [   37.234055] 7e40: ffff800902067e60 ffff0000081e935c ffff800902067e60 ffff0000081e9388
    [   37.241973] 7e60: ffff800902067eb0 ffff0000081ea388 0000000000000000 00008008ff1c1000
    [   37.249892] 7e80: ffffffffffffffff 0000ffffa7c4a79c 0000000000000000 ffff000000020000
    [   37.257810] 7ea0: 0000010000000004 0000000000000000 0000000000000000 ffff0000080830f0
    [   37.265729] 7ec0: fffffffffee1dead 0000000028121969 0000000001234567 0000000000000000
    [   37.273651] 7ee0: ffffffffffffffff 8080000000800000 0000800000008080 feffa9a9d4ff2d66
    [   37.281567] 7f00: 000000000000008e feffa9a9d5b60e0f 7f7fffffffff7f7f 0101010101010101
    [   37.289485] 7f20: 0000000000000010 0000000000000008 000000000000003a 0000ffffa7ccf588
    [   37.297404] 7f40: 0000aaaad6b87d00 0000ffffa7c4b1b0 0000fffff2474be0 0000aaaad6b88000
    [   37.305326] 7f60: 0000fffff2474fb0 0000000001234567 0000000000000000 0000000000000000
    [   37.313240] 7f80: 0000000000000000 0000000000000001 0000aaaad6b70d4d 0000000000000000
    [   37.321159] 7fa0: 0000000000000001 0000fffff2474ea0 0000aaaad6b5e2e0 0000fffff2474e80
    [   37.329078] 7fc0: 0000ffffa7c4b1cc 0000000000000000 fffffffffee1dead 000000000000008e
    [   37.336997] 7fe0: 0000000000000000 0000000000000000 9ce839cffee77eab fafdbf9f7ed57f2f
    [   37.344911] Call trace:
    [   37.347437] Exception stack(0xffff800902067b20 to 0xffff800902067c50)
    [   37.353970] 7b20: ffff000008e5d4c8 0001000000000000 0000000080f82000 0000000000000000
    [   37.361883] 7b40: ffff800902067b60 ffff000008e17000 ffff000008f44c68 00000001081081b4
    [   37.369802] 7b60: ffff800902067bf0 ffff000008108478 0000000000000000 ffff000008c235b0
    [   37.377721] 7b80: ffff800902067ce0 0000000000000000 0000000000000000 0000000000000015
    [   37.385643] 7ba0: 0000000000000123 000000000000008e ffff000008992000 ffff800902068000
    [   37.393557] 7bc0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    [   37.401477] 7be0: 0000000000000000 ffff000008c97dd8 ffff000008cc6fb0 ffff000008cc6fb8
    [   37.409396] 7c00: 6974726174736552 ffff800902067a50 0000000005f5e0ff ffff000008e70000
    [   37.417318] 7c20: ffff000008e706c0 ffff000008f42bfd ffff000088f42bef 0000000000000006
    [   37.425234] 7c40: 00000000deadbeef 0000ffffa7c27470
    [   37.430190] [<          (null)>]           (null)
    [   37.434982] [<ffff000008085334>] machine_restart+0x6c/0x70
    [   37.440550] [<ffff0000080e2c2c>] kernel_restart+0x6c/0x78
    [   37.446030] [<ffff0000080e2ee8>] SyS_reboot+0x130/0x228
    [   37.451337] [<ffff0000080830f0>] el0_svc_naked+0x24/0x28
    [   37.456737] Code: bad PC value
    [   37.459891] ---[ end trace 76e2fc17e050aecd ]---
    
    Signed-off-by: Julien Grall <julien.grall@arm.com>
    
    --
    
    Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: Juergen Gross <jgross@suse.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: x86@kernel.org
    
    The x86 code has theoritically a similar issue, altought EFI does not
    seem to be the preferred method. I have only built test it on x86.
    
    This should also probably be fixed in stable tree.
    
        Changes in v2:
            - Implement xen_efi_reset_system using xen_reboot
            - Move xen_efi_reset_system in drivers/xen/efi.c
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit fa12a870a9d594ba458242a04a4d17a76fc816a4
Author: Julien Grall <julien.grall@arm.com>
Date:   Mon Apr 24 18:58:38 2017 +0100

    arm/xen: Consolidate calls to shutdown hypercall in a single helper
    
    Signed-off-by: Julien Grall <julien.grall@arm.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit 5d9404e1185de8d508cd042761306495f727d7eb
Author: Julien Grall <julien.grall@arm.com>
Date:   Mon Apr 24 18:58:37 2017 +0100

    xen: Export xen_reboot
    
    The helper xen_reboot will be called by the EFI code in a later patch.
    
    Note that the ARM version does not yet exist and will be added in a
    later patch too.
    
    Signed-off-by: Julien Grall <julien.grall@arm.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit f31b969217b42df605b2e0e64aa6b3e03e781a4f
Author: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Date:   Wed Apr 26 09:42:48 2017 -0400

    xen/x86: Call xen_smp_intr_init_pv() on BSP
    
    Recent code rework that split handling ov PV, HVM and PVH guests into
    separate files missed calling xen_smp_intr_init_pv() on CPU0.
    
    Add this call.
    
    Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit 84d582d236dc1f9085e741affc72e9ba061a67c2
Author: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Date:   Mon Apr 24 15:04:53 2017 -0400

    xen: Revert commits da72ff5bfcb0 and 72a9b186292d
    
    Recent discussion (http://marc.info/?l=xen-devel&m=149192184523741)
    established that commit 72a9b186292d ("xen: Remove event channel
    notification through Xen PCI platform device") (and thus commit
    da72ff5bfcb0 ("partially revert "xen: Remove event channel
    notification through Xen PCI platform device"")) are unnecessary and,
    in fact, prevent HVM guests from booting on Xen releases prior to 4.0
    
    Therefore we revert both of those commits.
    
    The summary of that discussion is below:
    
      Here is the brief summary of the current situation:
    
      Before the offending commit (72a9b186292):
    
      1) INTx does not work because of the reset_watches path.
      2) The reset_watches path is only taken if you have Xen > 4.0
      3) The Linux Kernel by default will use vector inject if the hypervisor
         support. So even INTx does not work no body running the kernel with
         Xen > 4.0 would notice. Unless he explicitly disabled this feature
         either in the kernel or in Xen (and this can only be disabled by
         modifying the code, not user-supported way to do it).
    
      After the offending commit (+ partial revert):
    
      1) INTx is no longer support for HVM (only for PV guests).
      2) Any HVM guest The kernel will not boot on Xen < 4.0 which does
         not have vector injection support. Since the only other mode
         supported is INTx which.
    
      So based on this summary, I think before commit (72a9b186292) we were
      in much better position from a user point of view.
    
    Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Reviewed-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit 5f6a1614fab801834e32b420b60acdc27acfcdec
Author: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Date:   Fri Apr 21 11:13:14 2017 -0400

    xen/pvh: Do not fill kernel's e820 map in init_pvh_bootparams()
    
    e820 map is updated with information from the zeropage (i.e. pvh_bootparams)
    by default_machine_specific_memory_setup(). With the way things are done
    now,  we end up with a duplicated e820 map.
    
    Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Reviewed-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit 6483e3135a693548874429db901c0544d3a9b4cd
Author: Geliang Tang <geliangtang@gmail.com>
Date:   Sat Apr 22 09:21:13 2017 +0800

    xen/scsifront: use offset_in_page() macro
    
    Use offset_in_page() macro instead of open-coding.
    
    Signed-off-by: Geliang Tang <geliangtang@gmail.com>
    Reviewed-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit d5ff5061c35448525fcb38950f06af6b9ae12c04
Author: Stefano Stabellini <sstabellini@kernel.org>
Date:   Thu Apr 13 14:04:22 2017 -0700

    xen/arm,arm64: rename __generic_dma_ops to xen_get_dma_ops
    
    Now that __generic_dma_ops is a xen specific function, rename it to
    xen_get_dma_ops. Change all the call sites appropriately.
    
    Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
    Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    CC: linux@armlinux.org.uk
    CC: catalin.marinas@arm.com
    CC: will.deacon@arm.com
    CC: boris.ostrovsky@oracle.com
    CC: jgross@suse.com
    CC: Julien Grall <julien.grall@arm.com>

commit e058632670b709145730a134acc3f83f392f7aa7
Author: Stefano Stabellini <sstabellini@kernel.org>
Date:   Thu Apr 13 14:04:21 2017 -0700

    xen/arm,arm64: fix xen_dma_ops after 815dd18 "Consolidate get_dma_ops..."
    
    The following commit:
    
      commit 815dd18788fe0d41899f51b91d0560279cf16b0d
      Author: Bart Van Assche <bart.vanassche@sandisk.com>
      Date:   Fri Jan 20 13:04:04 2017 -0800
    
          treewide: Consolidate get_dma_ops() implementations
    
    rearranges get_dma_ops in a way that xen_dma_ops are not returned when
    running on Xen anymore, dev->dma_ops is returned instead (see
    arch/arm/include/asm/dma-mapping.h:get_arch_dma_ops and
    include/linux/dma-mapping.h:get_dma_ops).
    
    Fix the problem by storing dev->dma_ops in dev_archdata, and setting
    dev->dma_ops to xen_dma_ops. This way, xen_dma_ops is returned naturally
    by get_dma_ops. The Xen code can retrieve the original dev->dma_ops from
    dev_archdata when needed. It also allows us to remove __generic_dma_ops
    from common headers.
    
    Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
    Tested-by: Julien Grall <julien.grall@arm.com>
    Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Cc: <stable@vger.kernel.org>        [4.11+]
    CC: linux@armlinux.org.uk
    CC: catalin.marinas@arm.com
    CC: will.deacon@arm.com
    CC: boris.ostrovsky@oracle.com
    CC: jgross@suse.com
    CC: Julien Grall <julien.grall@arm.com>

commit 4a806016223c6df131280c82f1ed69c820b6a9ff
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Wed Apr 19 19:06:39 2017 +0200

    xen/9pfs: select CONFIG_XEN_XENBUS_FRONTEND
    
    All Xen frontends need to select this symbol to avoid a link error:
    
    net/built-in.o: In function `p9_trans_xen_init':
    :(.text+0x149e9c): undefined reference to `__xenbus_register_frontend'
    
    Fixes: d4b40a02f837 ("xen/9pfs: build 9pfs Xen transport driver")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit 65f9d65443b5512f74248a3eb56731fbb0b337b8
Author: Juergen Gross <jgross@suse.com>
Date:   Thu Apr 13 09:42:03 2017 +0200

    x86/cpu: remove hypervisor specific set_cpu_features
    
    There is no user of x86_hyper->set_cpu_features() any more. Remove it.
    
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: x86@kernel.org
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit d40342a2ac035444897e5952ea72a50440a2a028
Author: Juergen Gross <jgross@suse.com>
Date:   Thu Apr 13 09:37:20 2017 +0200

    vmware: set cpu capabilities during platform initialization
    
    There is no need to set the same capabilities for each cpu
    individually. This can be done for all cpus in platform initialization.
    
    Cc: Alok Kataria <akataria@vmware.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: x86@kernel.org
    Cc: virtualization@lists.linux-foundation.org
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Acked-by: Alok Kataria <akataria@vmware.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit 6807cf65f5ba6f2902ab64355d71506b9c14a9dd
Author: Juergen Gross <jgross@suse.com>
Date:   Wed Apr 12 15:12:09 2017 +0200

    x86/xen: use capabilities instead of fake cpuid values for xsave
    
    When running as pv domain xen_cpuid() is being used instead of
    native_cpuid(). In xen_cpuid() the xsave feature availability is
    indicated by special casing the related cpuid leaf.
    
    Instead of delivering fake cpuid values set or clear the cpu
    capability bits for xsave instead.
    
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit e657fccb799b970bd1f152e22e13f20e0de7adb5
Author: Juergen Gross <jgross@suse.com>
Date:   Wed Apr 12 12:45:57 2017 +0200

    x86/xen: use capabilities instead of fake cpuid values for x2apic
    
    When running as pv domain xen_cpuid() is being used instead of
    native_cpuid(). In xen_cpuid() the x2apic feature is indicated as not
    being present by special casing the related cpuid leaf.
    
    Instead of delivering fake cpuid values clear the cpu capability bit
    for x2apic instead.
    
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit ea01598b4bc453ed513e9d482a43c14557042ec9
Author: Juergen Gross <jgross@suse.com>
Date:   Wed Apr 12 12:37:00 2017 +0200

    x86/xen: use capabilities instead of fake cpuid values for mwait
    
    When running as pv domain xen_cpuid() is being used instead of
    native_cpuid(). In xen_cpuid() the mwait feature is indicated to be
    present or not by special casing the related cpuid leaf.
    
    Instead of delivering fake cpuid values use the cpu capability bit
    for mwait instead.
    
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit b778d6bf63e0a07ceb2258ce9bd996dbaaa11bfa
Author: Juergen Gross <jgross@suse.com>
Date:   Wed Apr 12 09:27:47 2017 +0200

    x86/xen: use capabilities instead of fake cpuid values for acpi
    
    When running as pv domain xen_cpuid() is being used instead of
    native_cpuid(). In xen_cpuid() the acpi feature is indicated as not
    being present by special casing the related cpuid leaf in case we
    are not the initial domain.
    
    Instead of delivering fake cpuid values clear the cpu capability bit
    for acpi instead.
    
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit aa1071562937d1b66b07db48e3dbafe136027f01
Author: Juergen Gross <jgross@suse.com>
Date:   Wed Apr 12 09:24:01 2017 +0200

    x86/xen: use capabilities instead of fake cpuid values for acc
    
    When running as pv domain xen_cpuid() is being used instead of
    native_cpuid(). In xen_cpuid() the acc feature (thermal monitoring)
    is indicated as not being present by special casing the related
    cpuid leaf.
    
    Instead of delivering fake cpuid values clear the cpu capability bit
    for acc instead.
    
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Signed-off-by: Juergen Gross <jgross@suse.com>

commit 88f3256f21d958d0773bf93523ad12d2ddaf3006
Author: Juergen Gross <jgross@suse.com>
Date:   Wed Apr 12 09:21:05 2017 +0200

    x86/xen: use capabilities instead of fake cpuid values for mtrr
    
    When running as pv domain xen_cpuid() is being used instead of
    native_cpuid(). In xen_cpuid() the mtrr feature is indicated as not
    being present by special casing the related cpuid leaf.
    
    Instead of delivering fake cpuid values clear the cpu capability bit
    for mtrr instead.
    
    Signed-off-by: Juergen Gross <jgro…
pcmoore pushed a commit that referenced this issue May 8, 2017
mipsxx_pmu_handle_shared_irq() calls irq_work_run() while holding the
pmuint_rwlock for read.  irq_work_run() can, via perf_pending_event(),
call try_to_wake_up() which can try to take rq->lock.

However, perf can also call perf_pmu_enable() (and thus take the
pmuint_rwlock for write) while holding the rq->lock, from
finish_task_switch() via perf_event_context_sched_in().

This leads to an ABBA deadlock:

 PID: 3855   TASK: 8f7ce288  CPU: 2   COMMAND: "process"
  #0 [89c39ac8] __delay at 803b5be4
  #1 [89c39ac8] do_raw_spin_lock at 8008fdcc
  #2 [89c39af8] try_to_wake_up at 8006e47c
  #3 [89c39b38] pollwake at 8018eab0
  #4 [89c39b68] __wake_up_common at 800879f4
  #5 [89c39b98] __wake_up at 800880e4
  #6 [89c39bc8] perf_event_wakeup at 8012109c
  #7 [89c39be8] perf_pending_event at 80121184
  #8 [89c39c08] irq_work_run_list at 801151f0
  #9 [89c39c38] irq_work_run at 80115274
 #10 [89c39c50] mipsxx_pmu_handle_shared_irq at 8002cc7c

 PID: 1481   TASK: 8eaac6a8  CPU: 3   COMMAND: "process"
  #0 [8de7f900] do_raw_write_lock at 800900e0
  #1 [8de7f918] perf_event_context_sched_in at 80122310
  #2 [8de7f938] __perf_event_task_sched_in at 80122608
  #3 [8de7f958] finish_task_switch at 8006b8a4
  #4 [8de7f998] __schedule at 805e4dc4
  #5 [8de7f9f8] schedule at 805e5558
  #6 [8de7fa10] schedule_hrtimeout_range_clock at 805e9984
  #7 [8de7fa70] poll_schedule_timeout at 8018e8f8
  #8 [8de7fa88] do_select at 8018f338
  #9 [8de7fd88] core_sys_select at 8018f5cc
 #10 [8de7fee0] sys_select at 8018f854
 #11 [8de7ff28] syscall_common at 80028fc8

The lock seems to be there to protect the hardware counters so there is
no need to hold it across irq_work_run().

Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
pcmoore pushed a commit to linux-audit/audit-testsuite that referenced this issue Jun 1, 2017
Test for simplified normalized NETFILTER_PKT audit message.
Check for receipt of each nfmarked packet and for correct number of fields.

See: linux-audit/audit-kernel#11

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
pcmoore pushed a commit that referenced this issue Oct 22, 2018
This reverts commit e70a3aa.

This change causes use-after-free on dst->_metrics.
The crash trace looks like this:
[   97.763269] BUG: KASAN: use-after-free in ip6_mtu+0x116/0x140
[   97.769038] Read of size 4 at addr ffff881781d2cf84 by task svw_NetThreadEv/8801

[   97.777954] CPU: 76 PID: 8801 Comm: svw_NetThreadEv Not tainted 4.15.0-smp-DEV #11
[   97.777956] Hardware name: Default string Default string/Indus_QC_02, BIOS 5.46.4 03/29/2018
[   97.777957] Call Trace:
[   97.777971]  [<ffffffff895709db>] dump_stack+0x4d/0x72
[   97.777985]  [<ffffffff881651df>] print_address_description+0x6f/0x260
[   97.777997]  [<ffffffff88165747>] kasan_report+0x257/0x370
[   97.778001]  [<ffffffff894488e6>] ? ip6_mtu+0x116/0x140
[   97.778004]  [<ffffffff881658b9>] __asan_report_load4_noabort+0x19/0x20
[   97.778008]  [<ffffffff894488e6>] ip6_mtu+0x116/0x140
[   97.778013]  [<ffffffff892bb91e>] tcp_current_mss+0x12e/0x280
[   97.778016]  [<ffffffff892bb7f0>] ? tcp_mtu_to_mss+0x2d0/0x2d0
[   97.778022]  [<ffffffff887b45b8>] ? depot_save_stack+0x138/0x4a0
[   97.778037]  [<ffffffff87c38985>] ? __mmdrop+0x145/0x1f0
[   97.778040]  [<ffffffff881643b1>] ? save_stack+0xb1/0xd0
[   97.778046]  [<ffffffff89264c82>] tcp_send_mss+0x22/0x220
[   97.778059]  [<ffffffff89273a49>] tcp_sendmsg_locked+0x4f9/0x39f0
[   97.778062]  [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
[   97.778066]  [<ffffffff89273550>] ? tcp_sendpage+0x60/0x60
[   97.778070]  [<ffffffff881cb359>] ? rw_copy_check_uvector+0x69/0x280
[   97.778075]  [<ffffffff8873c65f>] ? import_iovec+0x9f/0x430
[   97.778078]  [<ffffffff88164be7>] ? kasan_slab_free+0x87/0xc0
[   97.778082]  [<ffffffff8873c5c0>] ? memzero_page+0x140/0x140
[   97.778085]  [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
[   97.778088]  [<ffffffff89276f6c>] tcp_sendmsg+0x2c/0x50
[   97.778092]  [<ffffffff89276f6c>] ? tcp_sendmsg+0x2c/0x50
[   97.778098]  [<ffffffff89352d43>] inet_sendmsg+0x103/0x480
[   97.778102]  [<ffffffff89352c40>] ? inet_gso_segment+0x15b0/0x15b0
[   97.778105]  [<ffffffff890294da>] sock_sendmsg+0xba/0xf0
[   97.778108]  [<ffffffff8902ab6a>] ___sys_sendmsg+0x6ca/0x8e0
[   97.778113]  [<ffffffff87dccac1>] ? hrtimer_try_to_cancel+0x71/0x3b0
[   97.778116]  [<ffffffff8902a4a0>] ? copy_msghdr_from_user+0x3d0/0x3d0
[   97.778119]  [<ffffffff881646d1>] ? memset+0x31/0x40
[   97.778123]  [<ffffffff87a0cff5>] ? schedule_hrtimeout_range_clock+0x165/0x380
[   97.778127]  [<ffffffff87a0ce90>] ? hrtimer_nanosleep_restart+0x250/0x250
[   97.778130]  [<ffffffff87dcc700>] ? __hrtimer_init+0x180/0x180
[   97.778133]  [<ffffffff87dd1f82>] ? ktime_get_ts64+0x172/0x200
[   97.778137]  [<ffffffff8822b8ec>] ? __fget_light+0x8c/0x2f0
[   97.778141]  [<ffffffff8902d5c6>] __sys_sendmsg+0xe6/0x190
[   97.778144]  [<ffffffff8902d5c6>] ? __sys_sendmsg+0xe6/0x190
[   97.778147]  [<ffffffff8902d4e0>] ? SyS_shutdown+0x20/0x20
[   97.778152]  [<ffffffff87cd4370>] ? wake_up_q+0xe0/0xe0
[   97.778155]  [<ffffffff8902d670>] ? __sys_sendmsg+0x190/0x190
[   97.778158]  [<ffffffff8902d683>] SyS_sendmsg+0x13/0x20
[   97.778162]  [<ffffffff87a1600c>] do_syscall_64+0x2ac/0x430
[   97.778166]  [<ffffffff87c17515>] ? do_page_fault+0x35/0x3d0
[   97.778171]  [<ffffffff8960131f>] ? page_fault+0x2f/0x50
[   97.778174]  [<ffffffff89600071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   97.778177] RIP: 0033:0x7f83fa36000d
[   97.778178] RSP: 002b:00007f83ef9229e0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[   97.778180] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f83fa36000d
[   97.778182] RDX: 0000000000004000 RSI: 00007f83ef922f00 RDI: 0000000000000036
[   97.778183] RBP: 00007f83ef923040 R08: 00007f83ef9231f8 R09: 00007f83ef923168
[   97.778184] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f83f69c5b40
[   97.778185] R13: 000000000000001c R14: 0000000000000001 R15: 0000000000004000

[   97.779684] Allocated by task 5919:
[   97.783185]  save_stack+0x46/0xd0
[   97.783187]  kasan_kmalloc+0xad/0xe0
[   97.783189]  kmem_cache_alloc_trace+0xdf/0x580
[   97.783190]  ip6_convert_metrics.isra.79+0x7e/0x190
[   97.783192]  ip6_route_info_create+0x60a/0x2480
[   97.783193]  ip6_route_add+0x1d/0x80
[   97.783195]  inet6_rtm_newroute+0xdd/0xf0
[   97.783198]  rtnetlink_rcv_msg+0x641/0xb10
[   97.783200]  netlink_rcv_skb+0x27b/0x3e0
[   97.783202]  rtnetlink_rcv+0x15/0x20
[   97.783203]  netlink_unicast+0x4be/0x720
[   97.783204]  netlink_sendmsg+0x7bc/0xbf0
[   97.783205]  sock_sendmsg+0xba/0xf0
[   97.783207]  ___sys_sendmsg+0x6ca/0x8e0
[   97.783208]  __sys_sendmsg+0xe6/0x190
[   97.783209]  SyS_sendmsg+0x13/0x20
[   97.783211]  do_syscall_64+0x2ac/0x430
[   97.783213]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

[   97.784709] Freed by task 0:
[   97.785056] knetbase: Error: /proc/sys/net/core/txcs_enable does not exist
[   97.794497]  save_stack+0x46/0xd0
[   97.794499]  kasan_slab_free+0x71/0xc0
[   97.794500]  kfree+0x7c/0xf0
[   97.794501]  fib6_info_destroy_rcu+0x24f/0x310
[   97.794504]  rcu_process_callbacks+0x38b/0x1730
[   97.794506]  __do_softirq+0x1c8/0x5d0

Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
comps pushed a commit to comps/audit-test that referenced this issue Jul 4, 2019
This renders the audit-searching code mostly useless, but it
reflects the level of support (or lack thereof) for NETFILTER_PKT
new kernel versions have.

linux-audit/audit-kernel#11

RHBZ#1382494

Signed-off-by: Jiri Jaburek <jjaburek@redhat.com>
pcmoore pushed a commit that referenced this issue Sep 17, 2019
Revert the commit bd293d0. The proper
fix has been made available with commit d0a255e ("loop: set
PF_MEMALLOC_NOIO for the worker thread").

Note that the fix offered by commit bd293d0 doesn't really prevent
the deadlock from occuring - if we look at the stacktrace reported by
Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex -
i.e. it has already successfully taken the mutex. Changing the mutex
from mutex_lock to mutex_trylock won't help with deadlocks that happen
afterwards.

PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
   #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
   #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
   #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
   #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
   #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
   #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
   #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
   #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
   #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
   #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
  #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
  #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
  #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
  #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
  #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: bd293d0 ("dm bufio: fix deadlock with loop device")
Depends-on: d0a255e ("loop: set PF_MEMALLOC_NOIO for the worker thread")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
pcmoore pushed a commit that referenced this issue Jan 27, 2020
Some time ago the block layer was modified such that timeout handlers are
called from thread context instead of interrupt context. Make it safe to
run the iSCSI timeout handler in thread context. This patch fixes the
following lockdep complaint:

================================
WARNING: inconsistent lock state
5.5.1-dbg+ #11 Not tainted
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
kworker/7:1H/206 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff88802d9827e8 (&(&session->frwd_lock)->rlock){+.?.}, at: iscsi_eh_cmd_timed_out+0xa6/0x6d0 [libiscsi]
{IN-SOFTIRQ-W} state was registered at:
  lock_acquire+0x106/0x240
  _raw_spin_lock+0x38/0x50
  iscsi_check_transport_timeouts+0x3e/0x210 [libiscsi]
  call_timer_fn+0x132/0x470
  __run_timers.part.0+0x39f/0x5b0
  run_timer_softirq+0x63/0xc0
  __do_softirq+0x12d/0x5fd
  irq_exit+0xb3/0x110
  smp_apic_timer_interrupt+0x131/0x3d0
  apic_timer_interrupt+0xf/0x20
  default_idle+0x31/0x230
  arch_cpu_idle+0x13/0x20
  default_idle_call+0x53/0x60
  do_idle+0x38a/0x3f0
  cpu_startup_entry+0x24/0x30
  start_secondary+0x222/0x290
  secondary_startup_64+0xa4/0xb0
irq event stamp: 1383705
hardirqs last  enabled at (1383705): [<ffffffff81aace5c>] _raw_spin_unlock_irq+0x2c/0x50
hardirqs last disabled at (1383704): [<ffffffff81aacb98>] _raw_spin_lock_irq+0x18/0x50
softirqs last  enabled at (1383690): [<ffffffffa0e2efea>] iscsi_queuecommand+0x76a/0xa20 [libiscsi]
softirqs last disabled at (1383682): [<ffffffffa0e2e998>] iscsi_queuecommand+0x118/0xa20 [libiscsi]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&session->frwd_lock)->rlock);
  <Interrupt>
    lock(&(&session->frwd_lock)->rlock);

 *** DEADLOCK ***

2 locks held by kworker/7:1H/206:
 #0: ffff8880d57bf928 ((wq_completion)kblockd){+.+.}, at: process_one_work+0x472/0xab0
 #1: ffff88802b9c7de8 ((work_completion)(&q->timeout_work)){+.+.}, at: process_one_work+0x476/0xab0

stack backtrace:
CPU: 7 PID: 206 Comm: kworker/7:1H Not tainted 5.5.1-dbg+ #11
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: kblockd blk_mq_timeout_work
Call Trace:
 dump_stack+0xa5/0xe6
 print_usage_bug.cold+0x232/0x23b
 mark_lock+0x8dc/0xa70
 __lock_acquire+0xcea/0x2af0
 lock_acquire+0x106/0x240
 _raw_spin_lock+0x38/0x50
 iscsi_eh_cmd_timed_out+0xa6/0x6d0 [libiscsi]
 scsi_times_out+0xf4/0x440 [scsi_mod]
 scsi_timeout+0x1d/0x20 [scsi_mod]
 blk_mq_check_expired+0x365/0x3a0
 bt_iter+0xd6/0xf0
 blk_mq_queue_tag_busy_iter+0x3de/0x650
 blk_mq_timeout_work+0x1af/0x380
 process_one_work+0x56d/0xab0
 worker_thread+0x7a/0x5d0
 kthread+0x1bc/0x210
 ret_from_fork+0x24/0x30

Fixes: 287922e ("block: defer timeouts to a workqueue")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191209173457.187370-1-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
pcmoore pushed a commit that referenced this issue Jan 27, 2020
Properly initialize refcount to 1 when hardware queue arrays for
TC-MQPRIO offload have been freshly allocated. Otherwise, following
warning is observed. Also fix up error path to only free hardware
queue arrays when refcount reaches 0.

[  130.075342] ------------[ cut here ]------------
[  130.075343] refcount_t: addition on 0; use-after-free.
[  130.075355] WARNING: CPU: 0 PID: 10870 at lib/refcount.c:25
refcount_warn_saturate+0xe1/0x100
[  130.075356] Modules linked in: sch_mqprio iptable_nat ib_iser
libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_umad iw_cxgb4 libcxgb
ib_uverbs x86_pkg_temp_thermal cxgb4 igb
[  130.075361] CPU: 0 PID: 10870 Comm: tc Kdump: loaded Not tainted
5.5.0-rc1+ #11
[  130.075362] Hardware name: Supermicro
X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.2
01/16/2015
[  130.075363] RIP: 0010:refcount_warn_saturate+0xe1/0x100
[  130.075364] Code: e8 14 41 c1 ff 0f 0b c3 80 3d 44 f4 10 01 00 0f 85
63 ff ff ff 48 c7 c7 38 9f 83 8c 31 c0 c6 05 2e f4 10 01 01 e8 ef 40 c1
ff <0f> 0b c3 48 c7 c7 10 9f 83 8c 31 c0 c6 05 17 f4 10 01 01 e8 d7 40
[  130.075365] RSP: 0018:ffffa48d00c0b768 EFLAGS: 00010286
[  130.075366] RAX: 0000000000000000 RBX: 0000000000000008 RCX:
0000000000000001
[  130.075366] RDX: 0000000000000001 RSI: 0000000000000096 RDI:
ffff8a2e9fa187d0
[  130.075367] RBP: ffff8a2e93890000 R08: 0000000000000398 R09:
000000000000003c
[  130.075367] R10: 00000000000142a0 R11: 0000000000000397 R12:
ffffa48d00c0b848
[  130.075368] R13: ffff8a2e94746498 R14: ffff8a2e966f7000 R15:
0000000000000031
[  130.075368] FS:  00007f689015f840(0000) GS:ffff8a2e9fa00000(0000)
knlGS:0000000000000000
[  130.075369] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  130.075369] CR2: 00000000006762a0 CR3: 00000007cf164005 CR4:
00000000001606f0
[  130.075370] Call Trace:
[  130.075377]  cxgb4_setup_tc_mqprio+0xbee/0xc30 [cxgb4]
[  130.075382]  ? cxgb4_ethofld_restart+0x50/0x50 [cxgb4]
[  130.075384]  ? pfifo_fast_init+0x7e/0xf0
[  130.075386]  mqprio_init+0x5f4/0x630 [sch_mqprio]
[  130.075389]  qdisc_create+0x1bf/0x4a0
[  130.075390]  tc_modify_qdisc+0x1ff/0x770
[  130.075392]  rtnetlink_rcv_msg+0x28b/0x350
[  130.075394]  ? rtnl_calcit.isra.32+0x110/0x110
[  130.075395]  netlink_rcv_skb+0xc6/0x100
[  130.075396]  netlink_unicast+0x1db/0x330
[  130.075397]  netlink_sendmsg+0x2f5/0x460
[  130.075399]  ? _copy_from_user+0x2e/0x60
[  130.075400]  sock_sendmsg+0x59/0x70
[  130.075401]  ____sys_sendmsg+0x1f0/0x230
[  130.075402]  ? copy_msghdr_from_user+0xd7/0x140
[  130.075403]  ___sys_sendmsg+0x77/0xb0
[  130.075404]  ? ___sys_recvmsg+0x84/0xb0
[  130.075406]  ? __handle_mm_fault+0x377/0xaf0
[  130.075407]  __sys_sendmsg+0x53/0xa0
[  130.075409]  do_syscall_64+0x44/0x130
[  130.075412]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  130.075413] RIP: 0033:0x7f688f13af10
[  130.075414] Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff
eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
[  130.075414] RSP: 002b:00007ffe6c7d9988 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[  130.075415] RAX: ffffffffffffffda RBX: 00000000006703a0 RCX:
00007f688f13af10
[  130.075415] RDX: 0000000000000000 RSI: 00007ffe6c7d99f0 RDI:
0000000000000003
[  130.075416] RBP: 000000005df38312 R08: 0000000000000002 R09:
0000000000008000
[  130.075416] R10: 00007ffe6c7d93e0 R11: 0000000000000246 R12:
0000000000000000
[  130.075417] R13: 00007ffe6c7e9c50 R14: 0000000000000001 R15:
000000000067c600
[  130.075418] ---[ end trace 8fbb3bf36a8671db ]---

v2:
- Move the refcount_set() closer to where the hardware queue arrays
  are being allocated.
- Fix up error path to only free hardware queue arrays when refcount
  reaches 0.

Fixes: 2d0cb84 ("cxgb4: add ETHOFLD hardware queue support")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pcmoore pushed a commit that referenced this issue Jun 2, 2020
Stefano reported a crash with using SQPOLL with io_uring:

  BUG: kernel NULL pointer dereference, address: 00000000000003b0
  CPU: 2 PID: 1307 Comm: io_uring-sq Not tainted 5.7.0-rc7 #11
  RIP: 0010:task_numa_work+0x4f/0x2c0
  Call Trace:
   task_work_run+0x68/0xa0
   io_sq_thread+0x252/0x3d0
   kthread+0xf9/0x130
   ret_from_fork+0x35/0x40

which is task_numa_work() oopsing on current->mm being NULL.

The task work is queued by task_tick_numa(), which checks if current->mm is
NULL at the time of the call. But this state isn't necessarily persistent,
if the kthread is using use_mm() to temporarily adopt the mm of a task.

Change the task_tick_numa() check to exclude kernel threads in general,
as it doesn't make sense to attempt ot balance for kthreads anyway.

Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/865de121-8190-5d30-ece5-3b097dc74431@kernel.dk
pcmoore pushed a commit that referenced this issue Jan 25, 2022
If the key is already present then free the key used for lookup.

Found with:
$ perf stat -M IO_Read_BW /bin/true

==1749112==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 32 byte(s) in 4 object(s) allocated from:
    #0 0x7f6f6fa7d7cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x55acecd9d7a6 in check_per_pkg util/stat.c:343
    #2 0x55acecd9d9c5 in process_counter_values util/stat.c:365
    #3 0x55acecd9e0ab in process_counter_maps util/stat.c:421
    #4 0x55acecd9e292 in perf_stat_process_counter util/stat.c:443
    #5 0x55aceca8553e in read_counters ./tools/perf/builtin-stat.c:470
    #6 0x55aceca88fe3 in __run_perf_stat ./tools/perf/builtin-stat.c:1023
    #7 0x55aceca89146 in run_perf_stat ./tools/perf/builtin-stat.c:1048
    #8 0x55aceca90858 in cmd_stat ./tools/perf/builtin-stat.c:2555
    #9 0x55acecc05fa5 in run_builtin ./tools/perf/perf.c:313
    #10 0x55acecc064fe in handle_internal_command ./tools/perf/perf.c:365
    #11 0x55acecc068bb in run_argv ./tools/perf/perf.c:409
    #12 0x55acecc070aa in main ./tools/perf/perf.c:539

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: zhengjun.xing@intel.com
Link: https://lore.kernel.org/r/20220105061351.120843-24-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
pcmoore pushed a commit that referenced this issue May 17, 2022
Kernel panic when injecting memory_failure for the global
huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.

  Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
  page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
  head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
  flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
  raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
  page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
  ------------[ cut here ]------------
  kernel BUG at mm/huge_memory.c:2499!
  invalid opcode: 0000 [#1] PREEMPT SMP PTI
  CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
  Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
  RIP: 0010:split_huge_page_to_list+0x66a/0x880
  Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
  RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
  RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
  RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
  R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
  R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
  FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   try_to_split_thp_page+0x3a/0x130
   memory_failure+0x128/0x800
   madvise_inject_error.cold+0x8b/0xa1
   __x64_sys_madvise+0x54/0x60
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7fc3754f8bf9
  Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
  RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
  RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
  RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
  R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
  R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000

This makes huge_zero_page bail out explicitly before split in
memory_failure(), thus the panic above won't happen again.

Link: https://lkml.kernel.org/r/497d3835612610e370c74e697ea3c721d1d55b9c.1649775850.git.xuyu@linux.alibaba.com
Fixes: 6a46079 ("HWPOISON: The high level memory error handler in the VM v7")
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Reported-by: Abaci <abaci@linux.alibaba.com>
Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pcmoore pushed a commit that referenced this issue May 17, 2022
Kernel panic when injecting memory_failure for the global huge_zero_page,
when CONFIG_DEBUG_VM is enabled, as follows.

  Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
  page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
  head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
  flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
  raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
  page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
  ------------[ cut here ]------------
  kernel BUG at mm/huge_memory.c:2499!
  invalid opcode: 0000 [#1] PREEMPT SMP PTI
  CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
  Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
  RIP: 0010:split_huge_page_to_list+0x66a/0x880
  Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
  RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
  RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
  RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
  R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
  R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
  FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
  try_to_split_thp_page+0x3a/0x130
  memory_failure+0x128/0x800
  madvise_inject_error.cold+0x8b/0xa1
  __x64_sys_madvise+0x54/0x60
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7fc3754f8bf9
  Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
  RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
  RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
  RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
  R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
  R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000

We think that raising BUG is overkilling for splitting huge_zero_page, the
huge_zero_page can't be met from normal paths other than memory failure,
but memory failure is a valid caller.  So we tend to replace the BUG to
WARN + returning -EBUSY, and thus the panic above won't happen again.

Link: https://lkml.kernel.org/r/f35f8b97377d5d3ede1bc5ac3114da888c57cbce.1651052574.git.xuyu@linux.alibaba.com
Fixes: d173d54 ("mm/memory-failure.c: skip huge_zero_page in memory_failure()")
Fixes: 6a46079 ("HWPOISON: The high level memory error handler in the VM v7")
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Suggested-by: Yang Shi <shy828301@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
pcmoore pushed a commit that referenced this issue May 23, 2022
Do not allow to write timestamps on RX rings if PF is being configured.
When PF is being configured RX rings can be freed or rebuilt. If at the
same time timestamps are updated, the kernel will crash by dereferencing
null RX ring pointer.

PID: 1449   TASK: ff187d28ed658040  CPU: 34  COMMAND: "ice-ptp-0000:51"
 #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be
 #1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d
 #2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd
 #3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54
 #4 [ff1966a94a713d08] no_context at ffffffff9d06bda4
 #5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c
 #6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4
 #7 [ff1966a94a713de0] page_fault at ffffffff9da0107e
    [exception RIP: ice_ptp_update_cached_phctime+91]
    RIP: ffffffffc076db8b  RSP: ff1966a94a713e98  RFLAGS: 00010246
    RAX: 16e3db9c6b7ccae4  RBX: ff187d269dd3c180  RCX: ff187d269cd4d018
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ff187d269cfcc644   R8: ff187d339b9641b0   R9: 0000000000000000
    R10: 0000000000000002  R11: 0000000000000000  R12: ff187d269cfcc648
    R13: ffffffff9f128784  R14: ffffffff9d101b70  R15: ff187d269cfcc640
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice]
 #9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b
 #10 [ff1966a94a713f10] kthread at ffffffff9d101b4d
 #11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f

Fixes: 77a7811 ("ice: enable receive hardware timestamping")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Dave Cain <dcain@redhat.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
pcmoore pushed a commit that referenced this issue Jun 15, 2022
OFFLOADS paring using devcom is possible only on devices
that support LAG. Filter based on lag capabilities.

This fixes an issue where mlx5_get_next_phys_dev() was
called without holding the interface lock.

This issue was found when commit
bc4c2f2 ("net/mlx5: Lag, filter non compatible devices")
added an assert that verifies the interface lock is held.

WARNING: CPU: 9 PID: 1706 at drivers/net/ethernet/mellanox/mlx5/core/dev.c:642 mlx5_get_next_phys_dev+0xd2/0x100 [mlx5_core]
Modules linked in: mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm ib_uverbs ib_core overlay fuse [last unloaded: mlx5_core]
CPU: 9 PID: 1706 Comm: devlink Not tainted 5.18.0-rc7+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5_get_next_phys_dev+0xd2/0x100 [mlx5_core]
Code: 02 00 75 48 48 8b 85 80 04 00 00 5d c3 31 c0 5d c3 be ff ff ff ff 48 c7 c7 08 41 5b a0 e8 36 87 28 e3 85 c0 0f 85 6f ff ff ff <0f> 0b e9 68 ff ff ff 48 c7 c7 0c 91 cc 84 e8 cb 36 6f e1 e9 4d ff
RSP: 0018:ffff88811bf47458 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88811b398000 RCX: 0000000000000001
RDX: 0000000080000000 RSI: ffffffffa05b4108 RDI: ffff88812daaaa78
RBP: ffff88812d050380 R08: 0000000000000001 R09: ffff88811d6b3437
R10: 0000000000000001 R11: 00000000fddd3581 R12: ffff88815238c000
R13: ffff88812d050380 R14: ffff8881018aa7e0 R15: ffff88811d6b3428
FS:  00007fc82e18ae80(0000) GS:ffff88842e080000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9630d1b421 CR3: 0000000149802004 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 mlx5_esw_offloads_devcom_event+0x99/0x3b0 [mlx5_core]
 mlx5_devcom_send_event+0x167/0x1d0 [mlx5_core]
 esw_offloads_enable+0x1153/0x1500 [mlx5_core]
 ? mlx5_esw_offloads_controller_valid+0x170/0x170 [mlx5_core]
 ? wait_for_completion_io_timeout+0x20/0x20
 ? mlx5_rescan_drivers_locked+0x318/0x810 [mlx5_core]
 mlx5_eswitch_enable_locked+0x586/0xc50 [mlx5_core]
 ? mlx5_eswitch_disable_pf_vf_vports+0x1d0/0x1d0 [mlx5_core]
 ? mlx5_esw_try_lock+0x1b/0xb0 [mlx5_core]
 ? mlx5_eswitch_enable+0x270/0x270 [mlx5_core]
 ? __debugfs_create_file+0x260/0x3e0
 mlx5_devlink_eswitch_mode_set+0x27e/0x870 [mlx5_core]
 ? mutex_lock_io_nested+0x12c0/0x12c0
 ? esw_offloads_disable+0x250/0x250 [mlx5_core]
 ? devlink_nl_cmd_trap_get_dumpit+0x470/0x470
 ? rcu_read_lock_sched_held+0x3f/0x70
 devlink_nl_cmd_eswitch_set_doit+0x217/0x620

Fixes: dd3fddb ("net/mlx5: E-Switch, handle devcom events only for ports on the same device")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
pcmoore pushed a commit that referenced this issue Aug 22, 2022
With special lengths supplied by user space, register_shm_helper() has
an integer overflow when calculating the number of pages covered by a
supplied user space memory region.

This causes internal_get_user_pages_fast() a helper function of
pin_user_pages_fast() to do a NULL pointer dereference:

  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
  Modules linked in:
  CPU: 1 PID: 173 Comm: optee_example_a Not tainted 5.19.0 #11
  Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
  pc : internal_get_user_pages_fast+0x474/0xa80
  Call trace:
   internal_get_user_pages_fast+0x474/0xa80
   pin_user_pages_fast+0x24/0x4c
   register_shm_helper+0x194/0x330
   tee_shm_register_user_buf+0x78/0x120
   tee_ioctl+0xd0/0x11a0
   __arm64_sys_ioctl+0xa8/0xec
   invoke_syscall+0x48/0x114

Fix this by adding an an explicit call to access_ok() in
tee_shm_register_user_buf() to catch an invalid user space address
early.

Fixes: 033ddf1 ("tee: add register user memory")
Cc: stable@vger.kernel.org
Reported-by: Nimish Mishra <neelam.nimish@gmail.com>
Reported-by: Anirban Chakraborty <ch.anirban00727@gmail.com>
Reported-by: Debdeep Mukhopadhyay <debdeep.mukhopadhyay@gmail.com>
Suggested-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pcmoore pushed a commit that referenced this issue Jan 3, 2023
Ido Schimmel says:

====================
bridge: mcast: Extensions for EVPN

tl;dr
=====

This patchset creates feature parity between user space and the kernel
and allows the former to install and replace MDB port group entries with
a source list and associated filter mode. This is required for EVPN use
cases where multicast state is not derived from snooped IGMP/MLD
packets, but instead derived from EVPN routes exchanged by the control
plane in user space.

Background
==========

IGMPv3 [1] and MLDv2 [2] differ from earlier versions of the protocols
in that they add support for source-specific multicast. That is, hosts
can advertise interest in listening to a particular multicast address
only from specific source addresses or from all sources except for
specific source addresses.

In kernel 5.10 [3][4], the bridge driver gained the ability to snoop
IGMPv3/MLDv2 packets and install corresponding MDB port group entries.
For example, a snooped IGMPv3 Membership Report that contains a single
MODE_IS_EXCLUDE record for group 239.10.10.10 with sources 192.0.2.1,
192.0.2.2, 192.0.2.20 and 192.0.2.21 would trigger the creation of these
entries:

 # bridge -d mdb show
 dev br0 port veth1 grp 239.10.10.10 src 192.0.2.21 temp filter_mode include proto kernel  blocked
 dev br0 port veth1 grp 239.10.10.10 src 192.0.2.20 temp filter_mode include proto kernel  blocked
 dev br0 port veth1 grp 239.10.10.10 src 192.0.2.2 temp filter_mode include proto kernel  blocked
 dev br0 port veth1 grp 239.10.10.10 src 192.0.2.1 temp filter_mode include proto kernel  blocked
 dev br0 port veth1 grp 239.10.10.10 temp filter_mode exclude source_list 192.0.2.21/0.00,192.0.2.20/0.00,192.0.2.2/0.00,192.0.2.1/0.00 proto kernel

While the kernel can install and replace entries with a filter mode and
source list, user space cannot. It can only add EXCLUDE entries with an
empty source list, which is sufficient for IGMPv2/MLDv1, but not for
IGMPv3/MLDv2.

Use cases where the multicast state is not derived from snooped packets,
but instead derived from routes exchanged by the user space control
plane require feature parity between user space and the kernel in terms
of MDB configuration. Such a use case is detailed in the next section.

Motivation
==========

RFC 7432 [5] defines a "MAC/IP Advertisement route" (type 2) [6] that
allows NVE switches in the EVPN network to advertise and learn
reachability information for unicast MAC addresses. Traffic destined to
a unicast MAC address can therefore be selectively forwarded to a single
NVE switch behind which the MAC is located.

The same is not true for IP multicast traffic. Such traffic is simply
flooded as BUM to all NVE switches in the broadcast domain (BD),
regardless if a switch has interested receivers for the multicast stream
or not. This is especially problematic for overlay networks that make
heavy use of multicast.

The issue is addressed by RFC 9251 [7] that defines a "Selective
Multicast Ethernet Tag Route" (type 6) [8] which allows NVE switches in
the EVPN network to advertise multicast streams that they are interested
in. This is done by having each switch suppress IGMP/MLD packets from
being transmitted to the NVE network and instead communicate the
information over BGP to other switches.

As far as the bridge driver is concerned, the above means that the
multicast state (i.e., {multicast address, group timer, filter-mode,
(source records)}) for the VXLAN bridge port is not populated by the
kernel from snooped IGMP/MLD packets (they are suppressed), but instead
by user space. Specifically, by the routing daemon that is exchanging
EVPN routes with other NVE switches.

Changes are obviously also required in the VXLAN driver, but they are
the subject of future patchsets. See the "Future work" section.

Implementation
==============

The user interface is extended to allow user space to specify the filter
mode of the MDB port group entry and its source list. Replace support is
also added so that user space would not need to remove an entry and
re-add it only to edit its source list or filter mode, as that would
result in packet loss. Example usage:

 # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent \
	source_list 192.0.2.1,192.0.2.3 filter_mode exclude proto zebra
 # bridge -d -s mdb show
 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 permanent filter_mode include proto zebra  blocked    0.00
 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra  blocked    0.00
 dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude source_list 192.0.2.3/0.00,192.0.2.1/0.00 proto zebra     0.00

The netlink interface is extended with a few new attributes in the
RTM_NEWMDB request message:

[ struct nlmsghdr ]
[ struct br_port_msg ]
[ MDBA_SET_ENTRY ]
	struct br_mdb_entry
[ MDBA_SET_ENTRY_ATTRS ]
	[ MDBE_ATTR_SOURCE ]
		struct in_addr / struct in6_addr
	[ MDBE_ATTR_SRC_LIST ]		// new
		[ MDBE_SRC_LIST_ENTRY ]
			[ MDBE_SRCATTR_ADDRESS ]
				struct in_addr / struct in6_addr
		[ ...]
	[ MDBE_ATTR_GROUP_MODE ]	// new
		u8
	[ MDBE_ATTR_RTPORT ]		// new
		u8

No changes are required in RTM_NEWMDB responses and notifications, as
all the information can already be dumped by the kernel today.

Testing
=======

Tested with existing bridge multicast selftests: bridge_igmp.sh,
bridge_mdb_port_down.sh, bridge_mdb.sh, bridge_mld.sh,
bridge_vlan_mcast.sh.

In addition, added many new test cases for existing as well as for new
MDB functionality.

Patchset overview
=================

Patches #1-#8 are non-functional preparations for the core changes in
later patches.

Patches #9-#10 allow user space to install (*, G) entries with a source
list and associated filter mode. Specifically, patch #9 adds the
necessary kernel plumbing and patch #10 exposes the new functionality to
user space via a few new attributes.

Patch #11 allows user space to specify the routing protocol of new MDB
port group entries so that a routing daemon could differentiate between
entries installed by it and those installed by an administrator.

Patch #12 allows user space to replace MDB port group entries. This is
useful, for example, when user space wants to add a new source to a
source list. Instead of deleting a (*, G) entry and re-adding it with an
extended source list (which would result in packet loss), user space can
simply replace the current entry.

Patches #13-#14 add tests for existing MDB functionality as well as for
all new functionality added in this patchset.

Future work
===========

The VXLAN driver will need to be extended with an MDB so that it could
selectively forward IP multicast traffic to NVE switches with interested
receivers instead of simply flooding it to all switches as BUM.

The idea is to reuse the existing MDB interface for the VXLAN driver in
a similar way to how the FDB interface is shared between the bridge and
VXLAN drivers.

From command line perspective, configuration will look as follows:

 # bridge mdb add dev br0 port vxlan0 grp 239.1.1.1 permanent \
	filter_mode exclude source_list 198.50.100.1,198.50.100.2

 # bridge mdb add dev vxlan0 port vxlan0 grp 239.1.1.1 permanent \
	filter_mode include source_list 198.50.100.3,198.50.100.4 \
	dst 192.0.2.1 dst_port 4789 src_vni 2

 # bridge mdb add dev vxlan0 port vxlan0 grp 239.1.1.1 permanent \
	filter_mode exclude source_list 198.50.100.1,198.50.100.2 \
	dst 192.0.2.2 dst_port 4789 src_vni 2

Where the first command is enabled by this set, but the next two will be
the subject of future work.

From netlink perspective, the existing PF_BRIDGE/RTM_*MDB messages will
be extended to the VXLAN driver. This means that a few new attributes
will be added (e.g., 'MDBE_ATTR_SRC_VNI') and that the handlers for
these messages will need to move to net/core/rtnetlink.c. The rtnetlink
code will call into the appropriate driver based on the ifindex
specified in the ancillary header.

iproute2 patches can be found here [9].

Changelog
=========

Since v1 [10]:

* Patch #12: Remove extack from br_mdb_replace_group_sg().
* Patch #12: Change 'nlflags' to u16 and move it after 'filter_mode' to
  pack the structure.

Since RFC [11]:

* Patch #6: New patch.
* Patch #9: Use an array instead of a list to store source entries.
* Patch #10: Use an array instead of list to store source entries.
* Patch #10: Drop br_mdb_config_attrs_fini().
* Patch #11: Reject protocol for host entries.
* Patch #13: New patch.
* Patch #14: New patch.

[1] https://datatracker.ietf.org/doc/html/rfc3376
[2] https://www.rfc-editor.org/rfc/rfc3810
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6af52ae2ed14a6bc756d5606b29097dfd76740b8
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=68d4fd30c83b1b208e08c954cd45e6474b148c87
[5] https://datatracker.ietf.org/doc/html/rfc7432
[6] https://datatracker.ietf.org/doc/html/rfc7432#section-7.2
[7] https://datatracker.ietf.org/doc/html/rfc9251
[8] https://datatracker.ietf.org/doc/html/rfc9251#section-9.1
[9] https://github.com/idosch/iproute2/commits/submit/mdb_v1
[10] https://lore.kernel.org/netdev/20221208152839.1016350-1-idosch@nvidia.com/
[11] https://lore.kernel.org/netdev/20221018120420.561846-1-idosch@nvidia.com/
====================

Link: https://lore.kernel.org/r/20221210145633.1328511-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pcmoore pushed a commit that referenced this issue Jan 3, 2023
We need to check if we have a OS prefix, otherwise we stumble on a
metric segv that I'm now seeing in Arnaldo's tree:

  $ gdb --args perf stat -M Backend true
  ...
  Performance counter stats for 'true':

          4,712,355      TOPDOWN.SLOTS                    #     17.3 % tma_core_bound

  Program received signal SIGSEGV, Segmentation fault.
  __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:77
  77      ../sysdeps/x86_64/multiarch/strlen-evex.S: No such file or directory.
  (gdb) bt
  #0  __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:77
  #1  0x00007ffff74749a5 in __GI__IO_fputs (str=0x0, fp=0x7ffff75f5680 <_IO_2_1_stderr_>)
  #2  0x0000555555779f28 in do_new_line_std (config=0x555555e077c0 <stat_config>, os=0x7fffffffbf10) at util/stat-display.c:356
  #3  0x000055555577a081 in print_metric_std (config=0x555555e077c0 <stat_config>, ctx=0x7fffffffbf10, color=0x0, fmt=0x5555558b77b5 "%8.1f", unit=0x7fffffffbb10 "%  tma_memory_bound", val=13.165355724442199) at util/stat-display.c:380
  #4  0x00005555557768b6 in generic_metric (config=0x555555e077c0 <stat_config>, metric_expr=0x55555593d5b7 "((CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES))"..., metric_events=0x555555f334e0, metric_refs=0x555555ec81d0, name=0x555555f32e80 "TOPDOWN.SLOTS", metric_name=0x555555f26c80 "tma_memory_bound", metric_unit=0x55555593d5b1 "100%", runtime=0, map_idx=0, out=0x7fffffffbd90, st=0x555555e9e620 <rt_stat>) at util/stat-shadow.c:934
  #5  0x0000555555778cac in perf_stat__print_shadow_stats (config=0x555555e077c0 <stat_config>, evsel=0x555555f289d0, avg=4712355, map_idx=0, out=0x7fffffffbd90, metric_events=0x555555e078e8 <stat_config+296>, st=0x555555e9e620 <rt_stat>) at util/stat-shadow.c:1329
  #6  0x000055555577b6a0 in printout (config=0x555555e077c0 <stat_config>, os=0x7fffffffbf10, uval=4712355, run=325322, ena=325322, noise=4712355, map_idx=0) at util/stat-display.c:741
  #7  0x000055555577bc74 in print_counter_aggrdata (config=0x555555e077c0 <stat_config>, counter=0x555555f289d0, s=0, os=0x7fffffffbf10) at util/stat-display.c:838
  #8  0x000055555577c1d8 in print_counter (config=0x555555e077c0 <stat_config>, counter=0x555555f289d0, os=0x7fffffffbf10) at util/stat-display.c:957
  #9  0x000055555577dba0 in evlist__print_counters (evlist=0x555555ec3610, config=0x555555e077c0 <stat_config>, _target=0x555555e01c80 <target>, ts=0x0, argc=1, argv=0x7fffffffe450) at util/stat-display.c:1413
  #10 0x00005555555fc821 in print_counters (ts=0x0, argc=1, argv=0x7fffffffe450) at builtin-stat.c:1040
  #11 0x000055555560091a in cmd_stat (argc=1, argv=0x7fffffffe450) at builtin-stat.c:2665
  #12 0x00005555556b1eea in run_builtin (p=0x555555e11f70 <commands+336>, argc=4, argv=0x7fffffffe450) at perf.c:322
  #13 0x00005555556b2181 in handle_internal_command (argc=4, argv=0x7fffffffe450) at perf.c:376
  #14 0x00005555556b22d7 in run_argv (argcp=0x7fffffffe27c, argv=0x7fffffffe270) at perf.c:420
  #15 0x00005555556b26ef in main (argc=4, argv=0x7fffffffe450) at perf.c:550
  (gdb)

Fixes: f123b2d ("perf stat: Remove prefix argument in print_metric_headers()")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: http://lore.kernel.org/lkml/CAP-5=fUOjSM5HajU9TCD6prY39LbX4OQbkEbtKPPGRBPBN=_VQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
pcmoore pushed a commit that referenced this issue May 8, 2023
Sai Krishna says:

====================
octeontx2: Miscellaneous fixes

This patchset includes following fixes.

Patch #1 Fix for the race condition while updating APR table

Patch #2 Fix end bit position in NPC scan config

Patch #3 Fix depth of CAM, MEM table entries

Patch #4 Fix in increase the size of DMAC filter flows

Patch #5 Fix driver crash resulting from invalid interface type
information retrieved from firmware

Patch #6 Fix incorrect mask used while installing filters involving
fragmented packets

Patch #7 Fixes for NPC field hash extract w.r.t IPV6 hash reduction,
         IPV6 filed hash configuration.

Patch #8 Fix for NPC hardware parser configuration destination
         address hash, IPV6 endianness issues.

Patch #9 Fix for skipping mbox initialization for PFs disabled by firmware.

Patch #10 Fix disabling packet I/O in case of mailbox timeout.

Patch #11 Fix detaching LF resources in case of VF probe fail.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
pcmoore pushed a commit that referenced this issue Mar 25, 2024
Locally generated packets can increment the new nexthop statistics from
process context, resulting in the following splat [1] due to preemption
being enabled. Fix by using get_cpu_ptr() / put_cpu_ptr() which will
which take care of disabling / enabling preemption.

BUG: using smp_processor_id() in preemptible [00000000] code: ping/949
caller is nexthop_select_path+0xcf8/0x1e30
CPU: 12 PID: 949 Comm: ping Not tainted 6.8.0-rc7-custom-gcb450f605fae #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xbd/0xe0
 check_preemption_disabled+0xce/0xe0
 nexthop_select_path+0xcf8/0x1e30
 fib_select_multipath+0x865/0x18b0
 fib_select_path+0x311/0x1160
 ip_route_output_key_hash_rcu+0xe54/0x2720
 ip_route_output_key_hash+0x193/0x380
 ip_route_output_flow+0x25/0x130
 raw_sendmsg+0xbab/0x34a0
 inet_sendmsg+0xa2/0xe0
 __sys_sendto+0x2ad/0x430
 __x64_sys_sendto+0xe5/0x1c0
 do_syscall_64+0xc5/0x1d0
 entry_SYSCALL_64_after_hwframe+0x63/0x6b
[...]

Fixes: f4676ea ("net: nexthop: Add nexthop group entry stats")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240311162307.545385-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants