Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Refactor parser to minimize exposure to parsing bugs for unwanted record types #265

Open
briansmith opened this issue Oct 25, 2017 · 4 comments

Comments

@briansmith
Copy link
Contributor

briansmith commented Oct 25, 2017

Consider the implementation of RData::read() given below.

There are two, related, issues with this code:

  1. There is no way to minimize the code size by stripping out unneeded code for unused features. For example, consider a user of -resolver that doesn't enable DNSSEC. The compiler won't be able to drop the DNSSEC-specific parsing (and other processing, but in this issue we're concerned mainly with parsing) logic.

  2. More seriously, let's say there is an exploitable security vulnerability in the MX parsing code, and my application only uses -resolver and only resolves A/AAA/SRV records. For this application, ideally the MX parsing code would not even be reachable since MX records are not relevant to A/AAAA/SRV queries, and so ideally my application wouldn't need a security update. However, with the current design, my application will still parse MX records if they are (maliciously) present in the input, even though they're irrelevant to A/AAAA/SRV lookups.

I propose that we solve both issues in the following way:

  • Add a new argument to RData::read() that takes a structure that looks something like this:
struct RDataParser {
    a: Option<AParser>,
    aaaa: Option<AAAAParser>,
    srv: Option<SRVParser>,
   ....
}
impl Default for RDataParser {
    fn default() -> Self {
        Self { a: None, aaaa: None, srv: None, .... }
    }
}
  • Factor out each case of RData::read() into AParser, AAAAParser, etc.

  • Change the boolean option that determines whether or not DNSSEC validation is done to instead be a reference to a Validator that either does nothing (NoDNSSEC) or that does DNSSEC validation (DNSSECValidator). The validator will return a RDataParser that handles the relevant DNSSEC record types (all none for NoDNSSEC, and the full set of parsers, for DNSSECValidator).

  • Change the resolver's ipv4_lookup() so that it passes something like this to RData::read():

{ { a: AParser::new(), cname: CNameParser::new(), ..dnssec_validator.rdata_parser.clone() }

That is, ipv4_lookup() would parse CNAME and A records and maybe DNSSEC records depending on whether the user chose to enable validation.

  • Change ipv6_lookup(), etc. similarly.

The effects would be:

  1. If one uses only NoDNSSEC, then none of the DNSSEC record parsing code (and, later, none of the DNSSEC processing code in general) would get linked into the application, saving code size, and removing any exposure to DNSSEC-specific vulnerabilities.

  2. If one uses only ipv4_lookup() and ipv6_lookup() and service_lookup() then none of the parsing logic for record types irrelevant to `A/AAAA/SRV queries (and, later, none of the processing code for those types) would get linked in, and the application wouldn't be exposed to any vulnerabilities in that dropped code.

For reference, here's the RData::read() code as it is today:

    /// Read the RData from the given Decoder
    pub fn read(
        decoder: &mut BinDecoder,
        record_type: RecordType,
        rdata_length: u16,
    ) -> ProtoResult<Self> {
        let start_idx = decoder.index();

        let result = match record_type {
            RecordType::A => {
                debug!("reading A");
                RData::A(try!(rdata::a::read(decoder)))
            }
            RecordType::AAAA => {
                debug!("reading AAAA");
                RData::AAAA(try!(rdata::aaaa::read(decoder)))
            }
            rt @ RecordType::ANY => {
                return Err(ProtoErrorKind::UnknownRecordTypeValue(rt.into()).into())
            }
            rt @ RecordType::AXFR => {
                return Err(ProtoErrorKind::UnknownRecordTypeValue(rt.into()).into())
            }
            RecordType::CNAME => {
                debug!("reading CNAME");
                RData::CNAME(try!(rdata::name::read(decoder)))
            }
            RecordType::DNSKEY => {
                debug!("reading DNSKEY");
                RData::DNSKEY(try!(rdata::dnskey::read(decoder, rdata_length)))
            }
            RecordType::DS => {
                debug!("reading DS");
                RData::DS(try!(rdata::ds::read(decoder, rdata_length)))
            }
            rt @ RecordType::IXFR => {
                return Err(ProtoErrorKind::UnknownRecordTypeValue(rt.into()).into())
            }
            RecordType::KEY => {
                debug!("reading KEY");
                RData::KEY(try!(rdata::key::read(decoder, rdata_length)))
            }
            RecordType::MX => {
                debug!("reading MX");
                RData::MX(try!(rdata::mx::read(decoder)))
            }
            RecordType::NULL => {
                debug!("reading NULL");
                RData::NULL(try!(rdata::null::read(decoder, rdata_length)))
            }
            RecordType::NS => {
                debug!("reading NS");
                RData::NS(try!(rdata::name::read(decoder)))
            }
            RecordType::NSEC => {
                debug!("reading NSEC");
                RData::NSEC(try!(rdata::nsec::read(decoder, rdata_length)))
            }
            RecordType::NSEC3 => {
                debug!("reading NSEC3");
                RData::NSEC3(try!(rdata::nsec3::read(decoder, rdata_length)))
            }
            RecordType::NSEC3PARAM => {
                debug!("reading NSEC3PARAM");
                RData::NSEC3PARAM(try!(rdata::nsec3param::read(decoder)))
            }
            RecordType::OPT => {
                debug!("reading OPT");
                RData::OPT(try!(rdata::opt::read(decoder, rdata_length)))
            }
            RecordType::PTR => {
                debug!("reading PTR");
                RData::PTR(try!(rdata::name::read(decoder)))
            }
            RecordType::RRSIG => {
                debug!("reading RRSIG");
                RData::SIG(try!(rdata::sig::read(decoder, rdata_length)))
            }
            RecordType::SIG => {
                debug!("reading SIG");
                RData::SIG(try!(rdata::sig::read(decoder, rdata_length)))
            }
            RecordType::SOA => {
                debug!("reading SOA");
                RData::SOA(try!(rdata::soa::read(decoder)))
            }
            RecordType::SRV => {
                debug!("reading SRV");
                RData::SRV(try!(rdata::srv::read(decoder)))
            }
            RecordType::TXT => {
                debug!("reading TXT");
                RData::TXT(try!(rdata::txt::read(decoder, rdata_length)))
            }
        };
@briansmith briansmith changed the title Refactor parser to minimize exposure to parsing bugs for unwanted record types RFC: Refactor parser to minimize exposure to parsing bugs for unwanted record types Oct 25, 2017
@bluejekyll
Copy link
Member

bluejekyll commented Oct 25, 2017

This is an interesting idea. I worry that the indirection might be confusing, to new people coming to the project, but enough docs should make that ok.

I'm wondering if there is a way to skip the use of Optional, and instead just have different impls of the read method associated to the Parsers. Also, I tend to reserve parser for text based input, and deserializer/reader for binary (not sure what best practice is here in Rust, but I think this would be).

I get the value of reducing the potential attack vectors based on usage, but I worry that it's a little excessive. Should we just use the --features=dnssec-* as the method for enabling/disabling dnssec support? As to each lookup variant having it's own set of parsers, one challenge will be the generic lookup will potentially be hard to get the set of parsers correct.

Question, is it better to harden all the parsers against possible exploit, or should we actual limit the exposure of each request to these individualized sets? This may incur a cost of needing to test more variations of possible results to each request type. As a side note, many DNS servers/resolvers will return additional records in the nameservers and additionals section of the response Message. My basic concern is that the set of parsers would end up only varying by 1 or 2 supported record types.

I'm not against this change, I just worry that it might be costly to guarantee correctness (I may be overly concerned about this, but I'm not sure).

@briansmith
Copy link
Contributor Author

First, I should have mentioned that this idea was motivated by
CVE-2017-15650.

This is an interesting idea. I worry that the indirection might be confusing, to new people coming to the project, but enough docs should make that ok.

I expect that it won't affect the public interface of -resolver at all, and it probably won't affect the exposed interface of anything other than -proto either.

I'm wondering if there is a way to skip the use of Optional, and instead just have different impls of the read method associated to the Parsers.

I don't think the use of Option is going to cause any problem. I don't quite understand what you're suggesting, but I'm guessing you mean that we'd have a trait and then different implementations of that trait for each kind of query. I'm not against that but I doubt it would be less source code or less generated code and also I don't see a runtime performance benefit to it, but I'm likely overlooking something.

Also, I tend to reserve parser for text based input, and deserializer/reader for binary (not sure what best practice is here in Rust, but I think this would be).

Noted. I'll try to use your terminology.

(Quoted out of order):

Should we just use the --features=dnssec-* as the method for enabling/disabling dnssec support?

I just did that in #267. That idea is not mutually-exclusive with this idea, though I think a good implementation of this idea will the need to do #[cfg(feature = "dnssec")] mostly unnecessary in most of the code.

I get the value of reducing the potential attack vectors based on usage, but I worry that it's a little excessive.

One of the challenges to maintaining a security-critical library is that people who are hardcore about minimalism and security are always at odds with people who want more features. The idea here is to placate the people who value extreme security and minimalism while still making it easy to maintain and add non-core features.

Also, as is usually the case in security engineering, we take a layered approach, because the whole idea of security engineering is that you don't know where the (security) bugs in your code are, so you have to do things that generally mitigate vulnerabilities.

As to each lookup variant having it's own set of parsers, one challenge will be the generic lookup will potentially be hard to get the set of parsers correct.

The generic lookup function could be changed to delegate to the more specific lookup functions, so that it would do the minimally-risky thing for free. And/or, we could just warn people that the general lookup function has maximum exposure to risks related to processing hostile inputs, and recommend they use the more specific functions whenever practical. In any case, we don't have to optimize for the security of the general lookup function first; we can do lookup_ipv4(), lookup_ipv6(), lookup_service(), etc. first. (More generally, in priority order according to contributors' priorities and resources available.)

Question, is it better to harden all the parsers against possible exploit, or should we actual limit the exposure of each request to these individualized sets?

In the long term, it is better to harden all the parsers because that benefits the most people. However, these two things are not mutually exclusive. Even with hardened parsers, people want minimal exposure to risk.

This may incur a cost of needing to test more variations of possible results to each request type. As a side note, many DNS servers/resolvers will return additional records in the nameservers and additionals section of the response Message. My basic concern is that the set of parsers would end up only varying by 1 or 2 supported record types.

That's a good point, and that might make this idea moot. My thinking was specifically that I'd like to not think about MX or SOA or other domain-management records when doing A/AAA/SRV lookups.

I'm not against this change, I just worry that it might be costly to guarantee correctness (I may over concerned about this, but I'm not sure).

When I have time, I'll try to make a minimal proof-of-concept PR that does this for lookup_ipv4() to show the advantages and costs.

@bluejekyll
Copy link
Member

I really do like all the reasoning here. In terms of results coming back, though:

My thinking was specifically that I'd like to not think about MX or SOA or other domain-management records when doing A/AAA/SRV lookups.

It's definitely possible to remove SRV and SOA from A or AAAA queries, though lookup_ip is dual-stack, and that actually will return A and AAAA, though in multiple requests. One thing that will happen is CNAMEs will potentially (very likely) be returned. Additionally, NS records and SOA records are both useful as references to the actual zone authorities. SOA specifically has default caching information for the zone, which should be useful in the long run.

Basically I think what we'll end up with is a core set of RData types present in all requests, SOA, NS, CNAME, etc... And then the variations would be on the terminal record types, like A, AAAA, SRV, etc. Does that make sense?

@bluejekyll
Copy link
Member

Ok, as I started working on #280 I started thinking about this more.

  1. I think we should create concrete types for each RData variant, separate from the RData enum.
  2. Implement a macro that would define the set of RData's valid for each lookup, like:
lookup_types!{
    LookupA; A, CNAME, SOA, NS,
} 

This would produce an enum like RData today:

enum LookupARData {
       A(A), CNAME(CNAME), SOA(SOA),
}
  1. Implement the read() for these variants in the macro as well, which would do the correct match on RecordType etc (need to think about this one more).

  2. pass this into the lookup() methods to restrict the valid set of RDatas parsed during a query as a TypeParameter and then call it through the read method, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants