Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add limit option to hab pkg search #6581

Merged
merged 3 commits into from Jun 13, 2019

Conversation

@davidMcneil
Copy link
Contributor

commented May 23, 2019

This adds a --limit option to hab pkg search that lets users set an arbitrary limit when searching for packages. Before it was restricted to 50.

This uses the existing builder api. Future work could improve the builder api cleaning up the logic of the cli. For example, it is currently necessary to do multiple http requests to get the results.

I also made output that is not directly printing package idents pipe to stderr instead of stdout. I have found this makes tools easier to use in conjunction with other command line utilities.

Resolves #56

@chef-expeditor

This comment has been minimized.

Copy link

commented May 23, 2019

Hello davidMcneil! Thanks for the pull request!

Here is what will happen next:

  1. Your PR will be reviewed by the maintainers.
  2. If everything looks good, one of them will approve it, and your PR will be merged.

Thank you for contributing!

@baumanj
Copy link
Member

left a comment

Since we're making a change here, I'd like to take the opportunity to add tests. I think the best way to get this into a testable state would be to factor out the API call through dependency injection. We can pull out the API specific code into its own function like:

    fn seach_package_with_range(&self,
                                search_term: &str,
                                token: Option<&str>,
                                range: usize)
                                -> Result<(PackageResults<PackageIdent>, bool)> {
        let req = self.0
                      .get_with_custom_url(&package_search(search_term), |url| {
                          url.set_query(Some(&format!("range={:?}&distinct=true", range)));
                      });
        let mut res = self.maybe_add_authz(req, token).send()?;
        let mut encoded = String::new();
        res.read_to_string(&mut encoded)
           .map_err(Error::BadResponseBody)?;
        let package_results = serde_json::from_str(&encoded)?;
        match res.status {
            StatusCode::Ok => Ok((package_results, false)),
            StatusCode::PartialContent => Ok((package_results, true)),
            _ => Err(err_from_response(res)),
        }
    }

and then the original code becomes:

    pub fn search_package(&self,
                          search_term: &str,
                          limit: usize,
                          token: Option<&str>)
                          -> Result<(Vec<PackageIdent>, usize)> {
        self.search_package_impl(search_term, limit, token, Self::seach_package_with_range)
    }

Now search_package_impl can be easily unit tested (and benchmarked) by providing a closure that simulates various API responses.

Once I tried factoring seach_package_with_range out, a different approach to search_package_impl revealed itself. Since the original while loop expressed a do-while operation, I found the logic could be simplified by switching to a simple loop (leaving out the optimization suggestions and isizeusize fix to show the structure more clearly):

    fn search_package_impl(&self,
                           search_term: &str,
                           limit: usize,
                           token: Option<&str>,
                           search_with_range: impl Fn(&Client,
                              &str,
                              Option<&str>,
                              usize)
                              -> Result<(PackageResults<PackageIdent>, bool)>)
                           -> Result<(Vec<PackageIdent>, usize)> {
        let mut packages = Vec::new();

        loop {
            let (mut package_results, more_to_come) =
                search_with_range(self, search_term, token, packages.len())?;

            packages.append(&mut package_results.data);

            if packages.len() < limit && more_to_come {
                continue;
            } else {
                packages.truncate(limit);
                return Ok((packages, package_results.total_count));
            }
        }
    }
status_code = res.status;
}
_ => return Err(err_from_response(res)),
};

This comment has been minimized.

Copy link
@baumanj

baumanj May 23, 2019

Member
Suggested change
};
}

A small thing, but I find being intentional about statement vs expression can allow the compiler to be more helpful in spotting errors.

.map_err(Error::BadResponseBody)?;
let package_results: PackageResults<PackageIdent> =
serde_json::from_str(&encoded)?;
packages.extend(package_results.data);

This comment has been minimized.

Copy link
@baumanj

baumanj May 23, 2019

Member
Suggested change
packages.extend(package_results.data);
packages.append(&mut package_results.data);

I believe this should allow more optimizations since package_results.data doesn't need to remain independent.

This comment has been minimized.

Copy link
@baumanj

baumanj May 23, 2019

Member

Alternatively, we can use a different approach that may not be quite as optimizable (though it does eliminate the explicit copy), but prevents over-adding, allowing us to remove the packages.truncate(limit); below:

Suggested change
packages.extend(package_results.data);
let num_until_limit = package_results.data.len().min(limit - packages.len());
packages.extend(package_results.data.drain(..num_until_limit));
let mut encoded = String::new();
res.read_to_string(&mut encoded)
.map_err(Error::BadResponseBody)?;
let package_results: PackageResults<PackageIdent> =

This comment has been minimized.

Copy link
@baumanj

baumanj May 23, 2019

Member

I'd check this with a super large result set to see whether it makes a perceptible difference in response time*, but it may be more efficient to allocate the vector only once (allowing for the remote possibility of the total count increasing enough during the loop that a reallocation is required:

                    if package_results.total_count > 0 {
                        packages.reserve(limit.min(package_results.total_count as usize));
                    }

While we're at it, can we look at possibly changing PackageResults isize fields to usize? I don't see any reason why any of those fields should ever be negative and it clutters the consuming code to deal with that possibility. If they were usize the reservation code could be simplified to

                    packages.reserve(limit.min(package_results.total_count));

* I didn't bother with release, but in debug, with a query for "core" (~750 results), it made the difference between a 2.5 and 3 second runtime.

This comment has been minimized.

Copy link
@chefsalim

chefsalim May 23, 2019

Member

There's likely a big difference between debug and release timings... from what I've found in testing perf in the past

This comment has been minimized.

Copy link
@davidMcneil

davidMcneil May 29, 2019

Author Contributor

@baumanj I was unable to reproduce these results both versions ran at around 4.25 seconds on my machine. How are you doing the test?

This comment has been minimized.

Copy link
@baumanj

baumanj May 29, 2019

Member

I was just running

time env RUST_LOG= ./target/debug/hab pkg search --limit=1000 core

But the proper way to compare would be to add benchmark tests that compare the performance. Probably one that includes the network call and one that doesn't would be the most illustrative.

@@ -477,6 +477,8 @@ pub fn get(feature_flags: FeatureFlag) -> App<'static, 'static> {
endpoint. If not specified, the value will be taken from the HAB_BLDR_URL \
environment variable if defined. (default: https://bldr.habitat.sh)")
(@arg AUTH_TOKEN: -z --auth +takes_value "Authentication token for Builder")
(@arg LIMIT: -l --limit +takes_value default_value("50") {valid_numeric::<usize>}
"Limit how many packages to retrieve (default: 50)")

This comment has been minimized.

Copy link
@baumanj

baumanj May 23, 2019

Member
Suggested change
"Limit how many packages to retrieve (default: 50)")
"Limit how many packages to retrieve")

When using clap's default_value method, it will automatically add that information to the help output, so this code results in redundancy:

    -l, --limit <LIMIT>        Limit how many packages to retrieve (default: 50) [default: 50]
if more {
println!("Search returned too many items, only showing the first {}",
packages.len());
if packages.len() != total as usize {

This comment has been minimized.

Copy link
@baumanj

baumanj May 23, 2019

Member

This cast isn't safe. I'd rather deal with the isize weirdness within search_package and have it return a usize here. Once that's in place, this should more accurately be

Suggested change
if packages.len() != total as usize {
if packages.len() < total {

because it's not impossible for the total number of packages matching the query to go down during the course of the query and in that case it would be confusing to print "Search returned too many items".

This comment has been minimized.

Copy link
@chefsalim

chefsalim May 24, 2019

Member

I think the cast is safe, but Jon's suggestion is better overall

@@ -784,8 +784,12 @@ fn sub_pkg_provides(m: &ArgMatches<'_>) -> Result<()> {
fn sub_pkg_search(m: &ArgMatches<'_>) -> Result<()> {
let url = bldr_url_from_matches(&m)?;
let search_term = m.value_of("SEARCH_TERM").unwrap(); // Required via clap
let limit = m.value_of("LIMIT").unwrap().parse().unwrap(); // Required via clap

This comment has been minimized.

Copy link
@baumanj

baumanj May 23, 2019

Member

I learned my lesson awhile back about using expect even when failure should be impossible, since it makes zeroing in on the issue in release builds way easier. I'd suggest changing both this and the above line.

Suggested change
let limit = m.value_of("LIMIT").unwrap().parse().unwrap(); // Required via clap
let limit = m.value_of("LIMIT")
.expect("required opt LIMIT")
.parse()
.expect("valid LIMIT");

This comment has been minimized.

Copy link
@chefsalim
}
packages.truncate(limit);
Ok((packages, total))

This comment has been minimized.

Copy link
@baumanj

baumanj May 23, 2019

Member

In order to avoid the nonsensical isize leaking out to callers, I think it would be better to clean thing up here (if we can't change PackageResults itself):

Suggested change
Ok((packages, total))
total = if package_results.total_count > 0 {
packages.len().max(package_results.total_count as usize)
} else {
packages.len()
};
Ok((packages, total))

This comment has been minimized.

Copy link
@chefsalim

chefsalim May 24, 2019

Member

The isize is due to PG/Diesel.. I think it might be fine to change PackageResult itself.. but it's probably safest to just adjust the return here to be usize. It should also be safe to cast to usize as we're not actually ever going to return negative counts.

@davidMcneil davidMcneil force-pushed the dmcneil/improve-search branch from cf41b62 to 863c5f3 May 29, 2019

@davidMcneil davidMcneil force-pushed the dmcneil/improve-search branch 3 times, most recently from cdf26b3 to 2abbd8a Jun 10, 2019

@baumanj
Copy link
Member

left a comment

I'd suggest one tiny tweak. Otherwise this looks awesome! Thanks for incorporating the feedback.

assert_eq!(r.1, 0);
}

#[ignore]

This comment has been minimized.

Copy link
@baumanj

baumanj Jun 12, 2019

Member

Why is this ignored? I'm guessing it's because the test tends to take over 60s, and gives the package_search_large has been running for over 60 seconds warning, but on my machine it completes in ~90 seconds, so I think it's fine to put it in the normal suite. It doesn't ever fail, does it?

I'd say #[ignored] is for things that are inconsistent, but we'd still like to fix them up. Tests like that won't be noticed if they catch a regression, I'd want to know if this started failing.

This comment has been minimized.

Copy link
@davidMcneil

davidMcneil Jun 13, 2019

Author Contributor

No, it does not ever fail. This relates to our conversation on benchmarking. Ideally, this would be a benchmark and could be run separately from the tests. I like tests to have as fast a feedback loop as possible, but this is probably abusing the ignore attribute.

davidMcneil added some commits May 23, 2019

Add limit option to hab pkg search
Signed-off-by: David McNeil <dmcneil@chef.io>
Add package search tests
Signed-off-by: David McNeil <dmcneil@chef.io>
Remove ignore from test
Signed-off-by: David McNeil <mcneil.david2@gmail.com>

@davidMcneil davidMcneil force-pushed the dmcneil/improve-search branch from 2abbd8a to 4018c9c Jun 13, 2019

@davidMcneil davidMcneil merged commit 56a135d into master Jun 13, 2019

5 checks passed

DCO This commit has a DCO Signed-off-by
Details
buildkite/habitat-sh-habitat-master-verify Build #2339 passed (59 minutes, 51 seconds)
Details
buildkite/habitat-sh-habitat-master-website Build #2690 passed (43 seconds)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
expeditor/config-validation Validated your Expeditor config file
Details

chef-ci added a commit that referenced this pull request Jun 13, 2019

Update CHANGELOG.md with details from pull request #6581
Obvious fix; these changes are the result of automation not creative thinking.
@bixu

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2019

🎉

@davidMcneil davidMcneil deleted the dmcneil/improve-search branch Jul 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.