Skip to content

(PUP-8008) Avoid double-encoding Forge API pagination URIs#6287

Closed
scotje wants to merge 2 commits intopuppetlabs:masterfrom
scotje:pup8008_forge_pagination
Closed

(PUP-8008) Avoid double-encoding Forge API pagination URIs#6287
scotje wants to merge 2 commits intopuppetlabs:masterfrom
scotje:pup8008_forge_pagination

Conversation

@scotje
Copy link
Contributor

@scotje scotje commented Oct 18, 2017

This should resolve the issue with PMT following the pagination links returned by the Forge API.

@scotje scotje requested a review from Iristyle October 18, 2017 16:56
@scotje
Copy link
Contributor Author

scotje commented Oct 18, 2017

Oops, looks like there are 2 sets of unit tests for the Forge code. I'll fix up the other tests.

@scotje scotje force-pushed the pup8008_forge_pagination branch from 26ee9dc to 5c7139d Compare October 18, 2017 17:46
@puppetcla
Copy link

Waiting for CLA signature by @scotje

@scotje - We require a Contributor License Agreement (CLA) for people who contribute to Puppet, but we have an easy click-through license with instructions, which is available at https://cla.puppet.com/

Note: if your contribution is trivial and you think it may be exempt from the CLA, please post a short reply to this comment with details. http://docs.puppet.com/community/trivial_patch_exemption.html

def fetch(input)
name = input.tr('/', '-')
uri = "/v3/releases?module=#{name}"
uri = "/v3/releases?module=#{name}&sort_by=version"
Copy link
Contributor

@MosesMendoza MosesMendoza Oct 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why did you add the sort by query param here? (could this go in the commit message?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some info to the commit message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you query directly by version number here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but PMT currently collects all the available versions in case it needs them when resolving dependencies.

uri = "/v3/releases?module=#{name}&sort_by=version"
if Puppet[:module_groups]
uri += "&module_groups=#{Puppet[:module_groups].gsub('+', ' ')}"
uri += "&module_groups=#{Puppet[:module_groups]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any downside to leaving these gsub's here in case something else is using this uri?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-added the gsubs.

@scotje scotje force-pushed the pup8008_forge_pagination branch from 5c7139d to 6fb7f2d Compare October 18, 2017 19:57
@scotje
Copy link
Contributor Author

scotje commented Oct 18, 2017

Updated commit message for sort_by change.

@puppetcla
Copy link

CLA signed by all contributors.

…ersion

If there is a problem traversing additional pages of results from the
Forge, this at least will ensure that the latest versions of the
requested modules are found and installable. This also enables a future
optimization of only fetching the first page of results for each module
and seeing if that is sufficient to resolve dependencies.
@scotje scotje force-pushed the pup8008_forge_pagination branch from 6fb7f2d to 86345b0 Compare October 18, 2017 20:05
@repository.make_http_request(*args)
def make_http_request(uri, *args)
# Change plus to space so it can be encoded properly
uri = uri.gsub('+', ' ')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is called by download as well, I'm not sure that this is the safest location to make this change. I think it was intentional to make that mod inline before.

@Iristyle
Copy link
Contributor

Iristyle commented Oct 18, 2017

I think it would be good to look at a couple of things before making changes here:

  • Can we upload a module to the forge that contains UTF-8 characters (and will match a local query) for the sake of some additional testing / debugging?
  • Can we look at how the old PMT client behaved ... prior to any of these changes going in? Something still doesn't totally add up here for me, and I think it would helpful to have the actual client / server interactions (at least in terms of request URIs) to understand this better.
    • For instance, in Puppet 4.8, the request trickles through to
      request = Net::HTTP::Get.new(Puppet::Util.uri_encode(path), headers)
    • In current code, the request trickles through to
      request = Net::HTTP::Get.new(Puppet::Util.uri_encode(path), headers)
    • Those look the same (though the underlying Puppet::Util code changed), and we weren't decoding URIs before... what exactly about the interaction has changed? (Is it just the bit about the + or were other things also subtly wrong before that we just didn't run into)
  • What other clients are consuming the JSON payload? Given the request code mentioned in prior bullet, why do URIs need to be passed encoded in the JSON, given they're being encoded again on the way out of PMT?

@scotje
Copy link
Contributor Author

scotje commented Oct 19, 2017

Can we upload a module to the forge that contains UTF-8 characters (and will match a local query) for the sake of some additional testing / debugging?

In the actual module name? I don't think non-ASCII characters have ever been allowed in module names on the Forge. (See https://github.com/puppetlabs/puppet-forge-api/blob/master/app/validators.rb for the accepted regexes) But we can upload whatever we want to one of the non-public Forge instances for testing.

Can we look at how the old PMT client behaved ... prior to any of these changes going in? Something still doesn't totally add up here for me, and I think it would helpful to have the actual client / server interactions (at least in terms of request URIs) to understand this better. For instance, in Puppet 4.8...

I'm guessing the pagination has probably been broken since then? It only manifests if the requested release isn't in the first page of results so it may have taken that long for the issue to surface.

The previous commit to that:

request = Net::HTTP::Get.new(URI.escape(path), headers)

Uses URI.escape which leaves the pluses alone:

irb(main):002:0> URI.escape('module_groups=base+pe_only')
=> "module_groups=base+pe_only"

You can see the issue pretty clearly if you adjust the debug logging the way I did in the PR without any of the other changes:

$ bx bin/puppet module install puppetlabs-apache -v 2.3.0 --modulepath=/Users/jesse/sandbox/pmt --debug --trace --module_groups="base+pe_only"
Debug: Runtime environment: puppet_version=5.3.2, ruby_version=2.4.0, run_mode=user, default_encoding=UTF-8
Notice: Preparing to install into /Users/jesse/sandbox/pmt ...
Notice: Downloading from https://forgeapi.puppet.com ...
Debug: Evicting cache entry for environment 'production'
Debug: Caching environment 'production' (ttl = 0 sec)
Debug: HTTP GET https://forgeapi.puppet.com/v3/releases?module=puppetlabs-apache&module_groups=base%20pe_only
Debug: HTTP GET https://forgeapi.puppet.com/v3/releases?module=puppetlabs-apache&module_groups=base%2Bpe_only&limit=20&offset=20
Debug: HTTP GET https://forgeapi.puppet.com/v3/releases?module=puppetlabs-stdlib&module_groups=base%20pe_only
Debug: HTTP GET https://forgeapi.puppet.com/v3/releases?module=puppetlabs-stdlib&module_groups=base%2Bpe_only&limit=20&offset=20
Debug: HTTP GET https://forgeapi.puppet.com/v3/releases?module=puppetlabs-concat&module_groups=base%20pe_only
Debug: HTTP GET https://forgeapi.puppet.com/v3/releases?module=puppetlabs-concat&module_groups=base%2Bpe_only&limit=20&offset=20
Debug: HTTP GET https://forgeapi.puppet.com/v3/releases?module=puppetlabs-firewall&module_groups=base%20pe_only
Debug: HTTP GET https://forgeapi.puppet.com/v3/releases?module=puppetlabs-firewall&module_groups=base%2Bpe_only&limit=20&offset=20
Error: Could not install 'puppetlabs-apache' (v2.3.0)
  No releases matching '2.3.0' are available from https://forgeapi.puppet.com

Specifically, the first request for each page has module_groups=base%20pe_only (the plus was converted to a space and then encoded) but the subsequent requests have module_groups=base%2Bpe_only (literal plus encoded as %2B).

What other clients are consuming the JSON payload? Given the request code mentioned in prior bullet, why do URIs need to be passed encoded in the JSON, given they're being encoded again on the way out of PMT?

It's a public API that has been documented and recommended for ~4 years so it's hard to definitively answer that question. At the least there is r10k, forge-ruby, librarian-puppet, the new Artifactory stuff.

@scotje
Copy link
Contributor Author

scotje commented Oct 19, 2017

I still think this change is worth making, but in typing out my last response I realized we can probably work around this to some extent by accepting a literal + as a module group separator on the Forge as well.

@scotje
Copy link
Contributor Author

scotje commented Oct 24, 2017

Closing this, opening a new fix targeted at 4.10.x first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants