Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rebar_packages_cdn for fetching package resource #2479

Merged
merged 4 commits into from
Feb 5, 2021

Conversation

belltoy
Copy link
Contributor

@belltoy belltoy commented Jan 30, 2021

Not only r3_hex_repo:get_tarball need to use rebar_packages_cdn, but also all the other exported functions from r3_hex_repo.

Since only get_package and get_tarball are used, I apply cdn option with them but the rests.

Maybe this should be a config option inside the vender hex_core.

Not only `r3_hex_repo:get_tarball` need to use `rebar_packages_cdn`, but also all the other exported functions from `r3_hex_repo`.

Since only `get_package` and `get_tarball` are used, I apply cdn option with them but the rests.

Maybe this should be a config option inside the vender hex_core.
@tsloughter
Copy link
Collaborator

Hm, I had to look into the code because I don't remember much of how this is supposed to work, particularly in relation to the support for a priority list of repos that exists now.

But first thing I noticed is in rebar3.erl we do:

HexCDN = case os:getenv("HEX_CDN") of
                 false -> ?DEFAULT_CDN;
                 CDN -> CDN
             end,
    State2 = rebar_state:set(State1, rebar_packages_cdn, HexCDN),

So I'm not sure this is needed because it should be setting rebar_packages_cdn to the right value on startup. Are you getting different results when trying to use it?

@tsloughter
Copy link
Collaborator

Oh wait, this would mean you couldn't set rebar_packages_cdn but had to use the OS var HEX_CDN...

@belltoy
Copy link
Contributor Author

belltoy commented Jan 30, 2021

Yes, the state was set on startup, but not use it correctly when performing a request to repo_url. The only one place to use it was performing get_tarball request. It should be used for all requests to the repo_url. That's the option mean to be, right?

In my patch, I get the rebar_packages_cdn from the state and replace repo_url to it before performing request get_package.

@ferd
Copy link
Collaborator

ferd commented Jan 30, 2021

I think this option is probably a leftover of the time before we started supporting many indexes and mirrors as part of config and doesn't really make sense in a world where you can have 3-4 hex configs at once but would override it with a single CDN URL. See http://rebar3.org/docs/configuration/configuration/#hex-repos-and-indexes

I'm not sure we can keep supporting the HEX_CDN behaviour correctly with that stuff.

@belltoy
Copy link
Contributor Author

belltoy commented Jan 31, 2021

According to #2204 and #1884, and the hex.pm docs, the name of rebar_packages_cdn should means the mirror of the official hex.pm repo.
Most people use the default repo, so a simple HEX_CDN environment variable would works fine for it. For compatibility and convenience, I think the rebar_packages_cdn being applied to the default hex repo only should be the right thing to do.

If this make sense, I will update the codes in this PR. updated 2ea46f2.

This patch should fix #2346 as well.

@gilbertwong96
Copy link

I think this option is probably a leftover of the time before we started supporting many indexes and mirrors as part of config and doesn't really make sense in a world where you can have 3-4 hex configs at once but would override it with a single CDN URL. See http://rebar3.org/docs/configuration/configuration/#hex-repos-and-indexes

I'm not sure we can keep supporting the HEX_CDN behaviour correctly with that stuff.

HEX_CDN(mirror_url) is definitely useful in the china mainland where GFW exists.

@belltoy
Copy link
Contributor Author

belltoy commented Feb 1, 2021

HEX_CDN(mirror_url) is definitely useful in the china mainland where GFW exists.

+1

That influence us all, who were behind that wall, deeply in daily development and production deployment.

Please help to confirm this patch soon. cc @ferd @tsloughter

Copy link
Collaborator

@ferd ferd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is fine as long as HEX_CDN does only override the default repo_url.

There are however misleading debug comments and details about the implementation that I think need to be changed to be clearer. This is a bit unfortunate because you're the one currently fixing technical debt we introduced back then. Sorry, and thanks for your work there!

If you'd rather not do more work on this PR, I'm fine doing the changes myself before merging in the main branch.

@@ -243,6 +243,7 @@ parse_checksum(Checksum) ->

update_package(Name, RepoConfig=#{name := Repo}, State) ->
?MODULE:verify_table(State),
?DEBUG("Getting package ~ts resource from repo ~ts", [Name, maps:get(repo_url, RepoConfig, undefined)]),
Copy link
Collaborator

@ferd ferd Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that it makes the most sense to add this debug statement here. This call is internal and can get a lot of calls, with some of this information not actually being accurate. The update_package call here for example gets a debug message when it fails in rebar_prv_upgrade which refers to the repo's name and not the URL. That's because internally, the call gets the package definition and not the actual package, and the URL is not just the base value configured here, but has a subpath that may be organisation specific as well.

This is likely to be more misleading than it helps if something goes wrong. It might make more sense to have something like the following, maybe:

Suggested change
?DEBUG("Getting package ~ts resource from repo ~ts", [Name, maps:get(repo_url, RepoConfig, undefined)]),
?DEBUG("Getting definition for package ~ts from repo ~ts", [Name, maps:get(name, RepoConfig)]),

Copy link
Contributor Author

@belltoy belltoy Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as below, RepoConfig#{repo_url := Value} here is use for build the URL for making request in r3_hex_repo as a prefix. The question is its name, repo_url is just a prefix, not a hole url for a request. But it's still useful in the debug output for people who want to check the http requests detail. Maybe describe it repo_url directly won't make misleadings. So how about

Suggested change
?DEBUG("Getting package ~ts resource from repo ~ts", [Name, maps:get(repo_url, RepoConfig, undefined)]),
?DEBUG("Getting definition for package ~ts from repo ~ts via repo_url ~ts",
[Name, maps:get(name, RepoConfig, undefined), maps:get(repo_url, RepoConfig, undefined)]),

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that can work. One of the things I might do while merging is to follow the precedent I started setting in #2375 and #2393 to make it more obvious which settings are active at download time. I'll print the configured map with private fields elided and use some caching to make sure they're not printed more than once per run or something. I don't yet know which shape I want it to take, so I'll avoid blocking your PR on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Debugging format won't affect most uses I guess. But they are still worth to be considered and well-designed.

src/rebar_hex_repos.erl Outdated Show resolved Hide resolved
CDN = maybe_default_cdn(State),
case request(RepoConfig#{repo_url => CDN}, Name, Vsn, ETag) of
RepoUrl = maps:get(repo_url, RepoConfig, undefined),
?DEBUG("Downloading package ~ts from repo ~ts", [Name, RepoUrl]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment review here about the debug message. This isn't accurate and can be misleading. I understand you want to see the hex cdn being active, but this isn't a good way to get it because for other issues (i.e. people configuring the hex repo config without that URL), it actually obscures the information that's going on.

Do note that the package may also not be downloaded, but just fetched from cache. The debug call about downloading should happen nearer to the actual download call.

Copy link
Contributor Author

@belltoy belltoy Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agreed with you about this. repo_url isn't the actual requesting URL but only a prefix endpoint. I think we can debug repo_url right before calling r3_hex_repo:some_function which do the actual request.

But here is the appropriate place to print debug info I think, because the next request call do the actual http request to the tarball via r3_hex_repo:get_tarball. And more, the next debug info right after the request call will show the actual place the tarball got from, which was added by me.

        {ok, Body, NewETag} ->
            ?DEBUG("Downloaded package from repo ~ts, caching at ~ts", [CDN, CachePath]),

I notice that the RepoConfig#{repo_url := URI} was used by the build_url in vendored/r3_hex_repo.erl, which means repo_url here must have a value.

The hard thing to me is how to describe this debug info, does it acceptable below?

Suggested change
?DEBUG("Downloading package ~ts from repo ~ts", [Name, RepoUrl]),
?DEBUG("Making request to get package ~ts tarball from repo ~ts via repo_url ~ts",
[Name, maps:get(name, RepoConfig, undefined), maps:get(repo_url, RepoConfig, undefined)]),

src/rebar_state.erl Show resolved Hide resolved
@belltoy
Copy link
Contributor Author

belltoy commented Feb 1, 2021

Thanks for the review, I will update the PR soon.

@belltoy
Copy link
Contributor Author

belltoy commented Feb 2, 2021

image

That's why I need to apply HEX_CDN on r3_hex_repo:get_package()

Copy link
Collaborator

@ferd ferd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving this. I'll try and run it tonight (I've got a few other changesets to fold in) and see if I end up doing a few changes as mentioned in the review, such as tweaking the debug output to line up with newer approaches, and possibly moving the new rebar_state function to utils but as far as what we expect of a contributor to be done, this is good to go.

Thanks for the contribution :)

@ferd
Copy link
Collaborator

ferd commented Feb 5, 2021

The changes for formatting will be a bit more significant (I caught a weird edge case where we show private keys in debug output) so I'll make a follow-up PR to cover this one. Merging as is.

@ferd ferd merged commit 5603911 into erlang:master Feb 5, 2021
@ferd
Copy link
Collaborator

ferd commented Feb 5, 2021

follow-up in #2484

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants