Noarch repodata.json is empty for proxy channels #527

janjagusch · 2022-05-10T11:44:52Z

In our quetz server, we proxy conda-forge into a channel of the same name:

  {
    "name": "conda-forge",
    "description": null,
    "private": true,
    "size_limit": null,
    "ttl": 36000,
    "mirror_channel_url": "https://conda.anaconda.org/conda-forge",
    "mirror_mode": "proxy",
    "members_count": 1,
    "packages_count": 0
  },

Installation often fails because certain packages cannot be found. We narrowed the issue down to the noarch repodata.json being empty. Navigating to /get/conda-forge/noarch/repodata.json yields:

{
  "info": {
    "subdir": "noarch"
  },
  "packages": {},
  "packages.conda": {},
  "repodata_version": 1
}

This problem only occurs for the noarch platform. Also, in our package backend (GCS) we have a cached version of the repodata.json, which is not empty (it's 55MB large).

Any idea what might be causing this?

The text was updated successfully, but these errors were encountered:

wolfv · 2022-05-10T12:10:04Z

hmmm, this is the code in question:

quetz/quetz/main.py

Lines 1645 to 1654 in 05dbfc6

    
           if channel.mirror_channel_url and channel.mirror_mode == "proxy": 
        
               repository = RemoteRepository(channel.mirror_channel_url, session) 
        
               if not pkgstore.file_exists(channel.name, path): 
        
                   download_remote_file(repository, pkgstore, channel.name, path) 
        
               elif path.endswith(".json"): 
        
                   # repodata.json and current_repodata.json are cached locally 
        
                   # for channel.ttl seconds 
        
                   _, fmtime, _ = pkgstore.get_filemetadata(channel.name, path) 
        
                   if time.time() - fmtime >= channel.ttl: 
        
                       download_remote_file(repository, pkgstore, channel.name, path)

I wonder if the gzip magic does something bad here:

quetz/quetz/main.py

Lines 1673 to 1683 in 05dbfc6

    
           if accept_encoding and 'gzip' in accept_encoding and path.endswith('.json'): 
        
               # return gzipped response 
        
               try: 
        
                   package_content_iter = iter_chunks( 
        
                       pkgstore.serve_path(channel.name, path + '.gz') 
        
                   ) 
        
                   path += '.gz' 
        
                   headers['Content-Encoding'] = 'gzip' 
        
                   headers['Content-Type'] = 'application/json' 
        
               except FileNotFoundError: 
        
                   pass

Since we might not have the proper repodata.json.gz file on the package store ...

janjagusch · 2022-05-10T14:30:54Z

hmmm, this is the code in question:

quetz/quetz/main.py

Lines 1645 to 1654 in 05dbfc6

if channel.mirror_channel_url and channel.mirror_mode == "proxy":

repository = RemoteRepository(channel.mirror_channel_url, session)

if not pkgstore.file_exists(channel.name, path):

download_remote_file(repository, pkgstore, channel.name, path)

elif path.endswith(".json"):

# repodata.json and current_repodata.json are cached locally

# for channel.ttl seconds

_, fmtime, _ = pkgstore.get_filemetadata(channel.name, path)

if time.time() - fmtime >= channel.ttl:

download_remote_file(repository, pkgstore, channel.name, path)

I wonder if the gzip magic does something bad here:

quetz/quetz/main.py

Lines 1673 to 1683 in 05dbfc6

if accept_encoding and 'gzip' in accept_encoding and path.endswith('.json'):

# return gzipped response

try:

package_content_iter = iter_chunks(

pkgstore.serve_path(channel.name, path + '.gz')

)

path += '.gz'

headers['Content-Encoding'] = 'gzip'

headers['Content-Type'] = 'application/json'

except FileNotFoundError:

pass

Since we might not have the proper repodata.json.gz file on the package store ...

I just checked the repodata.json.gz and it's empty (check file size):

Interestingly, the other platform don't contain the repodata.json.gz, only repodata.json.

janjagusch · 2022-05-10T14:35:40Z

https://conda.anaconda.org/conda-forge

One more thing: repodata.json.gz also doesn't seem to exist on the upstream channel, see: https://conda.anaconda.org/conda-forge/noarch/repodata.json.gz

janjagusch · 2022-05-10T14:42:47Z

https://conda.anaconda.org/conda-forge

One more thing: repodata.json.gz also doesn't seem to exist on the upstream channel, see: https://conda.anaconda.org/conda-forge/noarch/repodata.json.gz

Deleting repodata.json.bz2 and repodata.json.gz seems to solve the issue for me. The question remains where these files come from, though.

wolfv · 2022-05-10T14:43:03Z

OK, so the bug goes as follows:

we added a repodata.json.gz file to support returning / streaming gzip compressed files from S3 buckets and similar
we do initialize every channel with an empty noarch repodata.json because the existence of teh noarch/repodata.json is what marks a channel as "existing"
apparently we do initialize even a proxy-mirror channel with the static noarch/repodata.json files (including the .gz / .bz2 ones). We should not do that.
the repodata.json.gz is a quetz-specific extension (and a bit of a workaround for OVH because they don't properly support setting the Content-Encoding header for a given file)

wolfv · 2022-05-10T14:50:23Z

This is where we call update_indexes for all kinds of channels:

quetz/quetz/main.py

Line 725 in 59adbfd

indexing.update_indexes(dao, pkgstore, new_channel.name)

That creates the empty noarch/repodata.json ...

Should we just not call that for a proxy-mirror channel?

janjagusch · 2022-05-11T06:47:12Z

This is where we call update_indexes for all kinds of channels:

quetz/quetz/main.py

Line 725 in 59adbfd

indexing.update_indexes(dao, pkgstore, new_channel.name)

That creates the empty noarch/repodata.json ...

Should we just not call that for a proxy-mirror channel?

Sounds reasonable to me. 👍

wolfv · 2022-05-11T07:20:24Z

On the other hand, a better solutoin might be to create the .gz files so that we can serve gzipped repodata (which saves a lot, it's 20% or so of the full repodata). E.g. instead of downloading 120Mb, you only need 20 or so.
Did you experience long downloads or are you gzipping the responses through nginx?

janjagusch · 2022-05-11T08:52:35Z

On the other hand, a better solutoin might be to create the .gz files so that we can serve gzipped repodata (which saves a lot, it's 20% or so of the full repodata). E.g. instead of downloading 120Mb, you only need 20 or so. Did you experience long downloads or are you gzipping the responses through nginx?

So far I don't think long download times have been an issue for us. But if we could build it in a way such that we have to send a lot less data over the network, then i would be all in favour for that.

wolfv · 2022-05-12T07:17:02Z

quetz is released, with this bug fixed.

janjagusch · 2022-05-12T07:24:24Z

thank you!

wolfv closed this as completed May 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noarch repodata.json is empty for proxy channels #527

Noarch repodata.json is empty for proxy channels #527

janjagusch commented May 10, 2022

wolfv commented May 10, 2022

janjagusch commented May 10, 2022

janjagusch commented May 10, 2022 •

edited

janjagusch commented May 10, 2022

wolfv commented May 10, 2022

wolfv commented May 10, 2022

janjagusch commented May 11, 2022

wolfv commented May 11, 2022

janjagusch commented May 11, 2022

wolfv commented May 12, 2022

janjagusch commented May 12, 2022

Noarch repodata.json is empty for proxy channels #527

Noarch repodata.json is empty for proxy channels #527

Comments

janjagusch commented May 10, 2022

wolfv commented May 10, 2022

janjagusch commented May 10, 2022

janjagusch commented May 10, 2022 • edited

janjagusch commented May 10, 2022

wolfv commented May 10, 2022

wolfv commented May 10, 2022

janjagusch commented May 11, 2022

wolfv commented May 11, 2022

janjagusch commented May 11, 2022

wolfv commented May 12, 2022

janjagusch commented May 12, 2022

janjagusch commented May 10, 2022 •

edited