Fixed Caltech-101 and Caltech-256 broken links with the official ones #9205

hrsvrn · 2025-09-04T19:40:54Z

Context

With reference to PR #9192

Issue

The google drive links are broken and does not download

How to use?

import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)), # Resize to a common size for pre-trained models
    transforms.ToTensor(),         # Convert image to a tensor
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # Normalize
])


caltech101_dataset = torchvision.datasets.Caltech101(
    root='./data',        # Directory to store the dataset
    download=True,      # Set to True to download if not already present
    transform=transform # Apply the defined transformations
)

caltech256_dataset = torchvision.datasets.Caltech256(
    root='./data',        # Directory to store the dataset
    download=True,      # Set to True to download if not already present
    transform=transform # Apply the defined transformations
)

Fix

Replaced the deadlinks with the official CalTech Repository site downloads and made sure that the data downloading is fixed accordingly

pytorch-bot · 2025-09-04T19:40:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9205

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 3bb37f4 with merge base 7bd8066 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

CMake / windows (windows.g5.4xlarge.nvidia.gpu, cuda, 12.6) / windows-job (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla · 2025-09-04T19:41:01Z

Hi @hrsvrn!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

meta-cla · 2025-09-04T21:18:30Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

AntoineSimoulin · 2025-09-05T13:46:46Z

torchvision/datasets/caltech.py

+        extracted_dir = os.path.join(self.root, "caltech-101")
+        extract_archive(os.path.join(extracted_dir, "101_ObjectCategories.tar.gz"), self.root)
+        extract_archive(os.path.join(extracted_dir, "Annotations.tar.gz"), self.root) # Note: Annotations is now also .tar.gz in the new archive
+        shutil.rmtree(extracted_dir)


Should we re-use the code from other datasets? For instance in "mnist.py" to extract all possible subdirectories?

for gzip_file in os.listdir(gzip_folder): if gzip_file.endswith(".gz"): extract_archive(os.path.join(gzip_folder, gzip_file), self.raw_folder) shutil.rmtree(gzip_folder)

Yes that works as well.. I will change this one :)

AntoineSimoulin · 2025-09-05T14:01:50Z

I realize we have two proposed fixes and issues for this issue #9097 is raising a similar issue and #9098 is implementing a similar fix.

hrsvrn · 2025-09-05T14:06:50Z

Should i still work on this or not?

hrsvrn · 2025-09-06T06:18:42Z

@AntoineSimoulin any updates on this one?

JonasKlotz · 2025-09-08T09:57:54Z

Hey @hrsvrn
@AntoineSimoulin asked whether if it is ok for me if we use your code instead, see my PR #9098 .
I am fine with it, your code looks good so far! I don't know how to officially review it!

AntoineSimoulin · 2025-09-08T21:49:47Z

@hrsvrn yes please let's make it to the finish line for this PR! I just adjusted the linting but I think we are pretty close now! @JonasKlotz I will credit you when merging this PR to acknowledge your suggested changes in #9098!

hrsvrn · 2025-09-08T23:21:46Z

Hey @AntoineSimoulin and @JonasKlotz

sorry for the late reply.
Its totally okay to use my code..
You can merge the PR :)

github-actions · 2025-09-09T12:59:39Z

Hey @AntoineSimoulin!

You merged this PR, but no labels were added.
The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Fixed Caltech-101 and Caltech-256 broken links with the official ones

7607454

meta-cla bot added the cla signed label Sep 4, 2025

AntoineSimoulin reviewed Sep 5, 2025

View reviewed changes

AntoineSimoulin mentioned this pull request Sep 5, 2025

Fix broken download URLs for Caltech101 and Caltech256 datasets #9098

Closed

used the reccomended method for unzipping the files

c99aaa0

AntoineSimoulin added 2 commits September 8, 2025 16:24

Merge branch 'main' into main

bb31716

lint

3bb37f4

AntoineSimoulin merged commit cdc1fee into pytorch:main Sep 9, 2025
59 of 60 checks passed

Fixed Caltech-101 and Caltech-256 broken links with the official ones #9205

Fixed Caltech-101 and Caltech-256 broken links with the official ones #9205

Conversation

hrsvrn commented Sep 4, 2025

Context

Issue

How to use?

Fix

Uh oh!

pytorch-bot bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9205

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

meta-cla bot commented Sep 4, 2025

Action Required

Process

Uh oh!

meta-cla bot commented Sep 4, 2025

Uh oh!

AntoineSimoulin Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

hrsvrn Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

AntoineSimoulin commented Sep 5, 2025

Uh oh!

hrsvrn commented Sep 5, 2025

Uh oh!

hrsvrn commented Sep 6, 2025

Uh oh!

JonasKlotz commented Sep 8, 2025

Uh oh!

AntoineSimoulin commented Sep 8, 2025

Uh oh!

hrsvrn commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Sep 4, 2025 •

edited

Loading

hrsvrn commented Sep 8, 2025 •

edited

Loading