how is clip performing on global information detection? For example, finding whether an image is noisy-corrupted, downsample-d or hazy, and furthermore, choosing the right corruption parameters like noise std? I tried images with different types of noises like gaussian poisson or gamma, and other corruptions like downsampling or hazy, and tokens like [gaussian noise with std=25, gaussian noise with std=50], [noisy, hazy], but the inference result is not well. Am i missing any key parts on my way of testing?