In this project, I compare the performance of a scene classification computer vision model that is based on low-frequency components of amplitude spectrum (Oliva & Torralba, 2001) and human fast perception of images. I found that human perception is very different from the model, that is, their performance relies more heavily on the phase (local) information in the image. In contrast, the gist-based computer vision model relies mainly on the amplitude spectrum (global features) of the image.
You can find a detailed report here: https://longluu.github.io/Scene%20classification.html
Or you can see the same report "Scene classification.pdf" (but without animation).
You can also look at my slides "Scene gist recognition.pptx" for more context.
The data can found here: https://drive.google.com/drive/folders/1i1ZcbUroGzzG2NrD1uaYidb8FtQtf2Ei?usp=sharing