What happens when you switch the Fourier magnitudes of two pictures, while leaving their phases intact? Are the pictures still recognizable? If so, which is which? Is the magnitude or the phase more important to human visual perception?
Have a look at this Jupyter notebook to find out.
I'm not sure whom to attribute this "parlor trick" to - it's been around for a while and I've seen it in at least a couple of talks by different people. The earliest reference I could find is this Berkeley course from 1997, taught by S. Shankar Sastry.
Update: After looking some more, I believe the original source is this: A. V. Oppenheim and J. S. Lim, "The importance of phase in signals," in Proceedings of the IEEE, vol. 69, no. 5, pp. 529-541, May 1981. doi: 10.1109/PROC.1981.12022 (paywall)
There are some earlier papers that look at reconstructing images from phase and unit magnitude, but this seems to be the first that does the swapping.
Hillary_Clinton.jpg
: resized version of https://en.wikipedia.org/wiki/Hillary_Clinton#/media/File:Hillary_Clinton_official_Secretary_of_State_portrait_crop.jpg (public domain)Bernie_Sanders.jpg
: https://en.wikipedia.org/wiki/Bernie_Sanders#/media/File:Bernie_Sanders.jpg (public domain)frogs.jpg
: resized version of https://en.wikipedia.org/wiki/Frog#/media/File:Anoures.jpg (public domain)
Resized using the ImageMagick identify and
convert commands to the dimensions of Bernie_Sanders.jpg
:
DIMS=`identify -format '%wx%h!' Bernie_Sanders.jpg`
convert Hillary_Clinton_official_Secretary_of_State_portrait_crop.jpg \
-resize $DIMS Hillary_Clinton.jpg
convert Anoures.jpg -resize $DIMS frogs.jpg
Trying the same trick with audio is a little more complicated. The Fourier transform is nonlocal, but humans do not hear a whole song at once. So in audio processing, instead of taking the Fourier transform of the whole signal, it is common to break up the signal into short time windows and do a Fourier transform on each of these windows.
A naive implementation of swapping magnitude and phase of two audio files on such windows may be found in this Python script. The results sound a little smoother using the Short-Time Fourier Transform (essentially: overlapping windows and a more carefully designed filter than simply a rectangular indicator). An implementation using this Python stft package may be found in this Python script.
The results are very window-size-dependent and are much harder to describe than in the case of images, but are still quite interesting and fun. I put some short examples in the speech and music folders.
- Carefree Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License
http://creativecommons.org/licenses/by/3.0/ - Gymnopedie No. 2 Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License
http://creativecommons.org/licenses/by/3.0/ - Address to the Women of America - Gloria Steinem
https://ia802302.us.archive.org/10/items/Greatest_Speeches_of_the_20th_Century/AddresstotheWomenofAmerica_64kb.mp3
Public domain - Address to Congress - Hank Aaron
https://ia802302.us.archive.org/10/items/Greatest_Speeches_of_the_20th_Century/AddresstoCongress-1974_64kb.mp3
Public domain
These files were processed using sox. See the music/music.sh and speech/speech.sh scripts for details.
MIT. See the LICENSE file for more details.