Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Music Interaction Design

by Guillermo Montecinos, NYU ITP Spring 2019

This document and the contents stored in this repo corresponds to Music Interaction Design class taught by Luisa Pereira, at NYU ITP during the 2019 Spring term.

Week 1

Music Interaction Exercise

We were expected to design a music interaction based on one musical piece and an oblique strategy. The song I chose is Sulky by the Argentinian musician Gustavo Cerati. Sulky is a 4:18 minutes electro-rock Chacarera –a folk music and dance from the countryside of Río de la Plata zone– written in 6/8 and a tempo of 115 bpm.

This is a song built from samples of typical instruments as Legüero bass drum and guitar, piano and a bunch of synths and unrecognizable samples that conform a dreamlike experience of traveling through a dark and open flat land.

In terms of this exercise, the oblique strategy got was: "Look closely at the most embarrassing details and amplify them". Responding to that, I think this song seeks to prepare the listener through a gentle and dreamlike atmosphere to face –probably– darkness and doubt of life. Darkness and dark feelings are constantly in the back of our heads and we usually are not proud off it. Actually many people avoid to talk and even recognize their darkness. But embracing darkness can make us look inside. So, in this experiment I'd like to design an interaction that –through Sulky– would invite one person to travel through his/her emotions being able to go back and face darkness any time it is needed.

The interaction –for only one person– consists in a dark room with one chair, three screens (one in front of the chair displaying visuals that evoke comfortable feelings and one at each side of it that evoke uncomfortable feelings) and a quadraphonic sound system. The person will be invited to travel through a reconstruction of the song in which the sound layers will be spatialized and synchronized to create a 3D immersion of the listener in the world of light and darkness created by Cerati. As the song plays the listener will be exposed to comfortable visuals displayed in the front screen, but will be tempted to look to his/her side to the uncomfortable visuals that –as our own darkness– will be constantly calling their attention. When the user look to any of the side screens –which means is paying attention to the darkness– the uncomfortable/dark/weird part of the song will be built in real time (played) by the system. For this purpose, the user's head will be tracked with a kinetic sensor placed behind the chair.

Final Project Iteration #1

For this project I'm interested in designing an interaction that through music can explore concepts as identity, community, oppression and migration, and how these concepts are understood in the post-globalization world were hateful speeches delivered against under represented and discriminated communities have rose. With a southern perspective I'm concerned in engaging territorial and musical communitarian memory, and how that memory can crop up when identity and community are threatened. As well through this interaction I want to challenge the role of technology as a medium used to create interactive installations or applications.

Regarding the kind of this interaction I haven't addressed the final format it will take, but there are some lights. According to Towards a Dimension Space for Musical Devices musical interactions can be parametrized in 7 dimensions that can be plotted in a dimension space chart, these are role of sound, required expertise, musical control, degrees of freedom, feedback modalities, inter-actors and distribution in space. Despite there is no clarity of how will the interaction look like there is certainty that the number of inter-actors can be high because it will be a community-based interaction. As well, the role of sound has to be expressive to engage with community imaginary, and distribution in space –wether physical or virtual– has to be distributed and decentralized. Finally, as this interaction will seek to engage with primal identity aspects of the community there will expected no expertise from the users.

Some references as Voluspa Jarpa

Week 2 - Collective orchestra

Final project Iteration #2

For this iteration of the project I'd like to keep the eye on the concept of community and imagine an interaction around it. According with Wikipedia A community is a small or large social unit (a group of living things) that has something in common, such as norms, religion, values, or identity – and usually a sense of territoriality and a common memory as well. The reason why we humans live in communities is because –ideally– together we can make more and better than separated.

Then, a community-based –or communitarian– music interaction can be one that expresses the best of it only when there is an engaged community interacting with it. An interesting goal to approach through this is to connect people around a musical interaction even if they don't know each other –or if they don't have a common memory– and make them realize that the sound landscape they are experiencing exists only because they are they are part of that interaction. As a draft title this interaction can be called Collective Orchestra even may not sound as a classic orchestra.

Regarding the above, this interaction can be experienced by any person from any cultural background, so there are no restrictions for play testing.

Musical User Path

During this iteration I approached the idea of an interactive installation –I'm not quite sure if physical, virtual or maybe both– where music is created collectively. Depending on how many users are interacting in the space the music construction will vary. In this regard, music parameters may be organized hierarchically and my follow this pattern:

  • Rhythm
  • Melody
  • Harmony

The above means that we can think for this instance that rhythm is the base of music, so as soon as a person appears rhythm will start playing a kick drum. As more people join the interaction more instruments will start to play and the performance/interaction will vary. As we can see in the next video, kick is represented by the apparition of the first person, whilst lead synth sequencer and bass synth (sequencer) represent the interaction of a 2nd and 3rd users. This is just a conceptual representation of the importance of collective engage for this interaction.

As a long term design goal, the idea is every inter-actor can control certain parameters of they musical representation as timber, melody sequence, eq, etc. As well, I would like to allow one –not sure if any or just one– user to control beat and signature.

Aural Mood Board

As an aural mood board I have selected some projects/bands/songs/pieces that inspire me for the design of this project:

Week 3 - One person interaction

During this week iteration I explored how a one person interaction would be, regarding on the idea that first person can represent rhythm. The original goal was to allow an inter-actor to control play/pause of an Ableton Live arrangement via being or not in a physical space, as well as control tempo and time signature with his/her hands. For this purpose, user's pose was expected to detect using the PoseNet (ML) model running on the Runnway app.

The original scheme designed was that after detecting the pose with PoseNet, that information had to be sent to Max/Msp via OSC to be interpreted and converted to MIDI and sent to Ableton Live to control the aforementioned parameters. Due to some protocol issues I couldn't make Runway work in order to detect poses because the message wasn't received at Max (now I know that Max should send an start message to Runways OSC server to begin communication, so I'll implement it in further iterations), so I decided to work with a ml5.js implementation of PoseNet which sends the data via Socket/OSC to Max.

In the above patch, udpreceive 7500 is listening to the port 7500 for OSC messages. The information sent from the jS sketch contains the vertical position of the left hand in a range of 0 - 127, and the amount of poses detected. The first is passed to ctlout –which sends MIDI control messages– while the second is used as a decision variable to play or stop the track, controlled via MIDI by the midinotes #27 and #28 –sent by noteout.

During the process as well I realized that Ableton Live's time signature can't be modified remotely by a MIDI mapping –as I could do with tempo and play/stop– because there is not a routing option to time signature values. So I had to rule this interaction option. As it can be seen when the MIDI Mapping menu is opened, CC 20 to Song Tempo, Note D#0 is connected to Stop and Note E0 is connected to Start. At the same time, it can be noticed that the aforementioned buttons are blue-painted, whilst the time signature has no color because can't be mapped.

Finally, regarding the assignment of make music with chords progressions I just explored some standard progressions but didn't create an actual composition:


  • I - IV - V
  • I - IIm - V
  • I - VIm - IIm - V
  • I IIIm - VIm - IIm - V


  • Im - VIm - Vm
  • Im - IIdim - V (harmonic)
  • Im - Vidim - IIm - V (melodic)

But, for the interaction sample I used a basic Major chord progression played in piano as can be seen as follows.

Week 5 (& week 4) - Reshaping concept

During last two weeks I struggled with the idea of designing an interaction that doesn't really have a powerful meaning to me. During the previous iterations I have tried to build an idea from the concept of community and collaboration, and how they are meaningful when people gather in a space by creating music. But despite these concepts are still too powerful to me, the interaction I'm imagining from them in this creative process aren't strong enough as I would like. In the class Recurring Concepts in Art we had as Tony Martin as a guest last week, an interactive artist who 50 years ago created an exhibition called Game Room in which people walking in a room triggered sound and lights with their movements. Seeing his artwork made me think that I have not a strong justification to use the same technique that has been used for almost 50 years, unless I have a strong concept that make it new.

Due to the above I spent almost two weeks thinking about a question from which I can build a design that made sense to work in. And I remembered a conversation I had with Francisca Cabrera –a Chilean Special Educator– about being able to perceive music through non-traditional senses, i.e. not using hearing. That query relates intimately with the sensation of perceiving music with the body when you are in a concert, by allowing the waves to make your body vibrate, or by an installation in which music can be hear by hugging a tree. So, the question I started with was: how to perceive music without hearing it? But I decided to add an accessibility component to the aforementioned question, because most music interactions are designed for people who can hear but not for deaf ones. An outstanding approach to this was made by Myles de Bastion, a deaf musician who makes and plays music using light as a complementary medium. So, why don't designing a framework that allow anyone to perceive music without the necessity of hearing the sound?

For addressing the question, I would like to take the concept Synesthesia which is is a perceptual phenomenon in which stimulation of one sensory or cognitive pathway leads to automatic, involuntary experiences in a second sensory or cognitive pathway. This means that a person can have a perceptual experience in one sense due to an experience in other sense. But going further, it can be thought as a sensorial experience can be translated or interpreted from one sense to another, for example a musician who sees colors or patterns while listening to music or creating music.

Departing from the above, I will design an experience in which sound information of music can be translated into other perceptual elements, but which elements? In order to make the experience affordable and accessible as possible, I will design an application that can run in any mobile device –but in a premature stage will just be designed for OSx. Given this constrain, the senses that can be stimulated by a phone or a tablet apart from hearing are vision and touch, so how to transmit music information through these senses?

A technique that I feel conveys both senses is AR. By processing music I can isolate its most important features and map them to a visualization which can be displayed in the space, at the same time it is complemented with vibrations. Regarding the above, sound visualization and spatialization is a well explored field plenty of material to inspire, from spectrograms 1, 2, to abstract visualizations 1, 2, 3, and covering ML-based music visualizing. In terms of AR sound visualizations there are interesting referents as (of course) Zach Liebermen, AR Sound Vis + 3d Printing and Wearable AR Sound Visualization.

Verplank's IxD


  • How do I DO?
    • User experiences a physical/virtual musical space
  • How do I FEEL?
    • Vision to see and touch to perceive music information via both visualization and vibration
    • FEEL: Cold because invites the user to take the device and explore the music
  • How do I KNOW?
    • Turn App On -> Choose song/mic -> Choose visualization mode -> Perceive

Regarding the above, the main question is how to keep user attention?

Design framework

A non-traditional music perceiving tool Synesthesia A software-based system that extracts music information and transduce it into visual and mechanical stimuli AR app for iPhone (prototype)
There are no displays to perceive music other than hearing and deaf people is not included in design A musician sees music in space when creating and/or hearing Sound capture / FFT / Space Measuring / Visual and mechanical stimuli generation Display movement in space used to see sound in space / Output modes selection

7-axis dimension space diagram

Week 6 - Midterm

In order to explore the possibilities of approaching the idea proposed during week 5, I decided to dive into 2D and 3D visuals generation. As a reference of sound visualization techniques I picked Seeing sound by Zimmerman, Mann, Kearney-Volpe, Pereira & Phillips, whilst for 3D AR sound visualization I picked Zach Lieberman's work (picture below) and HoloDecks by Lukasz Karluk, whilst. Additionally, I inspired in Kazimir Malevich, a Suprematist artist whose work was based on the abstraction of elemental shapes and the use of mainly warm colors.

I split the work in 2D and 3D, in order to explore different visual aesthetics in a simpler context (2D) and to learn how to work with a more complex tool that allow me to display 3D graphics, as Unity (3D). The workflow consisted in analyzing the music by using a Fast Fourier Transform so I can track how different frequency bands evolve in time. I connected then the information of some of the frequency bands to elements in the space.

I learned how to build a 3D sound visualization in Unity, in which sound was analyzed by a 8-frequency band. By this I learned how to place a volume in the space, how to animate it with sound data. I learned basic C# coding skills as well.

Finally I built a 2D animation of a Malevich painting, in order to explore how that kind of aesthetic can respond to music.

I got the following notes from this process:

  • FFT may not be the best tool to extract perceptual elements of music
    • Explore ML techniques
    • Other techniques
    • Beat detection
  • Learn more about perception and synesthesia
    • Other ways of perceive music through shapes and color

Week 8 - Midterm Evaluation

This is the evaluation of Ian MacDougald's VR Composition System midterm presentation, following the rubric developed during week 7:

Does it work?

The prototype is currently working and was presented to the class through a video capture.


The topic Ants addresses if the project sounds/feels like ants. I think this topic is achieved in sound terms because the instrument tends to sound like a bunch of ants talking each other. Besides, from a visual standpoint the project still looks as a bunch of purple balls, which hopefully will be improved.

Does it engage/inspire prolonged interest?

The project seems to call user's attention –at least it called my attention– but I think it's very important that its visual representation be improved in the future. Additionally, I'd would suggest to explore other ways of interaction with the sound besides throwing the balls against the walls.

To what extent does an arbitrary user understand how their interactions affect the system?

May be for this point it would be useful to have a description or a set of instructions at the beginning of the experience. Besides, I think it is equally interesting just to expose a user to a 3D grid of balls in a VR space and allow them to explore how to interact. It depends on the gesture you want to induce in the user.

Does it make you feel an spatial experience?

Totally. I'd think a bit more if the walls have to be black or maybe have some texture. I'd like to see how an infinite space can deform user's perception in your project.

Visual-side: is the "visual language" coherent, and does it accomplish the unique goals you want it to accomplish?

I think the election of this visual language is arbitrary because the project is in a prototype stage, but I'd expect a visuality that challenges the user to explore it.

Audio-side: if the "audio-language" is as important as the visual language theoretically, in your realized system, are they truly equal?

I think in this case the audio-side of the project is as important as the visual language or even more important because is what currently sets the sensation of space.

Midterm-Final Plan

Week 9 - Basic sound analysis and mesh rendering in oF and AR

During the weeks after the midterm I worked in start creating 3D textures in openFrameworks. For that I followed the chapter dedicated to mesh in the oF book from which I built a 3D animated mesh based.

Then, I used that code as a base to place an spectrograph in a 3D space.

Finally I worked on porting the previous advances to AR. For this I finally decided to work with Android –basically because I don't need a payed license to develop prototypes. The framework consists in oF supported by Android via the Android-oF library, which in parallel the addon ofxARCore communicates between the oF script and ARCore –the Android framework that analyses the spatial motion of the device. This process was a bit tricky because there are many layers of knowledge that I still don't understand, but I'm on it. By the end I could place the mesh in the space, adn this is how it looks.

You can’t perform that action at this time.