# Data Visulization With Soundscapes
## Introduction 

## Data 
For this project I will be using the Theater History of Operations Reports (THOR) dataset of aeriel bombing operations during World War I, World War II, the Korean War, and the Vietnam War undertaken by the US and other Allied nations. The dataset is available on data.world and can be accessed [here](https://data.world/datamil/world-war-ii-thor-data).

I will be using and abridged version of the dataset which I recived from [this tutorial](https://programminghistorian.org/en/lessons/visualizing-with-bokeh#the-wwii-thor-dataset) courtesy of The Programming Historian. Here is a link to the CSV file of the abridged dataset: [thor_wwii.csv](https://raw.githubusercontent.com/programminghistorian/ph-submissions/gh-pages/assets/visualizing-with-bokeh/thor_wwii.csv)

---------------------------------------------------------------------------------------------------------------------------

## Visualizing The Data

First I need to visualize the dataset so that I can get a better idea of what it looks like. (duh) 

I visualised it first by bombs dropped every day, as idealy I would like my soundscape to follow this scale, with each day representing a second of time.

That might make for a very long listening time, so I also visualized the data as weekly and biweekly to see how those distributions looked. 

### Bombings Over Time - days

In [None]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import Category10
output_notebook()

df = pd.read_csv('data/thor_wwii.csv') #Import the data from CSV into a dataframe

#make sure MSNDATE is a datetime format
df['MSNDATE'] = pd.to_datetime(df['MSNDATE'], format='%m/%d/%Y')

## Here we group the data into a more usable format by aggregating specified time periods together 
 # and specifying what columns we are interested in. The result is a smaller dataframe. 
 ##   
bombs=['TOTAL_TONS', 'TONS_IC', 'TONS_FRAG']
grouped = df.groupby('MSNDATE')[bombs].sum()
grouped = grouped/1000 #Convert to kilatons

source = ColumnDataSource(grouped) # set grouped as the bokeh data source 

p = figure(x_axis_type='datetime')

p.line(x='MSNDATE', y='TOTAL_TONS', line_width=2, source=source, color=Category10[3][0], legend_label='All Munitions')
p.line(x='MSNDATE', y='TONS_FRAG', line_width=2, source=source, color=Category10[3][2], legend_label='Fragmentation')
p.line(x='MSNDATE', y='TONS_IC', line_width=2, source=source, color=Category10[3][1], legend_label='Incendiary')

p.yaxis.axis_label = 'Kilotons of Munitions Dropped'

show(p)

### Bombings Over Time - Weeks

In [None]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import Category10
output_notebook()

df = pd.read_csv('data/thor_wwii.csv') #Import the data from CSV into a dataframe

#make sure MSNDATE is a datetime format
df['MSNDATE'] = pd.to_datetime(df['MSNDATE'], format='%m/%d/%Y')

## Here we group the data into a more usable format by aggregating specified time periods together 
 # and specifying what columns we are interested in. The result is a smaller dataframe. 
 ##   
bombs=['TOTAL_TONS', 'TONS_IC', 'TONS_FRAG']
grouped = df.groupby(pd.Grouper(key='MSNDATE', freq='W'))[bombs].sum()
grouped = grouped/1000 #Convert to kilatons

source = ColumnDataSource(grouped)# set grouped as the bokeh data source 

p = figure(x_axis_type='datetime')

p.line(x='MSNDATE', y='TOTAL_TONS', line_width=2, source=source, color=Category10[3][0], legend_label='All Munitions')
p.line(x='MSNDATE', y='TONS_FRAG', line_width=2, source=source, color=Category10[3][2], legend_label='Fragmentation')
p.line(x='MSNDATE', y='TONS_IC', line_width=2, source=source, color=Category10[3][1], legend_label='Incendiary')

p.yaxis.axis_label = 'Kilotons of Munitions Dropped'

show(p)

### Bombings Over Time - Bi-weekly

In [None]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import Category10
output_notebook()

df = pd.read_csv('data/thor_wwii.csv') #Import the data from CSV into a dataframe

#make sure MSNDATE is a datetime format
df['MSNDATE'] = pd.to_datetime(df['MSNDATE'], format='%m/%d/%Y')

## Here we group the data into a more usable format by aggregating specified time periods together 
 # and specifying what columns we are interested in. The result is a smaller dataframe. 
 ##   
bombs=['TOTAL_TONS', 'TONS_IC', 'TONS_FRAG']
grouped = df.groupby(pd.Grouper(key='MSNDATE', freq='2W'))[bombs].sum() #Grouping the data
grouped = grouped/1000 #Convert to kilatons

source = ColumnDataSource(grouped)# set grouped as the bokeh data source 

p = figure(x_axis_type='datetime')

p.line(x='MSNDATE', y='TOTAL_TONS', line_width=2, source=source, color=Category10[3][0], legend_label='All Munitions')
p.line(x='MSNDATE', y='TONS_FRAG', line_widtaudiofile = 'soundscape.wav'
jamsfile = 'soundscape.jams'
txtfile = 'soundscape.txt'
sc.generate(audiofile, jamsfile,
            allow_repeated_label=True,
            allow_repeated_source=True,
            reverb=0.1,
            disable_sox_warnings=True,
            no_audio=False,
            txt_path=txtfile,
            disable_instantiation_warnings=True)h=2, source=source, color=Category10[3][2], legend_label='Fragmentation')
p.line(x='MSNDATE', y='TONS_IC', line_width=2, source=source, color=Category10[3][1], legend_label='Incendiary')

p.yaxis.axis_label = 'Kilotons of Munitions Dropped'

show(p)

---------------------
## Creating a Soundscape with Scaper

This visual representation of the data is nice and all, but it feels dry and emotionless. I want to try to make the data more engaging and accessable by transforming it into something you can hear. 

For this I'm going to employ an opensource Python package called [Scaper](https://scaper.readthedocs.io/en/latest/index.html). Scaper will allow me to use existing sounds as "source material" to generate soundscapes on the fly. I want to try and feed the data I've anylised into Scaper to generate a soundscape representation of said data. 

First I want to install Scaper and its related dependencies. 

In [None]:
#!pip install scaper
#!pip install Sox
#!pip install FFmpeg

#import sys
#!{sys.executable} -m pip install FFmpeg
#!{sys.executable} -m pip install SoX
#!{sys.executable} -m pip install scaper

import sys
!conda install --yes --prefix {sys.prefix} FFmpeg
!{sys.executable} -m pip install scaper
!{sys.executable} -m pip install playsound



Now that Scaper is installed I need to write a function to create my Scaper object.


In [None]:
import scaper

def createScaper():
    path_to_audio = 'audio'
    soundscape_duration = 10.0
    seed = 123
    foreground_folder = 'audio/foreground'
    background_folder = 'audio/background'
    sc = scaper.Scaper(soundscape_duration, foreground_folder, background_folder)
    sc.ref_db = -20
    if sc: 
        print("Scaper created successfully")
        return sc
    else:
        print("Error initializing Scaper object")
    
sc = createScaper()

Next I need to be able to add background and foreground events to my soundscape. 

I'll start with the background since in this basic version of the soundscape it won't be dynamic. I've chosen an air raid siren that I can have repeat in the background. 

In [None]:
sc.add_background(label=('const', 'siren'),
                  source_file=('const', 'audio/background/siren/warningSiren01.wav'),
                  source_time=('const', 0))

Now to add a foreground event. I'm going to add a variable volume on this event so that when Scaper generates the soundscape it contains explosions of random different volumes.

In [None]:
sc.add_event(label=('const', 'bomb'),
             source_file=('const', 'audio/foreground/bomb/blast01.wav'),
             source_time=('const', 0),
             event_time=('uniform', 0, 9),
             event_duration=('const', 2.2),
             snr=('normal', 10, 3),
             pitch_shift=(None),
             time_stretch=(None))

Running the above code multyple times with generate a new bomb sound each time. I recommend running it 3-5 times.

Now all that I need to do is generate my soundscape with the Scaper.generate() function. This will create my soundscape using the random elements I specified. It will also generate a JAMS (JSON Annotated Music Specification) and text file of the soundscape.  

(Note: I used a static seed here, so each call to generate() will result in the same soundscape. Using a random seed instead would create a unique soundscape each time generate() is called.) 

In [None]:
audiofile = 'soundscape.wav'
jamsfile = 'soundscape.jams'
txtfile = 'soundscape.txt'
sc.generate(audiofile, jamsfile,
            allow_repeated_label=True,
            allow_repeated_source=True,
            reverb=0.1,
            disable_sox_warnings=True,
            no_audio=False,
            txt_path=txtfile,
            disable_instantiation_warnings=True)

If everything worked like it should a text file containing a textual representation of the newly generated soundscape should have outputed. 

YAY! The first soundscape is complete! Find it in the project folder and give it a listen, or run the code snippit below to listen to it.

## WARNING: VERY LOUD, PLAY THE RECORDING WITH CAUTION

In [None]:
from playsound import playsound # Import the package needed to play sounds from python

# This while loop verifies that the user pants to play the sound. No one likes autoplay.
while 1 == 1:
    userInput = input("Do you want to listen to the soundscape? WARNING: it can be loud. Enter 'y' for yes or 'n' for no: ")
    if userInput == 'y':
        playsound(audiofile)
        break
    elif userInput == 'n':
        break
    else:
        continue

---------------------------------
## Turning Data Into Sound

Now that we've set up both our data modling using pandas and bokeh, and Scaper to manage our soundscape, all that is left to do is to put them together. 

For this we are going to iterate through the `grouped` data frame we created. In this example we will be useing the code that grouped the data in two week increments so that our generated soundscape isn't too long. 

In [None]:
# initialize (so this code can be run seperate from eveyrthing else)
import pandas as pd
df = pd.read_csv('data/thor_wwii.csv')

#make sure MSNDATE is a datetime format
df['MSNDATE'] = pd.to_datetime(df['MSNDATE'], format='%m/%d/%Y')

# group the data
bombs=['TOTAL_TONS', 'TONS_IC', 'TONS_FRAG']
grouped = df.groupby(pd.Grouper(key='MSNDATE', freq='2W'))[bombs].sum()
grouped = grouped/1000 #Convert to kilatons

## loop that iterates each row in grouped. 
 # For now it just outputs the contents of the tuple, followed by a new line with the time and total_tons
i=0
for row in grouped.itertuples():
    print(row)
    print(row[0], " - ", row[1]) #prints the key (timestamp) and the first object in the tuple (total_tons)
    i += 1
    print(i)

Now that we can extract the data we need, we need to use it to create our foreground sound events. 

In [None]:
# First we need to start scaper
import scaper
def createScaper():
    path_to_audio = 'audio'
    soundscape_duration = 170 # We need to change the length to accomadate the new data. 167 items + 3 buffer seconds
    seed = 123
    foreground_folder = 'audio/foreground'
    background_folder = 'audio/background'
    sc = scaper.Scaper(soundscape_duration, foreground_folder, background_folder)
    sc.ref_db = -70
    if sc: 
        print("Scaper created successfully")
        return sc
    else:
        print("Error initializing Scaper object") 
sc = createScaper()

x = 0 #create a variable to keep track of the number of iterations

for row in grouped.itertuples(): #begin our loop
    sc.add_event(label=('const', 'bomb'),
             source_file=('const', 'audio/foreground/bomb/blast01.wav'),
             source_time=('const', 0),
             event_time=('const', x), # This ensures we have one item playing every second (2 weeks war == 1 second soundscape)
             event_duration=('const', 2), # Sounds will overlap a bit
             snr=('const', (row[1]/5)), # Changes the volume of the sound based on the tons of bombs
             pitch_shift=('normal', 2, 1), # pitch shift slightly for some variety 
             time_stretch=(None))
    x += 1 #increment our iterable before looping again
print("Added ",x," bomb explosion sound events")

Now we just need to add our background sound and generate the soundscape.

NOTE: It can take a couple minutes for it to generate so be patient. It also uses a lot of memory, especialy if run over and over. It might be best to restart the kernel, then run this section (Turning Data Into Sound) on its own. Hopefully it doens't crash...  : )

In [None]:
sc.add_background(label=('const', 'siren'),
                  source_file=('const', 'audio/background/siren/warningSiren01.wav'),
                  source_time=('const', 0))

audiofile = 'bombsOfWW2.wav'
jamsfile = 'bombsOfWW2.jams'
txtfile = 'bombsOfWW2.txt'
sc.generate(audiofile, jamsfile,
            allow_repeated_label=True,
            allow_repeated_source=True,
            reverb=0.1,
            disable_sox_warnings=True,
            no_audio=False,
            txt_path=txtfile,
            disable_instantiation_warnings=True)

That is it! 

If the generate method executed correctly then there should now be a WAV file 2min50sec long titled bombsOfWW2 in the project folder. You can listen to it there, or once again you could use the code below to give it a listen. I recamend playing it from your project folder as you can control the sound easier, the progress bar helps visualize where you are on the timeline, and there is no way to stop or pause it here. 

Try looking at the bi-weekly graph we made with Bokeh at the begining and see if you can match up the explosions to the data they represent.

In [None]:
from playsound import playsound # Import the package needed to play sounds from python

# This while loop verifies that the user pants to play the sound. No one likes autoplay.
while 1 == 1:
    userInput = input("Do you want to listen to the soundscape? WARNING: it can be loud. Enter 'y' for yes or 'n' for no: ")
    if userInput == 'y':
        playsound(audiofile)
        break
    elif userInput == 'n':
        break
    else:
        continue

---------