<a href="https://colab.research.google.com/github/olaviinha/SloppyNoto/blob/master/sloppyNoto.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<font face="Trebuchet MS" size="6">Sloppy Noto <font color="#999" size="3">v0.0.2</font><font color="#999" size="4">&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;</font><a href="https://github.com/olaviinha/SloppyNoto" target="_blank"><font color="#999" size="4">Github</font></a><font color="#999" size="4">&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;</font><font size="3" color="#999"><a href="https://inha.se" target="_blank"><font color="#999">O. Inha</font></a></font></font>

Sloppy Noto converts raw data from space probes into audio.

- **Noto does not** oscillate or generate sound waves by other means _based_ on given data, modify any such waves, or take artistic liberties.

- **Noto does** interpret given series of numbers directly as digital audio signal sample magnitudes. _What you see is what you hear._

While Sloppy Noto can interpret any csv-like data files containing large quantities of numeric data, it was created primarily to produce audio files out of the raw datasets of various space missions by [The European Space Agency](https://www.esa.int/), [The National Aeronautics and Space Administration](https://nasa.gov), etc.

While Sloppy Noto can download, unzip and utilize various parts of datasets from any suitable file URL, it is still subject to failure. Should any such failure occur, please download the file to your local computer, extract it to your Google Drive and access it by typing in the extracted file path in `data_file` field.

For data exploration, you may also be interested in the accompanying [FTP crawler util](https://colab.research.google.com/github/olaviinha/SloppyNoto/blob/master/ftp_crawler.ipynb).

<font size="5">Howto</font>
- Run one cell at a time and follow instructions per cell.

---

In [None]:
#@title #Setup
#@markdown This cell needs to be run only once. It will:
#@markdown 1. Connect your Google Drive.<br>
#@markdown 2. Import [inhagcutils](https://inha.asia/c/inhagcutils).

import os
from google.colab import output
force_setup = False

pip_packages = 'pysoundfile'

# inhagcutils
if not os.path.isfile('/content/inhagcutils.ipynb') and force_setup == False:
  %cd /content/
  !pip -q install import-ipynb {pip_packages}
  !curl -s -O https://raw.githubusercontent.com/olaviinha/inhagcutils/master/inhagcutils.ipynb
import import_ipynb
from inhagcutils import *

# Mount Drive
if not os.path.isdir('/content/drive') and force_setup == False:
  from google.colab import drive
  drive.mount('/content/drive')

# Drive symlink
if not os.path.isdir('/content/mydrive') and force_setup == False:
  os.symlink('/content/drive/My Drive', '/content/mydrive')
  drive_root_set = True
drive_root = '/content/mydrive/'

dir_tmp = '/content/tmp/'
create_dirs([dir_tmp])
last_data_file = ''

output.clear()
op(c.ok, 'Setup finished.')

In [None]:
#@title #1. Select Data
#@markdown <small>`data_file` may be a URL (www, ftp) to a file or a file path to a file located in your Google Drive. File can have any extension, but content should be in CSV-like format. _zip_ and _gz_ files are automatically extracted before processing. Whenever you change your `data_file`, remember to reset other settings in this cell as well if you're unsure about them.</small><br>
#@markdown <small>You may run this cell after filling in `data_file`. It will print out a preview of your data to determine the delimiter.</small>
data_file = "" #@param {type:"string"}
#@markdown <small>You may again run this cell after selecting `delimiter`. It will print out a preview of your data to determine which columns to use.</small>
delimiter = "None" #@param ["None", "whitespace", "tab", "semicolon", "comma", "pipe", "double_pipe"]
#use = "columns" #@param ["columns", "rows"]
#@markdown <small>Enter the columns you are interested in. List column numbers separated by commas (e.g. _1, 3, 5_). You may also include ranges (e.g. _1, 4-8, 12, 13-16_) or select all by just typing in _all_. These columns will be the candidates for sound file creation. You will make more specific selections later.</small>
preview_columns = "" #@param {type:"string"}
#@markdown <small>Run this cell one more time after typing in `preview_columns`.</small>

columns = preview_columns
data = ''
prev_rows = 10
separator = ''
global_sr = 44100
secs_warn_limit = 0.8


columnlist = []
if "," in columns:
  columns = columns.split(',')
  for col in columns:
    if "-" in col:
      cr = [int(i) for i in col.split('-')]
      columnlist.extend(list(range(cr[0], cr[1]+1)))
    else:
      columnlist.append(int(col))
  columns = columnlist
elif "-" in columns:
  cr = [int(i) for i in columns.split('-')]
  columnlist.extend(list(range(cr[0], cr[1]+1)))
  columns = columnlist
elif columns == '':
  columns = ''
elif columns.lower() == 'all':
  columns = 'all' #list(range(0, 999))
else:
  columns = [int(columns)]

if data_file != last_data_file:
  input_type = check_input_type(data_file)
  source_id = rnd_str(6)  
  last_data_file = data_file
  if input_type == 'link':
    op(c.title, 'Downloading...')
    if is_zip(data_file):
      zip_ext = path_ext(data_file, True)
      !wget {data_file} -O {dir_tmp}{source_id}.{zip_ext}
      op(c.title, 'Extracting...')
      !gunzip {dir_tmp}{source_id}.{zip_ext}
      !mv {dir_tmp}{source_id} {dir_tmp}{source_id}.csv
    else:
      !wget {data_file} -O {dir_tmp}{source_id}.csv
    use_file = dir_tmp+source_id+'.csv'
    input_type = check_input_type(use_file)
  elif input_type == 'unknown':
    use_file = drive_root+data_file
    input_type = check_input_type(use_file)
    if input_type == 'file':
      source_id = slug(basename(use_file))
  else:
    use_file = data_file

if input_type != 'file':
  op(c.fail, 'FILE NOT FOUND:', use_file)
else:

  skip_rows = 1
  with open(use_file, 'r') as f:
    for line in f:
      if line.startswith('#'):
        skip_rows += 1
      else:
        break

  if delimiter == 'None':
    output.clear()
    op(c.ok, 'Input file:', use_file)
    
    print( range(skip_rows, skip_rows+prev_rows) )
    with open(use_file) as f:
      #data_head = [next(f) for x in range(skip_rows, skip_rows+prev_rows)]
      import itertools
      data_head = itertools.islice(f, skip_rows, skip_rows+prev_rows)
      for line in data_head:
        print(line.replace('\n', ''))

    op(c.warn, '\nPlease select delimiter and run this cell again before proceeding.')
  elif delimiter == 'whitespace':
    separator = '\s+'
  elif delimiter == 'tab':
    separator = '\t'
  elif delimiter == 'semicolon':
    separator = ';'
  elif delimiter == 'comma':
    separator = ','  
  elif delimiter == 'pipe':
    separator = '|'
  elif delimiter == 'double_pipe':
    separator = '||'
  if separator != '':
    
    import pandas as pd
    pd.set_option('display.max_columns', None)
    colselect_warn = False
    if columns == '' or columns == 'all':
      data = pd.read_csv(use_file, sep=separator, error_bad_lines=False, skiprows=skip_rows, skipfooter=1, header=None, index_col=False, skipinitialspace=True, skip_blank_lines=True, engine='python')
      if columns == '':
        colselect_warn = True
      if columns == 'all':
        columns = list(range(len(data.columns)))
    else:
      data = pd.read_csv(use_file, sep=separator, error_bad_lines=False, skiprows=skip_rows, skipfooter=1, header=None, index_col=False, skipinitialspace=True, skip_blank_lines=True, engine='python', usecols=columns)
    
    #data = data[data.iloc[:,0].str.startswith('#').ne(True)]
    data = data.apply (pd.to_numeric, errors='coerce')
    data = data.fillna(0)
    #data = data.dropna()
    #data = data.reset_index(drop=True)
    
    output.clear()
    op(c.title, 'Data preview\n')
    print('Note that the columns in this preview may be divided to multiple rows.\n')
    print(data.head(skip_rows+prev_rows))

    print('\nSeparator:', separator, '('+delimiter+')')

    precise_secs = data.shape[0]/global_sr
    if colselect_warn == True:
      op(c.warn, '\nPlease select which columns to use by typing their numbers or \'all\' into preview_columns field and run this cell again before proceeding.')
    else:
      if precise_secs > secs_warn_limit:
        print('Estimated audio duration:', str(precise_secs), 'seconds.')
      op(c.ok, '\nAll set. You may proceed to next step.')
    
    if precise_secs < secs_warn_limit:
      op(c.fail, '\nWARN:', 'Your data will produce only '+str(precise_secs)+' seconds of audio. Try with a file with more numeric rows if you want more.')
      
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

info_file_created = False
source_run = 0


In [None]:
#@title #2. Visualize columns
#@markdown - This cell creates visual waveform previews of all selected columns.<br> 
#@markdown - DC offsets won't be as wonky in the final sound files as they likely appear in these previews.<br>
#@markdown - Column numbers located above each waveform will be used in the next step to determine which columns will be used as left and right channels to create a stereo audio file.<br>

%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import scipy
plt.rcParams['figure.figsize'] = [25, 6]
plt.rcParams.update({"axes.facecolor": "black"})
spheight = 6 * data.shape[1]
#data.plot(figsize=(25,spheight), subplots=True)

precise_secs = data.shape[0]/global_sr
if precise_secs < secs_warn_limit:
  secs = precise_secs
else:
  secs = math.floor(precise_secs)

op(c.title, 'Generating visual previews...\n')
for col in columns:
  op(c.warn, 'Column number: '+str(col))
  data[col].plot(color=np.random.rand(1,3))
  plt.show()

def pretty(secs):
  return str(datetime.timedelta(seconds=secs))

op(c.title, 'Estimated audio duration\n')
if precise_secs < secs_warn_limit:
  print(secs, 'seconds')
else:
  print(pretty(secs), '(h:mm:ss)')

print('\n')
# op(c.title, '\nExamples of time-stretched durations:')
# print('  2 x:', pretty(secs*2))
# print('  5 x:', pretty(secs*5))
# print(' 10 x:', pretty(secs*10))
# print(' 25 x:', pretty(secs*25))
# print(' 50 x:', pretty(secs*50))
# print(' 75 x:', pretty(secs*75))
# print('100 x:', pretty(secs*100))


In [None]:
#@title #3. Output
#@markdown - You may run this cell multiple times with different selections and settings. No need to run previous cells.
#@markdown - Audio creation is limited to a maximum duration of 3 minutes by default to prevent time-stretching accidents and Colab runtime crashes.

#@markdown <br>

#@markdown ##Audio settings
#@markdown Select waveforms from above by typing in their column numbers. They will be the left and right channel of the created soundfile. Leave `right_channel` blank if you want the same waveform on both channels (mono sound).<br>
left_channel = "" #@param {type:"string"}
right_channel = "" #@param {type:"string"}
stereo_width = 8 #@param {type:"slider", min:0, max:10, step:1}
time_stretch = 1 #@param {type:"slider", min:1, max:100, step:1}
#@markdown <small>Default sample rate is 44100 Hz. `stretch_type: sample_rate` option will stretch the sound by reducing sample rate (fast but lo-fi). With this option, your maximum `time_stretch` factor is 14. `stretch_type: linear_fill` will retain the sample rate of 44100 Hz and compute in the blanks (hi-fi but slow).<br>**I.e. `stretch_type: linear_fill` option will produce higher quality audio for your auditory perception.**</small>
stretch_type = "linear_fill" #@param ["sample_rate", "linear_fill"]
#@markdown <small>Use this to increase sample rate if the sound feels too "slow". This will be ignored if `time_stretch` is used (value is higher than 1), no sense in using both.</small>
speed_up = 1 #@param {type:"slider", min:1, max:4, step:1}

#@markdown <br>

#@markdown ##Save files
save_to_drive = False #@param {type:"boolean"}
timestamp_output_files = False #@param {type:"boolean"}
#@markdown <small>Enter a directory path pointing somewhere in your Google Drive. All sound files will be saved in this directory as WAV.</small>
output_dir = "" #@param {type:"string"}
#@markdown <small>Save accompanying .txt file containing information about your settings. May come in handy one day in the distant future.</small>
save_info_txt = False #@param {type:"boolean"}
#@markdown <small>Optional note to be included in the information txt file. Sloppy Noto has no idea what this data is.</small>
free_note = "" #@param {type:"string"}
#@markdown <small>Audio files are automatically clipped to a maximum duration of 3 minutes. This is to 1) prevent accidental creation of excessively long files, such as 45 seconds of audio time_stretched by the factor of 100, ending up on a sound file duration of 1 hour and 15 minutes, and 2) prevent Colab runtime from crashing. High-RAM runtime (Colab Pro only) can handle more data, but the free Standard-RAM runtime will crash after computing data to around 20 minutes of audio. Check `allow_long` if you want to save audio files longer than 3 minutes. This setting is to be used with caution, even with High-RAM runtime.<br>**I.e. checking `allow_long` is not recommended.**</small>
allow_long = False #@param {type:"boolean"}

op(c.title, 'Processing...')

output_dir = fix_path(drive_root+output_dir)

stereo_sep = stereo_width
maxv = 0.45
max_dur_min = 3
detail_view = False
run_id = rnd_str(6)

plt.rcParams.update({"axes.facecolor": "black"})

def appendTxt(file, content):
  txt = open(txt_file, 'a+') 
  txt.writelines(content+'\n')
  txt.close();
  
def swf(sig1, sig2='', sr=global_sr):
  #yellowgreen, salmon
  duration = len(sig1)/sr
  time = np.arange(0,duration,1/sr)
  plt.rcParams.update({"axes.facecolor": "black"})
  plt.ylim(-1, 1)
  plt.plot(time, sig1, color=np.random.rand(3), linewidth=1, alpha=1)
  if sig2 != '':
    plt.plot(time, sig2, color=np.random.rand(3), linewidth=1, alpha=0.55)
  plt.show()

def query_yes_no(question, default="yes"):
  valid = {"yes": True, "y": True, "ye": True,
            "no": False, "n": False}
  if default is None:
    prompt = " [y/n] "
  elif default == "yes":
    prompt = " [Y/n] "
  elif default == "no":
    prompt = " [y/N] "
  else:
    raise ValueError("invalid default answer: '%s'" % default)

  while True:
    sys.stdout.write(question + prompt)
    choice = input().lower()
    if default is not None and choice == '':
      return valid[default]
    elif choice in valid:
      return valid[choice]
    else:
      sys.stdout.write("Please respond with 'yes' or 'no' (or 'y' or 'n').\n")
            
channels = []
ready = True
cmp_time_stretch = time_stretch
max_duration = max_dur_min*60*global_sr

if secs*cmp_time_stretch > 1200 and allow_long == True:
  op(c.fail, '\n\nWARN:', 'You are about to time-stretch the soundfile to over 20 minutes (to '+pretty(secs*time_stretch)+' to be exact).\nIf you are rocking the free standard RAM version of Colab, there is a good chance Colab will run out of RAM and crash.\n')
  ready = query_yes_no('Want to take your chances anyway?')
  error = 'timestretch_length'
if ready == False:
  if error == 'timestretch_length':
    op(c.fail, 'Try reducing time-stretch and run cell again.')
else:

  left = int(left_channel)
  if right_channel == '':
    right = int(left_channel)
    stereo_width = 0
  else:
    right = int(right_channel)

  channels.append(data[left])
  channels.append(data[right])

  if time_stretch > 1:
    speed_up = 1
  for i, chan in enumerate(channels):
    if time_stretch > 1 and stretch_type == "linear_fill":
      real_duration = len(chan)
      max_duration = math.floor((max_dur_min*60*global_sr)/cmp_time_stretch)
      if real_duration > max_duration:
        print('Channel', i, ':', real_duration, '->', max_duration)
        chan = chan[:max_duration]
      stretcher = []
      for ii, fr in enumerate(chan):
        cur = fr
        if ii > 0 and ii < (real_duration-1):
          prv = chan.iloc[ii-1]
          #nxt = chan[ii+1]
          new_frame = np.linspace(prv, cur, time_stretch)
          stretcher.extend(new_frame)
      chan = np.array(stretcher).astype(np.float64)
    print('Channel', i, ':', chan.min(), chan.max(), '->', np.negative(maxv), maxv)
    chan = np.interp(chan, (chan.min(), chan.max()), (np.negative(maxv), maxv))
    sos = scipy.signal.butter(10, 15, 'hp', fs=global_sr, output='sos')
    chan = scipy.signal.sosfilt(sos, chan)
    channels[i] = chan

  xsr = global_sr
  if time_stretch > 1 and stretch_type == "sample_rate":
    xsr = math.floor(global_sr/time_stretch)
    print('New sample rate:', xsr)
  
  if stereo_sep == 0:
    fin_left = channels[0]+channels[1]
    fin_right = fin_left
  elif stereo_sep == 10:
    fin_left = channels[0]
    fin_right = channels[1]
  else:
    fin_left = channels[0]+channels[1]/stereo_sep
    fin_right = channels[0]/stereo_sep+channels[1]

  if detail_view == True:
    dvend = math.floor(xsr/2)
    swf(fin_left[0:dvend])
    swf(fin_right[0:dvend])

  if fin_left[0] < -0.4 or fin_left[0] > 0.4 or fin_right[0] < -0.4 or fin_right[0] > 0.4:
    clip_point = math.floor(xsr/2)
    fin_left = fin_left[clip_point:]
    fin_right = fin_right[clip_point:]

  if detail_view == True:
    dvend = math.floor(xsr/2)
    swf(fin_left[0:dvend])
    swf(fin_right[0:dvend])
    swf(fin_left)
    swf(fin_right)

  output.clear()

  if time_stretch > 14 and stretch_type == 'sample_rate':
    op(c.fail, 'ERROR:', 'You may not set time_stretch higher than 14 if you use \'sample_rate\' as stretch_type. Reduce time_stretch or change stretch_type to \'linear_fill\'')
  else:

    if speed_up > 1:
      xsr = xsr * speed_up

    max_duration = max_dur_min*60*xsr
    real_duration = len(fin_left)
    max_secs = max_dur_min*60
    real_secs = real_duration/xsr

    if real_duration > max_duration:
      fin_left = fin_left[:max_duration]
      fin_right = fin_right[:max_duration]
      
    swf(fin_left, fin_right, xsr)
    if real_duration > max_duration:
      print('Audio duration:', pretty(max_secs), '(clipped from', pretty(real_secs)+')\n')
    else:
      print('Audio duration:', pretty(real_secs), '\n')
    audio = np.array([fin_left, fin_right], np.float64)

    audio_player(audio, sr=xsr)

    source_run += 1
    if save_to_drive == True:
      import soundfile
      import datetime
      if timestamp_output_files == True:
        wav_timestamp = datetime.datetime.today().strftime('%Y-%m-%d-%H-%M-%S')+'_'
      else:
        wav_timestamp = ''
      wav_file = output_dir+'noto_'+wav_timestamp+source_id+'_'+run_id+'.wav'
      soundfile.write(wav_file, audio.T, xsr)
      view_out = wav_file.replace(drive_root, '')
      op(c.ok, '\n\nFile saved:', view_out)
      if save_info_txt == True:
        if info_file_created == False:
          if timestamp_output_files == True:
            txt_timestamp = datetime.datetime.today().strftime('%Y-%m-%d-%H-%M-%S')+'_'
          else:
            txt_timestamp = ''
          txt_file = output_dir+'noto_'+txt_timestamp+source_id+'.txt'
          txt = open(txt_file,'w') 
          params = ['started:            '+datetime.datetime.today().strftime('%Y-%m-%d %H:%M:%S')+'\n\n',
                    free_note+'\n\n',
                    'data_file:          '+data_file+'\n',
                    'columns:            '+str(preview_columns)+'\n',
                    'delimiter:          '+delimiter+' ('+separator+')\n\n',
                    '---\n\n']
          txt.writelines(params)
          txt.close()
          view_txt = txt_file.replace(drive_root, '')
          op(c.ok, 'Info file saved:', view_txt) 
          last_free_note = free_note
          info_file_created = True
        
        txt = open(txt_file,'a+') 
        params = ['VARIANT #'+str(source_run)+': '+path_leaf(wav_file)+'\n',
                  'created:            '+datetime.datetime.today().strftime('%H:%M:%S')+'\n',
                  'left_channel:       column '+str(left_channel)+'\n'
                  'right_channel:      column '+str(right_channel)+'\n',
                  'stereo_width:       '+str(stereo_width)+'\n',
                  'time_stretch:       '+str(time_stretch)+'x\n',
                  'stretch_type:       '+stretch_type+'\n\n']
        txt.writelines(params)
        txt.close()
        if free_note != last_free_note:
          appendTxt(txt_file, free_note+'\n')
          last_free_note = free_note

op(c.ok, '\nFIN.')
