This code generates various classes of complex-valued time-series signals that are similar to signals observed at the Allen Telescope Array, operated by the SETI Instititue. You can see what some of these signal classes look like by reading this blog post. We have also published a paper that contains information on the analytical model of these signals.
The output simulation files (named <uuid>.dat
) are simple: A JSON header, followed by a newline (\n
), an optional 2nd JSON header followed by a newline, and then some number of bytes
that hold the complex-valued time-series data. Each time-step comes in 2-byte pairs where the first byte is the real value
and the second byte is the imaginary value. These data files can be read with the ibmseti
Python package. That python package can also be used to do some basic signal processing and caclulate spectrogram.
This code is in relatively poor shape. One may be inclined to call it "research-level code" (i.e. not consumer-friendly and no unit tests) and there are no guarantees. We really only know that it works on our local systems and an external Apache Spark (2.1.0) cluster. (Also, please note that we are not expert programmers and have added Scala code as a learning exercise.) Please do not hesitate to contact the authers, submit Issues, or PRs if you have problems!
This code was developed on a Mac with JDK SE version 8 and and Scala Build Tool (SBT) version 0.13.
There are three "modes" under which this code can be run: spark
, serial
or local
. If you're just starting to use this code, you should first get this working in local
mode and move on from there.
In spark
mode, the code should be running on a Spark cluster. An RDD is created and .map
functions are used to farm out the
simulations to the executor nodes in order to parallelize the task. The <uuid>.dat
simulation files are stored in an OpenStack Swift Object Store. The parameters that control the simulation are stored separately in an IBM DB2 database. Credentials for DB2 should be set in the resources/simulation.properties
file. Credentials and container names for Object Storage should be set in the same file. See the example_spark_submit.sh
script.
In serial
mode, Spark is not used and all simulations are run in one thread. The data are still stored into the external
Object Storage and DB2 systems.
In local
mode, neither Spark nor the Object Store and DB2 systems are used. All data are stored locally. A second, "private",
JSON header is included in the output <uuid>.dat
file. Despite not using Object Storage or DB2, you will still need a resources
file
because the code tries to open it anyways (another casualty of "research-level" coding and motivation to fix all things).
Of course, you don't need to set any values to the credentials, just cp resources/simulation.properties.template resources/simulation.properties
. The simulations are performed in one local thread and stored to the local file system.
Also, to note: the <SNR>
setting (see below) is only available in the local
mode, as I didn't have time to add it to the serial
and spark
modes and test it out.
Instructions to install SBT: http://www.scala-sbt.org/release/docs/Setup.html
The command below will download the dependecies, compile the code and package it into an uber jar file.
However, you must first create the file resources/simulation.properties
. There is a .template
file
in the resources
folder.
If you are running the simulations in local
mode, you will not need to fill in the
values. Otherwise, fill in the values if you are planning to store the output data files in OpenStack Object Storage
and IBM DB2 tables. The structure of the IBM DB2 table is described below. The simulations.properties
file will be packaged into the
resulting .jar
file and opened during run time.
sbt clean assembly
Note, these instructions will not create an uber jar and just compiles the core .java classes.
These instructions are here in order to support the original authors of the core .java code.
These instructions encourage all dependency libraries to be downloaded and installed in the dependencies
folder. The setup.sh
script adds that folder to the CLASSPATH envar.
As of this writing, the java code is only dependent upon the Jackson tools for generating JSON.
source setup.sh #adds dependencies to CLASSPATH
javac apps/simulate/*.java
jar cfm setisimulator.jar MANIFEST.MF apps/simulate/*.class
If you've used sbt
to package the code, the resulting jar file is
target/scala-2.11/signalsimulation-assembly-8.0.jar
.
The main class for this jar file is spark/SETISim.scala
java -jar <jar file> <parameters>
In the example below, a narrowband
signal class is simulated. The range of simulation parameters for each class is hard-coded in the classes here. (This is less than ideal coding practice, but worked for our purposes.)
The training
option tells the program to report the signal class in the public header and specifies
a particular range of signal amplitudes to that may be simulated (the basic
option would use a larger range of amplitudes).
Two (2) simulations will be peformed.
The noise will be gaussian
, defined by the GaussianNoise.java class. (You'll almost always use this as your
noise model unless you have a data file that can be read with the FileNoise class, in which case
you can pass in the name of the file that holds that data.)
java -jar target/scala-2.11/signalsimulation-assembly-8.0.jar training serial 2 narrowband gaussian
The set of parameters that you can use are briefly described below.
java -jar target/scala-2.11/signalsimulation-assembly-8.0.jar <data_class> <mode> <number_of_partitions> <number_of_simulations> <signal_class> <noise> <SNR>
<data_class>
one oftraining
,test
,basic
,basictest
,private
. You should probably just usetraining
,basic
ortest
. Intest
mode, the output data files do not contain the signal class in the first public header (though the class name does exist in the second private header when inlocal
mode.) Inbasic
andbasictest
, the range of signal amplitudes of some signal classes are significantly larger, making them easier to classify. Confusingly, theprivate
mode is similar totraining
except that it saves output data files into a different Object Store container, as specified in the properties file. It has no effect when inlocal
mode. :/<mode>
eitherlocal
,serial
orspark
, as explained above.<number_of_partitions>
number of Spark partitions to use IFmode=spark
, otherwise DO NOT INCLUDE this value in command<number_of_simulations>
number of signals to simulate<signal_class>
See SignalDefFactory.scala for list of available classes.<noise>
one ofgaussian
,sunnoise
or the path to a file. Ifsunnoise
, will attempt to access Object Storage instance for data file.<SNR>
Ifmode=local
, then one can specify a fixed SNR value to use for all simulations. This ONLY works inlocal
mode. If this is not specified, a range of SNR values will be simulated.
Note that SNR is defined as the amplitude of the signal relative to the standard deviation of the noise amplitude. For gaussian white noise, that amplitude is fixed at a value of 13.0 for both the real and imaginary components. The signal amplitude is the amplitude of the sine wave that is added to the white noise at each time sample. You should use SNRs in the range from 0.05 to 0.75, depending on the signal class.
The different signal classes that [have been defined]((spark/signaldef/SignalDefFactory.scala) so far are:
narrowband
squarepulsednarrowband
sinepulsednarrowband
squigglesquarepulsednarrowband
squigglesinepulsednarrowband
squiggle
narrowbanddrd
squigglesquarepulsednarrowbanddrd
squigglesinepulsednarrowbanddrd
brightpixel
noise
The following examples assume the code is running on a system with Apache Spark 2.0 or greater installed.
Generate 1,000 test narrowband signals with sun noise, and run on Spark with 20 separate partitions.
java -jar target/scala-2.11/signalsimulation-assembly-8.0.jar test spark 20 1000 narrowband sunnnoise
The sunnoise
is a special case. We created noise files that were created by observing the Sun for a number of hours. These
noise files were stored in Object Storage and retrieved at run time (object storage container set in the properties file).
Unless you work at the SETI Instutite, you probably won't use this option!
Generate 1,000 training narrowband signals with gaussian white noise, and run on Spark with 20 separate partitions.
java -jar target/scala-2.11/signalsimulation-assembly-8.0.jar training spark 20 1000 narrowband gaussian
Generate 10 "basic" narrowband simulations, all with a fixed signal amplitude of 0.15
java -jar target/scala-2.11/signalsimulation-assembly-8.0.jar basic local 10 narrowband gaussian 0.15
Generate 10 "training" narrowband simulations with a fixed signal amplitude of 0.2.
java -jar target/scala-2.11/signalsimulation-assembly-8.0.jar training local 10 narrowband gaussian 0.2
Generate 10 "training" narrowband simulations with a range of signal amplitudes.
java -jar target/scala-2.11/signalsimulation-assembly-8.0.jar training local 10 narrowband gaussian
Generate 10 "training" squiggle simulations with a range of signal amplitudes.
java -jar target/scala-2.11/signalsimulation-assembly-8.0.jar training local 10 squiggle gaussian
We used an IBM Spark Enterprise service (30 executor cluster) to perform our simulations. We leave this
example command here for documentation. Note that the spark-submit.sh
script here is
the shell script from IBM to run code on the IBM Spark service
and not the spark-submit
script included in the Apache Spark distribution.
./spark-submit.sh --vcap vcap.enterprise.json --deploy-mode cluster --conf spark.service.spark_version=2.0 --class org.seti.simulator.SETISim target/scala-2.11/signalsimulation-assembly-8.0.jar training spark 20 1000 narrowband gaussian
If you did not package the compiled .class files into a jar file, you can call the main class directly.
source setup.sh
java apps.simulate.DataSimulator <all individual parameters>
//example
java apps.simulate.DataSimulator 13 "" 100 0.4 -0.0001 -0.0002 0.0001 792576 square 61440 .5 squiggle_pulsed test.data
You'll need to read the DataSimulator code class to decipher all of these values. :)
Alternatively
java -jar setisimulator.jar 13 "" 100 0.3 -0.0001 -0.0002 0.0001 792576 square 61440 .5 squiggle_pulsed test.data
To get 129 raster lines with 6144 frequency bins, which is the size of an archive-compamp file with the over-sampled frequencies removed (aka, a waterfall plot), the output length of data is a product of these two numbers 129 * 6144 = 792576.
Also, in this example, I've added a square wave amplitude modulation with a periodicity of 61440
samples (equivalent to 10 raster lines) with a duty cycle of 0.5. One can also add a sine wave
amplitude modulation (in the case of a sine
modulation, the duty cycle value is ignored.)
The output file contains one or two JSON headers separated by a newline (\n
). The
first header is called the "public" header, and the second header is the "private" header. In spark or serial mode,
the information from the private header will be saved to a database and removed from the simulation
file and the public header will remain. In test
mode, the signal class name will be removed from the public header.
The ibmseti
Python package can read these simulation data files and calculate spectrogram.
From the command-line, one can skip both headers and stream the remainder of the data with
the tail
command. Then pipe the data into the standard SETI command-line tools.
If the data files were created in local
mode, then be sure to tail -n +3
to skip both headers. If there's is
only one JSON header in the data, then tail -n +2
to skip just one header.
len=6144
tail -n +3 test.data | sqsample -l $len | sqwindow -l $len | sqfft -l $len | sqabs -l $len | sqreal -l $len | sqpnm -c $len -r 32 -p > wf1.pgm
XView will display the PGM file by simply
xv wf.pgm
In python,
from __future__ import print_function
from PIL import Image, ImageFilter
im = Image.open('wf1.pgm')
im.show()
When running in either spark
or serial
mode, the code expects the existence of an IBM DB2 database table with the following
structure.
create table setiusers.simsignal (
uuid VARCHAR(128) not null,
sigma_noise DECIMAL(31,10),
noise_name VARCHAR(128),
delta_phi DECIMAL(31,10),
signal_to_noise_ratio DECIMAL(31,10),
drift DECIMAL(31,10),
drift_rate_derivative DECIMAL(31,10),
jitter DECIMAL(31,10),
len BIGINT,
amp_modulation_type VARCHAR(128),
amp_modulation_period DECIMAL(31,10),
amp_modulation_duty DECIMAL(31,10),
amp_phase DECIMAL(31,10),
amp_phase_square DECIMAL(31,10),
amp_phase_sine DECIMAL(31,10),
signal_classification VARCHAR(128),
seed BIGINT,
drift_divisor DECIMAL(31,10),
initial_sine_drift DECIMAL(31,10),
initial_cosine_drift DECIMAL(31,10),
simulator_software_version INT,
simulator_software_version_date VARCHAR(128),
date_created TIMESTAMP(10),
container VARCHAR(128),
objectname VARCHAR(128),
etag VARCHAR(256),
noise_file_uuid VARCHAR(128)
);
All documentation and software in this repository is licensed under the Apache License, Version 2.0.