Skip to content
This repository has been archived by the owner on Mar 28, 2022. It is now read-only.

Integration between echoprint-codegen and Java #68

Open
wants to merge 40 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
f82d57e
bump version numbers to 4.12
Mar 13, 2012
b1605d5
update examples for 4.12
Mar 13, 2012
e3916cd
LDFLAGS fix for ubuntu
danicuki Feb 14, 2014
b9bc614
Integrating echoprint codegen with Java via JNI
danicuki Mar 12, 2014
5d8032f
Merge branch 'master' of github.com:playax/echoprint-codegen into rel…
danicuki Mar 12, 2014
a25af87
removed printf
danicuki Mar 12, 2014
0c0dd47
Update README.md
danicuki Mar 12, 2014
c2abaf2
changed package name to fingerprint
danicuki Mar 12, 2014
9e9dcea
Merge branch 'master' of github.com:playax/echoprint-codegen
danicuki Mar 12, 2014
5c4f531
fix memory leak
danicuki Mar 13, 2014
71f772c
trying to avoid memory leak
danicuki Mar 30, 2014
076d0d2
fixed wrong free command
danicuki Mar 30, 2014
8f6a55e
params to work on both linux and mac
danicuki Mar 30, 2014
fc19d1e
fixed yosemit dependencies
danicuki Feb 23, 2015
4e6099c
merge echonest
Nov 14, 2015
6d63c48
JSON unhashed codes in [b, t, d1, d2] format
adrianomitre Sep 8, 2015
5b0b157
added explanatory comment to tiny shell one-line "script"
adrianomitre Sep 8, 2015
2e47e15
minor adjustments to conform to Java JSON parser
adrianomitre Nov 5, 2015
f0d20de
parallelizable script now assumes executable is in path no current dir
adrianomitre Nov 20, 2015
b08cac8
Code format now defaults to compressed urlsafe base64 encoded mode, but
adrianomitre Nov 21, 2015
ecb7ed5
moved hash related stuff into proper conditional compiling clause
adrianomitre Nov 21, 2015
121436a
moved assert to outmost scope and added saturation for -DNDEBUG
adrianomitre Nov 21, 2015
ec1360c
fix for the very shortest duration with no codes at all
adrianomitre Nov 21, 2015
4c4f720
added '-h' to codegen.sh
adrianomitre Nov 21, 2015
5255d79
minor adjustments/fixes for compatibility with legacy codebase
adrianomitre Nov 23, 2015
e2643ad
createCodeStringJSON: bugfix for very small number of codes (< 3)
adrianomitre Nov 30, 2015
7f517df
discard wasteful zero IOI codes
adrianomitre Nov 30, 2015
6076032
moved stray comment back to where it belongs
adrianomitre Nov 30, 2015
2d63026
Merge pull request #1 from playax/unhashed_codes
danicuki Dec 4, 2015
58e7e1a
fix double quotes in jni lib
Jan 12, 2016
e0d82e3
Fix the #include headers in the whitening files
danilobellini Aug 23, 2016
1e52695
Add a validation test w/ 3 public domain MP3 files
danilobellini Aug 23, 2016
38e036e
Revert "Fix the #include headers in the whitening files"
Aug 27, 2018
d101d49
add cloudbuild recipe for the echoprint-codegen binary
gmega Oct 26, 2021
a3e1f9d
give exec permissions to build-codegen.sh, set correct libboost version
gmega Oct 26, 2021
858f1f4
add openjdk to build
gmega Oct 27, 2021
8ab93d4
set JAVA_HOME for make script
gmega Oct 27, 2021
fa7eca3
use gsutil steps instead of artifact so we can set HTTPS acls
gmega Oct 27, 2021
75792f1
fix gsutil image reference
gmega Oct 27, 2021
6158381
minor changes to docs
gmega Oct 27, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
35 changes: 33 additions & 2 deletions README.md
Expand Up @@ -59,16 +59,47 @@ The makefile builds an example code generator that uses libcodegen, called "code

Will take 30 seconds of audio from 10 seconds into the file and output JSON suitable for querying:

{"metadata":{"artist":"Michael jackson", "release":"800 chansons des annes 80", "title":"Billie jean", "genre":"", "bitrate":192, "sample_rate":44100, "seconds":294, "filename":"billie_jean.mp3", "samples_decoded":220598, "given_duration":30, "start_offset":10, "version":4.00}, "code_count":846, "code":"JxVlIuNwzAMQ1fxCDL133+xo1rnGqNAEcWy/ERa2aKeZmW...
[
{"metadata":{"artist":"Michael Jackson", "release":"Thriller", "title":"Billie Jean", "genre":"", "bitrate":128,"sample_rate":44100, "duration":294, "filename":"billie_jean.mp3", "samples_decoded":330902, "given_duration":30, "start_offset":10, "version":4.12, "codegen_time":0.087329, "decode_time":0.297166}, "code_count":906, "code":"eJztmm2OZacORafEt2E4YGD-Q8jCt1UnXdKlIlVa ..."
]

You can host your own [Echoprint server](http://github.com/echonest/echoprint-server "echoprint-server") and ingest or query to that.
You can POST this JSON directly to the Echo Nest's [song/identify](http://developer.echonest.com/docs/v4/song.html#identify "song/identify") (who has an Echoprint server booted), for example:

curl -F "query=@post_string" http://developer.echonest.com/api/v4/song/identify?api_key=YOUR_KEY
{"response": {"status": {"version": "4.2", "code": 0, "message": "Success"}, "songs": [{"tag": 0, "score": 273, "title": "Billie Jean", "message": "OK (match type 6)", "artist_id": "ARXPPEY1187FB51DF4", "artist_name": "Michael Jackson", "id": "SOJIZLV12A58A78309"}]}}
(you can also use GET, see the API description)

Or you can host your own [Echoprint server](http://github.com/echonest/echoprint-server "echoprint-server") and ingest or query to that.

Codegen also runs in a multithreaded mode for bulk resolving:

./echoprint-codegen -s 10 30 < file_list

Will compute codes for every file in file_list for 30 seconds starting at 10 seconds. (It tries to be smart about the number of threads to use.) It will output a JSON list.

## Integration with Scala and Java via JNI
You can use echoprint-codegen inside a JVM. For this you need to create a class named Echoprint in package com.playax.fingerprint. Here is a Scala example of this class:

package com.playax.fingerprint

class Echoprint {
@native def code(fileName: String): String
}

object Echoprint {
val EP = new Echoprint

System.load("/path/to/libcodegen.4.1.2.dylib")

def code(fileName: String) = EP.code(fileName)
}

Then you call static Method:

Echoprint.code("/path/to/file.mp3")

And it will return the json with fingerprint data.

## Statistics

### Speed
Expand Down
16 changes: 16 additions & 0 deletions build-debian-buster.sh
@@ -0,0 +1,16 @@
#!/usr/bin/env bash
#
# Simple recipe for building echoprint-codegen on Debian Buster. Called
# from our Cloudbuild config.
set -e

apt update
apt install -y build-essential ffmpeg libboost1.67-dev libtag1-dev zlib1g-dev

# This thing needs JNI headers to be built, even if you're building only the CLI.
# Fortunately for us it compiles with the version in Buster.
apt install -y openjdk-11-jdk
export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64/"

cd src
make
30 changes: 30 additions & 0 deletions cloudbuild.yml
@@ -0,0 +1,30 @@
# This config builds echoprint-codegen for a Linux distro and uploads it
# into GCS. This is NOT a good solution, as the binary will probably break
# down if libraries in the OS change even by an inch.
#
# Ideally, this should be repackaged as a microservice and built as a regular
# container image deployed as a sidecar, and the main Playax app should access
# this as an API instead of expecting to have the binary available locally.
#
# Other options would include building an APT or AppImage package, but clearly a
# microservice would provide better isolation for dependencies.
options:
dynamic_substitutions: true

substitutions:
_DISTRO: 'debian'
_RELEASE: 'buster'
_BINARIES_BUCKET: 'gs://files.playax.com/binaries'
_ARTIFACT_URL: '${_BINARIES_BUCKET}/echoprint-codegen/${_DISTRO}-${_RELEASE}/echoprint-codegen'

steps:
- name: '${_DISTRO}:${_RELEASE}'
args: ['bash', '-c', './build-${_DISTRO}-${_RELEASE}.sh']

# We unfortunately cannot use the cloudbuild artifact copy because
# we need to set the ACLs.
- name: 'gcr.io/cloud-builders/gsutil'
args: ['cp', './echoprint-codegen', '${_ARTIFACT_URL}']

- name: 'gcr.io/cloud-builders/gsutil'
args: ['acl', 'ch', '-u', 'AllUsers:r', '${_ARTIFACT_URL}']
6 changes: 6 additions & 0 deletions codegen.sh
@@ -0,0 +1,6 @@
#!/bin/sh
#
# To make it easy to use with GNU Parallel, e.g.,
# parallel codegen.sh ::: *.mp3
#
echoprint-codegen -h "$1" > "$1.json"
4 changes: 2 additions & 2 deletions examples/lookup.py
Expand Up @@ -21,7 +21,7 @@ def lookup(file):
fp = song.util.codegen(file)
if len(fp) and "code" in fp[0]:
# The version parameter to song/identify indicates the use of echoprint
result = song.identify(query_obj=fp, version="4.11")
result = song.identify(query_obj=fp, version="4.12")
print "Got result:", result
if len(result):
print "Artist: %s (%s)" % (result[0].artist_name, result[0].artist_id)
Expand All @@ -36,4 +36,4 @@ def lookup(file):
if len(sys.argv) < 2:
print >>sys.stderr, "Usage: %s <audio file>" % sys.argv[0]
sys.exit(1)
lookup(sys.argv[1])
lookup(sys.argv[1])
47 changes: 47 additions & 0 deletions src/Codegen.cxx
Expand Up @@ -23,6 +23,10 @@ using std::string;
using std::vector;

Codegen::Codegen(const float* pcm, unsigned int numSamples, int start_offset) {
for (int i = 0; i < 2; ++i) {
is_code_string_cached[i] = false;
}

if (Params::AudioStreamInput::MaxSamples < (uint)numSamples)
throw std::runtime_error("File was too big\n");

Expand All @@ -38,7 +42,11 @@ Codegen::Codegen(const float* pcm, unsigned int numSamples, int start_offset) {
Fingerprint *pFingerprint = new Fingerprint(pSubbandAnalysis, start_offset);
pFingerprint->Compute();

#if defined(UNHASHED_CODES)
_CodeString = createCodeStringJSON(pFingerprint->getCodes());
#else
_CodeString = createCodeString(pFingerprint->getCodes());
#endif
_NumCodes = pFingerprint->getCodes().size();

delete pFingerprint;
Expand All @@ -63,6 +71,24 @@ string Codegen::createCodeString(vector<FPCode> vCodes) {
return compress(codestream.str());
}

string Codegen::createCodeStringJSON(vector<FPCode> vCodes) {
std::ostringstream codestream;
codestream << "[";
for (uint i = 0; i < vCodes.size(); i++) {
int hash = vCodes[i].code;
// codestream << std::setw(5) << hash;
codestream << "[" << vCodes[i].frame << ", "
<< ((hash >> 20) & 7) << ", "
<< ((hash >> 10) & 1023) << ", "
<< ((hash >> 0) & 1023)
<< "]";
if (i < vCodes.size()-1) {
codestream << ", ";
}
}
codestream << "]";
return codestream.str();
}

string Codegen::compress(const string& s) {
long max_compressed_length = s.size()*2;
Expand All @@ -89,3 +115,24 @@ string Codegen::compress(const string& s) {
delete [] compressed;
return encoded;
}

std::string Codegen::getCodeString(bool human_readable) {
const uint n = human_readable;
if (!is_code_string_cached[n]) {
is_code_string_cached[n] = true;
if (human_readable) {
if (_CodeString.size() > 0) {
code_string_cache[n] = _CodeString;
} else {
code_string_cache[n] = "[]";
}
} else {
if (_CodeString.size() > 0) {
code_string_cache[n] = '"' + compress(_CodeString) + '"';
} else {
code_string_cache[n] = "\"\"";
}
}
}
return code_string_cache[n];
}
6 changes: 5 additions & 1 deletion src/Codegen.h
Expand Up @@ -33,16 +33,20 @@ class CODEGEN_API Codegen {
public:
Codegen(const float* pcm, unsigned int numSamples, int start_offset);

std::string getCodeString(){return _CodeString;}
std::string getCodeString(bool human_readable);

int getNumCodes(){return _NumCodes;}
static double getVersion() { return ECHOPRINT_VERSION; }
private:
Fingerprint* computeFingerprint(SubbandAnalysis *pSubbandAnalysis, int start_offset);
std::string createCodeString(std::vector<FPCode> vCodes);
std::string createCodeStringJSON(std::vector<FPCode> vCodes);

std::string compress(const std::string& s);
std::string _CodeString;
int _NumCodes;
bool is_code_string_cached[2];
std::string code_string_cache[2];
};

#endif
28 changes: 24 additions & 4 deletions src/Fingerprint.cxx
Expand Up @@ -12,6 +12,8 @@
#include "win_funcs.h"
#endif

#define SATURATE(var, val) if ((var) > (val)) (var) = (val);

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed ) {
// MurmurHash2, by Austin Appleby http://sites.google.com/site/murmurhash/
// m and r are constants set by austin
Expand Down Expand Up @@ -182,13 +184,18 @@ uint Fingerprint::quantized_time_for_frame_absolute(uint frame) {

void Fingerprint::Compute() {
uint actual_codes = 0;
#if !defined(UNHASHED_CODES)
unsigned char hash_material[5];
for(uint i=0;i<5;i++) hash_material[i] = 0;
#endif
uint * onset_counter_for_band;
matrix_u out;
uint onset_count = adaptiveOnsets(345, out, onset_counter_for_band);
_Codes.resize(onset_count*6);

#if defined(UNHASHED_CODES)
assert(SUBBANDS <= 8);
#endif
for(unsigned char band=0;band<SUBBANDS;band++) {
if (onset_counter_for_band[band]>2) {
for(uint onset=0;onset<onset_counter_for_band[band]-2;onset++) {
Expand All @@ -200,7 +207,7 @@ void Fingerprint::Compute() {
p[0][i] = 0;
p[1][i] = 0;
}
int nhashes = 6;
uint nhashes = 6;

if ((int)onset == (int)onset_counter_for_band[band]-4) { nhashes = 3; }
if ((int)onset == (int)onset_counter_for_band[band]-3) { nhashes = 1; }
Expand All @@ -222,16 +229,29 @@ void Fingerprint::Compute() {
}

// For each pair emit a code
for(uint k=0;k<6;k++) {
for(uint k=0; k < nhashes; k++) {
// Quantize the time deltas to 23ms
short time_delta0 = (short)quantized_time_for_frame_delta(p[0][k]);
short time_delta1 = (short)quantized_time_for_frame_delta(p[1][k]);
if (k == 0 && time_delta0 == 0 && time_delta1 == 0) {
continue;
}
uint hashed_code;
#if defined(UNHASHED_CODES)
assert(time_delta0 <= 1023);
assert(time_delta1 <= 1023);
#if defined(NDEBUG)
SATURATE(time_delta0, 1023);
SATURATE(time_delta1, 1023);
#endif
hashed_code = ((band & 7) << 20) | ((time_delta0 & 1023) << 10) | (time_delta1 & 1023);
#else
// Create a key from the time deltas and the band index
memcpy(hash_material+0, (const void*)&time_delta0, 2);
memcpy(hash_material+2, (const void*)&time_delta1, 2);
memcpy(hash_material+4, (const void*)&band, 1);
uint hashed_code = MurmurHash2(&hash_material, 5, HASH_SEED) & HASH_BITMASK;

hashed_code = MurmurHash2(&hash_material, 5, HASH_SEED) & HASH_BITMASK;
#endif
// Set the code alongside the time of onset
_Codes[actual_codes++] = FPCode(time_for_onset_ms_quantized, hashed_code);
//fprintf(stderr, "whee %d,%d: [%d, %d] (%d, %d), %d = %u at %d\n", actual_codes, k, time_delta0, time_delta1, p[0][k], p[1][k], band, hashed_code, time_for_onset_ms_quantized);
Expand Down
18 changes: 11 additions & 7 deletions src/Makefile
@@ -1,18 +1,19 @@
# Version of echoprint, as a list. Is expanded out
# for various version numbers.
EP_VERSION := 4 1 1
EP_VERSION := 4 1 2
VERSION := $(word 1, $(EP_VERSION)).$(word 2, $(EP_VERSION)).$(word 3, $(EP_VERSION))
VERSION_MAJ := $(word 1, $(EP_VERSION))
VERSION_COMPAT := $(word 1, $(EP_VERSION)).$(word 2, $(EP_VERSION))
UNAME := $(shell uname -s)
CXX=g++
CC=gcc
#OPTFLAGS=-g -O0
OPTFLAGS=-O3 -DBOOST_UBLAS_NDEBUG -DNDEBUG
BOOST_CFLAGS=-I/usr/local/include/boost-1_35
#OPTFLAGS=-g -O0 -DUNHASHED_CODES
OPTFLAGS=-O3 -DBOOST_UBLAS_NDEBUG -DNDEBUG -DUNHASHED_CODES
BOOST_CFLAGS=-I/usr/local/include/boost
JNI_CFLAGS=-I$$JAVA_HOME/include/ -I$$JAVA_HOME/include/darwin/ -I$$JAVA_HOME/include/linux/
TAGLIB_CFLAGS=`taglib-config --cflags`
TAGLIB_LIBS=`taglib-config --libs`
CXXFLAGS=-Wall $(BOOST_CFLAGS) $(TAGLIB_CFLAGS) -fPIC $(OPTFLAGS)
CXXFLAGS=-Wall $(BOOST_CFLAGS) $(TAGLIB_CFLAGS) $(JNI_CFLAGS) -fPIC $(OPTFLAGS)
CFLAGS=-Wall -fPIC $(OPTFLAGS)
LDFLAGS=$(TAGLIB_LIBS) -lz -lpthread $(OPTFLAGS)
LIBNAME=libcodegen.so
Expand All @@ -26,7 +27,10 @@ MODULES_LIB = \
Fingerprint.o \
MatrixUtility.o \
SubbandAnalysis.o \
Whitening.o
Whitening.o \
functions.o \
echoprint.o

MODULES = $(MODULES_LIB) Metadata.o

all: libcodegen echoprint-codegen
Expand All @@ -41,7 +45,7 @@ else
endif

echoprint-codegen: $(MODULES) main.o
$(CXX) $(MODULES) main.o $(LDFLAGS) -o ../echoprint-codegen
$(CXX) $(MODULES) main.o -o ../echoprint-codegen $(LDFLAGS)

%.o: %.c %.h
$(CC) $(CFLAGS) -c -o $@ $<
Expand Down