-
Notifications
You must be signed in to change notification settings - Fork 1
/
codec.clj
121 lines (100 loc) · 3.21 KB
/
codec.clj
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
(ns clj-synapses.codec
"A codec can encode and decode every data point.
One hot encoding is a process that turns discrete attributes into a list of 0.0 and 1.0.
Minmax normalization scales continuous attributes into values between 0.0 and 1.0.
```clojure
(require '[clj-synapses.codec :as codec])
```
There are two ways to create a codec:
1. By providing a list of pairs that define the name and the type of each attribute:
```clojure
(def preprocessor
(codec/->codec
[[\"petal_length\" false]
[\"species\" true]]
[{\"petal_length\" \"1.5\"
\"species\" \"setosa\"}
{\"petal_length\" \"3.8\"
\"species\" \"versicolor\"}]))
```
2. By providing its JSON representation.
```clojure
(def preprocessor
(codec/json->
\"[{\"Case\" : \"SerializableContinuous\",
\"Fields\" : [{\"key\" : \"petal_length\",\"min\" : 1.5,\"max\" : 3.8}]},
{\"Case\" : \"SerializableDiscrete\",
\"Fields\" : [{\"key\" : \"species\",\"values\" : [\"setosa\",\"versicolor\"]}]}]\"))
```
EXAMPLES
Encode a data point:
```clojure
(codec/encode
preprocessor
{\"petal_length\" \"1.5\"
\"species\" \"setosa\"})
;;=> [0.0, 1.0, 0.0]
```
Decode a data point:
```clojure
(codec/decode
preprocessor
[0.0, 1.0, 0.0])
;;=> {\"petal_length\" \"1.5\", \"species\" \"setosa\"}
```
Get the JSON representation of the codec:
```clojure
(codec/->json
preprocessor)
```"
(:import (synapses.custom AttributeWithFlag)
(synapses.jvm CodecJ)))
(defn ->codec
"Returns a codec that can encode and decode every data point.
`attributes` is a vector of pairs that define the name and the type (discrete or not) of each attribute.
```clojure
(codec/->codec
[[\"petal_length\" false]
[\"species\" true]]
[{\"petal_length\" \"1.5\"
\"species\" \"setosa\"}
{\"petal_length\" \"3.8\"
\"species\" \"versicolor\"}])
```"
[attributes data-points]
(let [attrs (into-array
(map
(fn [[attr flag]]
(AttributeWithFlag. attr flag))
attributes))
stream (.stream data-points)]
(CodecJ/apply attrs stream)))
(defn json->
"Returns a codec that can encode and decode every data point.
`json` is the JSON representation of a codec.
```clojure
(codec/json->
\"[{\"Case\" : \"SerializableContinuous\",
\"Fields\" : [{\"key\" : \"petal_length\",\"min\" : 1.5,\"max\" : 3.8}]},
{\"Case\" : \"SerializableDiscrete\",
\"Fields\" : [{\"key\" : \"species\",\"values\" : [\"setosa\",\"versicolor\"]}]}]\")"
[json]
(CodecJ/apply json))
(defn ->json
"Returns the JSON representation of the codec."
[codec]
(.json codec))
(defn encode
"Accepts the `data-point` as a map of strings
and returns the encoded data point as a vector of numbers between 0.0 and 1.0."
[codec data-point]
(vec
(.encode codec data-point)))
(defn decode
"Accepts the `encoded-values` as a vector of numbers between 0.0 and 1.0
and returns the decoded data point as a map of strings."
[codec encoded-values]
(->> encoded-values
double-array
(.decode codec)
(into {})))