-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
/
faq.html
194 lines (170 loc) · 8.71 KB
/
faq.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
layout: layout.njk
permalink: "{{ page.filePathStem }}.html"
---
{% include "toc.njk" %}
<div class="col-md-9 col-md-pull-3">
<h1 id="faq-top" class="title">FAQ</h1>
<h2 class="question" id="cite">How should I cite Smile?</h2>
<p class="answer">Please cite Smile in your publications if it helps your research. Here is an example BibTeX entry:</p>
<pre><code>
@misc{Li2014Smile,
title={Smile},
author={Haifeng Li},
year={2014},
howpublished={\url{https://haifengl.github.io}},
}
</code></pre>
<h2 class="question" id="link-with-smile">Link with Smile</h2>
<p class="answer">
Smile artifacts are hosted in <a href="https://oss.sonatype.org/#nexus-search;quick~smile-core">Sonatype Nexus</a>.
You can add the following dependency into your pom.xml:</p>
<pre><code>
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-core</artifactId>
<version>3.0.2</version>
</dependency>
</code></pre>
<p>If you're using Gradle, add the following line into your build
file's <code>dependencies</code> section:</p>
<pre><code>
implementation("com.github.haifengl:smile-core:3.0.2")
</code></pre>
<p>If you're using SBT, add the following line into your build file:</p>
<pre><code>
libraryDependencies += "com.github.haifengl" % "smile-core" % "3.0.2"
</code></pre>
<p>For Scala API,</p>
<pre><code>
libraryDependencies += "com.github.haifengl" %% "smile-scala" % "3.0.2"
</code></pre>
<p>Some algorithms rely on BLAS and LAPACK (e.g. manifold learning,
some clustering algorithms, Gaussian Process regression, MLP, etc).
To use these algorithms, you should include OpenBLAS for optimized
matrix computation:</p>
<pre><code>
libraryDependencies ++= Seq(
"org.bytedeco" % "javacpp" % "1.5.8" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64" classifier "linux-arm64" classifier "linux-ppc64le" classifier "android-arm64" classifier "ios-arm64",
"org.bytedeco" % "openblas" % "0.3.21-1.5.8" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64" classifier "linux-arm64" classifier "linux-ppc64le" classifier "android-arm64" classifier "ios-arm64",
"org.bytedeco" % "arpack-ng" % "3.8.0-1.5.8" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64" classifier "linux-arm64" classifier "linux-ppc64le" classifier ""
)
</code></pre>
<p>In this example, we include all supported 64-bit platforms and filter
out 32-bit platforms. The user should include only the needed platforms
to save spaces.</p>
<p>If you prefer other BLAS implementations, you can use any library found
on the "java.library.path" or on the class path, by specifying it with
the "org.bytedeco.openblas.load" system property. For example, to use
the BLAS library from the Accelerate framework on Mac OS X, we can pass
options such as `-Djava.library.path=/usr/lib/ -Dorg.bytedeco.openblas.load=blas`.</p>
<p>For a default installation of MKL that would be `-Dorg.bytedeco.openblas.load=mkl_rt`.
Or you may simply include `smile-mkl` module in your project, which
includes MKL binaries. With `smile-mkl` module in the class path, Smile
will automatically switch to MKL.</p>
<pre><code>
libraryDependencies += "com.github.haifengl" % "smile-mkl" % "3.0.2"
</code></pre>
<h2 class="question" id="model-serialization">Model serialization</h2>
<p class="answer">To serialize a model, you may use</p>
<pre class="prettyprint lang-scala"><code>
write(model, file)
</code></pre>
<p>This method is in the Scala API <code>smile.write</code> object and serialize the model to Java
serialization format. This is handy if you want to use a model in Spark.</p>
<h2 class="question" id="data-format">Data Format</h2>
<p class="answer">
Most Smile algorithms take simple <code>double[]</code> as input. So you can use your favorite methods
or library to import the data as long as the samples are in double arrays. Meanwhile, Smile provides
a couple of parsers in <code>smile.io</code> for popular data formats, such as CSV, Weka's ARFF files,
LibSVM's file format, delimited text files, SAS, Parquet, Avro, Arrow, JSON, and binary sparse data.
</p>
<h2 class="question" id="pom">Cannot build Smile with maven</h2>
<p class="answer">
We have moved to SBT to build packages. The maven pom.xml files were deprecated and were removed in v1.2.0.
</p>
<h2 class="question" id="headless-plot">Headless Plot</h2>
<p class="answer">
In case that your environment does not have a display or you need to generate and save
a lot of plots without showing them on the screen, you may run Smile in headless model.
</p>
<ul class="nav nav-tabs">
<li class="active"><a href="#scala_1" data-toggle="tab">Scala</a></li>
<li><a href="#java_1" data-toggle="tab">Java</a></li>
</ul>
<div class="tab-content">
<div class="tab-pane active" id="scala_1">
<div class="code" style="text-align: left;">
<pre class="prettyprint lang-sh"><code>
bin/smile -Djava.awt.headless=true
</code></pre>
</div>
</div>
<div class="tab-pane" id="java_1">
<div class="code" style="text-align: left;">
<pre class="prettyprint lang-sh"><code>
bin/jshell.sh -R-Djava.awt.headless=true
</code></pre>
</div>
</div>
</div>
<p>The following example shows how to save a plot in the headless mode.</p>
<ul class="nav nav-tabs">
<li class="active"><a href="#scala_2" data-toggle="tab">Scala</a></li>
<li><a href="#java_2" data-toggle="tab">Java</a></li>
</ul>
<div class="tab-content">
<div class="tab-pane active" id="scala_2">
<div class="code" style="text-align: left;">
<pre class="prettyprint lang-scala"><code>
val toy = read.csv("data/classification/toy/toy-train.txt", delimiter='\t', header=false)
val canvas = plot(toy, "V2", "V3", "V1", '.')
val image = canvas.toBufferedImage(400, 400)
javax.imageio.ImageIO.write(image, "png", new java.io.File("headless.png"))
</code></pre>
</div>
</div>
<div class="tab-pane" id="java_2">
<div class="code" style="text-align: left;">
<pre class="prettyprint lang-java"><code>
import java.awt.Color;
import smile.io.*;
import smile.plot.swing.*;
import org.apache.commons.csv.CSVFormat;
var toy = Read.csv("data/classification/toy/toy-train.txt", CSVFormat.DEFAULT.withDelimiter('\t'));
var canvas = ScatterPlot.of(toy, "V2", "V3", "V1", '.').canvas();
var image = canvas.toBufferedImage(400, 400);
javax.imageio.ImageIO.write(image, "png", new java.io.File("headless.png"));
</code></pre>
</div>
</div>
</div>
<h2 class="question" id="seed">How can I set the random number generator seed for random forest?</h2>
<p class="answer">
This is a common question for stochastic algorithms like random forest.
In general, this is discouraged because people often choose bad seed due to the lack
of sufficient knowledge of random number generation. However, one may want the repeatable
result for testing purpose. In this case, call <code>smile.math.MathEx.setSeed</code> before
training the model.
</p>
<p>
Note that we don't provide a method to set the seed for a particular algorithm.
Many algorithms are multithreaded and each thread has their own random number
generator. We choose this design because each random number generator maintains
an internal state so that it is not multithread-safe. If multithreads share a
random number generator, we have to use locks, which significant reduce the
performance.
</p>
<p>
A method <code>setSeed()</code> in the algorithm is also troublesome.
For algorithms like random forest, it is not right to initialize every thread
with the same seed. Otherwise, same decision trees will be created
and we lose the randomness of "random" forest. It is also complicated
to pass a sequence of random numbers because it is not clear
how many random number generators are needed for many algorithms. Even worse, it breaks
the encapsulation as the caller has to know the details of algorithms.
</p>
</div>
<script type="text/javascript">
$('#toc').toc({exclude: 'h1, h5, h6', context: '', autoId: true, numerate: false});
</script>