forked from apache/madlib
-
Notifications
You must be signed in to change notification settings - Fork 0
/
mainpage.dox.in
278 lines (229 loc) · 10.3 KB
/
mainpage.dox.in
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
/**
@mainpage
Apache MADlib (incubating) is an open-source library for scalable
in-database analytics. It provides data-parallel implementations of
mathematical, statistical and machine learning methods for structured
and unstructured data.
The MADlib mission: to foster widespread development of scalable analytic
skills, by harnessing efforts from commercial practice, academic research,
and open-source development.
Useful links:
<ul>
<li><a href="http://madlib.incubator.apache.org">MADlib web site</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/MADLIB">MADlib wiki</a></li>
<li><a href="https://issues.apache.org/jira/browse/MADLIB/">JIRAs for reporting bugs and reviewing backlog</a></li>
<li><a href="https://mail-archives.apache.org/mod_mbox/incubator-madlib-user/">User mailing list</a></li>
<li><a href="https://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/">Dev mailing list</a></li>
<li>User documentation for earlier releases:
<a href="../v1.8/index.html">v1.8</a>,
<a href="../v1.7.1/index.html">v1.7.1</a>,
<a href="../v1.7/index.html">v1.7</a>,
<a href="../v1.6/index.html">v1.6</a>,
<a href="../v1.5/index.html">v1.5</a>,
<a href="../v1.4/index.html">v1.4</a>,
<a href="../v1.3/index.html">v1.3</a>,
<a href="../v1.2/index.html">v1.2</a>
</li>
</ul>
Please refer to the <a href="https://github.com/apache/incubator-madlib/blob/master/ReadMe.txt">Read-Me</a> file for information
about incorporated third-party material. License information regarding MADlib
and included third-party libraries can be found inside the
<a href="https://github.com/apache/incubator-madlib/blob/master/LICENSE">License</a> directory.
@defgroup grp_datatrans Data Types and Transformations
@{Data types and transformation operations @}
@defgroup grp_arraysmatrix Arrays and Matrices
@ingroup grp_datatrans
@brief Mathematical operations for arrays and matrices
@details
These modules provide basic mathematical operations to be run on array and matrices.
For a distributed system, a matrix cannot simply be represented as a 2D array of numbers in memory.
<b>We provide two forms of distributed representation of a matrix</b>:
- Dense: The matrix is represented as a distributed collection of 1-D arrays.
An example 3x10 matrix would be the below table:
<pre>
row_id | row_vec
--------+-------------------------
1 | {9,6,5,8,5,6,6,3,10,8}
2 | {8,2,2,6,6,10,2,1,9,9}
3 | {3,9,9,9,8,6,3,9,5,6}
</pre>
- Sparse: The matrix is represented using the row and column indices for each
non-zero entry of the matrix. Example:
<pre>
row_id | col_id | value
--------+--------+-------
1 | 1 | 9
1 | 5 | 6
1 | 6 | 6
2 | 1 | 8
3 | 1 | 3
3 | 2 | 9
4 | 7 | 0
(6 rows)
</pre>
All matrix operations work with either form of representation.
In many cases, a matrix function can be <b>decomposed to vector operations
applied independently on each row of a matrix (or corresponding rows of two
matrices)</b>. We have also provided access to these internal vector operations
(\ref grp_array) for greater flexibility. Matrix operations like
<em>matrix_add</em> use the corresponding vector operation (<em>array_add</em>)
and also include additional validation and formating. Other functions like
<em>matrix_mult</em> are complex and use a combination of such vector operations
and other SQL operations.
<b>It's important to note</b> that these array functions are only available for the
dense format representation of the matrix. In general, the scope of a single
array function invocation is limited to only an array (1-dimensional or
2-dimensional) that fits in memory. When such function is executed on a table of
arrays, the function is called multiple times - once for each array (or pair of
arrays). On contrary, scope of a single matrix function invocation is the
complete matrix stored as a distributed table.
@{
@defgroup grp_array Array Operations
@defgroup grp_matrix Matrix Operations
@defgroup grp_matrix_factorization Matrix Factorization
@brief Matrix Factorization methods including Singular Value Decomposition and Low-rank Matrix Factorization
@{
@defgroup grp_lmf Low-rank Matrix Factorization
@defgroup grp_svd Singular Value Decomposition
@}
@defgroup grp_linalg Norms and Distance functions
@defgroup grp_svec Sparse Vectors
@}
@defgroup grp_pca Dimensionality Reduction
@ingroup grp_datatrans
@brief A collection of methods for dimensionality reduction.
@details A collection of methods for dimensionality reduction.
@{
@defgroup grp_pca_train Principal Component Analysis
@defgroup grp_pca_project Principal Component Projection
@}
@defgroup grp_data_prep Encoding Categorical Variables
@ingroup grp_datatrans
@defgroup grp_pivot Pivoting
@ingroup grp_datatrans
@defgroup grp_stemmer Stemming
@ingroup grp_datatrans
@defgroup grp_mdl Model Evaluation
@{Contains functions for evaluating accuracy and validation of predictive methods. @}
@defgroup grp_validation Cross Validation
@ingroup grp_mdl
@defgroup grp_pred Prediction Metrics
@ingroup grp_mdl
@defgroup grp_stats Statistics
@{Contains statistics modules @}
@defgroup grp_desc_stats Descriptive Statistics
@ingroup grp_stats
@{A collection of methods to compute descriptive statistics of the dataset @}
@defgroup grp_correlation Pearson's Correlation
@ingroup grp_desc_stats
@defgroup grp_summary Summary
@ingroup grp_desc_stats
@defgroup grp_inf_stats Inferential Statistics
@ingroup grp_stats
@{A collection of methods to compute inferential statistics on a dataset.@}
@defgroup grp_stats_tests Hypothesis Tests
@ingroup grp_inf_stats
@defgroup grp_prob Probability Functions
@ingroup grp_stats
@defgroup grp_super Supervised Learning
@{Contains methods which perform supervised learning tasks @}
@defgroup grp_crf Conditional Random Field
@ingroup grp_super
@defgroup grp_regml Regression Models
@ingroup grp_super
A collection of methods for modeling conditional expectation of a response variable.
@{
@defgroup grp_clustered_errors Clustered Variance
@defgroup grp_cox_prop_hazards Cox-Proportional Hazards Regression
@defgroup grp_elasticnet Elastic Net Regularization
@defgroup grp_glm Generalized Linear Models
@defgroup grp_linreg Linear Regression
@defgroup grp_logreg Logistic Regression
@defgroup grp_marginal Marginal Effects
@defgroup grp_multinom Multinomial Regression
@defgroup grp_ordinal Ordinal Regression
@defgroup grp_robust Robust Variance
@}
@defgroup grp_svm Support Vector Machines
@ingroup grp_super
@defgroup grp_tree Tree Methods
@ingroup grp_super
@{A collection of recursive partitioning (tree) methods. @}
@defgroup grp_decision_tree Decision Tree
@ingroup grp_tree
@defgroup grp_random_forest Random Forest
@ingroup grp_tree
@defgroup grp_tsa Time Series Analysis
@{A collection of methods to analyze time series data. @}
@defgroup grp_arima ARIMA
@ingroup grp_tsa
@defgroup grp_unsupervised Unsupervised Learning
@{A collection of methods for unsupervised learning tasks @}
@defgroup grp_association_rules Association Rules
@ingroup grp_unsupervised
@{A collection of methods used to uncover interesting patterns in transactional datasets. @}
@defgroup grp_assoc_rules Apriori Algorithm
@ingroup grp_association_rules
@defgroup grp_clustering Clustering
@ingroup grp_unsupervised
@{A collection of methods for clustering data@}
@defgroup grp_kmeans k-Means Clustering
@ingroup grp_clustering
@defgroup grp_topic_modelling Topic Modelling
@ingroup grp_unsupervised
@{A collection of methods to uncover abstract topics in a document corpus @}
@defgroup grp_lda Latent Dirichlet Allocation
@ingroup grp_topic_modelling
@defgroup grp_utility_functions Utility Functions
@defgroup @grp_utilities Developer Database Functions
@ingroup grp_utility_functions
@defgroup grp_linear_solver Linear Solvers
@ingroup grp_utility_functions
@{A collection of methods that implement solutions for systems of consistent linear equations. @}
@defgroup grp_dense_linear_solver Dense Linear Systems
@ingroup grp_linear_solver
@defgroup grp_sparse_linear_solver Sparse Linear Systems
@ingroup grp_linear_solver
@defgroup grp_path Path Functions
@ingroup grp_utility_functions
@defgroup grp_sessionize Sessionize Functions
@ingroup grp_utility_functions
@defgroup grp_pmml PMML Export
@ingroup grp_utility_functions
@defgroup grp_text_analysis Text Analysis
@ingroup grp_utility_functions
@{A collection of methods to find patterns in textual data. @}
@defgroup grp_text_utilities Term Frequency
@ingroup grp_text_analysis
@defgroup grp_early_stage Early Stage Development
@brief A collection of implementations which are in early stage of development.
There may be some issues that will be addressed in a future version.
Interface and implementation are subject to change.
@{
@defgroup grp_sketches Cardinality Estimators
@{
@defgroup grp_countmin CountMin (Cormode-Muthukrishnan)
@defgroup grp_fmsketch FM (Flajolet-Martin)
@defgroup grp_mfvsketch MFV (Most Frequent Values)
@}
@defgroup grp_cg Conjugate Gradient
@defgroup grp_bayes Naive Bayes Classification
@defgroup grp_sample Random Sampling
@defgroup grp_kernmach Support Vector Machines
@}
@defgroup grp_deprecated Deprecated Modules
@{A collection of deprecated modules. @}
@defgroup grp_dectree Decision Tree (old C4.5 implementation)
@ingroup grp_deprecated
@defgroup grp_svdmf Matrix Factorization
@ingroup grp_deprecated
@defgroup grp_mlogreg Multinomial Logistic Regression
@ingroup grp_deprecated
@defgroup grp_profile Profile
@ingroup grp_deprecated
@defgroup grp_quantile Quantile
@ingroup grp_deprecated
@defgroup grp_rf Random Forest (old implementation)
@ingroup grp_deprecated
*/