Skip to content

Commit

Permalink
Update readme, paper and add performance analysis
Browse files Browse the repository at this point in the history
  • Loading branch information
gmrandazzo committed Aug 18, 2023
1 parent f033f8d commit 78af325
Show file tree
Hide file tree
Showing 8,864 changed files with 1,789,612 additions and 14 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
15 changes: 11 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,14 +152,19 @@ The required dependencies to use libscientific are:
Install
=======

Compile from source
Manual Installation
-------------------

```
cmake -DCMAKE_INSTALL_PREFIX=/usr/local/ ..
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr/ ..
make -j5
mate test
make test # optional
sudo make install
cd ../src/python_bindings/
sudo pip install -e .
pytest # optional
```


Expand Down Expand Up @@ -212,7 +217,9 @@ How to write a unit test?

Please first read the cmake documentation about [testing with cmake and ctest](https://cmake.org/cmake/help/book/mastering-cmake/chapter/Testing%20With%20CMake%20and%20CTest.html)

Then write a test for the algorithm you propose and save it in "src/tests" directory.
Then write a test for the proposed algorithm and save it in "src/tests" directory.
Please write a test in src/python_bindings/tests for the python binding. The test should work using pytest.

Run and submit the resulting output in the pull request specifying:
- What the algorithm does
- What the unit tests represent and what they prove.
28 changes: 18 additions & 10 deletions paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Additionally, libscientific comes with a foreign function Python bindings, makin
One of the main advantages of libscientific is its performance. Because the library is written in C, it is highly optimized for performance.
This means that large data sets can be analyzed quickly and efficiently, making it an ideal choice for applications where speed is critical.
The library depends only on lapack for SVD and eigenvalues decomposition and can be easily integrated into embedded systems.
The current library version is 1.4.1, and here is a list of the current library features:
The current library version is 1.5.2, and here is a list of the current library features:

* Principal Component Analysis (PCA)
* Consensus Principal Component Analysis (CPCA)
Expand Down Expand Up @@ -86,24 +86,32 @@ Moreover, multi-thread cross-validation methodologies such as "Bootstrap k-fold"

Since we are dealing with numerical analysis, unit tests are crucial to ensure correctness, stability, and reproducibility.
Libcientific tests range from simple matrix-vector multiplication to the correctness of complex algorithms using ad-hoc torture toy examples.
Every algorithm is then tested to answer the following questions:

1. Is the algorithm able to work in fit and prediction correctly?
2. Is the algorithm able to work with large data?
3. Is the algorithm showing any memory leak?

# Speed and Memory Comparison
<WORK HERE>
for PCA, PLS, CPCA, UPCA, UPLS, MaxDis, MostDesc, LDA
Calculation time v.s n istances v.s. n column v.s. memory in use

Several simulations of every algorithm in libscientific with data of different sizes (input size) against CPU speed were performed to address the algorithm's performance.
Looking at their plots, we observe a linear trend, which indicates that the algorithm's time complexity is linear, denoted as O(n) in computational complexity analysis.



This means that as the input size (often termed "problem size") increases by a constant factor, the execution time also increases proportionally (linear algorithms).
Linear algorithms have notable characteristics:

* Linear Time Complexity (O(n)): Execution time grows linearly with input size.
* Constant Work per Input Element: In linear algorithms, each input element is processed continuously.
* Stable Performance Impact: Doubling input size roughly doubles execution time, facilitating performance estimation.
* Optimal Scaling: Linear-time solutions efficiently handle larger inputs.

Hence we have demonstrated that libscientific scales linearly with the data.

# Usage

For the usage in C or either Python we invite reading the official documentation located at the following link: [https://libscientific.readthedocs.io/en/latest/](https://libscientific.readthedocs.io/en/latest)

# Conclusions

Libscientific is a powerful library that provides a comprehensive set of multivariate analysis tools for researchers and analysts. Whether a scientists work on research or data analytics, libscientific can help gain deeper insights into the data. Its C-based implementation and Python bindings offer high performance and ease of use, making it an ideal choice for data-driven applications.
Libscientific is a powerful library that provides a comprehensive set of multivariate analysis tools for researchers and analysts to gain insights from any tabular data.
Its C-based implementation and Python bindings offer high performance and ease of use, making it an ideal choice for data-driven applications.


# Acknowledgements
Expand Down
4 changes: 4 additions & 0 deletions performance/compile.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
gcc pca_tests.c -o pca_tests -lscientific -L/usr/local/lib/ -I/usr/local/include/
gcc pls_tests.c -o pls_tests -lscientific -L/usr/local/lib/ -I/usr/local/include/
gcc cpca_tests.c -o cpca_tests -lscientific -L/usr/local/lib/ -I/usr/local/include/
gcc mlr_tests.c -o mlr_tests -lscientific -L/usr/local/lib -I/usr/local/include
Binary file added performance/cpca_input_vs_cputime.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 80 additions & 0 deletions performance/cpca_results.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
row,cols,memory(bytes),time(sec)
100,200, 1327604328,8.675820
100,400, 4343863800,15.620278
100,600, 9469970216,24.797488
100,800, 9444818704,21.428896
100,1000, 14679033600,29.023142
200,200, 2965168784,10.391137
200,400, 9306730392,18.765008
200,600, 23925804480,36.730166
200,800, 17041610520,26.095095
200,1000, 29505105984,39.799624
300,200, 6444664312,16.355999
300,400, 12457558600,19.630145
300,600, 22591082208,29.081535
300,800, 46314158880,49.794156
300,1000, 38102937936,44.495233
400,200, 5377502568,10.464380
400,400, 21954380952,26.831416
400,600, 43482496344,44.829673
400,800, 60654977952,57.379893
400,1000, 100764812600,87.782441
500,200, 11454195000,18.254532
500,400, 18024871112,20.246205
500,600, 39702301912,38.584180
500,800, 70849195160,63.459719
500,1000, 95715655112,82.131486
600,200, 12851324240,18.171843
600,400, 24825323968,25.162617
600,600, 64717963288,56.559039
600,800, 88243551864,73.796681
600,1000, 104756525608,92.067282
700,200, 29737531448,37.433778
700,400, 50093274200,44.573169
700,600, 57177372480,50.111591
700,800, 85246084632,73.842536
700,1000, 125326545936,106.062573
800,200, 29113398280,32.523147
800,400, 49861044200,41.241217
800,600, 111347667912,82.745866
800,800, 123620046656,93.865083
800,1000, 200797621872,145.274660
900,200, 33546609400,34.839483
900,400, 311732657728,223.466877
900,600, 143180104568,102.193757
900,800, 145327781176,107.601150
900,1000, 206451557936,154.146450
1000,200, 23914041344,23.505789
1000,400, 71758345688,52.504927
1000,600, 275805646808,174.307242
1000,800, 218349124024,146.587305
1000,1000, 212373730312,157.059295
1100,200, 23959793040,22.733436
1100,400, 104773618808,75.058265
1100,600, 97053250608,72.575148
1100,800, 396649624744,239.698296
1100,1000, 977641098376,549.429024
1200,200, 22959225472,20.809504
1200,400, 100922773984,67.903802
1200,600, 122430253864,85.987361
1200,800, 245683873624,163.482132
1200,1000, 245595544992,182.789533
1300,200, 51509652648,42.084883
1300,400, 109712117048,72.398606
1300,600, 184093795408,119.438574
1300,800, 201707767840,148.449587
1300,1000, 255990556032,207.471233
1400,200, 27295490504,22.938381
1400,400, 104797239544,68.939476
1400,600, 181877234544,127.348124
1400,800, 452044676376,293.104443
1400,1000, 300913572840,242.864053
1500,200, 53044481232,44.361880
1500,400, 167188560768,108.709916
1500,600, 151846866264,117.049791
1500,800, 277789405952,200.623615
1500,1000, 621108719096,432.582696
1600,200, 51911369608,42.139743
1600,400, 89615501512,60.704128
1600,600, 175751520648,122.046247
1600,800, 743978558304,432.372454
Binary file added performance/cpca_tests
Binary file not shown.
38 changes: 38 additions & 0 deletions performance/cpca_tests.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#include <stdio.h>
#include <scientific.h>
#include <sys/time.h>

int main(int argc, char **argv)
{
tensor *t; // Definition of the input matrix
CPCAMODEL *model; // Definition of the PCA model
int i, j, k;
int nobj = atoi(argv[1]);
int nvars = atoi(argv[2]);
int norder = atoi(argv[3]);
struct timeval tv1, tv2;
NewTensor(&t, norder);


for(k = 0; k < norder; k++){
NewTensorMatrix(t, k, nobj, nvars);
srand_(time(0));
for(i = 0; i < nobj; i++){
for(j = 0; j < nvars; j++){
t->m[k]->data[i][j] = randDouble(0,100);
}
}
}

gettimeofday(&tv1, NULL);
NewCPCAModel(&model);
CPCA(t, 1, 5, model);
gettimeofday(&tv2, NULL);
fprintf( stdout, "Total time (sec): %f\n",
(double) (tv2.tv_usec - tv1.tv_usec) / 1000000 +
(double) (tv2.tv_sec - tv1.tv_sec));
DelCPCAModel(&model);
DelTensor(&t);
}


4 changes: 4 additions & 0 deletions performance/make_plots.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
python3 plot.py pca_results.csv; mv input_vs_cputime.png pca_input_vs_cputime.png
python3 plot.py cpca_results.csv; mv input_vs_cputime.png cpca_input_vs_cputime.png
python3 plot.py pls_results.csv; mv input_vs_cputime.png pls_input_vs_cputime.png
python3 plot.py mlr_results.csv; mv input_vs_cputime.png mlr_input_vs_cputime.png
Binary file added performance/mlr_input_vs_cputime.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 78af325

Please sign in to comment.