In [2]:
:dep dami = {version="0.1.0",path="/home/caleb/dami"}

In [3]:
:dep ndarray-rand = "0.11.0"
:dep ndarray = "0.13.1"
:dep rand_isaac = "0.2.0"

In [4]:
extern crate dami;
use ndarray::{Array1,Array2};
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;
use ndarray_rand::rand::SeedableRng;
use dami::prelude::*;
use rand_isaac::isaac64::Isaac64Rng;

# Object creation
Methods for creating series include
> From a vector

> From an Array1 type of ndarray

> From a generic slice T (&[T])

> From a HashMap of type <&str,T>

> From an array of up to 32 characters ie `Series::from([1,2,3,5,7])`

Methods for creating DataFrames include
> From an Array2

> From a Vec<Vec\<T>> (Though not recomended)

> From a HashMap\<&str,Array1\<T>> The keys are sorted first (use `try_from`) with this method

In [5]:
let seed = 42;
let mut rng = Isaac64Rng::seed_from_u64(seed);
let series=Series::from(Array1::random_using(100, Uniform::new(0., 10.),&mut rng)); // Create a series of random numbers

let df = DataFrame::from(Array2::random_using((5000,4),Uniform::new(0., 10.),&mut rng)); // Create a DataFrame from a 2 dimensional array;

# Basic functionality

## Viewing data

use `Series::head(n)` or `DataFrame::head(n)` function
orr
`Series::tail(n)` or `DataFrame::tail(n)` function

In [14]:
series.head(5); // Print first 5 elements

 0  9.5799465 
 1  4.709879 
 2  3.991177 
 3  7.7817917 
 4  7.337223 



In [15]:
df.tail(4);

       0      1      2      3 
 4996  5.577  8.571  4.985  0.517 
 4997  2.130  0.635  8.496  9.514 
 4998  4.572  6.353  9.921  8.895 
 4999  6.388  2.541  9.106  0.412 



`to_ndarray()` method converts the DataFrame into an 2*n dimension ndarray of type `T`

In [8]:
let arr2:Array2<f64> = df.to_ndarray().unwrap(); // Convert the series to an ndarray
println!("{}",arr2[[0,0]]); // Print the zeroeth element

6.238008172545168


Printing large DataFrames and Series results in the first  and last `5` columns being printed

**Note**: Printing is an expensive operation. Avoid it when necessary

In [28]:
println!("{:?}",df);

        0      1      2      3 
 0      6.238  3.541  2.578  2.977 
 1      1.670  8.913  4.563  6.787 
 2      3.508  0.842  6.366  3.504 
 3      6.711  5.527  5.262  6.728 
 4      8.475  7.133  9.196  8.163 
 ....   ....   ....   ....   .... 
 4995   9.361  1.599  4.003  1.189 
 4996   5.577  8.571  4.985  0.517 
 4997   2.130  0.635  8.496  9.514 
 4998   4.572  6.353  9.921  8.895 
 4999   6.388  2.541  9.106  0.412 
                              
 types  f64    f64    f64    f64 

[5000 rows x 4 columns]


# Applying operations

Dami supports many methods for applying functions to data

The methods include:
 * `apply()` which takes a function that takes an `Array1<T>` and returns a scalar
 
 * `apply_map()` which takes a scalar `T` and returns another scalar of similar type
 
 * `par_apply_map()` Like `apply_map()` but uses rayon for parallelism
 
 * `transform()`: Which accepts an `Array1<T>` and returns a new `Array<P>` which is used to build a new DataFrame. This is done using parallel threads
 
Functions taking an Array have an optional `axis` argument, If `axis` is false. The function receives columns. If true it receives rows.

In [29]:
let sum_cols= df.apply::<f64,_>(false,|f| f.sum()); // Sum all values column-wise
let sum_rows = df.apply::<f64,_>(true,|f| f.sum()); // Sum all values row-wise
println!("{:?}",sum_cols.unwrap());
println!("{:?}",sum_rows.unwrap());

 index        values 
 0            24918.2925 
 1            24994.5661 
 2            25075.6544 
 3            24931.8165 
               
 name:series  dtype:f64 

 index        values      
 0            15.3354     
 1            21.9323     
 2            14.2213     
 3            24.2274     
 4            32.9673     
                          
 4995         16.1531     
 4996         19.6507     
 4997         20.7750     
 4998         29.7420     
 4999         18.4479     
 name:series  dtype:f64  length:5000 



# Getting Data in

## Reading CSV files

The parser will try to use the first 10 columns for each row to determine its type
Integers become `i32`,floats become `f64` and strings remain `String`

In [11]:
let df_csv=read_csv("./a.csv",None);
println!("{:?}",df_csv); /// By Default the first row becomes names

        id    TV       radio   newspaper  sales 
 0      1     230.100  37.800  69.200     22.100 
 1      2     44.500   39.300  45.100     10.400 
 2      3     17.200   45.900  69.300     9.300 
 3      4     151.500  41.300  58.500     18.500 
 4      5     180.800  10.800  58.400     12.900 
 ....   ....  ....     ....    ....       .... 
 195    196   38.200   3.700   13.800     7.600 
 196    197   94.200   4.900   8.100      9.700 
 197    198   177.000  9.300   6.400      12.800 
 198    199   283.600  42.000  66.200     25.500 
 199    200   232.100  8.600   8.700      13.400 
                                           
 types  i32   f64      f64     f64        f64 

[200 rows x 5 columns]


Reading JSON files is also supported out of the box, with currently some limitations
* Reading of array like json files is not yet supported
* Parsing Date time values
* Reading oriented files

In [17]:
let df_json=read_json("./test.json",true);
println!("{:?}",df_json);

        authors                            category       date        headline                           link                               short_description 
 0      Melissa Jeltsen                    CRIME          2018-05-26  There Were 2 Mass Shootings In...  https://www.huffingtonpost.com...  She left her husband. He kille... 
 1      Andy McDonald                      ENTERTAINMENT  2018-05-26  Will Smith Joins Diplo And Nic...  https://www.huffingtonpost.com...  Of course it has a song. 
 2      Ron Dicker                         ENTERTAINMENT  2018-05-26  Hugh Grant Marries For The Fir...  https://www.huffingtonpost.com...  The actor and his longtime gir... 
 3      Ron Dicker                         ENTERTAINMENT  2018-05-26  Jim Carrey Blasts 'Castrato' A...  https://www.huffingtonpost.com...  The actor gives Dems an ass-ki... 
 4      Ron Dicker                         ENTERTAINMENT  2018-05-26  Julianna Margulies Uses Donald...  https://www.huffingtonpost.com...  The "Dietl

# Plotting

Both Series and DataFrame plots are supported

Only numerical type data will be plotted (`f32`,`f64`,`i32` and `i64` columns in DataFrames while anything implementing `Num` trait in Series)

To plot in a normal environment use `plot()` for jupyter environments use `plot_evcxr()` the latter requires `ploty` extension to work see [here](https://github.com/plotly/plotly.py)


In [16]:
series.plot_evcxr("bar"); //Bar plots

In [19]:
series.plot_evcxr("line");

In [21]:
series.plot_evcxr("hist");

In [23]:
df_csv.plot_evcxr("bar");

In [30]:
df.plot_evcxr("box"); // Only for dataframes