# Download Sample Project


In [None]:
%%shell
rm -f dsatutorialv3.zip
rm -rf ./dsatutorials/
mkdir ./dsatutorials/
#pip -q install --upgrade --no-cache-dir gdown
gdown -q 1hJYcNFfwPyk_u3PwtJ91Hs8Vq6CLcuaH
unzip -q dsatutorialv3.zip -d dsatutorials
echo '====================INFO===================='
echo 'Current Folder:' & echo `pwd`
echo 'CPP code: stored in ./dsatutorials'
echo '============================================'

Current Folder:
/content
CPP code: stored in ./dsatutorials




# Tensor When Not Initialized
* In the case of the ```xt::xtensor``` library:
  * When not initialized or initialized with a scalar value (a number), the tensor **is just a scalar**.
    * Has a ```shape```: an empty tuple
    * Has a ```dimension```: ```0```
    * Has a ```size```: ```1```, because it contains one number.
    * The scalar can be accessed via: ```x``` or ```x[0]```, where ```x``` is the variable name.

  * When a tensor is created or initialized, the tensor variable has a corresponding ```shape```, ```dimension``` (number of dimensions), and ```size``` (number of elements) based on how it was initialized/created.

  * There are three common ways to create a tensor:
    1. Assign an existing tensor, [learn more](https://xtensor.readthedocs.io/en/latest/numpy.html#containers)
    2. Initialize using [number-generating functions](https://xtensor.readthedocs.io/en/latest/numpy.html#initializers) (generated based on rules)
    3. Initialize using [random number-generating functions](https://xtensor.readthedocs.io/en/latest/numpy.html#random).

In [None]:
%%writefile dsatutorials/src/main.cpp

#include <iostream>
#include <iomanip>
#include <sstream>
#include <string>
using namespace std;

#include "sformat/fmt_lib.h"
#include "tensor/xtensor_lib.h"


void tensor_status(){
  xt::xarray<double> data;
  double d = 123;

  cout << "NOT BEEN INITIALIZED/CREATED: Tensor is a scalar (dimension==0)" << endl;
  cout << "data: " << data << endl;
  cout << "data.shape(): " << shape2str(data.shape()) << endl;
  cout << "data.dimension(): " << data.dimension() << endl;
  cout << "data.size(): " << data.size() << endl;
  cout << "d + data: " << d + data << endl;
  cout << "d + data[0]: " << d + data[0] << endl << endl;

  data = 10;
  cout << "INITIALIZED/CREATED WITH SCALAR: Tensor is a scalar (dimension==0)" << endl;
  cout << "data: " << endl << data << endl;
  cout << "data.shape(): " << shape2str(data.shape()) << endl;
  cout << "data.dimension(): " << data.dimension() << endl;
  cout << "data.size(): " << data.size()  << endl;
  cout << "d + data: " << d + data << endl;
  cout << "d + data[0]: " << d + data[0] << endl << endl;

  cout << "INITIALIZED/CREATED WITH A TENSOR: dimension!=0, shape is not empty " << endl;
  data = xt::arange(10).reshape({2,5});
  cout << "A non-empty tensor: (dimension!=0)" << endl;
  cout << "data: " << endl << data << endl;
  cout << "data.shape(): " << shape2str(data.shape()) << endl;
  cout << "data.dimension(): " << data.dimension()<< endl;
  cout << "data.size(): " << data.size() << endl << endl;
}

int main(int argc, char** argv) {
    tensor_status();

    return 0;
}


Overwriting dsatutorials/src/main.cpp


In [None]:
%%shell
cd dsatutorials/
#make clean
make

mkdir -p obj/
g++ -std=c++17 -pthread  -Iinclude -Iinclude/tensor -Iinclude/sformat -Isrc -c   src/main.cpp -o obj/main.o
mkdir -p obj/tensor/
g++ -std=c++17 -pthread  -Iinclude -Iinclude/tensor -Iinclude/sformat -Isrc -c   src/tensor/xtensor_lib.cpp -o obj/tensor/xtensor_lib.o
mkdir -p obj/tensor/
g++ -std=c++17 -pthread  -Iinclude -Iinclude/tensor -Iinclude/sformat -Isrc -c   src/tensor/SampleT.cpp -o obj/tensor/SampleT.o
mkdir -p obj/ann/
g++ -std=c++17 -pthread  -Iinclude -Iinclude/tensor -Iinclude/sformat -Isrc -c   src/ann/SampleB.cpp -o obj/ann/SampleB.o
mkdir -p obj/ann/
g++ -std=c++17 -pthread  -Iinclude -Iinclude/tensor -Iinclude/sformat -Isrc -c   src/ann/SampleA.cpp -o obj/ann/SampleA.o
g++ -std=c++17 -pthread  -Iinclude -Iinclude/tensor -Iinclude/sformat -Isrc   obj/main.o  obj/tensor/xtensor_lib.o  obj/tensor/SampleT.o  obj/ann/SampleB.o  obj/ann/SampleA.o -o program -lm -lpthread 




In [None]:
!./dsatutorials/program

NOT BEEN INITIALIZED/CREATED: Tensor is a scalar (dimension==0)
data:  0.
data.shape(): ()
data.dimension(): 0
data.size(): 1
d + data:  123.
d + data[0]: 123

INITIALIZED/CREATED WITH SCALAR: Tensor is a scalar (dimension==0)
data: 
 10.
data.shape(): ()
data.dimension(): 0
data.size(): 1
d + data:  133.
d + data[0]: 133

INITIALIZED/CREATED WITH A TENSOR: dimension!=0, shape is not empty 
A non-empty tensor: (dimension!=0)
data: 
{{ 0.,  1.,  2.,  3.,  4.},
 { 5.,  6.,  7.,  8.,  9.}}
data.shape(): (2, 5)
data.dimension(): 2
data.size(): 10



# Dataset


## Instructions
* Provided:
  * Source code for ```DataLabel```
  * Source code for ```Batch```
  * Source code for ```Dataset```, which is the parent class of all future ```dataset``` classes.
* Requirement:
  * Complete the source code for the ```TensorDataset``` class, which is a subclass of ```Dataset```.

## Source code
```
template<typename DType, typename LType>
class DataLabel{
private:
    xt::xarray<DType> data;
    xt::xarray<LType> label;
public:
    DataLabel(xt::xarray<DType> data,  xt::xarray<LType> label):
    data(data), label(label){
    }
    xt::xarray<DType> getData() const{ return data; }
    xt::xarray<LType> getLabel() const{ return label; }
};

template<typename DType, typename LType>
class Batch{
private:
    xt::xarray<DType> data;
    xt::xarray<LType> label;
public:
    Batch(xt::xarray<DType> data,  xt::xarray<LType> label):
    data(data), label(label){
    }
    virtual ~Batch(){}
    xt::xarray<DType>& getData(){return data; }
    xt::xarray<LType>& getLabel(){return label; }
};


template<typename DType, typename LType>
class Dataset{
private:
public:
    Dataset(){};
    virtual ~Dataset(){};
    
    virtual int len()=0;
    virtual DataLabel<DType, LType> getitem(int index)=0;
    virtual xt::svector<unsigned long> get_data_shape()=0;
    virtual xt::svector<unsigned long> get_label_shape()=0;
    
};
```

## TensorDataset




### Instructions
* As a subclass of ```Dataset```
* ```TensorDataset``` receives (via constructor):
  * A tensor containing the sample data
  * A tensor containing the labels

* Implementation of ```len()```: **refer to the sample project**
* Implementation of ```getitem(int index)``` includes two cases:
  1. Data is available but labels are not. In this case, the dataset is used only for prediction and labels are not needed for evaluation.
    * To check this case: the tensor containing the labels passed to the constructor is uninitialized, meaning it only contains a scalar.
    * Implementation:
      * In the ```DataLabel``` object, only the data is assigned, and the label is not.
  
  2. Data and labels are both available.
    * To check: The ```dimension``` of the data and labels in the tensors passed to the constructor is not ```0```.
      * The size of dimension ```0``` in both the data and label tensors **must** be the same. A data sample that is located at ```index=i``` along dimension 0 in the data tensor, then it has the label at ```index=i``` in dimension 0 of the label tensor.
    * Implementation:
      * Select the data sample at the given ```index``` in the data tensor.
      * Select the corresponding label at the same ```index``` in the label tensor.
      * Return a ```DataLabel``` object containing the selected data and label.

### Source code
```
//////////////////////////////////////////////////////////////////////
template<typename DType, typename LType>
class TensorDataset: public Dataset<DType, LType>{
private:
    xt::xarray<DType> data;
    xt::xarray<LType> label;
    xt::svector<unsigned long> data_shape, label_shape;
    
public:
    /* TensorDataset:
     * need to initialize:
     * 1. data, label;
     * 2. data_shape, label_shape
    */
    TensorDataset(xt::xarray<DType> data, xt::xarray<LType> label){
        /* TODO: your code is here for the initialization
         */
    }
    /* len():
     *  return the size of dimension 0
    */
    int len(){
        /* TODO: your code is here to return the dataset's length
         */
        return 0; //remove it when complete
    }
    
    /* getitem:
     * return the data item (of type: DataLabel) that is specified by index
     */
    DataLabel<DType, LType> getitem(int index){
        /* TODO: your code is here
         */
    }
    
    xt::svector<unsigned long> get_data_shape(){
        /* TODO: your code is here to return data_shape
         */
    }
    xt::svector<unsigned long> get_label_shape(){
        /* TODO: your code is here to return label_shape
         */
    }
};
```

# DataLoader



## Instructions

* Notes:
  1. The DataLoader is responsible for **loading** the provided ```dataset``` through the constructor, while ```Dataset``` only provides two important methods (besides ``get-data-shape`` and ```get-label-shape```):
    1. ```len()```: returns the total number of data samples
    2. ```getitem()```: returns a pair of data and label; the label may be missing (``dimension: 0``)
  2. **MOST IMPORTANTLY**, your implementation must support the following syntax in ```DataLoader```:

  ```
  DataLoader<double, double>* pLoader; //initialized
  //Hidden code 1
  for(auto batch: *pLoader){
            xt::xarray<double> X = batch.getData();
            xt::xarray<double> t = batch.getLabel();
            //Hidden code 2
  }
  //Hidden code 3
  ```

* The above syntax has the following meaning:
    * **For each batch in the dataset (which contains both data and labels), we process the batch using code marked by ```Hidden Code 2```**

* Implementation:
  * **With Note 2 above in mind**, we deduce that we must implement an ```iterator``` mechanism for the DataLoader class.
    * Sample code for implementing this mechanism is already provided in the ```XArrayList``` class; you have also practiced developing an ```iterator``` for the ```DLinkedList``` class.
    * Basic idea:
      * Add two methods ```begin()``` and ```end()``` to ```DataLoader``` (refer to ```XArrayList``` for details).
      * Add the ```Iterator``` class: you must define at least 3 operators for the ```Iterator``` class:
        1. Inequality operator: ```!=```
        2. Dereference operator ```*```: Note, it must return ```Batch<DType, LType>```
        3. Increment operator: ```++```
  * **With Note 1 above in mind**, it is best to store the index list in the ```DataLoader```. The index list is a sequence of numbers: ```0, 1, 2, ..., (N-1)```, where ```N``` is the total number of samples (returned by the ```len()``` method of the dataset passed into the constructor). Some guidelines are:
    * The data type for this list should be ```xt::xarray<unsigned long>```, to be compatible with the ```xt::random::shuffle``` function used for shuffling data randomly.
    * Use the ```xt::arange``` function to generate the index sequence.
    * Since we are only managing the index list, the DataLoader shuffles the index list randomly using the ```xt::random::shuffle``` function, rather than shuffling the actual data and labels.
    * The ```Iterator``` class should contain the **index of the current batch**.
    * In the implementation of the dereference operator ```*```, we use the stored batch index to determine which data belongs to the current batch. Then, using the list of batch indices, we extract the DataLabel and combine it into a batch to return.
  * Notes:
    1. **shuffle** should only be called when the corresponding parameter passed to it is **true**. Think about where to call this in the ```DataLoader```!
    2. **IMPORTANT NOTE**: There are several ways to shuffle data. To ensure uniformity and compatibility with the grading system, **ALL STUDENTS** are REQUIRED to use the ```xt::random::shuffle``` function for shuffling data samples in the **DataLoader**.
      * An update to the sample project may require an additional **seed** variable to ensure that the shuffle order is the same as the grading key. Further announcements will follow regarding this.

## Source code
```
template<typename DType, typename LType>
class DataLoader{
public:
    
private:
    Dataset<DType, LType>* ptr_dataset;
    int batch_size;
    bool shuffle;
    bool drop_last;
    /*TODO: add more member variables to support the iteration*/
public:
    DataLoader(Dataset<DType, LType>* ptr_dataset,
            int batch_size,
            bool shuffle=true,
            bool drop_last=false){
        /*TODO: Add your code to do the initialization */
    }
    virtual ~DataLoader(){}
    
    /////////////////////////////////////////////////////////////////////////
    // The section for supporting the iteration and for-each to DataLoader //
    /// START: Section                                                     //
    /////////////////////////////////////////////////////////////////////////
    
    /*TODO: Add your code here to support iteration on batch*/
    
    /////////////////////////////////////////////////////////////////////////
    // The section for supporting the iteration and for-each to DataLoader //
    /// END: Section                                                       //
    /////////////////////////////////////////////////////////////////////////
};
```