# Create Random Data

This is going to create data that can be used for training a model. The data is very simple. It will be created with a simple model:

1. $\mathbf y$ is an (N,1) vector
2. $\mathbf X$ is an (N,M) matrix, and
3. $\mathbf a$ is an (M,1) vector

\begin{equation}
\mathbf{y} = \mathbf{X} \times \mathbf a
\end{equation}

This is an exceedingly simple model for demo purposes only and thus the data that will be cretaed will have the following pre-determined values - 

```
a = np.array([2, 3]).reshape((-1, 1))
```

Since $\mathbf y$ and $\mathbf X$ are related through this simple example, we can create a simple pyTorch model that will be able to predict the value of $\mathbf a$. We know what we are trying to predict - the vector $[2,3]^T$

In [3]:
import numpy as np

M, N = 2, 5000

a = np.array([2, 3]).reshape((-1, 1))
X = np.random.rand(N, M)
y = X @ a + np.random.rand()

# Saving the data

After calculating the data $\mathbf X$ and $\mathbf y$, we shall save the data within the S3 bucket - `sankha-test-data-folder`. This will be done outside of this code because we wish to be able to download the data directly form the S3 bucket through a shell script.

In [6]:
import os

os.makedirs('../../tempData', exist_ok=True)
np.save( '../../tempData/y.npy' , y)
np.save( '../../tempData/X.npy' , X)

In [7]:
!ls ../../tempData

X.npy  y.npy


## Push the data into the S3 bucket

In [11]:
! aws s3 ls | grep sankha

2021-03-01 21:19:08 [01;31m[Ksankha[m[K-test-data-folder
2021-03-01 21:18:27 [01;31m[Ksankha[m[K-test-models-folder


In [21]:
!aws s3 cp ../../tempData s3://sankha-test-data-folder/ --recursive

Completed 78.2 KiB/117.4 KiB (809.3 KiB/s) with 2 file(s) remainingupload: ../../tempData/X.npy to s3://sankha-test-data-folder/X.npy 
Completed 78.2 KiB/117.4 KiB (809.3 KiB/s) with 1 file(s) remainingCompleted 117.4 KiB/117.4 KiB (1002.0 KiB/s) with 1 file(s) remainingupload: ../../tempData/y.npy to s3://sankha-test-data-folder/y.npy   


In [22]:
!aws s3 ls sankha-test-data-folder

2021-03-01 21:52:46      80128 X.npy
2021-03-01 21:52:46      40128 y.npy
