# linear modeling in high dimensions

So far, we have learned how to fit a 1-dimensional linear neural network to data using tensorflow. As is the case with most 'general' computational frameworks, there is a fair amount of code needed to just implement a linear model using a neural network. Perhaps this particular framwork is a bit of 'over kill' for simple 1D linear modeling? However, by using a general framework to implement and fit our model, it makes extending our model to higher dimensions *much* easier.

In this notebook, you'll build, train and use multi-dimensional linear neural networks. This will expand the capability of our network to handle data of *any* dimension, while constraining the model to assume a linear relationship between the explanatory and response variables.

We'll start with 2-dimensional explanatory variables, which we can visualize in a 3D graph, with the response variable on the z-axis.

The following code cell uses scikit-learn's 'make_regression' function to generate a 3D dataset, with 2D explanatory variables and a 1D response

In [None]:
import sklearn.datasets
import matplotlib.pyplot as plt

x,y = sklearn.datasets.make_regression(n_samples=100,
                                       n_features=2,
                                       bias=50.0,
                                       noise=10.0,
                                       random_state=301918)

# let's plot the data in 3D for visualization
fig = plt.figure()
axs = fig.add_subplot(projection='3d')
axs.scatter(x[:,0], x[:,1], y, marker='o')

Running the code block above, you should see a 3D graph with some blue dots lying roughly along a diagonal line. It can be tough to 'see' the diagonal line in the 2D projection of 3D data, but maybe if you squint hard enough...

In any case, you can see that the actual *generation* of the data is *very* simple; the *only* thing we needed to change was

    n_features=1,

for 1-dimensional x values, to

    n_features=2,

for 2-dimensional x values.

Plotting the data in 3 dimensions was a bit more tricky; luckily this isn't a course on data visualization!

Because the "make_regression" funtion return numpy arrays, we can see the 'shape' of the "x" and "y" values using the ".shape" attributes

In [None]:
x.shape

As you can see, the "x" variable is a rank-2 tensor (aka, a "matrix") of values. The first rank is 100-dimensional, and the second rank is 2-dimensional, so the overall tensor is 100x2-dimensional.

In this case, the first dimension holds 100 replicate data samples (ie, independent samples from the *same* linear distribution). Each sample is 2-dimensional data.

You can 'see' the values stored in the "x" variable printed to the screen by executing the following code cell.

In [None]:
print(x)

Looks like about 100 lines of output; each line having a vector with 2 values.

The shape of the "y" variable is a bit different

In [None]:
y.shape

The first rank is 100-dimensional, just like the first rank of the "x" variable, because the "y" variable is holding the corresponding response for *each* value of "x".

It's a little 'weird' that there is a comma after the dimension of the first rank, but nothing after the comma (except the closing parenthesis).

This is because the "y" variable is a rank-1 tensor (ie, a "vector" of values). Python indicates the shape of a vector by using "(" and ")" to indicate a "tuple", but leaving a trailing "," with nothing after it. There's no real deep reason for this; it's just the way python reports the shape of a rank-1 tensor.

You can confirm that "y" is a rank-1 tensor by printing it to the screen:

In [None]:
print(y)