To standardize a dataset, you need to perform the following steps:

- Calculate the mean and the standard deviation of each feature (column) in the dataset. The mean is the average of all the values in the feature, and the standard deviation is a measure of how much the values vary from the mean. Mathematically, you can write them as:

$$\mu_j = \frac{1}{n} \sum_{i=1}^n x_{ij}$$

$$\sigma_j = \sqrt{\frac{1}{n} \sum_{i=1}^n (x_{ij} - \mu_j)^2}$$

where $x_{ij}$ is the value of the $j$-th feature for the $i$-th sample, $n$ is the number of samples, $\mu_j$ is the mean of the $j$-th feature, and $\sigma_j$ is the standard deviation of the $j$-th feature.

- Subtract the mean from each value in the feature and divide it by the standard deviation. This will center the data to zero mean and scale it to unit variance. Mathematically, you can write this as:

$$z_{ij} = \frac{x_{ij} - \mu_j}{\sigma_j}$$

where $z_{ij}$ is the standardized value of the $j$-th feature for the $i$-th sample.

- Repeat this process for all the features in the dataset.

sklearn.preprocessing.scale() is a function from the scikit-learn library that standardizes a dataset along any axis. It centers the data to the mean and scales it to unit variance. 

In [5]:
# Import the library
from sklearn import preprocessing
import numpy as np

# Create some sample data
X = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Convert the list to a numpy array
X = np.array(X)

# Print the mean and standard deviation of each column
print("Mean:", X.mean(axis=0))
print("Std:", X.std(axis=0))

# Scale the data using the function
X_scaled = preprocessing.scale(X)

# Print the scaled data, mean and standard deviation
print("Scaled data:\n", X_scaled)
print("Mean:", X_scaled.mean(axis=0))
print("Std:", X_scaled.std(axis=0))

Mean: [4. 5. 6.]
Std: [2.44948974 2.44948974 2.44948974]
Scaled data:
 [[-1.22474487 -1.22474487 -1.22474487]
 [ 0.          0.          0.        ]
 [ 1.22474487  1.22474487  1.22474487]]
Mean: [0. 0. 0.]
Std: [1. 1. 1.]


Axis=0 is a parameter that can be used in some NumPy functions to specify the direction along which an operation is performed. For example, if you have a 2-dimensional array, axis=0 means the direction along the rows, and axis=1 means the direction along the columns12

When you use axis=0 in a NumPy function, it means that the function will apply to each column of the array, and return an array with one less dimension. For example, if you use np.sum(a, axis=0), it will sum up all the elements in each column of array a, and return a 1-dimensional array with the column sums3

In [4]:
# Import the NumPy library
import numpy as np

# Create a 2-dimensional array
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

# Print the shape of the array
print("Shape of a:", a.shape)

# Sum the elements along axis=0
b = np.sum(a, axis=0)

c = np.sum(a, axis=1)

# Print the result and its shape
print("Sum along axis=0:", b)
print("Sum along axis=1:", c)
print("Shape of b:", b.shape)
print("Shape of c:", c.shape)

Shape of a: (4, 3)
Sum along axis=0: [22 26 30]
Sum along axis=1: [ 6 15 24 33]
Shape of b: (3,)
Shape of c: (4,)
