## Sparse
The purpose of this notebook is to simulate what would happen given a sparse but highly discriminative feature. An example of which is as follows:

| x_1 | ... | x_n | y   |
| --- | --- | --- | --- |
| 1   | ... | 1   | 1   |
| 2   | ... | 1   | 1   |
| 3   | ... | 0   | 0   |
| 4   | ... | 0   | 0   |
| ... | ... | ... | ... |
| n-2 | ... | 0   | 1   |
| n-1 | ... | 0   | 1   |
| n   | ... | 0   | 1   |

where given feature ``x_1`` the decision boundary is somewhere around ``n/2``. However, if ``x_n`` is equal to 1, although sparse, y is also in the positive class. All features are linear in these cases.

In [2]:
import numpy as np
np.random.seed(42)

## Null
First let's test the impact of completely non-pertinant features

In [350]:
class LogisticRegression:
    def __init__(self, learning_step, learning_rate):
        self.learning_step = learning_step
        self.learning_rate = learning_rate
        self.weights = None
        
    def linear_reg(self, w, x):
        return np.dot(x, w)
    
    def sigmoid(self, z):
        return 1 / ( 1 + np.exp(-1 * z))
    
    def fit(self, x, y):
        w = np.array([[0.0] * x.shape[1]])
        for step in range(self.learning_step):
            z = self.linear_reg(w, x)
            sig = sigmoid(z)
            gradient = (1/x.shape[0]) * np.sum((y - sig))
            w += gradient * self.learning_rate
        self.weights = w
    
    def predict(self, x):
        z = self.linear_reg(self.weights, x)
        return sigmoid(z)

In [351]:
x = np.vstack(
    (
        np.zeros((3, 1)), 
        np.ones((50, 1))
    )
)
y = x

In [352]:
lr = LogisticRegression(10, 0.2)
lr.fit(x, y)
lr.predict(x)

array([[0.5       ],
       [0.5       ],
       [0.5       ],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.77058072],
       [0.770

In [336]:
(lr.weights)

array([[1.44136242]])

In [259]:
(x)

array([[0],
       [0],
       [1]])

In [254]:
z = lr.linear_reg(lr.weights, x)

array([[0.        ],
       [0.        ],
       [1.25547868]])

In [260]:
lr.sigmoid(0)

0.5

In [86]:
z = linear_reg(np.array([3,2,1]), np.array([1,2,3]), 5)
sigmoid(z)

0.999999694097773

In [50]:
def generate_features(feature_ranges, samples=10000):
    feature_vec = []
    for feature in feature_ranges:
        increm = (feature[2] - feature[1])/(samples-1)
        feature_vec.append(
            (
                feature[0],
                [(increm * x) + feature[1] for x in range(samples)]
            )
        )
    return feature_vec

In [61]:
def generate_y(pos_ratio, sample=10000):
    
    return [0] * int(sample * (1 - pos_ratio)) + [1] * int(sample * pos_ratio)

In [62]:
def generate

In [62]:
features = [
    ('x_1', 1, 100000)
]

In [63]:
x = generate_features(features)

[0, 0, 0, 0, 0, 0, 0, 0, 0]