### I have a gaussian distribution of angles between 0 and 360. It is peaked at a reference. How

do I exctract a uniform distribution from it? I was thinking of using
Inverse transform sampling

To extract a uniform distribution from a Gaussian distribution of
angles, you can indeed use the inverse transform sampling method, but
with some adjustments given that the original distribution is not
uniform.

Here’s a concise approach:

1.  **Normalize the Gaussian**: Ensure your Gaussian distribution's
    probability density function (PDF) is normalized over the range \[0,
    360\] degrees.

2.  **Compute the Cumulative Distribution Function (CDF)**: Determine
    the CDF of your Gaussian distribution over the range. This should be
    straightforward using numerical integration or using a cumulative
    normal distribution formula adjusted for your mean and variance.

3.  **Apply Inverse Transform Sampling**:

    -   Draw a uniform random sample $ u $ from the interval \[0, 1\].
    -   Use the inverse of the CDF to find the corresponding angle, that
        is, $ \theta = CDF^{-1}(u) $.

Because the original Gaussian is bounded and wraps at 360 degrees, pay
careful attention to edge cases where numerical inaccuracies might cause
values near 0 or 360 to behave unexpectedly. You may want to numerically
compute or piecewise define the inverse if a closed form is not
available. Also, ensure that any angles wrap correctly between 0 and 360
by using modulo arithmetic where needed.

### How do you implement this in python

To implement inverse transform sampling in Python to extract samples
from a Gaussian distribution of angles and map them to a uniform
distribution, you'll typically follow these steps:

1.  Define your Gaussian PDF over the range \[0, 360\].
2.  Compute its CDF.
3.  Use the inverse of this CDF (or a numerical approximation) to
    transform uniform random samples to follow the Gaussian
    distribution.

Here is a simple implementation outline using libraries like `numpy` and
`scipy`:

``` python
import numpy as np
from scipy.stats import norm

# Parameters for the Gaussian distribution
mean_angle = 180  # example mean angle
stddev_angle = 30  # example standard deviation

# Define a function for the wrapped Gaussian PDF
def wrapped_gaussian_pdf(angles, mean, stddev):
    return norm.pdf((angles + 360) % 360, mean, stddev)

# Define the CDF using numerical integration
def wrapped_gaussian_cdf(angles, mean, stddev):
    # Normalization to ensure the distribution sums to 1 over the [0, 360) range
    angle_range = np.linspace(0, 360, 1000)
    pdf_values = wrapped_gaussian_pdf(angle_range, mean, stddev)
    cdf_values = np.cumsum(pdf_values) / np.sum(pdf_values)
    return np.interp(angles, angle_range, cdf_values)

# Inverse transform sampling
def inverse_transform_sampling(num_samples, mean, stddev):
    # Generate uniform samples
    uniform_samples = np.random.rand(num_samples)

    # Map the uniform samples to Gaussian distributed angles using the CDF
    angle_range = np.linspace(0, 360, 1000)
    cdf_values = wrapped_gaussian_cdf(angle_range, mean, stddev)
    sampled_angles = np.interp(uniform_samples, cdf_values, angle_range)  # inverse CDF

    return sampled_angles

# Example usage
num_samples = 1000
uniform_samples = inverse_transform_sampling(num_samples, mean_angle, stddev_angle)

# 'uniform_samples' should now contain samples extracted from the Gaussian distribution
# mapped back into uniform distribution space over [0, 360).
print(uniform_samples)
```

### Key Points:

-   **Numerical Integration**: The CDF is computed numerically since no
    closed form is straight forward due to the wrap-around nature.
-   **Interpolation**: `numpy.interp` is used to approximate the inverse
    CDF. This works by inverting the mapping from the uniform samples to
    the angles.
-   **Normalization**: Ensure that your PDF is properly normalized
    across the interval to accurately reflect the wrapped nature.

This approach uses interpolation and numerical techniques appropriate
for the fact that the distribution is bounded and needs careful handling
at the edges.

### This still gives me a gaussian distribution

My apologies for the confusion. It seems I misunderstood your original
request. If you want to transform a Gaussian distribution into a uniform
distribution, a direct approach like applying inverse transform sampling
will not achieve this because it's used to generate samples following
the original distribution, rather than transforming a given distribution
itself.

Instead, if your data is Gaussian-distributed and you want to
effectively "flatten" it to become uniformly distributed, you can do the
following:

1.  Normalize your Gaussian distribution data using its CDF to map it to
    a uniform distribution in the unit interval \[0,1\].
2.  Scale this to your desired interval, such as \[0, 360\].

Here’s how you might achieve this transformation using Python:

``` ipython
import numpy as np
from scipy.stats import norm

# Original Gaussian parameters
mean_angle = 180
stddev_angle = 30

# Generate Gaussian-distributed data
num_samples = 1000
gaussian_data = np.random.normal(loc=mean_angle, scale=stddev_angle, size=num_samples)

# Convert Gaussian data to uniform distribution using its CDF
uniform_data_01 = norm.cdf(gaussian_data, loc=mean_angle, scale=stddev_angle)

# Map the uniform distribution [0,1] to [0, 360]
uniform_data_scaled = uniform_data_01 * 360

print(uniform_data_scaled)
```

### Explanation

-   **`norm.cdf(...)`**: This step maps the Gaussian data to a uniform
    distribution over \[0,1\]. The CDF computes the probability
    (fraction) that the Gaussian variable is below a certain value,
    which maps directly to a uniform distribution.

-   **Scaling**: After the data is converted to a uniform \[0,1\]
    distribution, multiply by 360 to stretch this to the desired angular
    range.

This method will effectively "flatten" the Gaussian to uniform. Please
make sure that the distribution properties (e.g., its domain and
wrap-around behavior) are correctly accounted for during this process.

### But I get the indices from the original distribution that are kept in the new sample? I can't just generate new data.

To address your requirement where you need to transform existing
Gaussian-distributed data into a uniform distribution while keeping
track of indices, you can follow this procedure, which will leverage the
cumulative distribution function (CDF) to map your existing data into a
uniform distribution:

1.  Compute the CDF values of your existing data.
2.  Use these CDF values as your new uniform samples.
3.  Optionally scale these uniform samples to your desired range, such
    as \[0, 360\].

Here's how you can do this while maintaining the indices of the original
data:

``` ipython
import numpy as np
from scipy.stats import norm

# Original Gaussian-distributed data (e.g., angles)
mean_angle = 180
stddev_angle = 30
num_samples = 1000

# Simulate Gaussian data (for demonstration; replace with your actual data)
gaussian_data = np.random.normal(loc=mean_angle, scale=stddev_angle, size=num_samples)

# Sort the data to properly map their CDF values
sorted_indices = np.argsort(gaussian_data)
sorted_data = gaussian_data[sorted_indices]

# Compute the CDF values for the sorted data
cdf_values = norm.cdf(sorted_data, loc=mean_angle, scale=stddev_angle)

# Map CDF values to [0, 360] to get uniform distribution
uniform_mapped = cdf_values * 360

# Reorder the uniform distribution to match original indices
uniform_data_with_original_indices = np.empty_like(uniform_mapped)
uniform_data_with_original_indices[sorted_indices] = uniform_mapped

# Output the uniform distribution and original indices
print(uniform_data_with_original_indices)
```

This approach retains the original indices of the data:

-   **Sorting**: By sorting the original data, you can calculate the CDF
    values in a properly ordered manner, ensuring accuracy in mapping to
    a uniform distribution.

-   **Index Tracking**: Use `np.argsort` to keep track of the original
    indices, and then reorder the mapped data back to the original order
    using these indices.

This method transforms your existing Gaussian-distributed data to a
uniform distribution by leveraging the properties of the CDF to map
percentiles properly. This ensures you are not generating new data and
can track transformations to the original samples.

### This is not exactly what I want. I want to subsample from the original gaussian and the subsample angles to be uniformly distributed

To subsample from an existing Gaussian distribution in such a way that
the angles in the subsample are uniformly distributed, you can use the
following approach:

1.  **Calculate the CDF**: Compute the cumulative distribution function
    (CDF) for your Gaussian data.

2.  **Generate Uniform Random Values**: Generate uniform random values
    that cover the complete domain of the CDF.

3.  **Inverse of CDF**: Use these uniform random values to find
    corresponding values in your Gaussian data by inverting the CDF.

Here’s how to implement it in Python:

``` python
import numpy as np
from scipy.stats import norm

# Original Gaussian-distributed data (angles)
mean_angle = 180
stddev_angle = 30
num_samples = 1000

# Simulate Gaussian data (for demonstration; replace with your actual data)
gaussian_data = np.random.normal(loc=mean_angle, scale=stddev_angle, size=num_samples)

# Sort the data and compute CDF values
sorted_indices = np.argsort(gaussian_data)
sorted_data = gaussian_data[sorted_indices]
cdf_values = norm.cdf(sorted_data, loc=mean_angle, scale=stddev_angle)

# Generate uniformly spaced CDF values corresponding to the number of samples you want to extract
sample_size = 500  # Number of uniformly distributed samples needed
uniform_cdf_values = np.linspace(0, 1, sample_size, endpoint=False)[1:]  # Avoid exact 0 and 1 for CDF

# Find the indices where these uniform CDF values would fit into the sorted CDF
subsampled_indices = np.searchsorted(cdf_values, uniform_cdf_values)

# Get the corresponding samples from the original data
uniformly_distributed_samples = sorted_data[subsampled_indices]

# Optionally restore to original positions
original_indices_subsampled = sorted_indices[subsampled_indices]

# Output the subsampled angles and their indices in the original dataset
print("Subsampled Angles:", uniformly_distributed_samples)
print("Original Indices of Subsampled Data:", original_indices_subsampled)
```

### Explanation:

-   **Sorting**: First, sort your Gaussian data to compute the proper
    CDF values.

-   **Uniform CDF Values**: Generate uniformly spaced CDF values
    corresponding to the part of the distribution you want to sample.

-   **Indexing with `np.searchsorted`**: Find indices in the sorted CDF
    that approximate these uniform CDF values, effectively mapping
    uniform distribution to your Gaussian data.

-   **Subsample Extraction**: Use these indices to extract the subsample
    from your sorted dataset, resulting in a uniform distribution of
    subsampled angles.

This approach allows you to extract a uniformly distributed subsample
from an existing set of Gaussian-distributed angles without generating
new data.

### This does not work

I apologize for the previous confusion. Let's try a different approach
to subsample from your existing Gaussian distribution such that the
subsample should have its values (angles) uniformly distributed across
the range. Here’s a method that might better achieve what you’re aiming
for:

1.  **Sort and Rank Data**: Start by sorting the Gaussian data. This
    will allow us to map ranks to intervals.

2.  **Partition into Equal Intervals**: The sorted data is partitioned
    so that equal numbers of samples fall into each interval that spans
    from the minimum to the maximum value.

3.  **Randomly Select Within Each Interval**: From each interval,
    randomly pick a sample. This ensures uniformity in the overall
    subsample distribution.

Here is the implementation in Python:

``` ipython
import numpy as np

# Example Gaussian-distributed data
mean_angle = 180
stddev_angle = 30
num_samples = 1000

# Simulate Gaussian data (for demonstration; replace with your actual data)
gaussian_data = np.random.normal(loc=mean_angle, scale=stddev_angle, size=num_samples)

# Number of uniform samples you want
sample_size = 100

# Sort data
sorted_data = np.sort(gaussian_data)

# Partition into uniform intervals
step = num_samples // sample_size

# Choose one random index from each interval
selected_indices = [np.random.randint(i * step, (i + 1) * step) for i in range(sample_size)]
subsample = sorted_data[selected_indices]

plt.hist(subsample)
plt.show()
```

### Explanation

-   **Sorting**: The data is sorted to make partitioning effective.

-   **Equal Intervals**: The total number of samples is divided by the
    desired sample size, determining step size, which defines the
    intervals in the sorted data.

-   **Random Selection**: Within each interval, a random selection
    guarantees that each interval contributes to the final uniform
    subsample, simulating a more uniform distribution overall.

This approach ensures the selected subsample is spread across the entire
range of the original dataset, approximating a uniform distribution by
selecting from equal quantiles. Adjust the `sample_size` parameter to
control the granularity of uniformity.

### the sample is clearly gaussian

I apologize for the oversight. Let's refine the approach to achieve the
desired uniform distribution from your Gaussian sample:

1.  **Histogram Binning**: Divide the range of your data into
    equal-sized bins.
2.  **Select a Fixed Number of Samples From Each Bin**: Randomly pick a
    fixed number of samples from each bin to ensure uniform
    representation across the bins.

Here's the corrected implementation:

``` ipython
import numpy as np
import matplotlib.pyplot as plt

# Original Gaussian-distributed data
mean_angle = 180
stddev_angle = 30
num_samples = 1000

# Simulate Gaussian data (replace with your actual data)
gaussian_data = np.random.normal(loc=mean_angle, scale=stddev_angle, size=num_samples)

# Number of bins and subsamples per bin
num_bins = 10
samples_per_bin = 10

# Create histogram bins
counts, bin_edges = np.histogram(gaussian_data, bins=num_bins)

# Collect uniform samples from each bin
subsample = []
for i in range(num_bins):
    bin_mask = (gaussian_data >= bin_edges[i]) & (gaussian_data < bin_edges[i+1])
    bin_data = gaussian_data[bin_mask]
    if len(bin_data) >= samples_per_bin:
        chosen_samples = np.random.choice(bin_data, samples_per_bin, replace=False)
    else:
        chosen_samples = bin_data  # take whatever is available for smaller bins
    subsample.extend(chosen_samples)

plt.hist(subsample, bins=num_bins)
plt.title("Uniform Subsampled Data")
plt.show()
```

### Explanation:

-   **Histogram Binning**: The data is divided into bins to partition it
    into segments.

-   **Random Sampling**: A fixed number of points are randomly selected
    from each bin to ensure each bin contributes equally to the final
    subsample, leading to a uniform distribution.

-   **Handling Sparse Bins**: If a bin has fewer points than required,
    all available points are taken.

This approach attempts to create uniformity by sampling evenly across
the distribution’s span. You may need to adjust `num_bins` and
`samples_per_bin` to suit your dataset's characteristics.

**\***

To extract indices from the original Gaussian sample that would form a
uniform distribution when transformed, you can use the following
approach:

1.  **Generate Gaussian Samples**: First, create your
    Gaussian-distributed data.
2.  **Sort the Samples**: Sort these samples and associate the original
    indices.
3.  **Select the Indices Using Uniform Quantiles**: Select indices at
    uniform intervals from the sorted array.

Here's a Python implementation:

``` ipython
import numpy as np

# Parameters for the Gaussian distribution
mean_angle = 180
stddev_angle = 30

# Generate Gaussian-distributed data
num_samples = 1000
gaussian_data = np.random.normal(loc=mean_angle, scale=stddev_angle, size=num_samples)

# Track indices of the original data points
indices = np.arange(num_samples)

# Sort the data and keep track of indices
sorted_indices = np.argsort(gaussian_data)
sorted_data = gaussian_data[sorted_indices]

# Desired number of uniform samples
num_uniform_samples = 100

# Calculate indices for uniform sampling using quantiles
uniform_sampled_indices = np.round(np.linspace(0, num_samples - 1, num_uniform_samples)).astype(int)

# Get the original indices that correspond to the uniform samples
original_indices_for_uniform = sorted_indices[uniform_sampled_indices]

# The samples which will form a uniform distribution
uniform_samples = gaussian_data[original_indices_for_uniform]

print("Uniform Samples:", uniform_samples)
print("Original Indices:", original_indices_for_uniform)
```

**Explanation**:

-   **Sorting & Indexing:** We first sort the Gaussian data and keep
    track of the indices. This step allows us to select a specific
    number of evenly spaced samples from the sorted Gaussian data.
-   **Quantiles:** By selecting indices at regular intervals (using
    `np.linspace`), you mimic selecting uniform quantiles from the
    original sorted data.
-   **Original Indices:** The indices provide a way to reference back to
    which original samples net the uniform distribution effect when
    sorted.

This method allows you to retain the original indices that relate to the
samples you keep, helping in tracing back to the original dataset.

### This does not work

I apologize for any confusion. Let's go through a corrected approach to
find indices from the original Gaussian-distributed data that would
produce uniformly distributed angles:

1.  **Generate Gaussian Data:** Start with your Gaussian distribution.
2.  **Sort and Compute CDFs:** Sort the data and calculate the
    cumulative distribution function (CDF).
3.  **Select Uniformly using Quantiles:** Obtain indices corresponding
    to uniform quantile intervals.

Here's a more accurate implementation to achieve this:

``` ipython
import numpy as np
from scipy.stats import norm

# Parameters for the Gaussian distribution
mean_angle = 180
stddev_angle = 30

# Generate Gaussian-distributed data
num_samples = 1000
gaussian_data = np.random.normal(loc=mean_angle, scale=stddev_angle, size=num_samples)

# Track indices of the original data points
indices = np.arange(num_samples)

# Sort the data and indices
sorted_indices = np.argsort(gaussian_data)
sorted_data = gaussian_data[sorted_indices]

# Calculate the CDF of the sorted data
cdf_values = norm.cdf(sorted_data, loc=mean_angle, scale=stddev_angle)

# Desired number of uniform samples
num_uniform_samples = 100

# Compute uniform quantiles
uniform_quantiles = np.linspace(0, 1, num_uniform_samples, endpoint=False)

# Find indices closest to the uniform quantiles
uniform_indices = np.searchsorted(cdf_values, uniform_quantiles)

# Get the original indices for these quantile positions
original_indices_for_uniform = sorted_indices[uniform_indices]

# The samples from the Gaussian distribution that map to a uniform distribution via CDF
uniform_samples = gaussian_data[original_indices_for_uniform]

print("Uniform Samples:", uniform_samples)
print("Original Indices:", original_indices_for_uniform)
```

**Explanation:**

-   **CDF Calculation:** By using the CDF, we map our Gaussian data to a
    uniform distribution.
-   **Uniform Quantiles:** Use these to find the positions in the sorted
    CDF that correspond to uniform probabilities.
-   **Searchsorted:** Finds where these quantiles would fit in the
    sorted CDF, giving indices that you can trace back to the original
    data.

This approach should help you take indices from the original data such
that, when picked, they produce angles distributed uniformly.

**\***

To extract a uniform distribution from your angle distribution, you can
try the following approach:

1.  **Identify the Reference Angle**: Determine the reference angle
    where your original distribution is peaked.

2.  **Modify Probability Distribution**: Instead of drawing directly
    from your original angle distribution, apply a transformation to
    alter the probability of selecting each angle. One way is to define
    a probability density function (PDF) that favors angles around the
    reference + 180°.

3.  **Gaussian Transformation**: Create a Gaussian distribution centered
    at the reference + 180°. This will serve as your weighting function.

    $$
      f(\theta) = \frac{1}{\sigma \sqrt{2\pi}} \exp\left(-\frac{(\theta - (\text{reference} + 180^\circ))^2}{2\sigma^2}\right)
      $$

    Here, $\sigma$ is the standard deviation that controls the spread of
    your weighting function.

4.  **Resampling**: Use the weighting function $f(\theta)$ to adjust the
    likelihood of selecting each angle from your distribution. This
    means you'll have higher probabilities of selecting angles around
    reference + 180°, effectively reducing the contribution from the
    peak at the reference angle.

5.  **Normalization**: Since you're changing the distribution by
    considering a new PDF, ensure the resulting set maintains proper
    normalization. You might need to renormalize the resulting weights
    to sum up to one when generating the final set of angles.

6.  **Inverse Transformation Sampling**: If necessary, use inverse
    transform sampling or rejection sampling with these new weights to
    generate angles that follow the modified distribution.

By applying this strategy, angles around the initial peak become less
likely, and those around the reference + 180° become more prominent,
helping achieve a more uniform-like distribution across the 0 to 360°
range.

### How do I implement this in python?

To implement the extraction of a uniform-like distribution from your
angular distribution in Python, you can follow these steps. I'll provide
a basic outline using NumPy, assuming you've established the initial
angle distribution and identified the reference angle.

### Step-by-step Python Implementation:

``` ipython
import numpy as np
import matplotlib.pyplot as plt

# Generate a sample distribution of angles with Gaussian noise centered around a reference angle
n_samples = 10000  # Number of samples in the distribution
reference_angle = 30  # Example reference angle in degrees
sigma_original = 10  # Standard deviation of the original angle distribution

# Create the original angle distribution centered at the reference angle
original_angles = np.random.normal(reference_angle, sigma_original, n_samples) % 360

# Set up parameters for the Gaussian weighting function centered at reference + 180 degrees
offset_angle = (reference_angle + 180) % 360
sigma_weight = 30  # Standard deviation for the weighting function

def gaussian_weight(theta, center, sigma):
    return np.exp(-0.5 * ((theta - center) % 360)**2 / sigma**2)

# Calculate weights for the inverse probability distribution
weights = gaussian_weight(original_angles, offset_angle, sigma_weight)

# Resample according to the inverse likelihood (e.g., higher weight means less probability to select)
prob_weights = 1 - (weights / weights.max())
prob_weights /= prob_weights.sum()  # Normalize probabilities

# Draw samples based on the calculated probabilities
uniform_like_samples = np.random.choice(original_angles, size=n_samples, p=prob_weights)

# Plot results to visualize
plt.hist(uniform_like_samples, bins=360, range=(0, 360), alpha=0.5, label="Transformed Distribution")
plt.hist(original_angles, bins=360, range=(0, 360), alpha=0.5, label="Original Distribution")
plt.axvline(reference_angle, color='r', linestyle='dashed', linewidth=1, label='Reference Angle')
plt.axvline(offset_angle, color='g', linestyle='dashed', linewidth=1, label='Offset Angle')
plt.legend()
plt.xlabel("Angle (Degrees)")
plt.ylabel("Frequency")
plt.show()
```

### Key Points:

-   **Original Sample Generation**: Uses a Gaussian distribution to
    generate initial angle samples centered at a reference angle.
-   **Weighting Function**: Defines a Gaussian weighting function
    centered at the reference angle + 180°.
-   **Probabilistic Resampling**: Resamples angles using a probability
    inversely proportional to the weighting function values, encouraging
    uniformity.
-   **Result Visualization**: Plots both the original and the
    transformed distributions for comparison.

This example assumes you have a clear peak in the initial distribution
and want to work with the angular space defined in degrees. Adjust
`n_samples`, `sigma_original`, and `sigma_weight` as needed to better
match the specifics of your dataset or desired results.

### There must be an error the resulting distribution is far from uniform and still peaked at the reference

It seems like the approach used is not having the intended effect. Let's
refine the method to better flatten the distribution. A different
strategy involves using the Gaussian weighting function to selectively
reduce the sampling probability from the original peak, encouraging
selection of angles further away from it. Here is a refined approach:

``` ipython
import numpy as np
import matplotlib.pyplot as plt

# Parameters
n_samples = 10000  # Number of samples to generate
reference_angle = 30  # Example reference angle in degrees
sigma_original = 10  # Std deviation of original Gaussian peak
sigma_weight = 30  # Std deviation for the weighting function

# Generate a sample distribution of angles centered around the reference angle
original_angles = np.random.normal(reference_angle, sigma_original, n_samples) % 360

# Calculate weights for each angle in the transformed distribution
def gaussian_weight(theta, center, sigma):
    return np.exp(-0.5 * ((theta - center + 180) % 360 - 180)**2 / sigma**2)

weights = gaussian_weight(original_angles, reference_angle, sigma_weight)

# Invert the weights to reduce probability near the original peak
inverted_weights = 1.0 / (weights + np.finfo(float).eps)  # Avoid division by zero

# Normalize the inverted weights to form a probability distribution
prob_weights = inverted_weights / inverted_weights.sum()

# Resample according to the modified weights
uniform_like_samples = np.random.choice(original_angles, size=n_samples, p=prob_weights)

# Plot results
plt.figure(figsize=(10, 5))
plt.hist(original_angles, bins=360, range=(0, 360), alpha=0.5, label="Original Distribution")
plt.hist(uniform_like_samples, bins=360, range=(0, 360), alpha=0.5, label="Transformed Distribution")
plt.axvline(reference_angle, color='r', linestyle='dashed', linewidth=1, label='Reference Angle')
plt.legend()
plt.xlabel("Angle (Degrees)")
plt.ylabel("Frequency")
plt.title("Original and Transformed Angle Distributions")
plt.show()
```

### Key Changes and Explanation:

-   **Weight Inversion**: We're now inverting the weights directly
    derived from the Gaussian around the reference angle to reduce their
    influence in that region.
-   **Normalization**: The inverted weights are normalized to form a
    valid probability distribution for resampling.
-   **Sampling Process**: The resampling step now uses the modified
    weights to select angles with reduced probabilities near the
    original peak.

This should help in flattening the original peak and result in a more
uniform-like distribution, with higher probabilities assigned to angles
previously less represented. Adjust `sigma_weight` to control the spread
of the new weighting function and achieve the desired level of
uniformity.

### This is almost working but now it samples to many angles at reference + 180. I am sure that there should be a way to make it uniform

To achieve a more uniform distribution, we might need a different
approach that avoids biasing the sampling to specific regions, such as
the reference + 180°. A potential method involves:

1.  Using a cumulative distribution function (CDF) to approximate the
    original distribution.
2.  Applying transformations to produce a uniform distribution from the
    CDF.

Here's a step-by-step guide to implement this:

``` python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

# * Parameters
n_samples = 10000
reference_angle = 30
sigma_original = 75

# * Generate the original angle distribution centered at the reference angle
original_angles = np.random.normal(reference_angle, sigma_original, n_samples) % 360

# * Kernel Density Estimate to approximate the PDF of the original distribution
kde = gaussian_kde(original_angles, bw_method='silverman')
x_values = np.linspace(0, 360, 1000)
pdf_values = kde(x_values)

# * Normalize PDF to create a valid probability distribution function
pdf_values /= np.sum(pdf_values)

# * Compute cumulative distribution function (CDF)
cdf_values = np.cumsum(pdf_values)
cdf_values /= cdf_values[-1]  # Normalize to ensure it ends at 1

# * Use inverse transform sampling to draw samples from a uniform distribution over the CDF
random_values = np.random.rand(n_samples)
uniform_like_samples = np.interp(random_values, cdf_values, x_values)

#* Plot the results
plt.figure(figsize=(10, 5))
plt.hist(original_angles, bins=360, range=(0, 360), alpha=0.5, label="Original Distribution")
plt.hist(uniform_like_samples, bins=360, range=(0, 360), alpha=0.5, label="Transformed Distribution")
plt.axvline(reference_angle, color='r', linestyle='dashed', linewidth=1, label='Reference Angle')
plt.legend()
plt.xlabel("Angle (Degrees)")
plt.ylabel("Frequency")
plt.title("Original and Transformed Angle Distributions")
plt.show()
```

**Key Steps and Explanation:**

-   **Kernel Density Estimation (KDE):** Approximates the PDF of the
    original angle distribution.
-   **CDF Computation:** The CDF is derived from the approximated PDF to
    map distribution to a uniform space.
-   **Inverse Transform Sampling:** Uses the CDF to draw samples
    uniformly across the angle space.

This process should help in creating a more balanced angle distribution
by transforming the original distribution into uniform-like samples.
Adjust the bandwidth in the KDE if necessary for better PDF
approximation.

### This does not work

I apologize for the confusion. Let's revisit the approach to ensure a
truly uniform distribution across angles from your original
distribution. Here's a refined method that leverages histogram
equalization, a classical method to achieve uniform distributions from
an existing data distribution:

``` python
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d

# Parameters
n_samples = 10000  # Number of samples to generate
reference_angle = 30  # Example reference angle in degrees
sigma_original = 75  # Standard deviation of the original distribution

# Generate a sample distribution centered around the reference angle
original_angles = np.random.normal(reference_angle, sigma_original, n_samples) % 360

# Compute histogram of the original angles
hist_values, bin_edges = np.histogram(original_angles, bins=360, range=(0, 360), density=True)

# Compute the cumulative distribution function (CDF)
cdf_values = np.cumsum(hist_values)
cdf_values /= cdf_values[-1]  # Normalize to make it a valid CDF

# Interpolate the inverse CDF
inverse_cdf = interp1d(cdf_values, bin_edges[:-1])

# Generate uniform random samples
random_values = np.random.rand(n_samples)

# Map uniform samples through the inverse CDF to transform them into angle samples
uniform_like_samples = inverse_cdf(random_values)

# Plot results
plt.figure(figsize=(10, 5))
plt.hist(original_angles, bins=360, range=(0, 360), alpha=0.5, label="Original Distribution")
plt.hist(uniform_like_samples, bins=360, range=(0, 360), alpha=0.5, label="Transformed Distribution")
plt.axvline(reference_angle, color='r', linestyle='dashed', linewidth=1, label='Reference Angle')
plt.legend()
plt.xlabel("Angle (Degrees)")
plt.ylabel("Frequency")
plt.title("Original and Transformed Angle Distributions")
plt.show()
```

**Explanation:**

-   **Histogram and CDF Calculation**: This approach uses the histogram
    of your original distribution to create a cumulative distribution
    function (CDF).

-   **Inverse CDF Interpolation**: By interpolating the inverse CDF, we
    can map uniform random samples into the angle space of the original
    distribution.

-   **Uniform Sampling**: The uniform random values are mapped through
    the inverse CDF to result in transformed samples that should ideally
    spread more uniformly across the entire 0 to 360-degree range.

This approach aims at distributing samples more uniformly by equalizing
them across the cumulative distribution of the input data. Adjust
parameters such as the histogram bins for finer granularity to achieve
the desired uniformity.

### This raises an error

``` example
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[125], line 27
     24 random_values = np.random.rand(n_samples)
     26 # Map uniform samples through the inverse CDF to transform them into angle samples
---> 27 uniform_like_samples = inverse_cdf(random_values)
     29 # Plot results
     30 plt.figure(figsize=(10, 5))

File ~/mambaforge/envs/torch/lib/python3.10/site-packages/scipy/interpolate/_polyint.py:80, in _Interpolator1D.__call__(self, x)
     59 """
     60 Evaluate the interpolant
     61
   (...)
     77
     78 """
     79 x, x_shape = self._prepare_x(x)
---> 80 y = self._evaluate(x)
     81 return self._finish_y(y, x_shape)

File ~/mambaforge/envs/torch/lib/python3.10/site-packages/scipy/interpolate/_interpolate.py:755, in interp1d._evaluate(self, x_new)
    753 y_new = self._call(self, x_new)
    754 if not self._extrapolate:
--> 755     below_bounds, above_bounds = self._check_bounds(x_new)
    756     if len(y_new) > 0:
    757         # Note fill_value must be broadcast up to the proper size
    758         # and flattened to work here
    759         y_new[below_bounds] = self._fill_value_below

File ~/mambaforge/envs/torch/lib/python3.10/site-packages/scipy/interpolate/_interpolate.py:784, in interp1d._check_bounds(self, x_new)
    782 if self.bounds_error and below_bounds.any():
    783     below_bounds_value = x_new[np.argmax(below_bounds)]
--> 784     raise ValueError("A value ({}) in x_new is below "
    785                      "the interpolation range's minimum value ({})."
    786                      .format(below_bounds_value, self.x[0]))
    787 if self.bounds_error and above_bounds.any():
    788     above_bounds_value = x_new[np.argmax(above_bounds)]

ValueError: A value (0.0015298750278947004) in x_new is below the interpolation range's minimum value (0.004300000000000001).
```

The error occurs when attempting to interpolate values of
`random_values` that exceed the bounds of `cdf_values`. This happens
because `random_values` can generate values as close as 0 or 1, but
interpolation can't handle values outside the CDF's defined range. To
fix this, ensure `inverse_cdf` can map from 0 to 1 properly by modifying
the interpolation setup.

Here’s a revised version of the code to address this:

``` ipython
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d

# Parameters
n_samples = 10000  # Number of samples to generate
reference_angle = 30  # Example reference angle in degrees
sigma_original = 75  # Standard deviation of the original distribution

# Generate a sample distribution centered around the reference angle
original_angles = np.random.normal(reference_angle, sigma_original, n_samples) % 360

# Compute histogram of the original angles
hist_values, bin_edges = np.histogram(original_angles, bins=360, range=(0, 360), density=True)

# Compute the cumulative distribution function (CDF)
cdf_values = np.cumsum(hist_values)
cdf_values /= cdf_values[-1]  # Normalize to make it a valid CDF

# Adjust bin edges and CDF values for interpolation
extended_bin_edges = np.concatenate(([0], bin_edges[1:], [360]))
extended_cdf_values = np.concatenate(([0], cdf_values, [1]))

# Interpolate the inverse CDF over its full range
inverse_cdf = interp1d(extended_cdf_values, extended_bin_edges, bounds_error=False, fill_value=(0, 360))

# Generate uniform random samples
random_values = np.random.rand(n_samples)

# Map uniform samples through the inverse CDF to transform them into angle samples
uniform_like_samples = inverse_cdf(random_values)

# Plot results
plt.figure(figsize=(10, 5))
plt.hist(original_angles, bins=360, range=(0, 360), alpha=0.5, label="Original Distribution")
plt.hist(uniform_like_samples, bins=360, range=(0, 360), alpha=0.5, label="Transformed Distribution")
plt.axvline(reference_angle, color='r', linestyle='dashed', linewidth=1, label='Reference Angle')
plt.legend()
plt.xlabel("Angle (Degrees)")
plt.ylabel("Frequency")
plt.title("Original and Transformed Angle Distributions")
plt.show()
```

### Key Changes:

-   **Extended CDF and Bin Edges:** Adding `[0]` to the start and
    `[360]` to the end of the bin edges, accompanied by `0` and `1` for
    the CDF, ensures that the interpolation is defined across the entire
    `[0, 1]` interval.
-   **Fill Value for Interpolation:** Set the `fill_value` parameter in
    `interp1d` to extrapolate values appropriately to ensure no
    out-of-bound errors occur.

This revised code should generate a distribution uniformly sampled
across 0 to 360 degrees without encountering interpolation boundary
issues.

**\***

To balance the distribution and prevent over-sampling around reference +
180°, we can refine the method by combining the current weight inversion
technique with a broader and more moderate adjustment of probabilities.
This can be done by weighting less heavily against the region around the
reference and more uniformly reducing the differences across the range.

Here's an updated version:

``` python
import numpy as np
import matplotlib.pyplot as plt

# Parameters
n_samples = 10000  # Number of samples to generate
reference_angle = 30  # Example reference angle in degrees
sigma_original = 10  # Std deviation of original Gaussian peak
sigma_adjust = 180  # Increased standard deviation for adjustment

# Generate a sample distribution of angles centered around the reference angle
original_angles = np.random.normal(reference_angle, sigma_original, n_samples) % 360

# Weight adjustment function
def uniform_adjustment(theta, center, sigma):
    # Create a broader distribution that moderately favors uniformity
    adjustment = np.exp(-0.5 * ((theta - center + 180) % 360 - 180)**2 / sigma**2)
    return 1.0 / (adjustment + np.finfo(float).eps)

# Calculate moderate adjustment weights
weights = uniform_adjustment(original_angles, reference_angle, sigma_adjust)
prob_weights = weights / weights.sum()  # Normalize

# Resample according to the adjusted weights
uniform_like_samples = np.random.choice(original_angles, size=n_samples, p=prob_weights)

# Plot results
plt.figure(figsize=(10, 5))
plt.hist(original_angles, bins=360, range=(0, 360), alpha=0.5, label="Original Distribution")
plt.hist(uniform_like_samples, bins=360, range=(0, 360), alpha=0.5, label="Transformed Distribution")
plt.axvline(reference_angle, color='r', linestyle='dashed', linewidth=1, label='Reference Angle')
plt.legend()
plt.xlabel("Angle (Degrees)")
plt.ylabel("Frequency")
plt.title("Original and Transformed Angle Distributions")
plt.show()
```

### Key Adjustments:

-   **Broader Sigma (`sigma_adjust`)**: Instead of strictly focusing on
    reducing probabilities around the initial peak and reference + 180°,
    apply a broader smoothing to moderately make the distribution more
    uniform without overly biasing.
-   **Adjust Weights Gradually**: By setting `sigma_adjust` to a larger
    value (e.g., 180), we're applying a gentler adjustment, allowing
    more gradual changes across angles rather than sharply favoring any
    specific range. This should smooth the transition and help stabilize
    the tendency around reference + 180°.
-   **Normalized Probability Weights**: Ensure the weights are properly
    normalized to provide a valid probability distribution for sampling
    the angles.

These changes should help prevent excessive concentration of samples
around reference + 180°. Adjusting the `sigma_adjust` parameter can
further fine-tune the spread and smoothness of the distribution.

**\***

To extract a uniform distribution from a peaked distribution of angles,
you can follow a few potential methods. Here are a couple of approaches:

1.  **Random Sampling**: If you only need a uniform sample as opposed to
    reshaping the entire distribution into a uniform one, you can
    randomly sample from the angle space between 0 and 360 degrees,
    ignoring the shape of the original distribution.

2.  **Inverse Transform Sampling**:

    -   Calculate the CDF (Cumulative Distribution Function) of the
        original distribution.
    -   Generate uniform random numbers between 0 and 1.
    -   Use the inverse of the CDF to map these uniform numbers to your
        original distribution's range of angles. This method transforms
        the non-uniform distribution into a target distribution (in this
        case, you want the output to follow a uniform distribution
        perhaps for simplicity or computational reasons).

3.  **Histogram Equalization** (if you need a processed dataset):

    -   Treat the problem as a histogram equalization task. Convert your
        angles into a form where their cumulative histogram becomes
        approximately linear, which corresponds to a uniform
        distribution.

4.  **Data Rejection**:

    -   Use rejection sampling: generate a candidate angle uniformly
        over 0-360, and accept it with probability inversely
        proportional to the value of the original distribution’s
        probability density (e.g., if the density is high, then the
        acceptance probability is lower, balancing it to form a flat
        distribution).

Choose the method that aligns best with your specific needs (whether
it's maintaining data fidelity or simply generating
uniformly-distributed samples).

### How do I implement 2 in pyhton

To implement inverse transform sampling in Python to generate a uniform
distribution from a given non-uniform (peaked) distribution over angles,
you can follow these steps:

1.  Generate points based on the CDF of the original distribution.
2.  Use the inverse of the CDF to map uniform samples to your
    distribution.

Here's a basic implementation using numpy and scipy:

``` python
import numpy as np
from scipy.interpolate import interp1d

# Let's assume `angles` is your array of angles (in degrees) from the peaked distribution.
# Replace it with your actual data.
angles = np.random.normal(loc=180, scale=10, size=1000) % 360  # example data

# Step 1: Calculate the histogram and CDF of the original distribution
hist, bin_edges = np.histogram(angles, bins=360, range=(0, 360), density=True)
cdf = np.cumsum(hist) / np.sum(hist)

# Step 2: Generate uniform random numbers between 0 and 1
uniform_randoms = np.random.rand(len(angles))

# Step 3: Invert the CDF to map uniform randoms to angles
# Create an inverse CDF function using interpolation
inverse_cdf = interp1d(cdf, bin_edges[1:], kind='linear', fill_value='extrapolate')

# Map uniform randoms to the angle domain using the inverse CDF
uniform_angles = inverse_cdf(uniform_randoms)

# `uniform_angles` now follows a uniform distribution between 0 and 360
print(uniform_angles)
```

This script will give you a set of angles where the original peaked
distribution is transformed into a uniform distribution using inverse
transform sampling. Adjust the binning appropriately based on your
dataset size and characteristics for better accuracy.

### What I need is the indices of the angles that were kept from the original vector of angles

To extract indices of the original angles such that their distribution
becomes uniform, you can employ a technique similar to inverse transform
sampling to repeatedly select indices. Here's an approach to achieve
that:

1.  **Calculate the Histogram and CDF**: Compute the CDF of your
    original angle data.
2.  **Inverse Sampling on the CDF**: Use the CDF to select indices such
    that the resulting distribution is uniform.

Here's how you can implement this in Python:

``` python
import numpy as np
from scipy.interpolate import interp1d

# Let's assume `angles` is your array of angles (in degrees) from the peaked distribution.
# Replace it with your actual data.
angles = np.random.normal(loc=180, scale=10, size=1000) % 360  # Example data

# Sort angles and get original indices
sorted_indices = np.argsort(angles)
sorted_angles = angles[sorted_indices]

# Step 1: Calculate the histogram and CDF of the sorted distribution
hist, bin_edges = np.histogram(sorted_angles, bins=360, range=(0, 360), density=True)
cdf = np.cumsum(hist) / np.sum(hist)

# Step 2: Generate uniform random numbers between 0 and 1
uniform_randoms = np.random.rand(len(angles))

# Step 3: Use the CDF to select indices
# Create an inverse CDF (for index selection) function using interpolation
inverse_cdf = interp1d(cdf, bin_edges[1:], kind='linear', bounds_error=False, fill_value=(0, 360))

# Map uniform randoms to indices
selected_bins = np.digitize(inverse_cdf(uniform_randoms), bin_edges) - 1
selected_bins = np.clip(selected_bins, 0, len(hist) - 1)  # Ensure valid bin indices
selected_indices = np.unique(sorted_indices[selected_bins])  # Unique indices

# `selected_indices` now contains indices that would help create a uniform distribution
print(selected_indices)
```

Explanation:

-   We first sort the angles and keep track of their original indices.
-   The CDF is computed on the sorted data.
-   We perform inverse sampling by generating uniform random values and
    interpolating via the inverse CDF.
-   The result is the set of indices from the sorted angles that can be
    used to yield a uniformly distributed set of angles.

Note: This technique attempts to achieve uniformity; however, due to
discrete nature and histogram binning, results might not be perfectly
uniform unless adjusted for bin specifics and domain constraints.

### This technique does not seem to be working. the distribution of selected angles is still peaked

Let's refine the approach to better balance the distribution. The goal
is to select data points such that the resulting set of angles is
uniformly distributed, which can be challenging due to sampling
variability and the discrete nature of data. Here's one strategy to try:

1.  **Equal-width Bin Selection**: Divide the angle range into
    equal-width bins and randomly pick angles from each bin.

2.  **Random Sampling from Bins**:

    -   Bin the angles.
    -   Randomly sample equal numbers of angles from each bin, if
        possible.

Here's a Python implementation of this strategy:

``` python
import numpy as np

# Let's assume `angles` is your array of angles (in degrees) from the peaked distribution.
angles = np.random.normal(loc=180, scale=10, size=1000) % 360  # Example data

# Number of bins (you can adjust this based on your data size and spread)
num_bins = 20

# Step 1: Bin the data
bin_edges = np.linspace(0, 360, num_bins + 1)
binned_indices = np.digitize(angles, bin_edges) - 1

# Step 2: Select equal number of samples (or proportionate to bin population) from each bin
selected_indices = []

for i in range(num_bins):
    # Get indices of angles in the current bin
    indices_in_bin = np.where(binned_indices == i)[0]

    # Sample a fixed number or all if fewer samples are available
    if len(indices_in_bin) > 0:
        num_to_sample = min(5, len(indices_in_bin))  # Adjust `5` as needed
        sampled_indices = np.random.choice(indices_in_bin, num_to_sample, replace=False)
        selected_indices.extend(sampled_indices)

selected_indices = np.array(selected_indices)

# `selected_indices` contains indices of the original angles in the desired uniform manner
print(selected_indices)
```

**Explanation**:

-   **Binning**: Divide the data into a specified number of bins. This
    defines "horizontal" slices of the data, giving a structure to
    select from.

-   **Random Sampling from Bins**: From each bin, select a fixed number
    of samples (e.g., 5 per bin, adjustable). This ensures that each
    section of your distribution has a roughly equal representation in
    the sampled set.

-   **Final Indices**: Compile selected indices. The resulting set of
    angles should approximate a more uniform distribution when plotted.

Note that the number of bins and the number of samples per bin can be
adjusted according to your data size and distribution characteristics to
achieve a better uniformity balance.

**\***

### Does bins needs to be 360 in hist, bin<sub>edges</sub> = np.histogram(sorted<sub>angles</sub>, bins=360, range=(0, 360), density=True)

The choice of 360 bins in the histogram calculation is related to how
finely you want to discretize your angle space. In theory, using 360
bins corresponds to separating each degree individually, which might be
appropriate given the range of angles from 0 to 360.

However, whether you need 360 bins depends on:

1.  **Data Density**: If you have a large number of data points (e.g.,
    thousands of angles) and they cover the range densely, 360 bins
    could be appropriate. For sparse data, fewer bins might help prevent
    empty or near-empty bins.

2.  **Desired Resolution**: If you need a finer granularity in
    representing the angle distribution, you might opt for more bins.
    Conversely, fewer bins (e.g., 36 or 72) could be used for broader
    grouping of angles.

3.  **Computational Considerations**: More bins increase computational
    complexity and may require more adjustments in mapping.

You might start with a smaller number of bins and adjust based on the
resolution your analysis requires. If uniformity isn't achieved to
satisfaction, tweaking the number of bins and analyzing the results
could help optimize the outcomes.

**\***

### Is sorting necessary?

No, sorting isn't strictly necessary if you only want to extract indices
creating a uniform distribution. However, sorting is a common step for
clarity and ease when calculating cumulative distributions in a
straightforward manner since it helps easily map sample values back to
their original indices.

If you want to avoid sorting and directly work with indices, here's an
adjusted approach:

1.  Compute the histogram and CDF directly from unsorted data.
2.  Using uniform random values, directly sample indices from the CDF to
    achieve a uniform selection.

Here's how you could implement it:

``` python
import numpy as np
from scipy.interpolate import interp1d

# Example data: angles in degrees from a peaked distribution
angles = np.random.normal(loc=180, scale=10, size=1000) % 360  # Replace with your actual data

# Step 1: Calculate the histogram and CDF of the original distribution
hist, bin_edges = np.histogram(angles, bins=360, range=(0, 360), density=False)
cdf = np.cumsum(hist) / np.sum(hist)

# Step 2: Generate uniform random numbers between 0 and 1
uniform_randoms = np.random.rand(len(angles))

# Step 3: Map uniform random numbers to indices in the original array using the inverse CDF
inverse_cdf = interp1d(cdf, np.arange(len(cdf)), kind='nearest', bounds_error=False, fill_value='extrapolate')
selected_bins = inverse_cdf(uniform_randoms).astype(int)

# Find the original indices corresponding to these bins
selected_indices = []
for b in selected_bins:
    # Find the indices for angles in this bin
    bin_indices = np.where((angles >= bin_edges[b]) & (angles < bin_edges[b+1]))[0]
    if len(bin_indices) > 0:
        selected_indices.append(np.random.choice(bin_indices))

# Convert selected indices to a numpy array and remove duplicates
selected_indices = np.unique(selected_indices)

print(selected_indices)
```

Explanation of changes:

-   The key difference here is that rather than sorting the angles and
    then referencing indices, we calculate the CDF with the data in
    place and sample from it to get indices.
-   Instead of sorting, we rely on identifying the bin each uniform
    random number corresponds to, and then we select indices directly
    based on those bins.
-   We then resolve specific indices within each bin using sampling,
    ensuring unique selections.

This approach should provide you with a set of indices that can sample a
uniform-like distribution from your original data.

**\***

### You are a computational neuroscientist. You have a dataset composed of neuronal activities. Neurons are selective to angular locations and you are trying to decode the neurons preferred location given the fact that a non uniform distribution of stimulus is used at each trial.

You came up with this to account for the non uniform dist

``` ipython
import numpy as np
from sklearn.linear_model import RidgeCV

def generate_balanced_subsample(firing_rates, angles_rad, num_bins=96, sample_size=0.5):
    # Discretize the angles into bins
    bins = np.linspace(0, 2 * np.pi, num_bins + 1)
    digitized = np.digitize(angles_rad, bins, right=True)

    # List to store subsampled indices
    subsample_indices = []

    # Sample equally from each bin
    for b in range(1, len(bins)):
        bin_indices = np.where(digitized == b)[0]
        # Sample a balanced number from each bin (floor of the sample_size proportion)
        num_to_sample = int(sample_size * len(bin_indices))
        if num_to_sample > 0 and len(bin_indices) > 0:
            subsample_indices.extend(np.random.choice(bin_indices, size=num_to_sample, replace=False))

    # Convert to numpy array
    subsample_indices = np.asarray(subsample_indices)
    # print(subsample_indices.shape)

    # Create subsample
    X_sub = firing_rates[subsample_indices]
    y_sub = angles_rad[subsample_indices]

    return X_sub, y_sub

def average_neuron_selectivity_uniform(firing_rates, angles_rad, num_subsamples=1000, num_bins=96, sample_size=0.5):
    coeffs_cos = []
    coeffs_sin = []

    for _ in range(num_subsamples):
        # Generate a balanced subsample
        X_sub, y_sub = generate_balanced_subsample(firing_rates, angles_rad, num_bins, sample_size)

        # Prepare targets for cosine and sine components
        Y_sub = np.column_stack((np.cos(y_sub), np.sin(y_sub)))
        # Train Ridge regression
        reg = RidgeCV()
        reg.fit(X_sub, Y_sub)

        # print(X_sub.shape, y_sub.shape, reg.coef_.shape)

        # Store coefficients
        coeffs_cos.append(reg.coef_[0])  # Cosine coefficients
        coeffs_sin.append(reg.coef_[1])  # Sine coefficients

    # Average coefficients
    avg_coeffs_cos = np.mean(coeffs_cos, axis=0)
    avg_coeffs_sin = np.mean(coeffs_sin, axis=0)

    predicted_angles_rad = np.arctan2(avg_coeffs_sin, avg_coeffs_cos)
    return np.degrees(predicted_angles_rad) % 360
```

But I believe that the subsampling approach is not solving the imbalance
problem

Your approach addresses the imbalance by generating balanced subsamples,
but it's true that subsampling might not fully solve the problem or be
the most efficient solution. Here are a few potential concerns and
alternatives:

1.  **Data Reduction**: By subsampling, you effectively discard a
    significant portion of your data, which might lead to a loss of
    valuable information, especially if some bins are already sparsely
    populated.

2.  **Limited Re-sampling**: If certain bins are very underrepresented,
    random sampling might not capture the underlying distribution well
    enough, even after many subsamples. It could lead to high variance
    in the estimated coefficients.

3.  **Alternatives to Subsampling**:

    -   **Weighting**: Instead of balancing through subsampling, apply
        weights to the samples during training, inversely proportional
        to their frequency. This way, rare samples have a greater
        influence without discarding data.
    -   **Data Augmentation**: Create synthetic samples for
        underrepresented bins through techniques like bootstrapping or
        generating new samples using the statistics of existing data.
    -   **Smoothed Regression**: Incorporate priors or regularization
        terms that account for known biases or patterns in the data
        distribution directly in the model.

4.  **Cross-validation and Diagnostics**: Always validate the
    performance improvements through cross-validated prediction metrics
    and consider visual diagnostics to ensure the approach truly
    addresses the imbalance.

If weighting interests you, many machine learning libraries, including
scikit-learn, provide support for weighting during model fitting, which
might be beneficial in your case.

**\*** Your approach aims to estimate the preferred angular location of
neurons by accounting for a non-uniform distribution of stimuli across
trials. Here’s a brief explanation of your method:

1.  **Balanced Subsampling:**
    -   You discretize stimulus angles into equal-width bins.
    -   For each bin, you sample a proportion (indicated by
        `sample_size`) of instances to ensure balanced representation,
        handling the non-uniform distribution.
2.  **Cosine and Sine Regression:**
    -   You prepare targets as cosine and sine components of the
        subsampled angles.
    -   Using Ridge regression, you fit these components to the
        corresponding firing rates, obtaining coefficients indicative of
        selectivity.
3.  **Coefficient Averaging:**
    -   You perform subsampling and regression multiple times
        (`num_subsamples`), collecting coefficients for both components
        across iterations.
    -   The final preference angle for each neuron is computed by
        averaging these coefficients, and you convert these averaged
        values back to angles using `arctan2`.

This technique helps manage variability due to stimulus imbalance by
emphasizing equal representation across different stimulus geometries,
hence targeting a more robust estimation of neuronal preferences. If you
have any specific questions or concerns about this method, feel free to
ask!

### How could this be improved?

Here are a few suggestions to potentially improve or extend your current
approach:

1.  **Cross-Validation:**
    -   Implement cross-validation to better estimate the robustness of
        your ridge regression model. This will help ensure the model
        generalizes well and that predicted angles are not due to
        overfitting.
2.  **Hyperparameter Tuning:**
    -   Expand beyond the default hyperparameter selection of RidgeCV by
        evaluating different regularization strengths, or use
        cross-validated methods like GridSearchCV for fine-tuning.
3.  **Bootstrap Aggregation (Bagging):**
    -   In addition to averaging coefficients, consider bootstrapping
        the selection of samples and combining predictions for better
        stability and more robust estimates.
4.  **Increase Sample Size Dynamically:**
    -   Instead of a fixed proportion (as defined by `sample_size`),
        adaptively determine the sample size based on the total
        available samples in each bin to maximize the data usage while
        maintaining balance.
5.  **Neuron-Specific Adjustments:**
    -   You could explore neuron-specific adjustments or weightings if
        you suspect particular neurons have differing levels of
        reliability or selectivity.
6.  **Non-Linear Models:**
    -   Experiment with non-linear models, such as kernel ridge
        regression or neural networks, if neuronal responses exhibit
        non-linearities not captured by linear regression.
7.  **Data Augmentation:**
    -   If feasible, augment the dataset using techniques such as
        jittering (adding small noise) to potentially improve model
        robustness and performance.
8.  **Feature Engineering:**
    -   Explore additional features that may provide predictive power,
        such as temporal dynamics if your dataset is time-resolved, or
        interaction terms between neurons.
9.  **Alternative Binning Strategies:**
    -   Experiment with adaptive binning techniques or clustering for
        stimulus angles that account for uneven distribution and
        differences in angle density more naturally.
10. **Consider Using Circular Statistics:**
    -   Given the circular nature of the data, consider incorporating
        circular statistical methods or models that can directly handle
        such data, like circular-linear regression techniques.

Implementing any of these ideas could help refine your model, improve
its accuracy, and contribute to the robustness of your preferred
location estimates.

### How would you implement

1.  **Increase Sample Size Dynamically:**

To implement the idea of dynamically adjusting the sample size based on
the total available samples in each bin, the sampling should consider
the number of stimuli in each bin relative to the overall dataset,
potentially maximizing the available data while maintaining balance.
Here’s how you might adjust the existing code:

``` python
import numpy as np
from sklearn.linear_model import RidgeCV

def generate_balanced_subsample_dynamic(firing_rates, angles_rad, num_bins=96):
    # Discretize the angles into bins
    bins = np.linspace(0, 2 * np.pi, num_bins + 1)
    digitized = np.digitize(angles_rad, bins, right=True)

    # List to store subsampled indices
    subsample_indices = []

    # Calculate the intended sample size from global sample size as a proportion
    global_sample_size = len(angles_rad) // num_bins

    # Sample proportionally from each bin
    for b in range(1, len(bins)):
        bin_indices = np.where(digitized == b)[0]
        # Calculate a sample size for each bin as the minimum between what can be taken and the desired global target per bin
        num_to_sample = min(global_sample_size, len(bin_indices))
        if num_to_sample > 0:
            subsample_indices.extend(np.random.choice(bin_indices, size=num_to_sample, replace=False))

    # Convert to numpy array
    subsample_indices = np.asarray(subsample_indices)

    # Create subsample
    X_sub = firing_rates[subsample_indices]
    y_sub = angles_rad[subsample_indices]

    return X_sub, y_sub

def average_neuron_selectivity_uniform(firing_rates, angles_rad, num_subsamples=1000, num_bins=96):
    coeffs_cos = []
    coeffs_sin = []

    for _ in range(num_subsamples):
        # Generate a balanced subsample
        X_sub, y_sub = generate_balanced_subsample_dynamic(firing_rates, angles_rad, num_bins)

        # Prepare targets for cosine and sine components
        Y_sub = np.column_stack((np.cos(y_sub), np.sin(y_sub)))

        # Train Ridge regression
        reg = RidgeCV()
        reg.fit(X_sub, Y_sub)

        # Store coefficients
        coeffs_cos.append(reg.coef_[0])  # Cosine coefficients
        coeffs_sin.append(reg.coef_[1])  # Sine coefficients

    # Average coefficients
    avg_coeffs_cos = np.mean(coeffs_cos, axis=0)
    avg_coeffs_sin = np.mean(coeffs_sin, axis=0)

    predicted_angles_rad = np.arctan2(avg_coeffs_sin, avg_coeffs_cos)
    return np.degrees(predicted_angles_rad) % 360
```

### Key Changes and Explanation:

1.  **Dynamic Sampling in Each Bin:**
    -   Instead of a fixed `sample_size` proportion, we calculate a
        `global_sample_size`, which is the average intended samples per
        bin (total data divided by number of bins).
2.  **Adaptive Selection:**
    -   For each bin, we select samples as the minimum of available data
        in the bin or the `global_sample_size`, ensuring the maximum
        data usage available in each bin while maintaining relative
        balance.

This approach preserves balance while maximizing sample usage from bins
that have more available data, potentially improving the estimated
neuron selectivity by better exploiting the dataset.

**\***

1.  **Alternative Binning Strategies:**

To implement alternative binning strategies, you can use density-based
or adaptive binning methods that account for the distribution of your
angular data more naturally. One approach is to use techniques like
clustering to find natural groupings in the data. Here’s how you could
do this using a simple clustering approach:

### Implementation Steps

1.  **Use Clustering for Binning:**

    You can apply a clustering algorithm, such as K-Means, to identify
    clusters of angular locations. This method will adaptively determine
    bin sizes based on data density rather than using fixed-width bins.

2.  **Clustering Implementation:**

    ``` python
    import numpy as np
    from sklearn.cluster import KMeans
    from sklearn.linear_model import RidgeCV

    def generate_clustered_subsample(firing_rates, angles_rad, num_clusters=10, sample_size=0.5):
        # Reshape angles for clustering
        angles_rad_reshaped = angles_rad.reshape(-1, 1)

        # Apply K-Means clustering
        kmeans = KMeans(n_clusters=num_clusters, random_state=42)
        clusters = kmeans.fit_predict(angles_rad_reshaped)

        # List to store subsampled indices
        subsample_indices = []

        # Sample equally from each cluster
        for cluster_id in range(num_clusters):
            cluster_indices = np.where(clusters == cluster_id)[0]
            num_to_sample = int(sample_size * len(cluster_indices))
            if num_to_sample > 0 and len(cluster_indices) > 0:
                subsample_indices.extend(
                    np.random.choice(cluster_indices, size=num_to_sample, replace=False)
                )

        # Convert to numpy array
        subsample_indices = np.asarray(subsample_indices)

        # Create subsample
        X_sub = firing_rates[subsample_indices]
        y_sub = angles_rad[subsample_indices]

        return X_sub, y_sub

    def average_neuron_selectivity_clustered(firing_rates, angles_rad, num_subsamples=1000, num_clusters=10, sample_size=0.5):
        coeffs_cos = []
        coeffs_sin = []

        for _ in range(num_subsamples):
            # Generate a clustered subsample
            X_sub, y_sub = generate_clustered_subsample(firing_rates, angles_rad, num_clusters, sample_size)

            # Prepare targets for cosine and sine components
            Y_sub = np.column_stack((np.cos(y_sub), np.sin(y_sub)))

            # Train Ridge regression
            reg = RidgeCV()
            reg.fit(X_sub, Y_sub)

            # Store coefficients
            coeffs_cos.append(reg.coef_[0])  # Cosine coefficients
            coeffs_sin.append(reg.coef_[1])  # Sine coefficients

        # Average coefficients
        avg_coeffs_cos = np.mean(coeffs_cos, axis=0)
        avg_coeffs_sin = np.mean(coeffs_sin, axis=0)

        predicted_angles_rad = np.arctan2(avg_coeffs_sin, avg_coeffs_cos)
        return np.degrees(predicted_angles_rad) % 360
    ```

### Explanation

-   **Clustering with K-Means:**
    -   The angles are reshaped and clustered into a specified number of
        clusters. Each cluster represents a group of angles that are
        close to each other, allowing for naturally defined "bins".
-   **Subsampling from Clusters:**
    -   Similar to your original subsample method, you sample equally
        from each cluster, ensuring a balanced representation of
        neuronal firing rates for the angles within each cluster.
-   **Robust Coefficient Estimation:**
    -   With the new subsampled data, you continue to apply Ridge
        regression and average the coefficients across subsamples.

### Benefits

-   **Dynamic Adaptation:**
    -   The clusters naturally adapt to the distribution of your data,
        potentially improving the balance and robustness of your
        preferred location estimation.
-   **Flexibility:**
    -   This method can handle cases where angle distributions are
        highly non-uniform by concentrating samples in denser areas.

Feel free to modify parameters like `num_clusters` based on your dataset
and analysis needs.

**\***

1.  **Bootstrap Aggregation (Bagging):**

To implement bootstrap aggregation (bagging) for your neuronal activity
data, you can repeatedly sample with replacement from your dataset to
build multiple models and then average their predictions. Here is how
you might incorporate bagging into your existing pipeline:

``` python
import numpy as np
from sklearn.linear_model import RidgeCV

def bootstrap_aggregation_predict(firing_rates, angles_rad, num_models=100, num_bins=96, sample_size=0.5):
    coeffs_cos = []
    coeffs_sin = []

    for _ in range(num_models):
        # Discretize angles into bins
        bins = np.linspace(0, 2 * np.pi, num_bins + 1)
        digitized = np.digitize(angles_rad, bins, right=True)

        # Create a bootstrap sample
        boot_indices = np.random.choice(len(angles_rad), size=len(angles_rad), replace=True)
        boot_firing_rates = firing_rates[boot_indices]
        boot_angles_rad = angles_rad[boot_indices]

        # Balance the sample across angle bins
        X_sub, y_sub = generate_balanced_subsample(boot_firing_rates, boot_angles_rad, num_bins, sample_size)

        # Prepare targets for cosine and sine components
        Y_sub = np.column_stack((np.cos(y_sub), np.sin(y_sub)))

        # Train Ridge regression
        reg = RidgeCV()
        reg.fit(X_sub, Y_sub)

        # Store coefficients
        coeffs_cos.append(reg.coef_[0])  # Cosine coefficients
        coeffs_sin.append(reg.coef_[1])  # Sine coefficients

    # Average coefficients
    avg_coeffs_cos = np.mean(coeffs_cos, axis=0)
    avg_coeffs_sin = np.mean(coeffs_sin, axis=0)

    predicted_angles_rad = np.arctan2(avg_coeffs_sin, avg_coeffs_cos)
    return np.degrees(predicted_angles_rad) % 360

# Ensure reusability by defining the helper function outside.
def generate_balanced_subsample(firing_rates, angles_rad, num_bins=96, sample_size=0.5):
    bins = np.linspace(0, 2 * np.pi, num_bins + 1)
    digitized = np.digitize(angles_rad, bins, right=True)
    subsample_indices = []

    for b in range(1, len(bins)):
        bin_indices = np.where(digitized == b)[0]
        num_to_sample = int(sample_size * len(bin_indices))
        if num_to_sample > 0 and len(bin_indices) > 0:
            subsample_indices.extend(np.random.choice(bin_indices, size=num_to_sample, replace=False))

    subsample_indices = np.asarray(subsample_indices)
    X_sub = firing_rates[subsample_indices]
    y_sub = angles_rad[subsample_indices]

    return X_sub, y_sub

# Usage
# predicted_angles = bootstrap_aggregation_predict(firing_rates, angles_rad)
```

### Key Components:

1.  **Bootstrap Sampling:** We select subsets of the data with
    replacement to create multiple models.
2.  **Balanced Subsample:** For each bootstrap sample, we maintain
    balance across angle bins before fitting the model.
3.  **Model Building:** Fit a Ridge regression model on each balanced
    sample and collect the cosine and sine coefficients.
4.  **Aggregation:** Average the coefficients from all models to compute
    the final predicted angles.

This approach can improve robustness by reducing variance and capturing
more diverse aspects of the data, thanks to varied training sets
achieved through bootstrapping.

**\***