# Clustering methods

For the satellite data parameter constraint problem, we would like to decide which parameters can reasonably be constrained at each GeoSurfaceTimePoint (GSTP). To decide this, we consult the length scales of the Gaussian Process emulator fitted on the data at that GSTP and then select those few which are most important. Those most important parameters are the ones which can reasonably be constrained.

The problem is that different parameters are important for different GSTPs. We cluster the GSTPs according to their preferred parameters in order to obtain a few general regions over which the same parameters may be constrained. In other words, according to our clustering of GSTPs according to their corresponding length scales, we break the parameter constraint problem into a different problem for each cluster.

In [35]:
def k_means(length_scales, k=2):
    """
    Given a DataFrame of length scales, return a list of DataFrames, each DataFrame corresponding to a cluster and each
    row corresponding to a GSTP belonging to that cluster
    """
    from sklearn.cluster import KMeans
    
    my_k_means = KMeans(n_clusters=k).fit(length_scales)
    
    return([my_k_means.predict(length_scales), my_k_means.cluster_centers_])


def agglom(length_scales, k=2):
    """
    Agglomerative clustering (i.e. hierarchical clustering)
    """
    from sklearn.cluster import AgglomerativeClustering as Agglom
    
    my_agglom = Agglom(n_clusters=k).fit(length_scales)
    
    return(my_agglom.labels_)


def dbscan(length_scales, eps=3, min_samples=100):
    """
    """
    from sklearn.cluster import DBSCAN
    
    my_dbscan = DBSCAN(eps=eps, min_samples=min_samples).fit(length_scales)
    
    return(my_dbscan.labels_)

# Helpers

In [None]:
def flip_length_scales(length_scales):
    
    # Order the length scales by input name, not length so that they are comparable
    
    
    # Since the small length scales indicate more importance than large ones, we perform clustering on the vectors of 
    # inverse length scales
    flipped_lengths = [1/x for x in length_scales['length_scale']]

    return(flipped_lengths)