In [1]:
import yaml
import os

In [6]:
#YAML Generator For Single MP/OMP model

#Configures Output Path
output_dir = "./single_model_configs/"

# Ensure the output directory exists
os.makedirs(output_dir, exist_ok=True)


# The base configuration for your YAML files
base_config  = {
    "SIGNAL":{
        "N": 100000,
        "d": 300,
        "noise_level": [0, 0.01, 0.05, 0.1],
        "true_sparsity": [2, 5, 10]
    },
    "MODEL": {
        "method": "MP",
        "signal_bag_flag": True,
        "signal_bag_percent": 0.7,
        "atom_bag_percent": 0.7,
        "select_atom_percent": 0,
        "replace_flag": True,
        "agg_func": "weight"
    },
    "TEST": {
        "trial_num": 100,
    },
    "hydra": {
        "hydra_logging": {
            "level": "CRITICAL"
        },
        "job_logging": {
            "level": "CRITICAL"
        },
        "run": {
            "dir": "./outputs",
        }
    }
}
output_file = os.path.join(output_dir, "test.yaml")
with open(output_file, 'w') as f:
    yaml.dump(base_config,f)

In [None]:
"""
Here is what I am thinking about how to do this testing:

We could be testing whole many combinations of hyper parameters
But they can be classified into 3 classes: SIGNAL, MODEL, TEST, and hydra
We should start with the simplest idea: For every single combination of hyper parameters, we test 100 times
And we use ith trial_number as the seed in that trial. So that maybe we decide to do 1000 trials we do not waste 1000
it leads to next intersting question: how to recycle the data we have used?

As you can see I found a way to hash the dictionary, we can use a dictionary of SIGNAL and MODEL to do the hashing
and use that value for filename. Just to be sure even when we find the same hash value we should check hyper parameter one by one

All testing results and their corresponding hyper parameter should be stored in the .pkl file. 
Each .pkl file should correspond to (100) trials results we have got for one possible combination of (SIGNAL, TEST)

Configs are totally different bewteen single MP/OMP and bunches of MP/OMP
But i still suggest we should separate their configs and results path.

More details:
We input many of the hyper parameters in the forms of list. We should make our main function to be robust:
main.py should include all the parameters we need in all possible combinations of (100) trials we are gotta make:
So I suggest we make a class/function for testing and do the next things:
1. locate and get all combinations of hyper parameters in this yaml file and distribute the jobs
2. feed each combination to one function named run_one_possiblity(SIGNAL,MODEL,TEST): this function should iterate all (100) trials for one certain possible combination of hyper parameter
3. run_one_possiblity(SIGNAL,MODEL,TEST),this function would look into output and find if there is any previous unknown results to avoid possible calculation, runs a for loop with 2 function: Load_Data(SIGNAL,seed = trial_num),Load_Model(MODEL,seed=trial_num), Predict(),this process could use parallel computation
4. After run_one_possiblity(), we load whatever results (details left to decide, but i think only hyper parameters and MSE, sparsity recovery ratio is enough), dump everything into a file named by hash value given by hyper parameters
"""

In [None]:
### TODO: YAML Generator For Bagging of MP/OMP

In [11]:
# Hashing
def hash_dict(dictionary):
    return hash(frozenset(dictionary.items()))

my_dict = {"name": "John", "age": 25, "country": "USA"}
my_dict1 = {"age": 25, "country": "USA", "name": "John"}
a = hash_dict(my_dict)
b = hash_dict(my_dict1)

In [12]:
a

-6702312827062868661

In [13]:
b

-6702312827062868661