<a href="https://colab.research.google.com/github/fbeilstein/presentations/blob/master/cosmological_problem_for_August_8_2023.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Problem description

The problem will be based on my last article, but with a different dataset. I did **not** try this dataset on my own, so this is a purely research job and nobody knows what you may discover.

The workflow is expected to be as follows:
$$\require{AMScd}
\begin{CD}
\text{CAMELS} @>\text{download}>> \text{Dataset} @>\text{Persistent Homology}>>\text{Topological Features}@>\text{Wasserstein Distance}>>\text{Distance from Parameters}@>\text{Statistics}>>\text{Result}\\
\end{CD}$$

## 1. Getting the Data
* Familiarize yourself with CAMELS dataset [LINK](https://camels.readthedocs.io/en/latest/description.html)
* We are interested in simulations with only one cosmological parameter changing at a time [LINK](https://users.flatironinstitute.org/~camels/Rockstar/SIMBA/1P/)
  - SIMBA simulation
  - Rockstar Data
  - 1P subset
* Learn how to work with 1P subset of data. The information we are interested in is located in list-files that are basically text files with columns of data. You can simply open them in text editor to understand the structure. Primarily we will be interested in $x$, $y$, and $z$ coordinates (but maybe you will come up with ideas that use any of the rest data).
* Learn how to associate parameters to each of the files you download [LINK](https://github.com/franciscovillaescusa/CAMELS/blob/master/docs/params/CosmoAstroSeed_SIMBA.txt)
* There are two cosmological parameters among given: $\Omega_m$ and $\sigma_8$. We will be primarily interested in these parameters. You may want to check in Wikipedia their physical meaning if you want.
* Each file is a large pointcloud (halo positions $x$, $y$, and $z$) in a large cube. Learn how to "chop" this cube into few smaller parts, say $8$ (each side into two parts) so that you have few different "subsimulations" for each cosmological parameters set.
* Create a dataset: a number of pointclouds (halo positions) with one cosmological parameter varying. For each value of the chosen cosmological parameter you should have few pointclouds.

## 2. Processing Topological Data

* Familiarize yourself with general workflow, i.e. what we are trying to do: check my article [LINK](https://arxiv.org/pdf/2301.09411.pdf). Your goal will be to get something that looks like figure 6. Since cosmological simulation was already performed for you, you may skip reading that part.
* Familiarize yourself with Gudhi library [LINK](https://gudhi.inria.fr/python/latest/). You should learn how to:
   - calculate some complex, say alpha-complex
   - calculate persistence intervals
   - generate persistence diagram
   - caclulate Wasserstein and Bottleneck distances between persistence diagrams
* Generate persistence intervals for each of your pointclouds
* Calculate $1$-Wasserstein distances between these sets of intervals

## 3. Processing Statistical Data

What you have now looks as follows
$$
\begin{array}{llllllll}
\text{Cosmological Parameters 1} & \xrightarrow{\text{simulation}} &\text{datapoints 1} &\xrightarrow{\text{TDA}} &\text{Persistence Diagram 1} & \searrow \\
 & & & & & \text{Wasserstein Distance} & & \\
\text{Cosmological Parameters 2} & \xrightarrow{\text{simulation}} &\text{datapoints 2} &\xrightarrow{\text{TDA}} &\text{Persistence Diagram 2} & \nearrow \\
\end{array}
$$

In some sense we are comparing cosmological parameters but in a quite convoluted way
$$
\begin{array}{llllllll}
\text{Cosmological Parameters 1} &  \searrow \\
 & \text{Wasserstein Distance} & & \\
\text{Cosmological Parameters 2} & \nearrow \\
\end{array}
$$

* Check whether the proposed workflow is good for distinguishing between simulations with different cosmological parameters.
* Check whether you are able to predict cosmological parameters given a simulation.

## 4. Open Questions

* How sensitive are TDA methods to change in cosmological parameters?
* How good can you predict cosmological parameters given a simulation?
* Is filtration by mass or any other parameter helpful?
