## Introduction

This is a self-guided step-by-step Jupyter notebook guide that will show you how to run Python scripts saved on the ```/scripts``` directory of the [precog-data-intake](https://github.com/LeoBertini/precog-data-intake/blob/761bb9ef9dcebc816b8981579e63ca1cb8ed2f21/) repository.
The scripts will be run from within this Jupyter notebook for illustrative purposes.


To run each ```.py``` script, simply run each cell in this notebook and follow the in-cell prompts as if you were working on any ```Terminal prompt```.

You can also run equivalent commands on any ```Terminal prompt``` with slightly different semantics (shown below).

## Step 1. Check Python Environment
Type the following command to activate the virtual environment from within your Jupyter notebook server:

In [3]:
%%bash
source .venv/bin/activate

The command above is similar to running the following on the ```Terminal prompt```:
```bash 
source .venv/bin/activate
```

## Step 2. Create directory to save search results

Create a directory named ```test_search``` under your Desktop.

PS. This is equivalent to running ```> mkdir ~/Desktop/test_search ``` on a ```Terminal prompt```

In [4]:
%%bash
mkdir -p ~/Desktop/test_search

### Step 3. ESGF Catalogue sweep for ESM outputs of interest

Now run the next cell in the notebook to execute the program ```intake_CatalogueSearch.py``` and follow the in-prompt instructions:

PS. This is equivalent to running the command ```> python scripts/intake_CatalogueSearch.py``` on a ```Terminal prompt```

In [5]:
%run scripts/intake_CatalogueSearch.py




[90m         [37m5PO8$HHDBWWWWBBHK@UOPp[90m                                                                                         [39m
[90m     ![37mP&KWN0QQQQQQQ000QQQQQQ0NWHUG[90mn                                                                                     [39m
[90m   [37mFAHMQQQ0000000000000RR00000QQQ0D&g[90m                                                                                   [39m
[90m I[37mbBQQ00000000000000Q0000R00000QXPAUY4gg4SSh[90mc  '[37m2SVgggg4SSh[90m%  ;[37mpSdgggggggg[90m>  +[37mmS4gggggg[90mu   7[37mghggggggSg[90m7   L[37mghgggggggg[90mt  [39m
[90mj[37mAQQR0000000000000Q0000RRR00000H[90mn[37mMQP[90mezjz[37m3QQN[90m/  a[37mQQG[90m77zz[37m3QQW[90m+  j[37mQQE[90mo77777jj,  T[37mQQE[90mLLLLLLx  <[37mBQN[90mnLLLn[37mNQQ[90mI  %[37mMQD[90muLj7777j>  [39m
[37mEQ0RRR0RRMNMMRRRRR00Q0Q000000RQ2ZQQ4[90mT#[37my[90mw[37mEQQ[90mj  ,[37mOQQV[90mTJ[37my[90mJ[37m4QQ[90mz  +[37m$QQS[90mT#CCCCv   \[37m

KeyboardInterrupt: Interrupted by user

Now look at the ```path``` you indicated and inspect the files created. You should have the following:

- ESGF_search_<datetime_stamp>.xlsx ==> This is a Dataframe with the raw search results from all ESGF nodes.
- ESGF_search_<datetime_stamp>_<varstamp>.log ==> This is a text file with the log results from the ESGF sweep and also contains results for grid consistency tests as well as continuity of time stamps in files. 
- DF_Downloadable_XXX.xlsx ==> this is a Dataframe with the filtered and tested URLs for the variables you conducted the search for.


## Step 4. Fetch the data

The next step is to run the downloader script.

The program will download the filtered search results from the ```ESGF_search_<varstamp>.xlsx``` Dataframe.

You can indicate where you'd like files to be downloaded to or keep ```~/Desktop/search_results```  created on [Step 2](##-Step-2.-Create-directory-to-save-search-results) as your default.

Downloads will trigger in parallel, and files will be organised under a directory tree that has a directory named ```CMIP6``` at the top.

Run the following cell:

PS. This is equivalent to running the command ```> python scripts/intake_OcanVarsDL.py``` on a ```Terminal prompt```


In [6]:
%run scripts/intake_OceanVarsDL.py




[90m         [37m5PO8$HHDBWWWWBBHK@UOPp[90m                                                                                         [39m
[90m     ![37mP&KWN0QQQQQQQ000QQQQQQ0NWHUG[90mn                                                                                     [39m
[90m   [37mFAHMQQQ0000000000000RR00000QQQ0D&g[90m                                                                                   [39m
[90m I[37mbBQQ00000000000000Q0000R00000QXPAUY4gg4SSh[90mc  '[37m2SVgggg4SSh[90m%  ;[37mpSdgggggggg[90m>  +[37mmS4gggggg[90mu   7[37mghggggggSg[90m7   L[37mghgggggggg[90mt  [39m
[90mj[37mAQQR0000000000000Q0000RRR00000H[90mn[37mMQP[90mezjz[37m3QQN[90m/  a[37mQQG[90m77zz[37m3QQW[90m+  j[37mQQE[90mo77777jj,  T[37mQQE[90mLLLLLLx  <[37mBQN[90mnLLLn[37mNQQ[90mI  %[37mMQD[90muLj7777j>  [39m
[37mEQ0RRR0RRMNMMRRRRR00Q0Q000000RQ2ZQQ4[90mT#[37my[90mw[37mEQQ[90mj  ,[37mOQQV[90mTJ[37my[90mJ[37m4QQ[90mz  +[37m$QQS[90mT#CCCCv   \[37m

KeyboardInterrupt: Interrupted by user

When the program finishes running, a folder ```CMIP6``` should have been created within your ```downlaod_path``` with the data organised per model. 

### Step 5. Fetch Grid cell measures (`areacello` and `volcello`)

Run the next script to fetch corresponding grid cell measures ```areacello``` and ```volcello``` for the downloaded ESM outputs.

The program will fetch the grid cell measures and will create a new dataframe ```DF_Downloadable_<cellmeasure_stamp>.xlsx``` on the chosen ```download_path```.

Then you'll be prompted to indicate the path to this newly created dataframe, and the cell measure downloads will trigger in parallel. Files will be organised under a directory tree that has a directory ```CMIP6``` at the top. 

Run the following cell:

PS. This is equivalent to running the command ```> python scripts/intake_CellMeasuresDL.py``` on a ```Terminal prompt```

In [7]:
%run scripts/intake_CellMeasuresDL.py




[90m         [37m5PO8$HHDBWWWWBBHK@UOPp[90m                                                                                         [39m
[90m     ![37mP&KWN0QQQQQQQ000QQQQQQ0NWHUG[90mn                                                                                     [39m
[90m   [37mFAHMQQQ0000000000000RR00000QQQ0D&g[90m                                                                                   [39m
[90m I[37mbBQQ00000000000000Q0000R00000QXPAUY4gg4SSh[90mc  '[37m2SVgggg4SSh[90m%  ;[37mpSdgggggggg[90m>  +[37mmS4gggggg[90mu   7[37mghggggggSg[90m7   L[37mghgggggggg[90mt  [39m
[90mj[37mAQQR0000000000000Q0000RRR00000H[90mn[37mMQP[90mezjz[37m3QQN[90m/  a[37mQQG[90m77zz[37m3QQW[90m+  j[37mQQE[90mo77777jj,  T[37mQQE[90mLLLLLLx  <[37mBQN[90mnLLLn[37mNQQ[90mI  %[37mMQD[90muLj7777j>  [39m
[37mEQ0RRR0RRMNMMRRRRR00Q0Q000000RQ2ZQQ4[90mT#[37my[90mw[37mEQQ[90mj  ,[37mOQQV[90mTJ[37my[90mJ[37m4QQ[90mz  +[37m$QQS[90mT#CCCCv   \[37m

KeyboardInterrupt: Interrupted by user