## The Big Picture

The goal of this workflow is, in short, to share what we have learned. But also to provide as much context and guidance as possible so this research does not sit on a shelf somewhere.

Therefore we are going to create a knowledge repository. However, in order to do that, it will take a little bit of work upfront. You are going to have to spend some time learning some new things. I assure you though the hard work will have a great payoff. 

The workflow demonstrated here works well with both Python (like this notebook) but also with R. Infact, Quarto (our open source technical publishing system) is going to eventually replace R-markdown. 

But now here are some examples of things you can do generally with github and quarto (CLICK THE LINKS BELOW):

#### Websites

[Practical Deep Learning](https://course.fast.ai/)
[NASA-Openscapes](https://nasa-openscapes.github.io/)

#### Articles & Reports

[A Sample Title - The SocioEconomic Aspects of Stock Assessments](https://github.com/nmfs-opensci/quarto_titlepages/blob/main/example_1.pdf)
[HTML for web publishing](https://quarto-dev.github.io/quarto-gallery/articles/html/html.html)

#### Presentations

[An Educator's Perspective of the tidyverse](https://mine-cetinkaya-rundel.github.io/tidyperspective/talks/dagstat-2022.html#/title-slide)

#### Books 

[Hands-On Programming with R](https://jjallaire.github.io/hopr/)

#### Interactive Docs

[Shiny web framework for R](https://jjallaire.shinyapps.io/diamonds-explorer/)
[Jupyter interactive widgets](https://quarto.org/docs/interactive/widgets/jupyter.html)


## The Nuts and Bolts  - How it fits together

A brief overview of the way these various programs and platforms fit together.

### The Ecosystem

This ecosystem contains several different components, programs and/or environments. They are as follows:

#### For Documentation and Version Control

**Quarto**

an open-source scientific and technical publishing system. 

![](quartologo_rz.png)

>- You can create dynamic content using Python, R, Julia, and Observable.
>- publish repdoucable, production quality aticles, presentations, dashboards, websites, blocks, and books in HTML, PDF, MS Word, ePub and More.



**Github**

A platform for version control and collaboration using Git. It allows you to host and share your data repository, track changes, and collaborate with others. GitHub Pages can be used to host your Quarto-generated documentation and wiki.

![](githublogo_rz.png)

#### Python workflow

**python**

A versatile programming language widely used in data science, machine learning, and scientific computing. It has a rich ecosystem of libraries for data analysis, visualization, and machine learning, making it an essential tool for building and analyzing your data repository.

![](Pythonlogo_rz.png)

**Anaconda**

A distribution of Python and R for scientific computing and data science. It simplifies package management and deployment. Anaconda can be used to set up your data science environment and manage dependencies.

![](anacondalogo_rz.png)
 
**Jupyter**

An open-source project providing interactive notebooks for code, visualizations, and narrative text. Jupyter Notebooks are useful for data exploration, analysis, and sharing interactive reports.

![](Jupyterlogo_rz.png)

#### R workflow

**R** 

A programming language for statistical computing and graphics. It can be used for data analysis and visualization within your data repository.


![](R_logo_rz.png)

**R-Studio** 

An integrated development environment (IDE) for R. It provides a user-friendly interface for coding in R and supports R Markdown, which can be used with Quarto for creating dynamic documents.


![](RStudiologo_rz.png)


:::{.callout-note}
Anaconda is optional for R-workflow
:::


#### For File Storage

**CyVerse/AWS (Cloud Storage):**

Cloud storage solutions for hosting large files that cannot be stored on GitHub. These platforms provide URLs to access the files, enabling integration with your data repository and documentation.

![](cyverse_logo_rz.png.png)
![](awslogo_rz.png)

### How they fit together

![Ecosystem](ecosystem_simple.png){width=90%}


>- **Anaconda:** Use Anaconda to create and manage your Python and R environments, ensuring all necessary packages are installed.
>- **Jupyter:** Use Jupyter Notebooks for data analysis and creating interactive reports. These notebooks can be converted to Quarto documents.
>- **Python:** Utilize Python's extensive libraries and tools for data analysis, machine learning, and visualization within Jupyter notebooks or standalone scripts
>- **R and RStudio:** Use RStudio for R-based data analysis and creating R Markdown documents. These can also be integrated into Quarto.
>- **Quarto:** Use Quarto to compile Jupyter Notebooks and R Markdown documents into a cohesive set of documentation and reports. Quarto can generate static websites, PDFs, and more.
>- **GitHub:** Host your data repository on GitHub. Use GitHub for version control and collaboration. Host your Quarto-generated documentation and wiki on GitHub Pages for easy access and sharing.
>- **CyVerse/AWS (Cloud Storage):** Store large files that cannot be accommodated on GitHub. Use CyVerse or AWS S3 to host these files and generate URLs for access. Include these URLs in your Quarto documentation, Jupyter Notebooks, and R Markdown files to provide seamless access to the data.

## Getting Started

In the next notebook I will describe how to setup this ecosystem on your computer. Specifically using Python, Anaconda, Jupyter notebooks, Github and Quarto


:::{.callout-note}
Perhaps someone else can demonstrate an R-studio workflow and Cyverse integration
:::


[link to next page]()