# Collaborative data science

In this notebook, we'll learn about some ways to work on data science projects in a collaborative setting.

---

In the real world, we often work with a team of people on data science tasks. It's beneficial to quickly share your work with relevant people, and incorporate timely feedback.

At minimum, you may want to share files (notebooks, scripts, etc.), environments, and dashboards. Let's look at one convenient way (we're biased!) to do this. :)

Also, you have all been _collaborating_ through the Jitsi chat already!

## Share files

There is a `shared` folder present by default at the root of the file browser.

Files and directories in this folder can be viewed by everyone in your "group", in our case the group has all the SciPy participants. :)

### 💻 Your turn: Create a new text file with your name in the title and share it with the person next to you!

## Share environments

We can use and share `environment.yaml` or `requirements.txt` files with team members, however environments created with these files aren't always reproducible, especially if library versions aren't pinned or if you make changes to your environment as you work.

The safest way to ensure reproducibility is by pinning exact versions of libraries in the dependency tree, in conda you can do this with lockfiles. Here is the lockfile for the `data-of-unusual-size` environment we have been using today: https://scipy.quansight.dev/conda-store/api/v1/build/3/lockfile/

### (Optional) Demonstrate environment creation

If we have time, the instructors will demonstrate how to create and share a new environment with a tool called `conda-store`.

Instructors, ensure the new environment has `numpy`, `pandas`, and `xarray`.

### (Optional) 💻 Your turn: Change your kernel to use the newly created environment to run the following cell 

In [None]:
import numpy as np
import pandas as pd
import xarray as xr

data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]})
data.plot()

## Data science platforms

### JupyterHub

A project by the Jupyter team that provides deploy-able Jupyter Notebook/Lab IDEs for collaboration.

> JupyterHub brings the power of notebooks to groups of users. It gives users access to computational environments and resources without burdening the users with installation and maintenance tasks. Users - including students, researchers, and data scientists - can get their work done in their own workspaces on shared resources which can be managed efficiently by system administrators.
> 
> ~ [Project Jupyer](https://jupyter.org/hub)

### Nebari

[Nebari](https://nebari.dev) is an open source project that makes deploying your own JupyterHub faster, removes the need for DevOps expertise, and ships with sane defaults (Dask, dashboard sharing, environment sharing, etc.). As you already guessed/know, we have been running this tutorial on Nebari.

If you're considering a platform for your team, you can check out Nebari, as well as our friends in the ecosystem: [2i2c](https://2i2c.org/service/), [Anaconda](https://www.anaconda.com/products/enterprise), [Coiled](https://www.coiled.io/), [Domino Data Lab](https://www.dominodatalab.com/product/domino-enterprise-mlops-platform), [SaturnCloud](https://saturncloud.io/), and more!

---