# Setup Google Colab Infrastructure

## Python Environment

When you open a new notebook in google colab, it begins a brand-new Python session with only a few packages pre-installed. It can be a pain (and waste of several minutes) to reinstall necessary packages each time. To avoid reinstalling each time, this notebook sets up a persistent "environment" that can be re-used on colab. 

To do so, we first "mount" our university Google Drive (i.e. we give colab permission to read and write from google drive). You need to login **using your university google account** and grant permissions. Once that's finished, we can navigate the Drive folder with python in colab

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Now we create a folder where we store our python packages

In [None]:
import os
package_dir = '/content/drive/MyDrive/uppp_214/python_packages'
os.makedirs(package_dir, exist_ok=True)

The last step is to install the packages we use for the course using `pip`, and setting the target (i.e. where the packages will be installed) to the folder we just created. This will take a few minutes to run and may issue a few warnings, none of which are critical

In [None]:
!pip install pysal geosnap statsmodels lonboard osmnx geodatasets --target=$package_dir

Finally, to use these packages, we insert the folder where they are stored first in the python path. This means when we execute an `import` statement, it will search for packages in that directory first

In [None]:
import sys
sys.path.append(package_dir)

The setup is now complete!

To use these packages in any other notebook without reinstalling, we will need to mount the drive and update the path; that is, in every new notebook we need to execute

```python
import sys
from google.colab import drive
package_dir = '/content/drive/MyDrive/uppp_214/python_packages'
drive.mount('/content/drive')
sys.path.append(package_dir)
```

Then we can proceed as normal. If you want to write code in this notebook, you can do so immediately.  Note, however, that these packages will need to take a few minutes to upload to your google drive before they are accessible in other notebooks because each colab notebook operates in its own container.  That means if you want to run the `01_test_colab_pkgs` notebook, you should give it 5 or so.

## Data Resources

The `geosnap` provides access to a [wide variety](https://open.quiltdata.com/b/spatial-ucr) of geospatial datasets focused on demographic, economic, institutional, environmental and land-use characteristics for use in applied research or pedagogical examples. In the general case or for one-off uses, all datasets can be streamed directly over the web. For repeated use (and quicker access), the datasets can also be stored in a local drive. When the data are stored on disk, and the user does not define a specific location, the [`platformdirs`](https://platformdirs.readthedocs.io/en/latest/) package is used to determine a reasonable storage location. Alternatively, a user can define a specific directory where datasets can be placed. In the latter scenario, it's important for the user to specify this directory *each time* they instantiate a `geosnap.DataStore` instance, otherwise geosnap will search for the default location and, finding no data, it will revert to streaming.

For consistent use on google colab, we will store all necessary data in the course folder (created above) in the user's google drive (in a directory called `data/geosnap`)

In [None]:
data_dir = '/content/drive/MyDrive/uppp_214/data/geosnap'
os.makedirs(data_dir, exist_ok=True)

In [None]:
import geosnap as gsp

In [None]:

# store ACS
gsp.io.store_acs(years=2021, level='tract', data_dir=data_dir)

# store EPA EJSCREEN
gsp.io.store_ejscreen(years=2021, level='tract', data_dir=data_dir)

# store SEDA school achievement data
gsp.io.store_seda(data_dir, accept_eula=True)

# store school locations
gsp.io.store_nces(dataset='schools', data_dir=data_dir, years='1516')