# Geospatial data I/O

Data analysis tasks involve reading geospatial data stored in files on disks, servers in the cloud, or recorded by sensors. Also, we need to save the results of our analysis or datasets we have generated to files. 

This lab will introduce:

* geospatial data file formats
* techniques for reading and writing from geospatial data from and to files
* Python data structures for representing vector and raster data

## Setup

### Run the labs

You can run the labs locally on your machine or you can use cloud environments provided by Google Colab. **If you're working with Google Colab be aware that your sessions are temporary and you'll need to take care to save, backup, and download your work.**

<a href="https://colab.research.google.com/github/geog3300-agri3003/coursebook/blob/main/docs/notebooks/week-3_2.ipynb" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Download data

If you need to download the data for this lab, run the following code snippet. 

In [None]:
import os
import subprocess

if "data_lab-3_2" not in os.listdir(os.getcwd()):
    subprocess.run('wget "https://github.com/geog3300-agri3003/lab-data/raw/main/data_lab-3_2.zip"', shell=True, capture_output=True, text=True)
    subprocess.run('unzip "data_lab-3_2.zip"', shell=True, capture_output=True, text=True)
    if "data_lab-3_2" not in os.listdir(os.getcwd()):
        print("Has a directory called data_lab-3_2 been downloaded and placed in your working directory? If not, try re-executing this code chunk")
    else:
        print("Data download OK")

### Install packages

If you're working in Google Colab, you'll need to install the required packages that don't come with the colab environment.

In [None]:
if 'google.colab' in str(get_ipython()):
    !pip install rioxarray
    !pip install mapclassify
    !pip install rasterio

## Import modules

In [None]:
import os
import pprint
import rioxarray as rxr
import xarray as xr
import pandas as pd
import geopandas as gpd

## Raster data

Raster data represents geographic features and variables (e.g. elevation, reflectance in UAV images) as a grid of values (pixels).

### GeoTIFF

Many geospatial datasets are based on the raster data model where values are assigned to pixels and pixels represent locations on the Earth's surface. 

A common source of raster data are remote sensing images captured by sensors on uncrewed aerial vehicles, aircraft, or satellites. Optical remote sensing images store the measured reflectance of light off the Earth's land surface in different wavelenghts. Raster remote sensing images are often stored using the <a href="https://gdal.org/drivers/raster/gtiff.html" target="_blank">GeoTIFF</a> format. 

A GeoTIFF file is based on the Tagged Image File Format (or .tiff file) which is a general format for storing image data. A TIFF file comprises:

* a **TIFF header** which includes 8 bytes that tell us that the file is in TIFF format and where in the file (what byte number / byte offset from 0) the first Image File Directory is stored.
* **Image File Directories** which contains image metadata, a pointer to where the image data is in the file (what byte number / byte offset from 0), and the location of the next Image File Directory if there is more than one image stored in the TIFF file. Metadata is stored as fields which comprise a TIFF tag and it's corresponding value.
* **Image Data** - the values associated with each pixel in the image. A single TIFF file can store multiple images.

![](https://github.com/geog3300-agri3003/coursebook/raw/main/docs/img/week-3-geotiff.jpg)

GeoTIFF files include extra information (metadata) as tags which describe the coordinate reference system (CRS) of the image data (i.e. where on the Earth's surface the image data corresponds to), spatial resolution, no data values, and various other configurations described <a href="https://gdal.org/drivers/raster/gtiff.html" target="_blank">here</a>. 

GeoTIFF files can store multiple images (i.e. raster layers) in a single file. This makes them well suited for storing remote sensing image data where each raster layer corresponds to measured reflectance in a particular wavelength. 

We can use functions provided by the <a href="https://corteva.github.io/rioxarray/stable/" target="_blank">rioxarray</a> package to read and write raster data in GeoTIFF format into Python programs.

Using <a href="https://corteva.github.io/rioxarray/stable/" target="_blank">rioxarray's</a> <a href="" target="_blank">`open_rasterio()` method</a> we can read raster data stored on disk as a GeoTIFF file into a `xarray.Dataset` or `xarray.DataArray` object in our Python program. 

The GeoTIFF file `"week-2-s2-summer-2020.tif"` in the `data_lab-3-2` stores remote sensing data covering a field in Western Australia. The remote sensing data was captured by the <a href="https://sentinel.esa.int/web/sentinel/missions/sentinel-2" target="_blank">European Space Agency's Sentinel-2 satellite</a> (10 m spatial resolution for red, green, blue, and near infrared bands). This is the path to the GeoTIFF file:

In [None]:
# path to the GeoTIFF file
s2_summer_path = os.path.join(os.getcwd(), "data_lab-3_2", "week-3-s2-summer-2020.tif")

#### Recap quiz

**This is the rioxarray's <a href="https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#rioxarray" target="_blank">`open_rasterio()` docs</a>. Can you use this function to read in the GeoTIFF file referenced by `s2_summer_path` to a `xarray.DataArray`**? 

**Make sure the data is read to an `xarray.DataArray` object referenced by the variable name `s2_summer`.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
s2_summer = rxr.open_rasterio(s2_summer_path)
```
</details>

### Xarray recap

<details>
    <summary>These are notes repeated from week 2 that provide an overview of xarray and its classes for storing multidimensional arrays as objects in Python programs. (click the arrow to display notes).</summary>


<a href="" target="_blank">Xarray</a> is a Python package that builds on top of NumPy's array-based data structures, but provides extra tools and functions that are useful for working with geospatial and Earth Science datasets. For example, `xarray.DataArray` data structures are objects that store multidimensional arrays of raster values and also store metadata information that describe the raster values. 

`xarray` also provides convenient functions for reading raster data from geospatial data files on disk into memory as `xarray.DataArray` objects which we can use in our Python programs while retaining geographic and temporal information about the raster values stored in the array.

Specifically, while a NumPy `ndarray` stores just the raster values and has some properties such as the `shape` (number of elements along each axis) and `ndim` (the dimensions of the array) it does not explicitly store any geospatial, temporal, or other geographic metadata. `xarray` solves this problem by reading raster data into an `xarray.DataArray` object with:

* `values`: the multidimensional array of raster values
* `dims`: a list of names for the dimensions of the array (e.g. instead axis 0 describing the 0th (row) dimension of an array that dimension can have a descriptive label such as longitude)
* `coordinates`: a `list` of array-like objects that describe the location of an array element along that dimension (e.g. a 1D array of longitude values describing the location on the Earth's surface for each row in the array)
* `attrs`: a `dict` of metadata attributes describing the dataset

`xarray.DataArray` objects can be stored within a larger container called `xarray.Dataset`. An `xarray.Dataset` can store many `xarray.DataArray` objects that share `dims` and `coordinates`. This is useful if you have different arrays of different `Variables` that correspond to the same locations and time-periods (e.g. you could have a separate array for temperature and precipitation values organised within a single `xarray.Dataset`).

![Schematic of an xarray.Dataset (source: xarray Getting Started)](https://docs.xarray.dev/en/stable/_images/dataset-diagram.png)

**Why is `xarray` useful for geospatial data?** 

* The `dims` and `coordinates` of an `xarray.DataArray` mean we can subset values from an array using latitude, longitude, time, or whatever a coordinate describes; we're not just restricted to subsetting values based on their index location within an array
* `xarray.Dataset` objects provide a container to store multidimensional arrays (e.g. many variables and time points) that are common in geography, Earth Sciences, meteorology, and agriculture. For example, multispectral satellite images of the same location over time; arrays of different meteorological variables)
* useful functions for reading, analysing and visualising raster or array-like geospatial data that are common across many spatial data science workflows
    
</details>

Now we've opened the GeoTIFF file `"week-3-s2-summer-2020.tif"` as an `xarray.DataArray` referenced by the variable `s2_summer`, let's explore the dataset using the attributes and methods of the `xarray.DataArray` object.

#### Recap quiz

**To answer these questions, you will need to look things up in the xarray docs. The user guide on <a href="" target="_blank">Data Structures</a> and the <a href="" target="_blank">`xarray.DataArray` API reference</a> will be useful.**

**How do `xarray.DataArray` objects store raster data in Python programs?**

<details>
    <summary><b>answer</b></summary>
    
`xarray.DataArray` objects store raster values in a multidimensional NumPy `ndarray` or array-like `values` property:

```python
s2_summer.values
```
</details>

**What are the `dims` for the `xarray.DataArray` object storing the raster values from the GeoTIFF file `"week-3-s2-summer-2020.tif"`?**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>
    
`xarray.DataArray` objects store descriptive dimension labels as a tuple under the `dims` attribute. These dimension labels are more descriptive and informative than the axis numbering of NumPy `ndarray`s.   

```python
s2_summer.dims
```
</details>

**What is the size, in terms of the number of elements along each dimension, of the `xarray.DataArray` object storing the raster values from the GeoTIFF file `"week-3-s2-summer-2020.tif"`?**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

We can return the raster values as a NumPy `ndarray` object and then access the `shape` property of the `ndarray`.

```python
s2_summer.values.shape
```
</details>

### rio accessor

The <a href="https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray-rio-accessors" target="_blank">`rio` accessor</a> from the rioxarray package extends the `xarray.DataArray` class with extra properties and methods that are useful for retrieving information about an `xarray.DataArray` object when it contains raster geospatial data. 

For example, the `rio.crs` property will return the coordinate reference system (CRS) of the raster data in the `xarray.DataArray` object which was retrieved when reading the data from the GeoTIFF file:

In [None]:
s2_summer.rio.crs

We can see the CRS of the dataset is EPSG 4326, which is representing the raster data using latitude and longitude with the WGS84 ellipsoid and datum. It can be tricky to measure distance or compute area, as the data's positional units are in decimal degrees on 3D surface as opposed to metric units on a 2D surface. Therefore, often, we want to reproject raster data a projected CRS on a 2D surface. 

#### Recap quiz

**Can you use the <a href="https://corteva.github.io/rioxarray/stable/examples/reproject.html#Reproject" target="_blank">rio.reproject() method</a> to reproject the raster data to `"EPSG:32750"` (UTM Zone 50S)?**

**Save the reprojected `xarray.DataArray` object to a variable referenced by `s2_summer_utm`.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
s2_summer_utm = s2_summer.rio.reproject("EPSG:32750")
# check it reprojected OK
s2_summer_utm.rio.crs
```
</details>

**Above, we knew the EPSG code for the CRS we wished to reproject our `xarray.DataArray` object to. What happens if we don't have this information? Can you look in the <a href="https://corteva.github.io/rioxarray/stable/modules.html" target="_blank">rio accessor docs</a> and see if you can spot a method, and implement it, that will estimate a suitable UTM CRS for an `xarray.DataArray` object?**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
est_utm = s2_summer.rio.estimate_utm_crs()
est_utm
```
</details>

### NetCDF

<a href="eccodes" target="_blank">Network Common Data Form (NetCDF)</a> is a commonly used file format in climatology, meteorology, oceanography, and geosciences and Earth sciences more broadly, where there is a need to store data as multidimensional arrays. The NetCDF file format is comprised of:

* *variables* - multidimensional arrays of data values (including 1D arrays for the dimensions with the same name as the corresponding dimension)
* *dimensions* - have a name and a value and describe the size and shape of the dataset
* *attributes* - additional metadata to describe the dataset

The NetCDF format is very similar to the `xarray.Dataset` class (the NetCDF model was the basis for designing the `xarray.Dataset` class). However, `xarray.Dataset` are designed for working with multidimensional arrays in memory from within Python programs. NetCDF is a format for storing multidimensional arrays on disk. This is a useful description of the <a href="https://pro.arcgis.com/en/pro-app/3.1/help/data/multidimensional/fundamentals-of-netcdf-data-storage.htm" target="_blank">NetCDF format</a> provided by ESRI.

Let's open a NetCDF file which stores some data from ERA5-Land reanalysis weather data covering Western Australia. The data was downloaded from the <a href="https://cds.climate.copernicus.eu/cdsapp#!/home" target="_blank">Copernicus Climate Data Store</a> with the following characteristics:

* Product type: Monthly averaged reanalysis by hour of day
* Variable: 2m temperature (t2m), Total precipitation (tp)
* Year: 2023
* Month: January, February, March, April, May, June, July, August, September, October, November, December
* Time: 00:00, 06:00, 12:00, 18:00
* Sub-region extraction: North -9째, West 109째, South -36째, East 129째
* Format: Zipped NetCDF-3 (experimental)

In [None]:
era5 = xr.open_dataset(os.path.join(os.getcwd(), "data_lab-3_2", "era-5-western-australia-monthly-2023.nc"))
era5

You might spot that `era5` references a `xarray.Dataset` object with two data variables: `tp` and `t2m`. Each of `tp` and `t2m` are `xarray.DataArray` objects that store precipitation and air temperature at 2 m respectively. We can access each of these `xarray.DataArray` objects by their name. For example, to retrieve the temperature data as a `xarray.DataArray` we use dot notation to access the `t2m` variable: 

In [None]:
era5.t2m

We can select a 2D slice of the multidimensional array of temperature values to visualise. `xarray.DataArray` objects have a `sel()` method which lets us conditionally select elements from the array using the `coordinates`. Let's slice out the 2D array corresponding to the time `"2023-12-01T12:00:00"`. We also need to select the `expver=5`, this distinguishes between the initial release of the data (`expver=5`) and a validated release (`expver=1`). Let's use `5` here.

Finally, let's use `plot.imshow()` to visualise temperature values across Western Australia for a single time slice.

In [None]:
era5.t2m.sel(time="2023-12-01T12:00:00", expver=5).plot.imshow()

#### Recap quiz

**Can you select the precipitation data from the `xarray.Dataset` `era5` and visualise precipitation data for a single time slice in June across Western Australia?**

**Use a sensible colour palette for precipitation values. You will need to set `expver=1`.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
era5.tp.sel(time="2023-06-01T18:00:00", expver=1).plot.imshow(cmap="Blues")
```
</details>

### ZARR

<a href="https://zarr.readthedocs.io/en/stable/tutorial.html" target="https://zarr.readthedocs.io/en/stable/tutorial.html">Zarr</a> is a modern cloud optimised format and specification for storing chunked and compressed multidimensional arrays. It's useful for working with big datasets which have an array-like structure and in cloud computing / web environments. 

For example, outputs from weather and climate models often comprise a large number of multidimensional arrays (e.g. dimensions for latitude, longitude, and time and arrays for a wide range of variables such as temperature, precipitation, wind speed, pressure, and so on ....). One way of conceptualising a zarr dataset is to think of it as a directory of compressed array files. 

Compressing arrays means that their storage size is reduced. This reduces the costs associated with storing large datasets and means it is quicker to transfer data over networks. 

Chunking the arrays in storage means you don't need to be able to read the entire array into memory in your Python programs. The memory limits on personal computers / laptops can prohibit reading in entire datasets stored as arrays as dataset sizes increase. 

Zarr datasets are well suited to cloud storage buckets (e.g. Google Cloud Storage, Amazon S3) and they also support parallel read and writes. This means that multiple clients (e.g. users, applications) can read data concurrently from the same zarr dataset stored in the cloud. 

In the `data_lab-3_2` directory is a zarr dataset named `nuist_cmip6_wa_2100_tasmax.zarr`. It stores an array of maximum temperature data for the year 2100 covering Western Australia extraced from the <a href="https://planetarycomputer.microsoft.com/dataset/cil-gdpcir-cc-by#overview" target="_blank">Climate Impact Lab Global Downscaled Projections for Climate Impacts Research</a>. This data is from the World Climate Research Programme's 6th Coupled Model Intercomparison Project (CMIP6) and is generated by the NUIST NESM3 model and the ssp585 scenario (this dataset has been made available with a <a href="https://github.com/ClimateImpactLab/downscaleCMIP6/blob/master/data_licenses/NESM3.txt" target="_blank">Creative Commons 4.0 International License</a>.

xarray provides functionality for reading zarr datasets (<a href="https://docs.xarray.dev/en/stable/user-guide/io.html#zarr" target="_blank">see docs here</a>).

#### Recap quiz

**Can you read in the zarr dataset `nuist_cmip6_wa_2100_tasmax.zarr` to an `xarray.Dataset`?**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
cmip6_wa = xr.open_dataset(os.path.join(os.getcwd(), "data_lab-3_2", "nuist_cmip6_wa_2100_tasmax.zarr"), engine="zarr")
print(cmip6_wa)
cmip6_wa.tasmax.sel(time="2100-01-01T12:00:00").plot()
```
</details>

## Vector data

Vector data uses point, line, or polygon geometries to represent geographic features.

### GeoPandas GeoDataFrame

A GeoPandas `GeoDataFrame` is a tabular data structure for storing vector geospatial data and is based on a regular pandas `DataFrame`. 

A `GeoDataFrame` consists of columns of non-spatial attributes similar to a pandas `DataFrame`. However, a `GeoDataFrame` also has a `geometry` column which is a `GeoSeries` of geometries for the spatial data associated with each row.

In Python, geometries are represented as <a href="https://shapely.readthedocs.io/en/stable/geometry.html" target="_blank">Shapely</a> `Geometry` objects. The `geometry` column in a GeoPandas `GeoDataFrame` is a `Series` of Shapely `Geometry` objects. Printing a <a href="https://shapely.readthedocs.io/en/stable/geometry.html" target="_blank">Shapely</a> `Geometry` object returns a Well Known Text (WKT) string description of the geometry (e.g. `POINT (0, 1)`). The `geometry` column of a `GeoDataFrame` (or a `GeoSeries`) can be viewed as a sequence of Shapely `Geometry` objects:

```
a_geoseries = [POINT (0, 1), POINT (0, 2), POINT (2, 3)]
```

Shapely provides tools for representing geometries in Python programs. It does not provide tools for reading geometry data from disk or handling attribute data. GeoPandas `GeoDataFrame` and `GeoSeries` combine Shapely's functionality for handling geometries with tools for reading and writing vector data, handling attributes, and visualisation. Therefore, we will focus on using `GeoDataFrame`s in these labs. 

Let's convert a CSV file with longitude, latitude, and elevation columns into a `GeoDataFrame`. First, let's read the CSV file in as a pandas `DataFrame`. 

In [None]:
elev_df = pd.read_csv(os.path.join(os.getcwd(), "data_lab-3_2", "week-3-bf66-elevation.csv"))
elev_df.head()

Now, let's use the longtitude and latitude columns in the `DataFrame` to convert the elevation data into a GeoPandas `GeoDataFrame`.

In [None]:
# Convert the elevation data to a spatial format
points = gpd.points_from_xy(elev_df["Lon"], elev_df["Lat"], crs="EPSG:4326")
print(f"points is of type {type(points)}")

elev_gdf = gpd.GeoDataFrame(elev_df, geometry=points)
print(f"elev_gdf is of type {type(elev_gdf)}")

elev_gdf.head()

#### Recap quiz

You will need to refer to the GeoPandas documentation to answer these questions.

<details>
    <summary><b>What does executing the <code>GeoDataFrame</code> method <a href="https://geopandas.org/en/stable/docs/reference/api/geopandas.points_from_xy.html" target="_blank"><code>points_from_xy()</code></a> return?</b></summary>
<code>points_from_xy()</code> expects a pandas <code>Series</code> objects for x and y coordinates and coordinate reference system. It will return to a GeoPandas <code>GeometryArray</code> object which stores a POINT geometry object for each x and y pair and can be converted into a <code>GeoSeries</code> object.
</details>

<p></p>
    
<details>
    <summary><b>The <a href="https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.html#geopandas.GeoDataFrame" target="_blank"><code>GeoDataFrame()</code></a> constructor function can take three arguments. What are these arguments and how do they enable the creation of a <code>GeoDataFrame</code> object?</b></summary>
    The <code>GeoDataFrame()</code> constructor function requires a pandas <code>DataFrame</code> as its first argument. This data is the non-spatial attributes. The second (optional) argument is a GeoPandas  object which stores <code>geometry</code> objects associated with each row (this could also be a string denoting the column of a <code>DataFrame</code> storing geometries. The third (optional) argument is a crs denoting the coordinate reference system for the geometry data.
</details>

### GeoJSON

JSON data (JavaScript Object Notation for its full name) is a widely used format for data interchange (exchanging data between programs, computers, clients, and servers). 

JSON represents data as key:value pairs enclosed within curly brackets `{}` (you might notice the similarity with Python's dictionary data structure). 

This is an example of JSON data:

```
{
    "title": "Introducing JSON",
    "url": "https://www.json.org/json-en.html"
}
```

The values in JSON data can include text (strings), numbers, arrays (lists), and nested JSON objects. Like the CSV format, JSON is a text based format where human readable characters are encoded in binary using UTF-8 or UTF-16.

GeoJSON is an extension of the JSON format for storing and exchanging spatial data. One of GeoJSON's uses is sending spatial data to web browsers to render as layers on web maps.

GeoJSON represents geographic features as vector data (points, lines, and polygon geometries) and can also store non-spatial attribute information.
 
Spatial data in GeoJSON are represented using `geometry` types which include:

`Point`

```
{"type": "Point", "coordinates": [1, 1]}
```

`LineString`

```
{"type": "LineString", "coordinates": [[1, 1], [2, 2]]}
```

`Polygon`

```
{"type": "Polygon", "coordinates": [[[1, 1], [2, 2], [1, 2], [1, 1]]]}
```

`Feature` types include attribute data as `properties` with `geometry` types.

```
{
    "type": "Feature",
    "geometry": {
        "type": "Point",
        "coordinates": [0, 0]
    }, 
    "properties": {
        "name": "Perth Airport"
    }
}
```

A `FeatureCollection` is a collection of `Feature`s.

```
{
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [0, 0]
            }, 
            "properties": {
                "name": "Perth Airport"
            }
        },
        {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [10, 1]
            }, 
            "properties": {
                "name": "Broome Airport"
            }
        }
        
    ]
}
```

You can read <a href="https://macwright.com/2015/03/23/geojson-second-bite.html#featurecollection" target="_blank">More than you ever wanted to know about GeoJSON</a> for a description of the GeoJSON format. 

Let's get the first two rows of the `GeoDataFrame` and convert them to GeoJSON format. `GeoDataFrame`s have a `to_json()` method which can be used to convert the data in the `GeoDataFrame` into a string object in GeoJSON format. 

In [None]:
# Get the first two rows of the elevation GeoDataFrame and convert to GeoJSON
elev_gdf_2 = elev_gdf.iloc[0:2, :]
elev_gdf_2

In [None]:
elev_geojson_2 = elev_gdf_2.to_json()
print(f"The GeoJSON data is stored as a {type(elev_geojson_2)} type object")
print("")
pprint.pprint(elev_geojson_2)

In Python, the GeoJSON data that we have generated from our `GeoDataFrame` is stored as a string object. GeoJSON (and JSON) is a text-based data format similar to CSV files. However, unlike the CSV format where data has a tabular structure with records arranged by row the GeoJSON data is based around nested objects of key:value pairs.

As we have subsetted the first two rows of our `GeoDataFrame` and converted them to GeoJSON we have generated a `FeatureCollection` object with two `Feature`s. 

Each row in the `GeoDataFrame` is converted to a `Feature` and each `Feature` has the column values per row stored in a `properties` object - these are the non-spatial attributes associated with each `Point` feature. The spatial information is stored in a `geometry` object which contains two key:value pairs. The value associated with the `type` key tells us this is a `Point` geometry and the array value associated with `coordinates` key defines the location.

Compare the tabular display of the `GeoDataFrame` to the print of the GeoJSON to see how the non-spatial and spatial information in the table structure is converted to the GeoJSON nested format. 

We can save a `GeoDataFrame` to GeoJSON using the `GeoDataFrame`'s `to_file()` method and setting the driver argument to GeoJSON.

In [None]:
# Save the elevation GeoDataFrame to a GeoJSON file
elev_gdf.to_file(os.path.join(os.getcwd(), "week-2", "week-2-bf66-elevation.geojson"), driver="GeoJSON")

Check the GeoJSON file has saved to the directory specified. As it is text data, if you click on it you should be able to inspect its format in a text editor. 

#### Recap quiz

<details>
    <summary><b>Identify two differences between the GeoJSON file format and a GeoPandas <code>GeoDataFrame</code></b></summary>
<ul>
<li>A <code>GeoDataFrame</code> is used to store geospatial data in memory for Python programs. A GeoJSON file format describes how geospatial data should be encoded when it is stored on disk.</li>
<li>A <code>GeoDataFrame</code> uses a tabular structure to organise non-spatial and spatial attributes with each row corresponding to a feature. GeoJSON format uses dictionary-like structure of key:value pairs with geographic data (coordinates) stored as values with a <code>geometry</code> key and attribute data stored as values with a <code>properties</code> key.</li>
</ul>
</details>

**You saved the elevation data to a GeoJSON file at this path: `os.path.join(os.getcwd(), "data_lab-3_2", "week-3-bf66-elevation.geojson")`.**

**Head to the <a href="https://geopandas.org/en/stable/getting_started/introduction.html#Reading-and-writing-files" target="_blank">GeoPandas documentation</a> and look up how to read files into `GeoDataFrame` objects. Read the *elevation.geojson* file into a `GeoDataFrame` referenced by the variable `elev_from_file`.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>

```python
elev_from_file = gpd.read_file(os.path.join(os.getcwd(), "data_lab-3_2", "week-3-bf66-elevation.geojson"))
elev_from_file.head()
```
    
Note, this answer assumes GeoPandas has been imported as gpd. 
</details>

**Write the data referenced by `elev_from_file` to disk as a GeoPackage.**

In [None]:
## ADD CODE HERE

<details>
    <summary><b>answer</b></summary>
    
```python
elev_from_file.to_file(os.path.join(os.getcwd(), "data_lab-3_2", "week-3-bf66-elevation.gpkg"), driver="GPKG")
```
</details>