# Entering data

You will be manipulating, working with, and visualizing a lot of different kinds of data in gis. You will translate all of those findings, insights, and interpretations into a few measurements that you will record in a spreadsheet that we collaboratively build together. I encourage you to keep your own notes as you work. For example, I have a research notebook that I maintain it all times where I give the date, and take notes as I work on whatever project I happen to be working on. I take these notes in a computer document, however you can use whatever format works for you. Feeling requirement really is that your notes be informative to you and keep track of what you are doing.

However, the end data product that we will be working towards is the collaborative spreadsheet. All of your measurements, interpretations, and notes should ultimately end up in that collaborative document.

You should have gotten a link to edit a google sheet. If you have not, [contact Eric](mailto:ebarefoo@iu.edu).

This spreadsheet is organized where rows represent a reach of river. Each individual sample that we will be working with has a unique identification number. Thus, each row begins with the identification number, and every column thereafter represents a new piece of information about this reach of river. For example, one of the columns is entitled "viable." A value of `FALSE` in that column indicates that the river reach in that row is not a viable reach for analysis. Below, I will go into detail about what factors disqualify a river reach from being viable.

Many aspects of this spreadsheet are restricted. Some columns cannot be edited by anybody except for the project lead. This is because most of those fields are automatically generated. Many of the columns that you as the mappers will interact with have restricted data entry, meaning that there are only a limited set of responses that are allowed. This is mostly to ensure that the data collection process is consistent, and that typos don't make for problems later on.

Entering data is not a glamorous or particularly fun process, but we appreciate and deeply value your contributions to this and your care and attention on this work is essential to the success of the project. Thank you thank you! 

## When to leave notes

In general, more documentation and more detail is always better. In our collaborative spreadsheet, we have three columns dedicated to these kinds of notes. They are all on the far right hand side of the spreadsheet. If you scroll to the right, you should see three columns entitled "observations", "interpretations", and "notes".

These columns support free text entry, so please leave whatever thoughts, observations, or interpretations that you may have when evaluating the data that we have. These notes are essential for us when placing the data in context and interpreting why some datapoints may be useful or not. 

Common things that you may want to note in these columns include:
 
 1. Reasons for disqualifying a river reach
 1. Concerns about the quality of data or the availability of some piece of information
 1. Notes about duplicated data.
 1. Notes about landforms and relevant topography or the geographic setting
 
Oftentimes, the only piece of information that would be available more important for a river reach are the notes. If you have determined that a reach should be disqualified, your notes are going to be essential for placing that decision in context. So please be detailed, but also concise.

```{note}
If you would like to raise a question about a particular data point, row, or observation, but it's not worth an email, then comment on the cell you have a question about. It will start a discussion between you, your fellow mappers and me without having to have a reply-all email.
```

When you are loading in data, and searching for the right rows of the database, you may find that some of the datapoints are duplicated. You may find that the datapoint you are about to fill in has one or more rows already recorded. I have tried to eliminate these automatically, but often miss them. If you see them, please leave a comment on the duplicated ID, so that I can reconcile the data. 

Now, let's talk about what kind of determinations and interpretations you're going to have to make. You should consider these a hierarchical set, which is to say that this is almost like a decision tree. At each stage in this process your work and if you make certain determinations. It's at that point that you will want to leave a note, mark down the time and note your authorship, then move on to the next reach.

## Timestamps and authorship

After you have made any notes, and when you are finished noting down all of the data that you will want to include, you should note your authorship of that data, and note the time and date. In the collaborative spreadsheet that we are working on, this information is recorded in columns **`O`** and **`P`** , (`mapper` and `date`, respectively). 

In the `mapper` column, there is a drop-down menu, and your name should appear somewhere in that drop-down menu. If it does not, [contact Eric](mailto:ebarefoo@iu.edu). Select your name.

In the `date` column, Please put a timestamp for the current time and date. You could do this manually if you so choose, but luckily there is a nice shortcut. If you select that cell and spreadsheet and then use the key combination `Ctrl  Alt  Shift  ;` the time and date should appear in that cell.

These two pieces of information should be recorded whenever you add data to a row.

## Data viability

The first determination that you should make is whether the data for the reach of river you're working on are viable or not. 

First of all, if some crucial pieces of data are missing, please leave a note in the spreadsheet and move on. Those two crucial pieces of information are the REM, and the river center line. As you have learned when we are introducing you to the research project, a river rem is a digital representation of the topography relative to the water surface on the river. That is, it is an image where every pixel represents an elevation, and that elevation is the elevation above the water surface of the river. This data is the essential data that you'll be working with. The river centre line will also be ultimately very important for you. If either of these are missing, then something is wrong, and you should leave a note in the spreadsheet and move on to the next folder.

Just to reiterate, if you open up the data folder, and either of the two following files are absent, then note their absence and move on:

```
dem_XXXXXXXXXX_prj_REM.tif
centerline_XXXXXXXXXX_prj.geojson
```

If the data are all present, there are two main disqualification criteria: (1) poor data resolution and (2) human impacts.

### Poor data resolution
(low-res-info)=

We have attempted to download data with a per-pixel resolution of 1 meter or better. In some cases, though, the data we download is lower resolution, 5 or 10 meter per mixel or more. You will learn to recognize when the data is lower resolution than desired. Here is an examples of high resolution topography, and then the same location at much lower resolution. Notice how that when the data is high resolution you can see a lot more detail and that at the same scale, features that are visible and clear in the high resolution data, our pixelated and difficult to see in the low resolution data.  Ask one of the other mappers working on the project if you are not certain if the data is of sufficient quality to move forward. In the meantime, if you are uncertain, leave a note and move on.

![high resolution data](screenshots/usgs_1m.png)

![low resolution data](screenshots/usgs_10m.png)

<!-- ![](screenshots/) -->

### Human impacts 

The other main factor that we are concerned about is when human infrastructure, or human modification substantially altered the course of rivers. Examples of this might include things like dams, trenching, or substantial urban buildup. If that the river reach you are working with passes through a human-made lake, if there is substantial construction all the way up to the riverbanks on a large portion of the river's length, or if the river's natural course has been substantially modified by people, you will want to disqualify that reach.

Here are some quick examples of each of those different things.

**Here, the river centerline is shown as a white line, and a dam is marked in neon red. (imagery)**

![dam](screenshots/dam.png)

**Here is a highly urbanized river. (imagery)**

![urban](screenshots/indy.png)

**Here is a river that was straightened. (lidar data)**

![modified](screenshots/kankakee.png)

### Summary

In both of these cases (poor data resolution and human modification), mark `FALSE` in the `viable` column, provide a note describing why you disqualified the reach, note your authorship and the date, then move on.

## River Confinement

Even if a reach is a viable candidate for analysis, we are not interested in measuring river levees in places where river levees cannot exist. The most obvious example of how this might happen is if a river is confined by the valley that it flows in. The most intuitive example of this might be the grand canyon. The colorado river (which flows through the grand canyon) cannot form a levee because the walls of the canyon are the same as the banks of the river.

Not all canyons need to be as dramatic as the grand canyon, and many rivers are restricted by their valuables. In general, when the area surrounding the river is mostly flat and its elevation is not much more than a few meters above the river surface, we call it a "floodplain". We are only interested in measuring river levees when the flood plain is about twice or three times as wide of the river. If you find that the river along most of its length in the data that you have is confined with a narrow floodplain, then mark that river as nonviable (`viable`=`FALSE`), but then mark `TRUE` in the `confined` column, note your authorship and time, then move on.

## Levee Visual Assessment

If both tests have been met---that is, the river reach is both viable and unconfined---then we would like you to make an assessment of how abundant natural levees are on the river.

To do this you were going to have to visualize the river rem and make evaluations on the map. Here's the procedure you should follow for visualizing the rem. It is at this point that you may often find and discover that the data set is of low resolution ([see above](low-res-info)).

### Visualizing lidar REMs
(visualizing_REMs)=

You can load data into qgis a number of different ways, the simplest and most intuitive is to just drag and drop it from when it's older you have into the data screen. Once you have done so however oftentimes you will not be looking at the right location to view the data. Therefore what you will want to do is to zoom to the location where the data exist. Once you have done this you will want to change how the data is displayed, said that river levees are more evident in the image. The basic sequence is as follows:

1. load REM
2. zoom to data
3. convert REM visualization to "hillshade"
4. duplicate the REM layer
5. change the blending mode of the uppermost REM layer to "multiply"
6. change the render type of the uppermost REM layer to "singleband pseudocolor"
7. change the limits of the colorbar to span from 0 to some small number (2-10 are usually good options)

Here's an example video of how to do this sequence.

<video controls width=100%>
    <source src="_static/screencasts/visualizing_rem.webm"/>
</video>

You will find that each reach requires a different color threshold to make the topography clear. 

### Identifying levees

What is a levee? What does it look like? This is a little complicated, so it is the subject of it's [own page](identifying_levees). 

### Making your assessment and rating your confidence

Once you have an assessment and you have identified whether there are levees on your stretch of river or not, we want you to make an assessment of (1) abundance and (2) confidence

**Abundance**

If you think that overall, levees are present on half or more of the length of the river, then we will say that levees are `abundant`. If levees are present, but only on less than half of the river's length, then they are `sparse`. If there are no levees, then levees are `absent`. Initially, please make this determination based on an overall assessment. 

When you have arrived at your assessment, enter it in the drop down menu in the `visual_levee_assessment` column. You will only be able to only enter one of those three values.

**Confidence**

Since identifying if a levee is present or not is a bit subjective, it is likely that sometimes you will be very uncertain of your assessment. Your confidence in the abundance level represents both your confidence in identifying levees where they are ambiguous, and also your confidence in whether the levees are abundant or sparse. _This is not a subjective measure of how confident you are in yourself, but rather whether you think the determination was ambiguous or not._ Rate your confidence as either `high`, `medium`, or `low` in the `confidence` column. Once again, there is a dropdown menu so that you can only enter one of those three values. 

## Nearest USGS gage

The USGS maintains a network of river discharge measuring stations called gages. We would also like you to locate the nearest USGS gage to the river reach you are working on. The easiest way to do this is to open up Google Earth Pro. In the Slate-Project folder, we have a folder called `gages_usa`, that contains 51 `.kml` files, one for each of the 50 US states (plus DC). To locate the nearest gage to your river reach:

1. Open Google Earth Pro (if not installed, follow instructions [here](https://www.google.com/earth/about/versions/) to download google earth pro for desktop.)
2. Drag and drop the `centerline_XXXXXXXXXXXXX.geojson` file into Google Earth Pro.
3. Identify which state the centerline lies in.
4. Drag and drop the corresponding file from `gages_usa` into Google Earth Pro.
5. Return to the location of your river centerline, and find the nearest marker indicating a USGS gage on the same river that the centerline represents. You may have to go up or downstream some distance to find it.
6. Once you have it, copy and paste the identification code (a number) and copy it into the `nearest_gage` column in the spreadsheet. 

```{attention}
If you copy and paste the number in, please ensure that it is formatted as plain text, and that if there is a zero at the front of the code, that the entire code is entered. For example, a gage identification code could be 04269000, and 4269000 is _not_ the same thing. If you enter the data and find that a leading zero disappears, try selecting the cell in the spreadsheet, clicking the `Format` menu, then selecting `Number` then `Plain Text` to convert it. See below for a screenshot. You can often circumvent this by copying and pasting with `Ctrl Shift V` instead of simply `Ctrl V`.

![](screenshots/format_text.png)
```

## Valley width

If we have asked you to measure the valley width, open and visualize the DEM like above, and then play around with the elevation colorbar until you are convinced that you can see where the valley walls stop and the floodplain begins. Floodplains are defined to be flatter, and the valley walls are steep, so the easiest way to do this is usually to set the lower limit of the colorbar to zero (river) and mess around with the upper color bar until you see a broad area surrounding the river colored in. If this area is relatively insensitive to the value you choose for the upper color bar level, then you're in good shape. 

Next, choose a few locations along the length of the river, approximately every 2-3 kilometers. Measure the valley width using the 'Measure' tool in QGIS. It has this icon: ![](screenshots/measure_icon.png). It allows you to measure between two points you click. Click on each side of the valley, attempting to measure approximately perpendicular to the overall valley trend. Here's an example. The gold line represents the two points chosen as the measuring end points. 

![valley width example](screenshots/valley_width_example.png)

Once you make several measurements like this, and note down the values that you get, take the average, and enter it in the `valley_width_m` column of the collaborative spreadsheet. The units should be **meters**. 

## Migration rate

Coming soon....
