# Example weightGIS

In this file we are going to be constructing the weights to be used for weighting regional data on places that change
over time. Let's take a fictional island nation that has three core regions where one of the three expands its borders 
sometime between 1931-51. We also for this example have the underlying city administrative regions from 1921 with the 
population of these regions, so we have the ability to do area and population weighting. 

If you want to look an image of what this location looks like you can see the image [here][im]. If you want to follow 
along all the example data is in a folder on the [github page][repo] under ExampleData. This will include all the 
results as well as the raw data you need to follow along. **However**, keep in mind that this directory includes results
from all the tutorials so, so don't worry about the files not mentioned yet!

[im]: https://github.com/sbaker-dev/weightGIS/blob/master/Example/Images/ExampleChanges.png

## Construct base weights

First you will want to make a project folder, in our case its called ExampleData on github. Within it you will want
to put all the shapes you want to compare in folder called "Shapefiles"; or tell ConstructWeights the name of the folder
by setting the keyword arg shape_file_folder_name. 

You also need to put a population shapefile in the project directory,not the shapefile directory, if you want to do 
population sub weighting and set a weight index for the base zero column index that holds the population information in 
this population shapefile. For example, if you have population number for your sub_unit population figures in column 3 
(base zero indexing), then you need to provided the weight_index of 3.

The GID (numeric name of the place) and the name are by default set to be the 0 and 1 indexes of your shapefiles. You
can change these if that is not the case for your data to the column they are within, but this column must be the same 
index across all shapefiles. 

In larger datasets you may have names that are not unique, this is what the GID is for, but you may also have a dataset
specific clarified such as a place type. For example you may have rural or urban places with a classier in another 
column. If you want to maintain this information, you can set the index of this column to name_class; but we are not 
using that within this tutorial series. 

To see when the changes occur we will be using the ConstructWeights class, to do this we need to compare a set of 
shapefiles and see if a polygon stays the same in the next iteration of that place in time. We will undertake this 
process in the following cell. As in this case most of the indexes are set to there default values but remember to set 
them if your attributes are not the default values. 

[repo]: https://github.com/sbaker-dev/weightGIS/Example/ExampleData

In [3]:
from weightGIS import ConstructWeights
from weightGIS import AdjustWeights

project_directory = "ExampleData"
base_shape = "1951.shp"
population_shape = "1921.shp"

In [7]:
ConstructWeights(project_directory, base_shape, population_shape, weight_index=2).construct_base_weights()

1 / 3
2 / 3
3 / 3


### The differences between area and population

This will construct a json file called BaseWeights_0.txt within a folder called BaseWeights within your project folder.
Lets look specifically at Ecanlor 1931 before it changes; shown below. As you can see Ecanlor does gains a large 
amount of Danlhigh based on the area. However, as this area represents mostly open mountains and grass land the actual 
population that have been re-assigned is drastically different as few people lived in these rural areas. This example 
has been constructed to be an extreme case, but should allow you to see how if the area's that are transferred are 
large, but not with an equivalently large amount of population living in it, that area weights may be a poor choice. 
In general, the larger the geographical generalisation you use, the more dangerous area weights become.

*Ecanlor 1931 from within BaseWeights*
```json
{
    "1__Ecanlor": {
        "Area": 100.0,
        "Population": 100.0
    },
    "3__Danlhigh": {
        "Area": 41.76016134968026,
        "Population": 1.8336986193489935
    }
}

```

## Determine Changes

Now we know that a change occurs, but we don't know exactly when it occurs, all we know know is that it occurs between 
our observed periods in the shapefiles of 1931 and 1951. In this case we have a small number of changes we could see
by eye, but in larger numbers and size shapefiles we will need output all the changes we need to find dates for. We can
do this via the via the write_out_changes command. In this case you just need to provide the path to file you just 
created. We can just save the information to our working directory.

In [8]:
weights_path = f"{project_directory}/BaseWeights/BaseWeights_0.txt"
AdjustWeights(project_directory, weights_path).write_out_changes("ChangeLog")

Written out changes!


## Assigning dates
From ChangeLog we can see both Ecanlor and Dalhigh experience a change so we now need to go and find out when this 
occurs. Let's say we dig through the archives and find that during 1931 and 1951 there we actually two changes, even 
though we where only expecting 1. This is an important time to bring up another limitation of *observable changes*. When
we use shapefiles we only see the cumulative effect off all the changes, and can only act upon these observed changes.

So, lets say the first change happened in 1938, and then we have another change in 1939. We observe the 1939 shape so
unless you construct a fix. In larger projects or in areas of the distant past you may not always be able to find 
reliable information on what the interim changes look like, so you may have to accept some of these changes are dropped
from your analysis. In this example, the 1938 date will be dropped as we cannot observe this weight based on the
data you have provided. Whether you want to go out of your way to record the changes that will not be used or not is up
to you, although it can be important for transparency so it is recommend. 

This means that you will produce a file that looks as like the following, which can be seen in the Weight_Dates.csv. We 
will now proceed having dropped this information, but will cover how to fix this problem afterwards

| GID         | Place Name | Changes1    | Changes2    |
|:------------|:-----      |:-----       |:-----       |
| 1           | Ecanlor    | 01/04/1938  | 01/04/1939  |
| 2           | Nirghol    | -           | -           |
| 3           | Danlhigh   | 01/04/1938  | 01/04/1939  |


## Constructed weighted Database

For now, we want to take these weights and construct a database that has the weights relative to the dates that places
change over time in as is. First we need to load the weights be generated in the ConstructWeights and the dates we 
constructed in Weight_Dates.csv. These will then be used to assign the weights to these dates. 

It may be the case that you only observe the dates of a census in a general year format, but the changes you have are 
more specific in terms of year-month-day. If this is the case, you need to adjust the year format by assigning a month 
and day to the assign_weights call method so we can look at changes occurring between them.

In [None]:
from weightGIS import AssignWeights

dates_path = f"{project_directory}/Weight_Dates.csv"
AssignWeights(weights_path, project_directory, "1951_weights_by_dates", dates_path
              ).assign_weights_dates("0401")

### Output
This will write out a json database for each ID and place showing when the changes occur starting from the first census
year provided. As we can see in the json data below, each place in our reference shape that now has dates assigned to 
each change, and the places involved in that change in the form of change place id, Change place name, and a given 
weight that was specified.

```json
{
    "1__Ecanlor": {
        "19310401": {
            "1__Ecanlor": 100.0,
            "3__Danlhigh": 1.8336986193489935
        },
        "19390401": {
            "1__Ecanlor": 100.0
        }
    },
    "2__Nirghol": {
        "19310401": {
            "2__Nirghol": 100.0
        }
    },
    "3__Danlhigh": {
        "19310401": {
            "3__Danlhigh": 98.16443083327889
        },
        "19390401": {
            "3__Danlhigh": 100.0
        }
    }
}
```


## Correcting for un-observed changes

If you want to ensure you get each change, then this can be done by drawing additional shapefiles in between each 
change. For example, lets say Ecanlor gained part of Danlhigh in 1938, and then the rest of the observed change in 1939. 
If you are working in a large shapefile with 100's or even thousands of places, it is recommend that you slice out 
the area of the changes you need to fix as we can update the master database without having to rerun it. So in this case
we just isolate Ecanlor and Danlhigh from the shapefile as Nirghol isn't affected.

Once you have your area to fix, you would need to find the borders of the shapefile in 1938 and draw that as a new 
shapefile. In our example case you would create a new folder 'Ecanlor' with shapefiles for this fix in 1931, 1938, and 
then 1951.This way we can observe all changes. An example of these shapefiles has been placed within 'Ecanlor' to give 
you an idea of how to structure this change. One you have made the change you can run the process of constructing the 
weights again

In [5]:
base_shape = "1951.shp"
ConstructWeights(project_directory, base_shape, population_shape, weight_index=2, 
                 shape_file_folder_name="Ecanlor").construct_base_weights()


1 / 2
2 / 2


### Adjusting the weights

This will produce a file called Ecanlor_1.txt in our BaseWeights folder. What we can now do is adjusted our original 
BaseWeights to have this additional information. For the purpose of this tutorial i have duplicated and then renamed the
file to BaseWeights_Adjusted.txt so that we have  a clear before and after. We can then use AdjustWeights as shown below
where say specifically which places we want to update via the 'update' list. Each update is handled individually so
run this method via a for loop. 

You can then re-run AssignWeights, but by changing the weights_path to the new file, and see the differences would be.
This double change is only used for this example to show how to fix it, and will not be reflected in future tutorials
so it is not included within the example data.

In [7]:
update = ["1__Ecanlor", "3__Danlhigh"]
fixed_path = f"{project_directory}/BaseWeights/Ecanlor_1.txt"
for ud in update:
    AdjustWeights(f"{project_directory}/BaseWeights", f"{project_directory}/BaseWeights/BaseWeights_Adjusted.txt"
                  ).replace_assigned_weight(fixed_path, ud)
print("Adjusted Weights")

Adjusted Weights


### End of tutorial 1

From this you now have your assigned weights, but now we want to actually use them to weight some data. The next step
that we will need to do is create a geographic lookup so that we can converge all data to a single unique entry that we
can use.