# Weighting External Data

## Preparing data for weights

To be able to use the weights on external data, this data will need to before formatted, checked, and cleaned into a 
set format of json files akin to the format below. You need to structure your data in a manner that is geo-level 
specific. For example if you have state level and district level data they both need their own group. Even if you only 
have 1 level, you **must** group the data by that level. From there you need to assign unique places, that have unique 
attributes otherwise they will override each other due to how json works. 

Place names need be ID__NAME in that format specifically. You can have additional information in the name, and ID isn't 
very human readable which goes against the principle of json, but you must make sure that the ID is the first element as
it will be extract by .split("__")[0] with the rest of the information discarded. If you want the place name to be 
assigned, make sure to have an attribute within the PlaceName that is assigned the name. 
  
```json
{
    "Geo-level1": {
        "PlaceName": {
            "AttributeA": {
                "Dates": [],
                "Values": []
            },
            "AttributeB": {
                "Dates": [],
                "Values": []
            },
            "AttributeC": {
                "Dates": [],
                "Values": []
            }
        }
    },
    "Geo-level2": {
        "PlaceName": {
            "AttributeA": {
                "Dates": [],
                "Values": []
            }
        },
        "PlaceName2": {
            "AttributeA": {
                "Dates": [],
                "Values": []
            }
        }
    }
}
```
##### Example

Now we are going to use the example data of ExternalData within our Example directory to construct a weighted dataset 
given these weights. The file contains some information before 19390401 (when our change occurs) and some after, so its
the basic use case for this pipeline.

Here we provide the path to the external data file that follows the schema above, the weights by dates file we created
early, and a cut of. If you want to include all the data, just set it to something larger than your date range **Note -
we should just make this an optional arg, this will change in future**.

Then provide the write directory and write name and let the program run. Dates before 19390401 are now weighted based
on the weights that you provided, standardising them across the time period.


In [2]:
from weightGIS import WeightExternal

project_directory = "ExampleData"

WeightExternal("ExampleData/ExternalData.txt", "ExampleData/1951_weights_by_dates.txt", 19390601).weight_external(
    project_directory, "WeightedDatabase")


Loading External Data
1__Ecanlor
2__Nirghol
3__Danlhigh


### Accessing values

Json may not be a easy way for you to access the data, so there are several methods to access it. The first is to just
parse it out of the json data directly, which you can do by using the supporting access_weighted method. First we load
the data into memory so that we can access that data as much as we want without having to re-load it. Here we use 
another package of miscSupports to load the json data, which is a required package for weightGIS so you should have it
installed

In [3]:
from miscSupports import load_json
weighted = load_json("ExampleData/WeightedDatabase.txt")
print("Loaded Data")

Loaded Data


#### Retrieving data

Lets say you want to extract the data from a given attribute for all places and all times. In this case you just need to
change the values of the keys set below and the script will do the rest for you. Keep in mind that keys are 
**Case Sensitive**, so they must be copied or typed correctly. You can included as many names as you want, but you must
leave them within the list, even if you only want a single attribute. You can also use the indexer to isolate any single
part of the name that is delimited by '__', in this case we use 1 to extract the name without id.

This data is formatted in rows, so its very easy to write it out as a csv if that is the desired end state.

In [4]:
from weightGIS import access_weighted
from csvObject import write_csv

data_request = ["Births"]
data_out = access_weighted(weighted, data_request, 1)
print(data_out[0])
write_csv(project_directory, "Retrieved_Data", ["Place", "Date", "Births"], data_out)


Retrieved Weighted Data 17:5
['Ecanlor', 19380701, 859.5424985563471]


### Converting to SQL database

Whilst json and accessing via csv may be fine is you just want to extract a column of data and then move on if you need
more complex command calls then it may be better to convert the database into an SQL database. The steps to do this are 
quite simple.

First we use the access_weight to request all the data we want, so in this case we will extract Births and Deaths so
that we have all the data

In [5]:
data_out = access_weighted(weighted, ["Births", "Deaths"], 1)

Retrieved Weighted Data 17:5


#### Writing data to SQL

we will need the sqlite3 package that is within python 3 natively for this. Set the connection to the location you want
to save the information with the name you want to add on the end. The extract the cursor element from the connection so
that we can access and add information to it.

Then we want to add a table in with each element in our data. The first two rows will always be Place and Data by 
default which should receive the type of Text and INTEGER (more on SQL types [here][sql_types], although clearly you
can edit these out should you wish but most of the methods that the provided SQL reader within this package has to help
you use these dataset will not work if you do. We then need to add both Births and Deaths as REALs which act as floats.

Its important that you write the table in the order of the row data put in as the for loop after the table will add in
the tuple of information into these slots in this order. Now if you want to use the custom provided SQL parser for data
like this you can use the ExampleSQL notebook with the examples directory of this repository

[sql_types]: https://www.sqlite.org/datatype3.html

In [6]:
from miscSupports import terminal_time
import sqlite3

connection = sqlite3.connect("ExampleData/SQLData")
c = connection.cursor()

# Make the table
c.execute("""
    CREATE TABLE WEIGHTED
    (
    Place TEXT,
    Date INTEGER,
    Births REAL,
    Deaths REAL 
    )
    """)

for index, row in enumerate(data_out):
    c.execute(f'INSERT INTO WEIGHTED VALUES {tuple(row)}')

# Commit the file
connection.commit()
connection.close()
print(f"Finished at {terminal_time()}")



Finished at 17:5
