Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choropleth - stroke by postcode #3

Closed
mdsumner opened this issue Oct 8, 2018 · 7 comments
Closed

Choropleth - stroke by postcode #3

mdsumner opened this issue Oct 8, 2018 · 7 comments

Comments

@mdsumner
Copy link
Collaborator

mdsumner commented Oct 8, 2018

(I'll make some numbered suggestions for discussion, if you agree just thumbs-up and I will modify the example accordingly, otherwise please discuss!)

How about this for the stroke_count_estimate?

It simplifies the code somewhat by putting the array of incidence constants in directly against the right age-specific column, and doesn't require an indirect use of apply, or the extra function ComputeStrokeCount.

The current code https://github.com/SymbolixAU/GeospatialStroke/blob/ChoroplethMMC/mmc_surrounds.R#L54-L77 then becomes:

#################
## --- 1) 
basicDemographicsVIC <- mutate(basicDemographicsVIC, 
                       Age_0_24_yr_P = Age_0_4_yr_P + Age_5_14_yr_P + 
                         Age_15_19_yr_P + Age_20_24_yr_P)
basicDemographicsVIC <- mutate(basicDemographicsVIC, stroke_count_estimate = ( 
                   Age_0_24_yr_P  * 5 +  
                   Age_25_34_yr_P * 30 +   
                   Age_35_44_yr_P * 44 +  
                   Age_45_54_yr_P * 111 +  
                   Age_55_64_yr_P * 299 +  
                   Age_65_74_yr_P * 747 + 
                   Age_75_84_yr_P * 1928 +  
                   Age_85ov_P     * 3976) / 100000)

The other "tidyverse" way to map those values to the right column would require a named vector of the constants, and conversion to long form - which I think is less clear. The text might say that though, "in a database context these columns would be stored in long-form, with incidence as a join-able second table" - we can move from interactive exploratory mode to production mode.

@mdsumner
Copy link
Collaborator Author

mdsumner commented Oct 8, 2018

Use read_sf rather than st_read to avoid factor creation from strings by default in https://github.com/SymbolixAU/GeospatialStroke/blob/ChoroplethMMC/mmc_surrounds.R#L44

#################
## --- 2) 
postcodeboundariesAUS <- sf::read_sf("ABSData/Boundaries/POA_2016_AUST.shp")

This avoids the downstream message from the right_join

basicDemographicsVIC <- right_join(postcodeboundariesAUS, basicDemographicsVIC, 
                                   by=c("POA_CODE" = "POA_CODE_2016"))
Warning message:
Column `POA_CODE`/`POA_CODE_2016` joining factor and character vector, coercing into character vector 

@mdsumner
Copy link
Collaborator Author

mdsumner commented Oct 8, 2018

Consider plot of DistanceToMMC at https://github.com/SymbolixAU/GeospatialStroke/blob/ChoroplethMMC/mmc_surrounds.R#L93

#################
## --- 3) 
plot(basicDemographicsVIC["DistanceToMMC"])

Note that we didn't have to worry about the map projection, st_distance is correct as per great circle assumptions in longlat data.

This is where it needs to be said that crow-distance and transport-distance are obviously different.

@mdsumner mdsumner mentioned this issue Oct 8, 2018
@richardbeare
Copy link
Owner

Stroke count computation looks nicer.

I think we'll need comments on projections in the main text, with reference to st_distance.

@richardbeare
Copy link
Owner

I think you can merge your other suggestions into the code.

We'll cover crow-distance and transport-distance and transport time in a separate example.

mdsumner added a commit that referenced this issue Oct 10, 2018
@richardbeare
Copy link
Owner

@mdsumner - a query on crs. I'm experimenting with including osm geocoding (with tmap_tools, since we're using tmap anyway) to avoid the need for keys in this example. If I do:

MMCLocation <- tmaptools::geocode_OSM("Monash Medical Centre, Clayton, Victoria, Australia", as.sf=TRUE)
MMCLocation

Simple feature collection with 1 feature and 7 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 145.1207 ymin: -37.92093 xmax: 145.1207 ymax: -37.92093
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
                                                query       lat      lon   lat_min   lat_max  lon_min  lon_max                   geometry
1 Monash Medical Centre, Clayton, Victoria, Australia -37.92093 145.1207 -37.92098 -37.92088 145.1207 145.1208 POINT (145.1207 -37.92093)

Whereas the census postcode data is:

postcodeboundariesAUS

Simple feature collection with 2670 features and 4 fields (with 2 geometries empty)
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 96.81694 ymin: -43.74051 xmax: 167.998 ymax: -9.142176
epsg (SRID):    4283
proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
# A tibble: 2,670 x 5
   POA_CODE POA_CODE16 POA_NAME  AREA_SQKM                                                                                geometry
   <chr>    <chr>      <chr>         <dbl>                                                                      <MULTIPOLYGON [°]>
 1 POA0800  0800       0800          3.17  (((130.8502 -12.45301, 130.8496 -12.45622, 130.8504 -12.45637, 130.8501 -12.45824, 1...
 2 POA0810  0810       0810         23.8   (((130.8715 -12.39532, 130.8708 -12.39539, 130.8628 -12.39544, 130.8609 -12.39645, 1...

What is the correct way of ensuring crs are consistent?

@mdsumner
Copy link
Collaborator Author

Strictly, the right way is to transform one to the other (you could use sf::st_crs(postcodeboundariesAUS) rather than prj, or you could enter 4283 directly):

prj <- "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs"
 format(unclass(sf::st_geometry(sf::st_transform(MMCLocation, prj))), digits = 15)
[1] "145.1207033, -37.9209311"
> format(unclass(sf::st_geometry(MMCLocation)),  digits = 15)
[1] "145.1207033, -37.9209311"

But, as you can see they aren't different. I don't know exactly on this stuff, but the +towgs84=<all zeros> to me means that it's equivalent to WGS84 in 2D, ( if surveyor accuracy is required we'd be using GDA94 (the datum)and a real 3D transformation grid would be in use).

I'm not quite sure what the advice should be generally, use "+proj=longlat +datum=WGS84 +no_defs" ("+init=epsg:4326", or st_crs(4326)) unless you have source data that uses this locally specific one. I think probably we should use 4283 generally, but it never actually matters in my work these days.

Take home: keep the metadata strictly for use by the sf tools, and always check if you have to apply it or modify it manually.

@richardbeare
Copy link
Owner

Ah, I'll use st_transform to spread good practices.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants