Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final steps #12

Closed
richardbeare opened this issue Feb 7, 2019 · 30 comments
Closed

Final steps #12

richardbeare opened this issue Feb 7, 2019 · 30 comments
Assignees

Comments

@richardbeare
Copy link
Owner

@njtierney @mpadge @SymbolixAU @mdsumner @gboeing

Hi Everyone,
We're running into the end of our extra time, I'm afraid. We need to knock this off. I've done a reorganisation of the paper text and I have a set of tasks that will give us the final pieces. I'm taking an executive decision to assign tasks as follows:

Those tasks that involve writing, don't worry too much about the flow, get the ideas down in a coherent form in a section at the end of the document. Use Nick's morgue section for any new sections I assign.

  1. @mpadge Finish off the catchment basin example by using the incidence ideas from the choropleth to compute per postcode stroke rates, then we compute strokes per service centre based on the proportion of randomly sampled cases that are assigned to a service centre. i.e if 30 % of the random cases from a postcode are assigned to service centre A, then 30% of the predicted stroke cases are assigned to it.

Ensure that neither example uses API keys.

Have a test section at the beginning that checks for required packages.

  1. @SymbolixAU A discussion paragraph on API keys, what they are needed for and which common platforms need them.

Next, modify both examples to use only keys - specifically google for the various geocoding and distance calculations (you might need to drop the number of samples per suburb to avoid hitting getting charged), and replace all of the visualization steps with mapdeck

  1. @njtierney Write a section on curated data. Ideally a US and or Canadian and European example to match what we've already mentioned. i.e. census data, including boundaries, address/position data bases + anything else that you can think of.

  2. @gboeing Python versions of both examples without API keys. Also the versions with keys once available, if it makes sense. Send any ideas you have about the curated data sets, especially python friendly forms to @njtierney.

Still to do - once the examples are running we'll edit the text around them to suit

  1. @mdsumner off the hook for now, but we'll be leaning heavily on you for proof reading, I think.
@gboeing
Copy link
Collaborator

gboeing commented Feb 7, 2019

@richardbeare I'll be adding my work in #13

@gboeing
Copy link
Collaborator

gboeing commented Feb 7, 2019

@njtierney I added some additions to the manuscript text (in "suggestion" mode so the additions are highlighted) about US spatial data sets.

@gboeing
Copy link
Collaborator

gboeing commented Feb 7, 2019

@richardbeare Python versions of both examples have been completed in #13, and merged into master.

@richardbeare
Copy link
Owner Author

Thanks - will test ASAP.

@njtierney
Copy link
Collaborator

@gboeing thanks! :) can you add citations or links to the work you referenced?

Eurostat + GISCO is the main source of GIS data for Europe, I have also provided recommended links for France, Germany, and Switzerland, where they have similar levels of data. Would you like me to explore specifically the type of data that they provide and critique that? Or is this more so that the reader knows where to look?

@richardbeare
Copy link
Owner Author

I think it is worth checking to the level of pointing out approximate parallels to what we've demonstrated. i.e. verify that there are boundaries of various types and corresponding demographics. If anything else jumps out as interesting then think about going a bit deeper.

mpadge added a commit that referenced this issue Feb 11, 2019
@mpadge
Copy link
Collaborator

mpadge commented Feb 11, 2019

That commit should polish my bit off, with the whole shebang directly viewable in the README - let me know of any other potential things you might like in there @richardbeare.

Interestingly, final case loads to each rehab centre comparing my R way of doing things with the python code of @gboeing look like this:

Destination R (abs) R (rel) python (abs) python (rel)
CaseyHospital 10817 19.4 479 12.8
DandenongHospital 16325 29.4 1419 37.9
KingstonHospital 28479 51.2 1851 49.4

Given that I don't think that differences in sample sizes are likely to generate the observed degree of difference in proportional allocation, potential origins of these discrepancies could be:

  1. Thanks to input of @richardbeare, the R code uses the PSMA::fetch_postcode() function which generates random samples of actual street addresses within a postcode, where the python code uses a simple sample of network nodes
  2. I've been uncertain for a while whether osmx.network_from_xxx -> networkx.shortest_path_length actually does the same thing as dodgr_dists(), and suspect in fact not. @gboeing Your wisdom greatly appreciated here, but in my potentially uniformed view, osmx.nework_from_xxx extracts the specified part of the network (here, network_type = "drive"), but does not actually weight that to generate weighted route preferences. The networkx.shortest_path_length calculation is then in linear km (or whatever), but absent a preferential weighting scheme. In contrast, dodgr calculates a dual graph with a weighted version used for preferential routing, and an unweighted version used for distance calculations. That could be the source of some discrepancy here?

Very interesting regardless to have this forum to provide such a concrete contrasts between the otherwise strongly parallel work of @gboeing and myself. Geoff, it'd be great one day to have a chance to merge minds on this stuff!

@gboeing
Copy link
Collaborator

gboeing commented Feb 11, 2019

@mpadge yep, I think that's generally right. Correct on 1. Regarding 2, OSMnx uses a directed graph weighted by length, so the shortest path algorithm minimizes distance traveled. Lots of different ways to tweak that analysis, with trade-offs between theoretical soundness, labor, time complexity, etc. Our estimates are in the same ballpark, which is good but also (usefully, actually) demonstrate how hundreds of tiny methodological decisions help shape analytical results.

@richardbeare
Copy link
Owner Author

We'll see how much different the distance api approach is - @SymbolixAU - perhaps check the feasibility of using the same addresses as the python version for comparison purposes.

@SymbolixAU
Copy link
Contributor

@richardbeare I've added a paragraph on API keys in the 'APIKeys' folder. Let me know if you want more details.

@mpadge mpadge mentioned this issue Feb 12, 2019
@richardbeare
Copy link
Owner Author

OK,
Progress is looking good. I've added a pointwise methods section for each example. @gboeing, can you modify your notebooks to provide numbered headings that match the numbered sections in the methods. A slight reordering may be required.

@mpadge same for you - please restructure the catchment basin example to match the order in the methods description (the main change is so that the first couple of steps of each example are the same). Ensure that we have numbered sections and that they match the methods description. Also, lets get rid of all non essential visualisation steps. If you like mapview for the rehab example, lets stick with only that. We'll have a separate mapdeck example. Don't remove the extra description you've included around dodgr etc. It is good and we won't have room in the main paper.

@SymbolixAU once @mpadge has finished tweaking the rehab example, please look at reproducing both examples using services requiring keys. Choropleth with mapdeck visualization and google geocoding. Catchment areas with mapdeck and google too. Be sure to reduce the number of sampled addresses to something much smaller (10 per post code?). Write a section on how to set up the keys in the supplementary material (bottom of the google doc), and we'll copy it into the web site too.

@njtierney @SymbolixAU @mpadge @mdsumner - we need consensus about how the data folder is referenced. In order to do this we need to make some guesses as to how people are most likely to use the examples. If they are clicking on the Rmd files in rstudio then a ../ or here:: approach is probably OK. However I'm not sure if here:: will work if they do a download of the zip file from github. In addition, we want all this stuff to somehow end up on a github pages website, so the decisions need to coexist.

@richardbeare
Copy link
Owner Author

Finally, @gboeing, @njtierney, @SymbolixAU , @mpadge , @mdsumner - calling for votes: The paper is at the point where we need to start consolidating it with citations etc. I'm open to suggestions, but I suspect that latex is probably the easiest. If there are a few who agree, we'll put a copy into a folder on the repo and work on it from there. An volunteers for the first pass - perhaps one to do the initial translation and someone else to do the bibtex.

@SymbolixAU
Copy link
Contributor

@richardbeare Is it ok if I get my sections completed by Friday 15th Feb?

@richardbeare
Copy link
Owner Author

@SymbolixAU - trying to either reset my mapdeck account password, or create a new sign in. Have you had problems lately?

@SymbolixAU
Copy link
Contributor

@richardbeare no, I've not had issues with the access token. Feel free to open an issue on my mapdeck issue page and we can work through it. (or a separate issue in this repo would be fine too)

@richardbeare
Copy link
Owner Author

@SymbolixAU Ah - mapbox.com, not mapdeck.com

SymbolixAU pushed a commit that referenced this issue Feb 14, 2019
@mpadge
Copy link
Collaborator

mpadge commented Feb 14, 2019

@richardbeare just to confirm: do you mean that I should align structure with the section headings in Choropleth/mmc_surrounds?

@gboeing
Copy link
Collaborator

gboeing commented Feb 14, 2019

@gboeing, can you modify your notebooks to provide numbered headings that match the numbered sections in the methods. A slight reordering may be required.

@richardbeare this is completed in #20

@richardbeare
Copy link
Owner Author

@mpadge. Please try to align with methods description in the google doc. The first 3 steps will be the same as mmc_surrounds

mpadge added a commit that referenced this issue Feb 15, 2019
@mpadge
Copy link
Collaborator

mpadge commented Feb 15, 2019

That commit should do it. I've also incorporated some of the insights of the rvspy stuff, so final results are generated 2 ways:

  1. Estimates through direct sampling of random addresses, with each postcode weighted equally; and
  2. Estimates weighted by estimated stroke incidence based on per-postcode demographics.

@richardbeare
Copy link
Owner Author

@njtierney @mdsumner @mpadge @gboeing @SymbolixAU - any preference for the tool used for the final efforts on the manuscript?

Frontiers has latex support

Happy to stick with googledocs if anyone knows how to do citations properly.

@mpadge
Copy link
Collaborator

mpadge commented Feb 16, 2019

Latex:+1:

@gboeing
Copy link
Collaborator

gboeing commented Feb 16, 2019

I like Latex. A service like overleaf makes collaboration easy.

@richardbeare
Copy link
Owner Author

I'm happy if we treat the document like everything else in the repo. No need to involve any other service?

@gboeing
Copy link
Collaborator

gboeing commented Feb 17, 2019

Sure.

@njtierney
Copy link
Collaborator

I'd be happy with LaTeX, markdown or rmarkdown is also fine.

Something like overleaf would be great with LaTeX but I understand if you'd prefer to keep things as is.

@richardbeare
Copy link
Owner Author

OK, @njtierney , please do the initial version of latex+bibtex and commit to an article folder. I aim to submit by the end of next week.

@njtierney
Copy link
Collaborator

OK @richardbeare - working on this right now

@njtierney
Copy link
Collaborator

OK, submitted a PR here: #24

On another note:

  • I am back in Melbourne this week but am taking one month of leave from Saturday.
  • My current to-do item is:
    • Evaluate the data in the french, german, and swiss statistics departments, further to what @richardbeare said: "I think it is worth checking to the level of pointing out approximate parallels to what we've demonstrated. i.e. verify that there are boundaries of various types and corresponding demographics. If anything else jumps out as interesting then think about going a bit deeper."
    • re referencing data: ../ or here::here(). I would suggest here::here() if it is in rmarkdown.

@njtierney
Copy link
Collaborator

I've added more to the PR, ended up going on a bit of a journey to find geospatial data in Europe, I collated more detail here:

https://docs.google.com/document/d/1RQ_W09fzcBwJh3zYkZlmwNi9b6sUWfU7_O3rZrJU8Yk/edit

@SymbolixAU SymbolixAU removed their assignment Jan 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants