Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Election tutorial #2041

Merged
merged 9 commits into from
Nov 6, 2019
Merged

Conversation

ae-foster
Copy link
Contributor

Here is the second tutorial for pyro.contrib.oed.

In this tutorial, we introduce the posterior estimator and show how it could be used to design a polling strategy that can lead to accurate predictions of the outcome of a US presidential election.

Copy link
Collaborator

@martinjankowiak martinjankowiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great! some initial comments/questions:

  • @neerajprad should we move this data files (*.pickle) somewhere else?
  • @neerajprad what's the optimal way to link to the other OED tutorial?
  • at the top add some caveat about "We make a number of simplifying assumptions... exploratory..."?
  • "We will take historical election data 19760-2012 as our prior" => "We will use historical election data 1976-2012 to construct a plausible prior"
  • print out some of the dataframes instead of the whole thing where appropriate?
  • explain why your covariance is in logits space
  • missing word "we are assuming that people respond to the ...."
  • "# Now compute w according to the (approximate) electoral college formula" pull this deterministic calculation out as a helper function?
  • strange comment? " (This kind of posthoc analysis is, of course, only engaged in by people of low statistical morals.)"
  • "One such variational estimator is " + "the"
  • "This $q$ performs" arguably q doesn't perform anything although it might be used to perform something?
  • can you call the intermediates in compute_dem_probability h1, h2 etc
  • re: "def h(p)" why not use bernoulli.entropy
  • explain swing_score
  • why eig=False and use of ape?
  • why do the code snippet that starts best_eig = 0 programmatically?
  • i'm confused by what's going on in the second-to-last section
  • the final sentence reads a bit wonky

@neerajprad
Copy link
Member

@neerajprad should we move this data files (*.pickle) somewhere else?

Even though these are small binaries, we should upload it to aws like other datasets and read it from there, rather than committing it in this PR.

@neerajprad what's the optimal way to link to the other OED tutorial?

For the time being, how about just directing to the github link? Once this is published, we can change those links to direct to the website.

@jpchen
Copy link
Member

jpchen commented Sep 19, 2019

what's the optimal way to link to the other OED tutorial?

look at the links in the other intro tutorials. iirc a normal md link should work [first tutorial](first_tutorial.ipynb) and then nbconvert will automatically create the correct html extensions. alternatively, you could hardcode the url pyro.ai/examples/tutorial_name.html

@martinjankowiak
Copy link
Collaborator

@ae-foster when you get to this let's also fix the formatting errors in http://pyro.ai/examples/working_memory.html

@ae-foster
Copy link
Contributor Author

ae-foster commented Oct 10, 2019

I've gone through and applied @martinjankowiak 's suggestions.

why eig=False and use of ape?

Since w is defined only implicitly by the model we need a large number of samples to compute the prior entropy, better to estimate it once to avoid extra noise comparing designs. I can add a sentence to this effect, though it may confuse some people.

i'm confused by what's going on in the second-to-last section

I rewrote this section- have a look and see if it is better

Even though these are small binaries, we should upload it to aws like other datasets and read it from there, rather than committing it in this PR.

I take it you guys have to do that as you have the necessary AWS keys

a normal md link should work

I'm sticking with this then

let's also fix the formatting errors in http://pyro.ai/examples/working_memory.html

The only one I could see was a malformed numbered list near the beginning- I tried to fix this by throwing in some extra new lines. Hard to tell how it'll render though. Were there other formatting errors we need to pick up?

@martinjankowiak
Copy link
Collaborator

@neerajprad can you please help get the pickle files in this PR onto aws?

@martinjankowiak
Copy link
Collaborator

@ae-foster looks good.

  • are you able to make docs locally? i think there's still some weird formatting
  • i still find the second-to-last section somewhat confusing. what is the message you're trying to get across?
  • re: "I can add a sentence to this effect, though it may confuse some people." i think you should explain this a bit in the text (although i guess you should keep the explanation short)

@ae-foster
Copy link
Contributor Author

ae-foster commented Nov 4, 2019

Just pushed some more updates.

are you able to make docs locally? i think there's still some weird formatting

I went through the existing tutorial online and picked up a few other changes which are fixed now

i still find the second-to-last section somewhat confusing

I had another go at rewriting and simplifying this section. The general point is: having done the experimental design, let's simulate actually running the experiment and conducting the data analysis: have a look at the posterior compared to the prior. Nothing really intellectual here, just showing the complete experiment loop. If you think it's still confusing it's no problem to drop this section altogether.

i think you should explain this a bit in the text

I've added explanation as comments near where I put eig=False so it's clear what I'm referring to.

Overall, I think the tutorial is looking close to ready now. Did you get the files up on AWS?

@codecov-io
Copy link

codecov-io commented Nov 4, 2019

Codecov Report

❗ No coverage uploaded for pull request base (dev@31a72da). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##             dev    #2041   +/-   ##
======================================
  Coverage       ?   94.47%           
======================================
  Files          ?      201           
  Lines          ?    12781           
  Branches       ?        0           
======================================
  Hits           ?    12075           
  Misses         ?      706           
  Partials       ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31a72da...4302b7e. Read the comment docs.

@martinjankowiak
Copy link
Collaborator

looking great. a few last nits:

  • the 'previous tutorial' link and the 'working memory tutorial' link should point to http://pyro.ai/examples/working_memory.html
  • can you call the intermediates in compute_dem_probability something other than y
  • typos: 'If we have the option of first design the polling strategy using our prior'

i'll let you know once we have the AWS pickles sorted

@neerajprad
Copy link
Member

@ae-foster, @martinjankowiak - The dataset files should be in https://d2hg8soec8ck9v.cloudfront.net/datasets/us_elections/ directory on aws.

e.g. https://d2hg8soec8ck9v.cloudfront.net/datasets/us_elections/us_presidential_election_data_test.pickle

@ae-foster
Copy link
Contributor Author

Thanks- I've sorted those comments from @martinjankowiak and I have removed the data from the repo, instead we now have

!curl -sO "https://d2hg8soec8ck9v.cloudfront.net/datasets/us_elections/electoral_college_votes.pickle"
etc.

which should download the data to the right directory at the start

@martinjankowiak
Copy link
Collaborator

great thanks @ae-foster ! can you please make the data downloading pythonic? see e.g. here
once that's done we'll get this merged!

@ae-foster
Copy link
Contributor Author

Now using urlopen 👍

Copy link
Collaborator

@martinjankowiak martinjankowiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks adam! i'll merge this as is although i notice some of the numbers have shifted a bit. did something change, is this just inherent stochasticity, or do things need to be trained for longer or what?

@martinjankowiak martinjankowiak merged commit b7288f4 into pyro-ppl:dev Nov 6, 2019
@ae-foster
Copy link
Contributor Author

It's inherently stochastic because it depends on the simulated outcome of the experiment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants