Skip to content

Commit

Permalink
add last two days
Browse files Browse the repository at this point in the history
  • Loading branch information
irenetrampoline committed Jun 11, 2018
1 parent d3b3940 commit c7da156
Show file tree
Hide file tree
Showing 6 changed files with 55 additions and 1 deletion.
4 changes: 4 additions & 0 deletions README.md
Expand Up @@ -2,6 +2,10 @@
My goal is to read an academic paper every day. Here I keep myself accountable.

## Papers
**Jun 11, 2018:** [Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information.](writeups/RaoDau18.md) S. Rao, H. Daume. 2018. [[pdf]](https://arxiv.org/pdf/1805.04655.pdf)

**Jun 10, 2018:** [An Alternative View: When Does SGD Escape Local Minima?](writeups/KleLiYua18.md) R. Kleinberg, Y. Li, Y. Yuan. 2018. [[pdf]](https://arxiv.org/pdf/1802.06175.pdf)

**Jun 05, 2018:** [Do CIFAR-10 Classifiers Generalize to CIFAR-10?.](writeups/RecEtAl18.md) B. Recht, R. Roelofs, L. Schmit, V. Shankar. 2018. [[pdf]](https://arxiv.org/pdf/1806.00451.pdf)

**Jun 03, 2018:** [Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health.](writeups/AltClaLes16.md) T. Althoff, K. Clark, J. Leskovec. 2016. [[pdf]](http://www.aclweb.org/anthology/Q16-1033)
Expand Down
Binary file added pdfs/KleLiYua18.pdf.pdf
Binary file not shown.
Binary file added pdfs/RaoDau18.pdf.pdf
Binary file not shown.
3 changes: 2 additions & 1 deletion publish.py
Expand Up @@ -34,7 +34,8 @@ def main():

year = all_nums[-1]
except:
raise ValueError('Year must end with e.g. "2014."')
pdb.set_trace()
raise ValueError('Year must end with digits, e.g. "2014."')

if authors_N < 4:
md_title = []
Expand Down
17 changes: 17 additions & 0 deletions writeups/KleLiYua18.md
@@ -0,0 +1,17 @@
# An Alternative View: When Does SGD Escape Local Minima?

Robert Kleinberg, Yuanzhi Li, Yang Yuan. [An Alternative View: When Does SGD Escape Local Minima?](https://arxiv.org/pdf/1802.06175.pdf) ICML 2018.

## tl;dr
- Tacking the question of "Why does deep learning work?", here we explore whether stochastic gradient descent (SGD) escapes local minima? Yes for convex, usually yes for non-convex.
- Usually yes means the weighted average of the gradients of its neighborhoods must be one point convex with respect to desired x-star.
- Empirically the authors show that the neural networks loss surface exhibits one point convexity locally.

## One point convexity
Informally, a function f is c-point convex with fixed point x-star, step size n, noise W(x), and y = x - n\*grad(f(x)) if: the inner product of the gradient of the expected value of the neighborhood of y and the direction x-star - y is greater than the 2-norm of c \*(x-star - y). We then know that y will converge to x-star with decent probability.

## Motivating example
The authors use the motivating example of a parabola with added spikey noise. It is conjectured that flat local minima leads to better generalization, so using an example of spikey noise represents the most extreme local minima to overcome.

## Main theorem
The main theorem shows that points y will converge to fixed point x-star and once there, it will stay there. The main assumption is L-smoothness.
32 changes: 32 additions & 0 deletions writeups/RaoDau18.md
@@ -0,0 +1,32 @@
# Learning to Ask Good Questions

Sudha Rao, Hal Daume. [Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information.](https://arxiv.org/pdf/1805.04655.pdf) ACL 2018.

## tl;dr
- As the title suggests, the authors rank clarification questions for an original question to get more succinct help.
- Since I did an NLP final project similar to this project, I am particularly intrigued.
- Neural network architecture is novel, and the real question seems to be dataset generation.

## Test Time Pipeline
1. Given a post p, retrieve 10 similar posts in the training set using Lucene
2. The questions asked by those 10 similar posts p_i are then q_i and become the candidate set Q. Edits made to post in response to the questions are answer set A.
3. For each possible clarification question q_i, we generate an answer representation F(p, q_i) and calculate how close answer candidate a_j is to F(p, q_i)
4. We then calculate utility gain to post p if it were updated with answer a_j
5. Given expected utilities, we rank candidate questions by expected utility.

## Loss functions
From the pipeline, we see that a lot of steps involve quantifying "better" questions, answers, and utility functions. Without getting too mathematical (mostly because I haven't figured out how to implement MathJax on my Github pages), we describe the fucntions.

1. Lucene uses a variant of TF-IDF to find related documents
2. No math, just aggregation.
3. Answer representation F(p, q_i) comes from a neural representation. Answer distance is calculated with cosine similarity of the average word vector of answer a_j.
4. Expected utility depends on the probability of answer candidate a_j being the answer to question q_i and the utility value of adding the information. We model the probability P(a_j | p, q_i) as a negative exponential of the distance between F(p, q_i) and a-hat_j or the average word vector. The utility function is defined as the sigmoid of F_util where F_util is also a neural network.
5. We sort by expected utilites then.

The neural networks mentioned are one joint model using a question LSTM and an answer LSTM with a loss function based on the existing triples (post, question, and answer).

## Evaluation
Because of the tricky nature of the experimental setup, expert annotators are vital. Evaluations are conducted with expert annotations, against the original question, and excluding the original question.

## Next steps
For my person research, it might be useful to think about health question and answering and how to automate a health knowledge graph. Isn't this classification model another type of HKG?

0 comments on commit c7da156

Please sign in to comment.