-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unclear reference to lambda baseline #2
Comments
Let me take a look tmrw.
On Sun, Jan 7, 2018 at 4:17 AM Kenneth Benoit ***@***.***> wrote:
In https://github.com/kbenoit/sophistication/blob/master/R/predict.R#L138,
we refer to reference, but this was from older code before we changed the
arguments to reference_top and reference_bottom.
@ArthurSpirling <https://github.com/arthurspirling> @kmunger
<https://github.com/kmunger> can you recall which one this is supposed to
be?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASQTngGrgww51yYH3Y30YBtfns5ZVvApks5tIIujgaJpZM4RVlpN>
.
--
Via iPhone, apologies for terseness
|
Actually, wasn't this user defined? That is, it was up to the user to
decide what they wanted to compare a given lambda to -- for example, one
could specify an interest in comparing a particular snippet to e.g one by
Eisenhower (assuming that was already in the data) as a reference?
---
If I have that wrong, then I'm p certain it was intended to default to the
fifth grade text, which is the hardcoded -2.17... figure.
Can you clarify what reference_bottom is? (I assume it's the hardest
snippet in the data, or sth)
AS
…On Sun, Jan 7, 2018 at 4:17 AM, Kenneth Benoit ***@***.***> wrote:
In https://github.com/kbenoit/sophistication/blob/master/R/predict.R#L138,
we refer to reference, but this was from older code before we changed the
arguments to reference_top and reference_bottom.
@ArthurSpirling <https://github.com/arthurspirling> @kmunger
<https://github.com/kmunger> can you recall which one this is supposed to
be?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASQTngGrgww51yYH3Y30YBtfns5ZVvApks5tIIujgaJpZM4RVlpN>
.
--
Deputy Director, Center for Data Science <http://cds.nyu.edu/>
Director of Graduate Studies, MSDS
<http://cds.nyu.edu/academics/ms-in-data-science/>
Associate Professor of Politics and Data Science
New York University
http://www.nyu.edu/projects/spirling/
|
That's right-- the "reference" call is from older code. The current code has the hardcoded top and bottom values that derived by simply sorting the extreme lambdas on the SOTU. When I did this, I left the older code in there and just added an extra column with the new, hardcorded approach. The solution is to just get rid of the old code, which I can do easily. But the longer-term question is whether we should allow this to be user defined? Should we use the SOTU values as defaults and allow users to specify if they want to change them? |
Yes, I think that's what we want: default to the present values, but allow
users to specify something other than defaults should they want.
…On Fri, Jan 12, 2018 at 3:41 PM, Kevin Munger ***@***.***> wrote:
That's right-- the "reference" call is from older code. The current code
has the hardcoded top and bottom values that derived by simply sorting the
extreme lambdas on the SOTU. When I did this, I left the older code in
there and just added an extra column with the new, hardcorded approach.
The solution is to just get rid of the old code, which I can do easily.
But the longer-term question is whether we should allow this to be user
defined? Should we use the SOTU values as defaults and allow users to
specify if they want to change them?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASQTnmYqQ-KTz9AXc_j5vA_aQ7hK3kwlks5tJ30ogaJpZM4RVlpN>
.
--
Deputy Director, Center for Data Science <http://cds.nyu.edu/>
Director of Graduate Studies, MSDS
<http://cds.nyu.edu/academics/ms-in-data-science/>
Associate Professor of Politics and Data Science
New York University
http://www.nyu.edu/projects/spirling/
|
Ok, on that. And I've realized what the problem is: we've confused the "reference" (used to compute the probability scores) with the endpoints for rescaling. These don't necessarily have to come from the same source, but all three do need to be input as defaults (or user provided). I'm currently rewriting the documentation to reflect what we're doing: The default value for "reference" is the lambda across the fifth grade texts--our "prob" output thus calculates the probablity that a text is easier than these. The default values for "reference_top" and "reference_bottom" come from the extremes of the SOTU corpus, and are used to rescale texts on the 0-100 scale. Are these the defaults we want? |
don't we use the fifth grade texts as 100? That's what the paper implies,
no?
…On Fri, Jan 12, 2018 at 8:16 PM, Kevin Munger ***@***.***> wrote:
Ok, on that.
And I've realized what the problem is: we've confused the "reference"
(used to compute the probability scores) with the endpoints for rescaling.
These don't necessarily have to come from the same source, but all three do
need to be input as defaults (or user provided).
I'm currently rewriting the documentation to reflect what we're doing:
The default value for "reference" is the lambda across the fifth grade
texts--our "prob" output thus calculates the probablity that a text is
easier than these.
The default values for "reference_top" and "reference_bottom" come from
the extremes of the SOTU corpus, and are used to rescale texts on the 0-100
scale.
Are these the defaults we want?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASQTnjOXJ1mcX-nw76pdXoqeVgZ70kn-ks5tJ72ngaJpZM4RVlpN>
.
--
Deputy Director, Center for Data Science <http://cds.nyu.edu/>
Director of Graduate Studies, MSDS
<http://cds.nyu.edu/academics/ms-in-data-science/>
Associate Professor of Politics and Data Science
New York University
http://www.nyu.edu/projects/spirling/
|
Reading back over the documentation, yes, that seems to be the case--and I
just checked numbers, which do match up.
So, are *these *the defaults we want: the baseline probablity comparison is
the same as the 100 on the scaled version?
On Fri, Jan 12, 2018 at 5:10 PM, Arthur Spirling <notifications@github.com>
wrote:
… don't we use the fifth grade texts as 100? That's what the paper implies,
no?
On Fri, Jan 12, 2018 at 8:16 PM, Kevin Munger ***@***.***>
wrote:
> Ok, on that.
>
> And I've realized what the problem is: we've confused the "reference"
> (used to compute the probability scores) with the endpoints for
rescaling.
> These don't necessarily have to come from the same source, but all three
do
> need to be input as defaults (or user provided).
>
> I'm currently rewriting the documentation to reflect what we're doing:
>
> The default value for "reference" is the lambda across the fifth grade
> texts--our "prob" output thus calculates the probablity that a text is
> easier than these.
>
> The default values for "reference_top" and "reference_bottom" come from
> the extremes of the SOTU corpus, and are used to rescale texts on the
0-100
> scale.
>
> Are these the defaults we want?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#2#
issuecomment-357331107>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ASQTnjOXJ1mcX-
nw76pdXoqeVgZ70kn-ks5tJ72ngaJpZM4RVlpN>
> .
>
--
Deputy Director, Center for Data Science <http://cds.nyu.edu/>
Director of Graduate Studies, MSDS
<http://cds.nyu.edu/academics/ms-in-data-science/>
Associate Professor of Politics and Data Science
New York University
http://www.nyu.edu/projects/spirling/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGQLe2MkNzRZLcVvvOjPCBnbOBCV8YpAks5tJ9hugaJpZM4RVlpN>
.
|
that's what makes sense to me, yes: 100 is the 5th grade text, 0 is the
hardest SOTU text (which is at college level, by FRE standards). Those
being default ends for the 0-100 make sense, and fifth grade texts being
the default comparison for the probability calculations.
On Fri, Jan 12, 2018 at 10:38 PM, Kevin Munger <notifications@github.com>
wrote:
… Reading back over the documentation, yes, that seems to be the case--and I
just checked numbers, which do match up.
So, are *these *the defaults we want: the baseline probablity comparison is
the same as the 100 on the scaled version?
On Fri, Jan 12, 2018 at 5:10 PM, Arthur Spirling ***@***.***
>
wrote:
> don't we use the fifth grade texts as 100? That's what the paper implies,
> no?
>
> On Fri, Jan 12, 2018 at 8:16 PM, Kevin Munger ***@***.***>
> wrote:
>
> > Ok, on that.
> >
> > And I've realized what the problem is: we've confused the "reference"
> > (used to compute the probability scores) with the endpoints for
> rescaling.
> > These don't necessarily have to come from the same source, but all
three
> do
> > need to be input as defaults (or user provided).
> >
> > I'm currently rewriting the documentation to reflect what we're doing:
> >
> > The default value for "reference" is the lambda across the fifth grade
> > texts--our "prob" output thus calculates the probablity that a text is
> > easier than these.
> >
> > The default values for "reference_top" and "reference_bottom" come from
> > the extremes of the SOTU corpus, and are used to rescale texts on the
> 0-100
> > scale.
> >
> > Are these the defaults we want?
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <#2#
> issuecomment-357331107>,
> > or mute the thread
> > <https://github.com/notifications/unsubscribe-auth/ASQTnjOXJ1mcX-
> nw76pdXoqeVgZ70kn-ks5tJ72ngaJpZM4RVlpN>
> > .
> >
>
>
>
> --
> Deputy Director, Center for Data Science <http://cds.nyu.edu/>
> Director of Graduate Studies, MSDS
> <http://cds.nyu.edu/academics/ms-in-data-science/>
> Associate Professor of Politics and Data Science
> New York University
> http://www.nyu.edu/projects/spirling/
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#2#
issuecomment-357368548>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/
AGQLe2MkNzRZLcVvvOjPCBnbOBCV8YpAks5tJ9hugaJpZM4RVlpN>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASQTnpmoi8NBfARSa5wuw6LcHYMeHBwVks5tJ978gaJpZM4RVlpN>
.
--
Deputy Director, Center for Data Science <http://cds.nyu.edu/>
Director of Graduate Studies, MSDS
<http://cds.nyu.edu/academics/ms-in-data-science/>
Associate Professor of Politics and Data Science
New York University
http://www.nyu.edu/projects/spirling/
|
Ok, made these changes. |
Very good - so this will now appear on CRAN as a package?
best
AS
…On Mon, Jan 15, 2018 at 4:36 PM, Kenneth Benoit ***@***.***> wrote:
Thanks, I think that corrected it. @kmunger <https://github.com/kmunger>
with e7ac504
<e7ac504>
the package now passes the CRAN check - except for the too-large data
objects.
Note that I removed the article_manuscript and manuscript_chapter
folders, since these should only be in the *sophistication-papers*
repository.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASQTnjWfZYvovEtDblXfdKxFmUepfjD4ks5tK8TxgaJpZM4RVlpN>
.
--
Deputy Director, Center for Data Science <http://cds.nyu.edu/>
Director of Graduate Studies, MSDS
<http://cds.nyu.edu/academics/ms-in-data-science/>
Associate Professor of Politics and Data Science
New York University
http://www.nyu.edu/projects/spirling/
|
No, we would need to submit it, but first cut out the large data objects. There is a 5MB size limit on CRAN packages and we are way over that (26.1 Mb). Most of those were for replicating our analysis however, and that could be removed from the package. There are also some documentation and robustness (testing!) issues that need to be addressed before it's released as a general tool. I've spoken to @kmunger about this and am happy to guide work in this area. |
Thanks for the clarification - that makes sense.
…On Tue, Jan 16, 2018 at 10:11 AM, Kenneth Benoit ***@***.***> wrote:
No, we would need to submit it, but first cut out the large data objects.
There is a 5MB size limit on CRAN packages and we are way over that (26.1
Mb). Most of those were for replicating our analysis however, and that
could be removed from the package.
There are also some documentation and robustness (testing!) issues that
need to be addressed before it's released as a general tool. I've spoken to
@kmunger <https://github.com/kmunger> about this and am happy to guide
work in this area.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASQTnqc0d-dTUDloRIFnd0V3awsS6MUkks5tLLwhgaJpZM4RVlpN>
.
--
Deputy Director, Center for Data Science <http://cds.nyu.edu/>
Director of Graduate Studies, MSDS
<http://cds.nyu.edu/academics/ms-in-data-science/>
Associate Professor of Politics and Data Science
New York University
http://www.nyu.edu/projects/spirling/
|
Indeed, I'm happy to start working on this, and @ken any guidance would be
appreciated.
I'll go ahead and start removing the large data objects, to get it down to
size.
On Tue, Jan 16, 2018 at 10:45 AM, Arthur Spirling <notifications@github.com>
wrote:
… Thanks for the clarification - that makes sense.
On Tue, Jan 16, 2018 at 10:11 AM, Kenneth Benoit ***@***.***
>
wrote:
> No, we would need to submit it, but first cut out the large data objects.
> There is a 5MB size limit on CRAN packages and we are way over that (26.1
> Mb). Most of those were for replicating our analysis however, and that
> could be removed from the package.
>
> There are also some documentation and robustness (testing!) issues that
> need to be addressed before it's released as a general tool. I've spoken
to
> @kmunger <https://github.com/kmunger> about this and am happy to guide
> work in this area.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#2#
issuecomment-357990342>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ASQTnqc0d-
dTUDloRIFnd0V3awsS6MUkks5tLLwhgaJpZM4RVlpN>
> .
>
--
Deputy Director, Center for Data Science <http://cds.nyu.edu/>
Director of Graduate Studies, MSDS
<http://cds.nyu.edu/academics/ms-in-data-science/>
Associate Professor of Politics and Data Science
New York University
http://www.nyu.edu/projects/spirling/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGQLe0HdfWqmXK7jqjuTA_BAZXpUOYkUks5tLMQggaJpZM4RVlpN>
.
|
Best would be to create replication materials needed for our chapter and paper, removing the larger objects from the package as needed, but using the package functions to get the results. Each time you make a data object local, you can remove it from the package. |
In https://github.com/kbenoit/sophistication/blob/master/R/predict.R#L138, we refer to
reference
, but this was from older code before we changed the arguments toreference_top
andreference_bottom
.@ArthurSpirling @kmunger can you recall which one this is supposed to be?
The text was updated successfully, but these errors were encountered: