Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Transferred URLs to .name. Slight CSS tweaks for comment counts. New …
…posts
- Loading branch information
Mark Reid
committed
Jan 20, 2009
1 parent
c18f97c
commit c1c7d01
Showing
8 changed files
with
225 additions
and
568 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Original file line | Diff line number | Diff line change |
---|---|---|---|
@@ -0,0 +1,78 @@ | |||
require 'rubygems' | |||
require 'rest_client' | |||
require 'json' | |||
|
|||
DISQUS_BASE = 'http://disqus.com/api/' | |||
DISQUS = RestClient::Resource.new DISQUS_BASE | |||
|
|||
SOURCE_URL = 'http://mark.reid.dev/iem/' | |||
TARGET_URL = 'http://mark.reid.name/iem/' | |||
|
|||
THREADS = { | |||
10211725 => 'http://mark.reid.name/iem/behold-jensens-inequality.html', | |||
10211748 => 'http://mark.reid.name/iem/feed-bag-a-simple-rss-archiver.html', | |||
10211737 => 'http://mark.reid.name/iem/visualising-reading.html', | |||
10211738 => 'http://mark.reid.name/iem/snuck-flied-and-wedded.html', | |||
10211739 => 'http://mark.reid.name/iem/super-crunchers.html', | |||
10211728 => 'http://mark.reid.name/iem/colt-2008-highlights.html', | |||
10211784 => 'http://mark.reid.name/iem/staying-organised-with-citeulike-and-bibdesk.html', | |||
10211740 => 'http://mark.reid.name/iem/constructive-and-classical-mathematics.html', | |||
10211730 => 'http://mark.reid.name/iem/the-earth-is-round.html', | |||
10211753 => 'http://mark.reid.name/iem/information-divergence-and-risk.html', | |||
10211742 => 'http://mark.reid.name/iem/ml-and-stats-people-on-twitter.html', | |||
10211720 => 'http://mark.reid.name/iem/a-meta-index-of-data-sets.html', | |||
10211710 => 'http://mark.reid.name/iem/introducing-inductio-ex-machina.html', | |||
10211755 => 'http://mark.reid.name/iem/artificial-ai.html', | |||
10211733 => 'http://mark.reid.name/iem/machine-learning-summer-school-2009.html', | |||
10211711 => 'http://mark.reid.name/iem/clarity-and-mathematics.html', | |||
10211713 => 'http://mark.reid.name/iem/a-cute-convexity-result.html', | |||
} | |||
|
|||
# Gets the first forum key associated with USER_KEY | |||
def forum_key | |||
forum_list = get('get_forum_list', :user_api_key => USER_KEY) | |||
forum_id = forum_list[0]['id'] | |||
get('get_forum_api_key', :user_api_key => USER_KEY, :forum_id => forum_id) | |||
end | |||
|
|||
# Encapsulates request, JSON parsing and error checking a REST call to Disqus | |||
def get(command, args) | |||
path = command + '?' + args.map {|k,v| "#{k}=#{v}"}.join('&') | |||
response = JSON.parse( DISQUS[path].get ) | |||
raise "Bad response to #{path}" unless response['succeeded'] | |||
response['message'] | |||
end | |||
|
|||
def threads | |||
thread_list = get('get_thread_list', :forum_api_key => FORUM_KEY) | |||
end | |||
|
|||
# Set the URL of the Disqus thread to the given value | |||
def update(thread_id, url) | |||
data = { | |||
:forum_api_key => FORUM_KEY, | |||
:thread_id => thread_id, | |||
:url => url | |||
} | |||
|
|||
puts "Updating thread #{thread_id} with URL = #{url}" | |||
response = JSON.parse( DISQUS['update_thread'].post(data) ) | |||
end | |||
|
|||
USER_KEY = ENV['DISQUS_KEY'] | |||
FORUM_KEY = forum_key | |||
|
|||
# Set the new URLs | |||
# threads.each do |t| | |||
# url = THREADS[t['id'].to_i] | |||
# next if url.nil? | |||
# update(t['id'], url) | |||
# puts "Set thread #{t['id']} to #{url}" | |||
# end | |||
|
|||
# Check everything worked | |||
threads.each do |t| | |||
url = THREADS[t['id'].to_i] | |||
next if url.nil? | |||
puts "Thread #{t['id']} has #{url}" | |||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
37 changes: 37 additions & 0 deletions
37
iem/_posts/2009-01-06-information-divergence-and-risk.markdown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Original file line | Diff line number | Diff line change |
---|---|---|---|
@@ -0,0 +1,37 @@ | |||
--- | |||
layout: post | |||
|
|||
title: Information, Divergence and Risk for Binary Experiments | |||
excerpt: A summary of a recent paper Bob and I posted to arXiv. | |||
location: Canberra, Australia | |||
|
|||
wordpress_url: http://conflate.net/inductio/?p=175 | |||
wordpress_id: 175 | |||
--- | |||
[Bob Williamson][bob] and I have finished a [report][] outlining what we have been looking at for the last year or so and uploaded it to the arXiv. Weighing in at 89 pages, it covers a lot of ground in an attempt to unify a number of different classes of measures for problems that can be expressed as binary experiments. That is, where instances are drawn from two distributions. This include binary classification, class probability estimation, and hypothesis testing. | |||
|
|||
We show that many of the usual measures of difficultly for these problems — divergence, information and Bayes risk — are very closely related. We also look at ways in which members of each class of measure can be expressed in terms of "primitive" members of those classes. In particular, Fisher-consistent losses (also known as proper scoring rules) can be written as weighted sums of cost-sensitive loss while all f-divergences can be written as weighted sums of something akin to cost-sensitive variational divergence. These "Choquet representations" make it easy to derive Pinsker-like bounds for arbitrary f-divergences (not just KL divergence) as well as results similar to those of Bartlett et al in their "[Convexity, classification and Risk Bounds][bartlett]". | |||
|
|||
It should be made clear that many of these results are not new. However, what I like about our approach is that almost all of the results in the paper stem from a two observations about convex functions: they are invariant under the Legendre-Fenchel bidual, and they have a second-order integral Taylor expansion with non-negative weights. | |||
|
|||
If any of this sounds interesting, you should grab the full paper from the [arXiv][report]. Here's the abstract: | |||
|
|||
> We unify f-divergences, Bregman divergences, surrogate loss bounds (regret bounds), | |||
> proper scoring rules, matching losses, cost curves, ROC-curves and information. We | |||
> do this by systematically studying integral and variational representations of these | |||
> objects and in so doing identify their primitives which all are related to cost-sensitive | |||
> binary classification. As well as clarifying relationships between generative and | |||
> discriminative views of learning, the new machinery leads to tight and more general | |||
> surrogate loss bounds and generalised Pinsker inequalities relating f-divergences to | |||
> variational divergence. The new viewpoint illuminates existing algorithms: it provides a | |||
> new derivation of Support Vector Machines in terms of divergences and relates | |||
> Maximum Mean Discrepancy to Fisher Linear Discriminants. It also suggests new | |||
> techniques for estimating f-divergences. | |||
Now that we have a good understanding of binary experiments the aim is to build on these results and extend this type of work to other forms of machine learning problems. High on the list are multi-category classification, ranking and regression problems. | |||
|
|||
Questions, criticism, suggestions and pointers to related work we may have missed are all welcome. | |||
|
|||
[bartlett]: http://www.citeulike.org/user/mdreid/article/510440 | |||
[report]: http://arxiv.org/abs/0901.0356 | |||
[bob]: http://axiom.anu.edu.au/~williams/ |
78 changes: 78 additions & 0 deletions
78
iem/_posts/2009-01-16-ml-and-stats-people-on-twitter.markdown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Original file line | Diff line number | Diff line change |
---|---|---|---|
@@ -0,0 +1,78 @@ | |||
--- | |||
layout: post | |||
|
|||
title: ML and Stats People on Twitter | |||
excerpt: Wherein I compile a list of interesting people who use Twitter to discuss machine learning and statistics. | |||
location: Canberra, Australia | |||
|
|||
wordpress_url: http://conflate.net/inductio/?p=171 | |||
wordpress_id: 171 | |||
--- | |||
I started using the social, "micro-blogging" service [Twitter][] in February this year simply because I had been seeing so much commentary about it — both good and bad. Since then, I've posted [800+ updates][me], amassed over 100 [followers][] and [follow][] nearly that many myself. | |||
|
|||
[twitter]: http://twitter.com/ | |||
[me]: http://twitter.com/mdreid/ | |||
[follow]: http://twitter.com/mdreid/friends | |||
[followers]: http://twitter.com/mdreid/followers | |||
|
|||
What has surprised me about Twitter is how many people I have found on there who are active, or at least interested, in machine learning and statistics. The day-to-day discussions, questions, advice and pointers I've got via Twitter have been illuminating and fun. | |||
|
|||
In an effort to get to know some of these people a bit better I followed the links they provided in their respective profiles to see what they had to say about themselves. The descriptions below are based only on those links as I don't find Google-stalking very friendly. | |||
|
|||
So, in no particular order, here they are: | |||
|
|||
Students | |||
---------- | |||
* [Tim Danford](http://twitter.com/arthegall) | |||
A computer science [Ph.D. student at MIT](http://people.csail.mit.edu/tdanford/) | |||
|
|||
* [Mark James Adams](http://twitter.com/mja) | |||
"[I am a student of quantitative genetics and a temperamental psychologist](http://affinity.raysend.com/record/about/author)" | |||
|
|||
* <a href="http://twitter.com/dwf" rel="nofollow">Dave Warde-Farley</a> | |||
[Computer science Masters student at Toronto](http://www.cs.toronto.edu/~dwf/) working in machine learning | |||
|
|||
* [Amir massoud Farahmand](http://twitter.com/SoloGen) | |||
Ph.D. student looking at manifold learning (amongst other things) at the [University of Alberta](http://www.cs.ualberta.ca/~amir/). Runs the blog [thesilog](http://thesilog.sologen.net/). | |||
|
|||
* [Markus Weimer](http://twitter.com/markusweimer) | |||
Graduate student working on "[applications of machine learning to eLearning](http://weimo.de/about)". Also runs a [blog](http://weimo.de/) | |||
|
|||
* [Ryan Rosario](http://twitter.com/DataJunkie) | |||
Statistics and computer science graduate student. | |||
|
|||
* [A.M. Santos](http://twitter.com/ansate) | |||
Maths and statistics graduate student. | |||
|
|||
Non-students | |||
--------------- | |||
* [Neal Richter](http://twitter.com/nealrichter) | |||
Neal Richter - Runs the blog [aicoder](http://aicoder.blogspot.com/) | |||
|
|||
* [Brendan O'Connor](http://twitter.com/brendan642) | |||
[Research assistant](http://anyall.org/) in NLP at Stanford and consultant at [Dolores Labs](http://blog.doloreslabs.com/) | |||
|
|||
* [Daniel Tunkelang](http://twitter.com/dtunkelang) | |||
Chief scientist at the information retrieval company Endeca and owner of the blog [The Noisy Channel](http://thenoisychannel.com/) | |||
|
|||
* [Jason Adams](http://twitter.com/ealdent) | |||
Computational linguist work on sentiment analysis. Runs the blog [The Mendicant Bug](http://mendicantbug.com/). | |||
|
|||
* [Mikio Braun](http://twitter.com/mikiobraun) | |||
Post-doc at Technische Universität Berlin and a machine learning blogger at [Marginally Interesting](http://mikiobraun.blogspot.com/). | |||
|
|||
* [Daniel Lemire](http://twitter.com/lemire) | |||
Professor of computer science at the University of Quebec at Montreal and [blogger](http://www.daniel-lemire.com/blog/). | |||
|
|||
* [Jason H. Moore](http://twitter.com/moorejh) | |||
Professor of Genetics, Director of Bioinformatics at Dartmouth Medical School. Works on the [Multi-factor Dimensionality Reduction](http://sourceforge.net/projects/mdr/) software MDR and blogs at [Epistasis](http://compgen.blogspot.com/). | |||
|
|||
* [Pete Skomoroch](http://twitter.com/peteskomoroch) | |||
Director of analytics at Juice Analytics and [Data Wrangling](http://www.datawrangling.com/) blogger. | |||
|
|||
* [Alex Smola](http://twitter.com/smolix) | |||
Principal Researcher at Yahoo! Research and ex-colleague of mine at [NICTA](http://nicta.com.au) and the [ANU](http://anu.edu.au) a.k.a. "Mr. Kernel" | |||
|
|||
If you are not on this list but think you should be, leave a comment below and I'll update this list. Conversely, if I've put you on this list and you don't wish to be associated with these sorts of people, leave a comment or send me an email and I'll remove you. | |||
|
|||
Of course, feel free to follow [me][] if you'd like to keep up with what I'm doing. |
Oops, something went wrong.