Skip to content
This repository has been archived by the owner on Jun 30, 2018. It is now read-only.

Bulk import with Bonsai.io/Heroku and Tire #327

Closed
toddwschneider opened this issue Apr 29, 2012 · 15 comments
Closed

Bulk import with Bonsai.io/Heroku and Tire #327

toddwschneider opened this issue Apr 29, 2012 · 15 comments

Comments

@toddwschneider
Copy link

I have setup Bonsai on my Heroku app following this gist: https://gist.github.com/2041121, and everything seems to work except for the bulk import of existing posts. Any new post gets put into the ES index automatically, but when I try to import old posts, either directly from the Heroku console, e.g.:

posts = Post.limit(100)
Tire.index BONSAI_INDEX_NAME do
  import posts
end

or by running heroku run rake environment tire:import CLASS=Post FORCE=true, I get this error:

[ERROR] Too many exceptions occured, giving up. The HTTP response was: 401 > {"error": "Not authorized: Some endpoints are admin-only, ask support@onemorecloud.com."}

One strange feature is that some of my old posts get imported, but not all, and I can't see any reason why some fail while others succeed. Has anyone else had any luck doing a bulk import into Bonsai? Many thanks for any help!

cc @nz

@danthompson
Copy link

I too have this issue, and request some direction.

@nz
Copy link
Contributor

nz commented Apr 30, 2012

What Tire needs (and as @karmi and I have discussed) is a way to specify a global default index. When present, a model should use that as its base, otherwise models should create their own indices and use the cluster as their base.

From the user's end, this behavior should be invoked with something like this:

Tire.configure do
  index_url '…'
end

From there, it's a matter of tracing through the method calls that either find/create a per-model index, or talk to the cluster directly, and have them check for the presence of a globally defined index.

Unfortunately, neither @karmi nor I have had the time to really dig into this yet. I'm guessing a contribution would be welcome? I'd certainly be happy to help test or review any pull requests, just cc me.

@karmi
Copy link
Owner

karmi commented Apr 30, 2012

More specifically, Tire needs to separate models by type by default, and use a common index for them, let's say "application wide". Of course, users must be able to specify separate indices for separate models, when needed (eg. when one index would hold millions of data and the other mere thousands, etc).

I'll definitely have a look into this, but probably not during May.

@nz, I think there are two questions related to Bonsai.io at the moment:

  1. Why some data were imported, but not all?
  2. How can users work with Bonsai.io and Tire currently? After all, setting MyModel.index_name to a specific, user-based value should work?

@nz
Copy link
Contributor

nz commented Apr 30, 2012

Why some data were imported, but not all?

That I have no answer to, sorry @toddwschneider. I promise all the requests to /_bulk failed with a 401 — so maybe some one-off single-model indexing calls slipped through?

How can users work with Bonsai.io and Tire currently? After all, setting MyModel.index_name to a specific, user-based value should work?

It looks like Index#bulk_store ignores the index name (index.rb L123). Pull request incoming.

@karmi
Copy link
Owner

karmi commented May 1, 2012

So, the issue here is that Index#bulk_store connects to /_bulk, not to <MY INDEX>/_bulk, correct? Therefore, bulk store / importing with current Tire and Bonsai.io doesn't work.

And, basically, setting MyModel.index_name is the way to make Tire work with Bonsai.io in general?

(I do like the implementation of Index#url in #327, it's much cleaner, will review it. I wouldn't mix it with such a big overall code change, but I can split & edit those commits myself.)

@karmi karmi closed this as completed in e3fbd2c May 3, 2012
@karmi
Copy link
Owner

karmi commented May 3, 2012

@toddwschneider Hi, should be resolved on current master and the 0.4.1 version.

@karmi
Copy link
Owner

karmi commented May 6, 2012

@toddwschneider Could you confirm the current master and release work with Bonsai?

@caged
Copy link

caged commented May 6, 2012

I'm running 0.4.2 and haven't been able to get this to work yet, albeit it's complaining about a missing index this time.

  Tire.configure do
    url BONSAI_URL
    logger STDERR
  end

  articles = [
    { :id => '1', :type => 'article', :title => 'one',   :tags => ['ruby'],           :published_on => '2011-01-01' },
    { :id => '2', :type => 'article', :title => 'two',   :tags => ['ruby', 'python'], :published_on => '2011-01-02' },
    { :id => '3', :type => 'article', :title => 'three', :tags => ['java'],           :published_on => '2011-01-02' },
    { :id => '4', :type => 'article', :title => 'four',  :tags => ['ruby', 'php'],    :published_on => '2011-01-03' }
  ]

  Tire.index 'articles' do
    delete
    create

    import articles
  end
#
curl -X POST BONSAI_URL/articles -d '{}'

# 2012-05-06 09:40:55:975 [201]

[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (1)...
[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (2)...
[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (3)...
[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (4)...
[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (5)...
[ERROR] Too many exceptions occured, giving up. The HTTP response was: 404 > {"error":"IndexMissingException[[articles] missing]","status":404}
# 2012-05-06 09:40:57:727 [BULK] ("articles")
#
curl -X POST BONSAI_URL/articles/_bulk -d '{... data omitted ...}'

# 2012-05-06 09:40:57:727 [201]

@nz
Copy link
Contributor

nz commented May 6, 2012

I'll try to get some more thorough manual testing in this week.

Justin, can you confirm that the full BONSAI_URL including the index name is in that curl command?

Maybe the bulk import method is setting _index incorrectly? Should be easy to verify.

Nick Zadrozny

On Sunday, May 6, 2012 at 9:47, Justin Palmer wrote:

I'm running 0.4.2 and haven't been able to get this to work yet, albeit it's complaining about a missing index this time.

Tire.configure do
url BONSAI_URL
logger STDERR
end

articles = [
{ :id => '1', :type => 'article', :title => 'one', :tags => ['ruby'], :published_on => '2011-01-01' },
{ :id => '2', :type => 'article', :title => 'two', :tags => ['ruby', 'python'], :published_on => '2011-01-02' },
{ :id => '3', :type => 'article', :title => 'three', :tags => ['java'], :published_on => '2011-01-02' },
{ :id => '4', :type => 'article', :title => 'four', :tags => ['ruby', 'php'], :published_on => '2011-01-03' }
]

Tire.index 'articles' do
delete
create

import articles
end
#
curl -X POST BONSAI_URL/articles -d '{}'

# 2012-05-06 09:40:55:975 [201]

[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (1)...
[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (2)...
[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (3)...
[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (4)...
[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (5)...
[ERROR] Too many exceptions occured, giving up. The HTTP response was: 404 > {"error":"IndexMissingException[[articles] missing]","status":404}
# 2012-05-06 09:40:57:727 [BULK] ("articles")
#
curl -X POST BONSAI_URL/articles/_bulk -d '{... data omitted ...}'

# 2012-05-06 09:40:57:727 [201]

Reply to this email directly or view it on GitHub:
#327 (comment)

@caged
Copy link

caged commented May 6, 2012

Yeah, it has heroku's generated index.

Sent from my iPhone

On May 6, 2012, at 12:02 PM, Nick Zadroznyreply@reply.github.com wrote:

I'll try to get some more thorough manual testing in this week.

Justin, can you confirm that the full BONSAI_URL including the index name is in that curl command?

Maybe the bulk import method is setting _index incorrectly? Should be easy to verify.

Nick Zadrozny

On Sunday, May 6, 2012 at 9:47, Justin Palmer wrote:

I'm running 0.4.2 and haven't been able to get this to work yet, albeit it's complaining about a missing index this time.

Tire.configure do
url BONSAI_URL
logger STDERR
end

articles = [
{ :id => '1', :type => 'article', :title => 'one', :tags => ['ruby'], :published_on => '2011-01-01' },
{ :id => '2', :type => 'article', :title => 'two', :tags => ['ruby', 'python'], :published_on => '2011-01-02' },
{ :id => '3', :type => 'article', :title => 'three', :tags => ['java'], :published_on => '2011-01-02' },
{ :id => '4', :type => 'article', :title => 'four', :tags => ['ruby', 'php'], :published_on => '2011-01-03' }
]

Tire.index 'articles' do
delete
create

import articles
end
#
curl -X POST BONSAI_URL/articles -d '{}'

# 2012-05-06 09:40:55:975 [201]

[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (1)...
[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (2)...
[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (3)...
[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (4)...
[ERROR] 404 > {"error":"IndexMissingException[[articles] missing]","status":404}, retrying (5)...
[ERROR] Too many exceptions occured, giving up. The HTTP response was: 404 > {"error":"IndexMissingException[[articles] missing]","status":404}
# 2012-05-06 09:40:57:727 [BULK] ("articles")
#
curl -X POST BONSAI_URL/articles/_bulk -d '{... data omitted ...}'

# 2012-05-06 09:40:57:727 [201]

Reply to this email directly or view it on GitHub:
#327 (comment)


Reply to this email directly or view it on GitHub:
#327 (comment)

@toddwschneider
Copy link
Author

@karmi I just gave it a try with 0.4.2 and it works! Thanks for the help!

@karmi
Copy link
Owner

karmi commented May 7, 2012

@toddwschneider Great!
@caged I don't think articles index is available at Bonsai, AFAIK the index name is tied to your account. You should probably leave the index empty. Please see this gist by @nz for more info: https://gist.github.com/2041121.

@nz
Copy link
Contributor

nz commented May 7, 2012

@caged and @karmi, I think I see what's happening here.

@caged, your example code is probably better written like this…

Tire.configure do
  url "http://index.bonsai.io/"
  logger STDERR
end

articles = [
  { :id => '1', :type => 'article', :title => 'one',   :tags => ['ruby'],           :published_on => '2011-01-01' },
  { :id => '2', :type => 'article', :title => 'two',   :tags => ['ruby', 'python'], :published_on => '2011-01-02' },
  { :id => '3', :type => 'article', :title => 'three', :tags => ['java'],           :published_on => '2011-01-02' },
  { :id => '4', :type => 'article', :title => 'four',  :tags => ['ruby', 'php'],    :published_on => '2011-01-03' }
]

Tire.index 'pvz4bww6kn87gz50s6h43tb' do
  import articles
end

I ran this against an index of mine and it worked. Let me break down a few points about how Tire and Bonsai are interacting here…

  1. Tire.configure only takes a cluster-level URL right now. However, it's easy to use an index URL there and be only partially successful, since the URL is just concatenated in where it's needed.
  2. Tire.index takes an index name for its parameter. This is where you supplied articles but more appropriately would want to give your Bonsai index name.
  3. The index-level delete and create operations aren't available on Bonsai yet (working on that), hence I've omitted them from my example code.

So it looks like my hypothesis was partially correct: an incorrect index name was getting passed in for the _index key in the bulk import payload. But that was correct behavior based on the example code — after reworking it I'm not seeing any bugs here.

Maybe I'll tackle that application-single-default-index thing this week…

@caged
Copy link

caged commented May 7, 2012

@nz - awesome, that worked. I had assumed the URL was structured /BONSAI_KEY/INDEX_NAME, bu the key is the index name. Thanks for the thorough explanation. Feel free to close this.

@jch
Copy link

jch commented Oct 22, 2012

In case anyone is still running into this issue, if you upgrade to a newer version of Bonsai (> v530), they scope accounts by subdomain now instead of path. This works with tire's assumption about url paths. https://devcenter.heroku.com/articles/bonsai

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants