Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support solr 4.0 #450

Closed
jcoyne opened this issue Aug 9, 2012 · 10 comments
Closed

Support solr 4.0 #450

jcoyne opened this issue Aug 9, 2012 · 10 comments
Assignees
Labels
Milestone

Comments

@jcoyne
Copy link
Member

jcoyne commented Aug 9, 2012

No description provided.

@cbeer
Copy link
Member

cbeer commented Nov 2, 2012

I think Blacklight works fine with Solr 4 (or, we're doing it, and I don't think we have any special code to my knowledge). Is this actually a ticket to update blacklight-jetty to solr 4?

@jcoyne
Copy link
Member Author

jcoyne commented Nov 2, 2012

I think the real issue is having configs that work with solr 4. It's my
understanding that the config format has changed a bit. Perhaps having a
branch of blacklight-jetty with solr4 is the best way to demonstrate/test
this.

-Justin

On Fri, Nov 2, 2012 at 2:06 PM, Chris Beer notifications@github.com wrote:

I think Blacklight works fine with Solr 4 (or, we're doing it, and I don't
think we have any special code to my knowledge). Is this actually a ticket
to update blacklight-jetty to solr 4?


Reply to this email directly or view it on GitHubhttps://github.com//issues/450#issuecomment-10026230.

@pristinenoise
Copy link

I also think that switching to solr 4 as a default is a good example going
forward.

On Nov 2, 2012, at 5:25 PM, Justin Coyne notifications@github.com wrote:

I think the real issue is having configs that work with solr 4. It's my
understanding that the config format has changed a bit. Perhaps having a
branch of blacklight-jetty with solr4 is the best way to demonstrate/test
this.

-Justin

On Fri, Nov 2, 2012 at 2:06 PM, Chris Beer notifications@github.com
wrote:

I think Blacklight works fine with Solr 4 (or, we're doing it, and I
don't
think we have any special code to my knowledge). Is this actually a
ticket
to update blacklight-jetty to solr 4?


Reply to this email directly or view it on GitHub<
https://github.com/projectblacklight/blacklight/issues/450#issuecomment-10026230>.


Reply to this email directly or view it on
GitHubhttps://github.com//issues/450#issuecomment-10030809.

@ghost ghost assigned cbeer Nov 5, 2012
@cbeer
Copy link
Member

cbeer commented Nov 13, 2012

I updated blacklight-jetty to have a Solr 4.0.0 flavor:

projectblacklight/blacklight-jetty@97707a2

Unfortunately, SolrMarc doesn't work with Solr 4 (or, doesn't fully work). It definitely doesn't work in the SolrMarc embedded mode. If you use the non-embedded mode, you get this (@ndushay reported this error to the SolrMarc list.)

$ rake solr:marc:index  MARC_FILE=../../test_support/data/test_data.utf8.mrc 
WARNING: Cucumber-rails required outside of env.rb.  The rest of loading is being defered until env.rb is called.
  To avoid this warning, move 'gem cucumber-rails' under only group :test in your Gemfile
java -Xmx512m  -Dsolr.hosturl=http://127.0.0.1:8983/solr  -jar /Volumes/TempStorage/Projects/blacklight/lib/SolrMarc.jar /Volumes/TempStorage/Projects/blacklight/tmp/test_app/config/SolrMarc/config.properties ../../test_support/data/test_data.utf8.mrc

 INFO [main] (MarcImporter.java:816) - Starting SolrMarc indexing.
 INFO [main] (Utils.java:191) - Opening file: /Volumes/TempStorage/Projects/blacklight/tmp/test_app/config/SolrMarc/config.properties
 INFO [main] (MarcImporter.java:749) -  Connecting to remote Solr server at URL http://127.0.0.1:8983/solr/update
java.io.IOException: Server returned HTTP response code: 500 for URL: http://127.0.0.1:8983/solr/admin/registry.jsp
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    at java.net.URL.openStream(URL.java:1010)
    at org.solrmarc.solr.SolrCoreLoader.loadRemoteSolrServer(SolrCoreLoader.java:351)
    at org.solrmarc.marc.MarcImporter.getSolrServerProxy(MarcImporter.java:750)
    at org.solrmarc.marc.MarcImporter.loadLocalProperties(MarcImporter.java:193)
    at org.solrmarc.marc.MarcHandler.loadProperties(MarcHandler.java:172)
    at org.solrmarc.marc.MarcHandler.init(MarcHandler.java:118)
    at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:822)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at com.simontuffs.onejar.Boot.run(Boot.java:334)
    at com.simontuffs.onejar.Boot.main(Boot.java:170)

@cbeer cbeer closed this as completed in f6db584 Nov 13, 2012
@jcoyne
Copy link
Member Author

jcoyne commented Nov 13, 2012

Is it okay to switch blacklight to Solr 4 if SolrMarc doesn't work with Solr 4 yet?

@jrochkind
Copy link
Member

Forgive me if I'm saying something you already know, but...

SolrMarc uses the SolrJ Java library from the Solr project to actually talk to Solr. Solr distributes different versions of SolrJ for different versions of Solr -- they are not wire-compatible, nor are the index files byte-compatible (SolrJ can be used both for talking to Solr over the wire, and for directly writing Solr-compatible index files; and SolrMarc actually does both things depending on how it's been configured.)

Normaly you'd have a SolrJ jar file, and make sure it's in your Java class path when you run whatever Java code you are running taht's going to use it to talk to Solr, and make sure it's the right SolrJ for the Solr version you are talking to. SolrMarc distributed as a single .jar file bundles the SolrJ jar file inside it. Because it's too much of a pain to make users download SolrJ themselves and worry about Java class paths, when we just want to give them an indexer that works.

That was when Solr 1.4 was all there was. Then multiple versions of Solr were out there, and SolrMarc needed to work with all of them. What to do? There are a variety of hypothetical options, but the one chosen by Bob was to do some rather tricky stuff involving Java reflection, to somehow bundle a SolrMarc.jar that can talk to multiple versions of Solr using multiple versions of SolrJ (mutiple versions bundled iwth SolrMarc.jar... I think?), based on configuration variables. I don't entirey understand the details.

So, long story short, I think the first problem is that the SolrMarc you are using (the one bundled iwth Blacklight) is not including/using a proper version of SolrJ to talk to Solr 4. So the first step would be making it do so, by interfacing with Bob's clever java reflection stuff, or whatever, I don't know. Blacklight 's indexing rake tasks will use a custom local version of SolrMarc.jar rather than the one it's bundled with, if you put it at your local app's ./config/SolrMarc/solrmarc.jar (I think that's the right path, there is such a path anyway).

It's possible even once you've done that, it still wouldn't work -- it depends on if SolrJ's Java API is backwards compatible in Solr 4 or not. But even if SolrJ's Java API is entirely backwards compatible, it still won't work to use a pre-Solr-4 version of SolrJ with a Solr 4. Which is probably what you're doing now.

SolrMarc is.... difficult for me to deal with. For many reasons. But in retrospect, I don't think Bob's complex custom java reflection layer was the right approach for supporting multiple versions of Solr/SolrJ. And probably nobody but Bob is capable of figuring out how to add support for Solr4 into the current architecture.


From: Chris Beer [notifications@github.com]
Sent: Monday, November 12, 2012 7:46 PM
To: projectblacklight/blacklight
Subject: Re: [blacklight] Support solr 4.0 (#450)

I updated blacklight-jetty to have a Solr 4.0.0 flavor:

projectblacklight/blacklight-jetty@97707a2projectblacklight/blacklight-jetty@97707a2

Unfortunately, SolrMarc doesn't work with Solr 4 (or, doesn't fully work). It definitely doesn't work in the SolrMarc embedded mode. If you use the non-embedded mode, you get this (@ndushayhttps://github.com/ndushay reported this error to the SolrMarc list.)

$ rake solr:marc:index MARC_FILE=../../test_support/data/test_data.utf8.mrc
WARNING: Cucumber-rails required outside of env.rb. The rest of loading is being defered until env.rb is called.
To avoid this warning, move 'gem cucumber-rails' under only group :test in your Gemfile
java -Xmx512m -Dsolr.hosturl=http://127.0.0.1:8983/solr -jar /Volumes/TempStorage/Projects/blacklight/lib/SolrMarc.jar /Volumes/TempStorage/Projects/blacklight/tmp/test_app/config/SolrMarc/config.properties ../../test_support/data/test_data.utf8.mrc

INFO main - Starting SolrMarc indexing.
INFO main - Opening file: /Volumes/TempStorage/Projects/blacklight/tmp/test_app/config/SolrMarc/config.properties
INFO main - Connecting to remote Solr server at URL http://127.0.0.1:8983/solr/update
java.io.IOException: Server returned HTTP response code: 500 for URL: http://127.0.0.1:8983/solr/admin/registry.jsp
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at java.net.URL.openStream(URL.java:1010)
at org.solrmarc.solr.SolrCoreLoader.loadRemoteSolrServer(SolrCoreLoader.java:351)
at org.solrmarc.marc.MarcImporter.getSolrServerProxy(MarcImporter.java:750)
at org.solrmarc.marc.MarcImporter.loadLocalProperties(MarcImporter.java:193)
at org.solrmarc.marc.MarcHandler.loadProperties(MarcHandler.java:172)
at org.solrmarc.marc.MarcHandler.init(MarcHandler.java:118)
at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:822)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.simontuffs.onejar.Boot.run(Boot.java:334)
at com.simontuffs.onejar.Boot.main(Boot.java:170)


Reply to this email directly or view it on GitHubhttps://github.com//issues/450#issuecomment-10311249.

@jrochkind
Copy link
Member

Is it okay to switch blacklight to Solr 4 if SolrMarc doesn't work with Solr 4 yet?

This is a complicated topic. There are some people that would like to make Blacklight have nothing to do with SolrMarc at all. And "switch blacklight to Solr4" could mean several things.

Let's unpack exactly what Blacklight's relationship to both Solr and SolrMarc are.

Blacklight itself actually doesn't care what version of Solr you are using. Nor does it care if you are using SolrMarc at all, really. Blacklight right now does not, by default, use any Solr feature that aren't present in Solr 1.4 -- and I think it should probably remain that way, mostly because there's nothing Blacklight itself needs to do out of the box that requires any special Solr4 features -- and because several of the committers are still deploying against Solr 1.4 (erik, let's not have a debate about how there's no excuses for that, we have our reasons, and it's a differnet topic at any rate). And cbeer's work he's reporting on, at any rate, has nothing to do with changing that. Blacklight just talks to any Solr you want over HTTP.

However, Blacklight's automated tests need to have a Solr index to test against. Which means they need to create a Solr index. (Another option would be trying to completely mock Solr, but that's not the option we've used, and would have it's own serious challenges, and would probably still require a way to automatically create a solr index that BL's tests will pass against).

Currently, Blacklight uses SolrMarc to create that Solr index (in a Solr 3.x in current versions? I think?). And there's a git project that includes a Solr all set up, so the automated tests can just check that out, use SolrMarc to index the test data in there, and run the tests. So long as Blacklight is using SolrMarc to create the Solr index it's testing against, that git project needs to have a Solr compatible with it, for sure.

Does Blacklight need to use SolrMarc to index it's test data for testing purposes? Of course not technically. Some of us (or at least me?) think it's important that Blacklight tests involve SolrMarc indexing though, because we think it's important that Blacklight supports the library catalog MARC use case, and that BL's automated tests ensure it supports the library catalog MARC use case. So you could add more tests that indexed using things other than SolrMarc, but some of us (me anyway?) think SolrMarc-based tests need to stay there. To understand why, we need to go to the next thing BL has to do with SolrMarc....

BL also ships with an optional installers for either or both of: 1) A Solr in a jetty. B) SolrMarc with same getting started indexing files for the library catalog MARC use case.

As it happens, the jetty-solr (and solr config) and SolrMarc (and solrmarc config) BL installer will optionally install is the same one used for BL tests. It doens't technically need to be. You could ship a jetty-solr and/or SolrMarc for users to install to get started that is an entirely differnet one than the one used for BL tests. However, some of us (or at least me?) think it's importnat that BL support the library catalog use case by shipping with an optional install of a jetty-solr and SolrMarc that work for that use case. And it would be more work to support one variation for user install, and a different variation for BL testing. As well as, if you're not testing what you ship, how would you know it works?

By testing BL with the same SolrMarc and jetty-solr we ship, we can ensure it works as expected with the jetty-Solr and SolrMarc that BL optionally installs, and only have one jetty-solr and SolrMarc to maintain.

If you didn't think it was important for BL to support the library catalog marc use case, you could remove SolrMarc entirely from it, there's no reason BL needs to test with SolrMarc, or ship a SolrMarc indexer.

But if you do think this is important, then BL needs to ship with an optional jetty-solr that works with SolrMarc, and an optional SolrMarc that works with that jetty-solr. BL could additionally ship with a different jetty-solr that doesn't neccesarily work with SolrMarc -- there could be multiple getting-started jetty-solr install options. But then there'd be more things to maintain (and test), and we're not doing great even with the one's we've got.

Or we could fix SolrMarc to work with Solr 4.0. But that's kind of a pain, because other developers who are not Bob have all found SolrMarc's source code difficult to work with. Or we could create an alternative to SolrMarc, still supporting library catalog marc as a use case, but something other than SolrMarc. in jruby? That's something many of us have talked about for some time, but nobody's had time to do (at least do in a way that it's integrated totally with BL, so BL ships it as an install option and/or tests under it).

But, to be clear, there is absolutely nothing stopping anyone from using BL with Solr4 now (and I think some people are?) -- BL just won't give you a Solr4 as an optional install option like it gives you Solr3, and BL won't give you a MARC indexer that works with Solr4 as it gives you SolrMarc, and BL's integration testing isn't against Solr4.

Phew. That's my piece.


From: Justin Coyne [notifications@github.com]
Sent: Monday, November 12, 2012 9:05 PM
To: projectblacklight/blacklight
Subject: Re: [blacklight] Support solr 4.0 (#450)

Is it okay to switch blacklight to Solr 4 if SolrMarc doesn't work with Solr 4 yet?


Reply to this email directly or view it on GitHubhttps://github.com//issues/450#issuecomment-10312994.

@cbeer
Copy link
Member

cbeer commented Nov 13, 2012

(I confess I haven't read your full responses yet.)

SolrMarc works with Solr 4.x in streaming mode (with the exception of the stack trace that is ugly but doesn't seem to affect functionality). Embedded mode breaks badly, but we don't need to support embedded mode, right?

@jrochkind
Copy link
Member

I think embedded mode was a huge SolrMarc mistake to begin with, so I think it's okay if we don't. But embedded mode does, in some circumstances result in faster indexing. More importantly, it allows you to index without a running Solr -- I'm not sure if our testing procedures and instructions, as well as our shipped rake tasks and instructions for indexing, may assume that indexing will work without a running solr. So that would all need to be fixed without embedded mode.

But even more importantly, the tl;dr version of my last post is that the SolrMarc you are using is probably using the wrong version of the SolrJ java library to talk to Solr. This is not supported by Solr, SolrJ libraries are specific to Solr versions. So even if it appears to work but for an apparently spurious error message, I think it's dangeorus, not supported by the Solr project, and would be a mistake to rely upon. A mistake to ship with instructions for other people that end up using the wrong SolrJ, and a mistake to have our own integration tests use the wrong SolrJ. It would be hard to know if there aren't all sorts of other things going wrong not being noticed, our own integration tests are certainly not sufficient to catch any possible bugs and edge cases due to violating SolrJ's instructions and using a version of SolrJ from one Solr version to talk to a different Solr version.


From: Chris Beer [notifications@github.com]
Sent: Monday, November 12, 2012 9:29 PM
To: projectblacklight/blacklight
Cc: Jonathan Rochkind
Subject: Re: [blacklight] Support solr 4.0 (#450)

(I confess I haven't read your full responses yet.)

SolrMarc works with Solr 4.x in streaming mode (with the exception of the stack trace that is ugly but doesn't seem to affect functionality). Embedded mode breaks badly, but we don't need to support embedded mode, right?


Reply to this email directly or view it on GitHubhttps://github.com//issues/450#issuecomment-10313458.

@pristinenoise
Copy link

Yeah, we're ingesting just fine versus solr 4. I didn't even notice it was
ugly.

If solrmarc works via streaming then I feel like that's fine. I'd rather
put people on solr 4 with slower ingests than old solrs to preserve an
optional solrmarc functionality.

On Nov 12, 2012, at 9:29 PM, Chris Beer notifications@github.com wrote:

(I confess I haven't read your full responses yet.)

SolrMarc works with Solr 4.x in streaming mode (with the exception of the
stack trace that is ugly but doesn't seem to affect functionality).
Embedded mode breaks badly, but we don't need to support embedded mode,
right?


Reply to this email directly or view it on
GitHubhttps://github.com//issues/450#issuecomment-10313458.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants