Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Provide PS, type and field code searching #506

Closed
traviscb opened this Issue · 13 comments

4 participants

@traviscb
Collaborator

Originally on 2011-02-24

Depends on #505

Once #505 is completed add kbs to the type code and field code abbreviations, as well as aliases in the SPIRES search syntax for

PS, SCL, type, TC -> 690C_a (via doctype.kb)

FC, field -> 65017 (via a classifications.kb)

Both kbs are published in Travis public inspire repo in the PS_FC_kbs branch

Additionally indexes should be made for both of these MARC codes.

After this is all done, we should looks at standardizing the coding across TC/Note/etc and FC/archive category/collection etc.

@jrbl jrbl was assigned by traviscb
@invenio-developers
Collaborator

Originally by valkyrie (@valkyriesavage) on 2011-04-21

I don't know how to build the indexes, but this is available in my public INSPIRE branch (/afs/slac.stanford.edu/public/groups/library/valkyrie-public-git/inspire-valkyrie.git/) as knowledge-bases.

@tiborsimko
Owner

Originally on 2011-05-05

1) You can make indexes by inserting proper configuration statements
to the top-level Makefile; see for example what I did earlier for
the "firstauthor" index. (INSPIRE repo, commit 2146ab19)

2) The journal index is already made, so you can test the journal
synonym searching (including volume and pages) even without creating
the other indexes. In the journal index synonym configuration, the
branch currently uses the massaging function leading_to_number, but
I think you should rather use leading_to_comma, because INSPIRE
convention for journal index is to separate_journal,volume,page_
values by commas. So with leading_to_number, journal searches
including volume and page would not work. For the two other indexes,
the exact massaging function seems appropriate.

3) For the two other classification/doctype indexes, we may perhaps
consider using the index-time synonyms instead of search-time
synonyms, especially if people are used to values like E from SPIRES
times.

4) But I'd like to clarify the terminology regarding
classification/doctype indexes first.

WRT "classification" KB, it generates values for 65017 field which is
called "subject" in cataloguing tools. How do we want to call the new
index in the user facing parts of INSPIRE, "classification" or
"subject" or "fc" or something else? Consider that people may be
typing and/or seeing query terms like classification:E or
classification:"Experiment-HEP", so we'd better choose something
nice. Maybe stick to "subject" like in cataloguing tools, maybe stick
to "fc" if we choose this to be the user-facing canonical index name
and not only an alias, etc.

WRT "doctype", there is a similar naming mismatch. Moreover, here the
word doctype has a very concrete meaning in Invenio, namely the type
of a document attached to a record. So it may be misleading to call
it that. (BTW, see also somewhat related filetype/doctype index issue
in #473.)

5) When updating KB/index names, we may want to amend the following
description TALKTYPEDESC='Mapping of... something?' a bit. :)

6) You should also document in the commit log the weblinks/oalinks fix
done alongside the process. Ideally, we should perhaps separate this
fix into a commit of its own, since it is unrelated to synonym KBs.

@invenio-developers
Collaborator

Originally by valkyrie (@valkyriesavage) on 2011-05-17

Ok, with those comments in mind, I renamed the doctype and classification indices to "media" and "subject", and I corrected all the other comments. I didn't separate out the weblinks/oalinks fix, although if you feel strongly about it I can.

I am having trouble testing this fix, since there doesn't seem to be any information in 690__c. http://inspirebeta.net/search?ln=en&ln=en&p=690__c%3AReview&action_search=Search&sf=&so=d&rm=&rg=25&sc=0&of=hb

?

Anyway, there are branches for both Invenio (for SPIRES syntax to translate the new index names correctly) and Inspire available in my afs repo as knowledge-bases.

@traviscb
Collaborator

Originally on 2011-05-17

Replying to [comment:4 valkyrie]:

Ok, with those comments in mind, I renamed the doctype and classification indices to "media"

Hmm. media sounds odd. can we call it "type" or type-code (prefer type)

I am having trouble testing this fix, since there doesn't seem to be any information in 690__c. http://inspirebeta.net/search?ln=en&ln=en&p=690__c%3AReview&action_search=Search&sf=&so=d&rm=&rg=25&sc=0&of=hb

Try http://inspirebeta.net/search?ln=en&ln=en&p=690C%3AReview&action_search=Search&sf=&so=d&rm=&rg=25&sc=0&of=hb

it is 690C not 690__c...thats because the "C" is not a subfield. To be precise it is 690C_a

This may require a bit of checking/changing in your configuration...

As a side comment - I certainly don't think we find these layers of MARC complexity valuable, I imagine only 1-2 people in INSPIRE know why there is a "C" there. In the long run a separation of Invenio from MARC would be desirable in my opinion, but in the occasional case of exporting to others using MARC, they could potentially still be useful

@invenio-developers
Collaborator

Originally by valkyrie (@valkyriesavage) on 2011-05-17

Ok, I renamed the index.

Sadly, I still can't make it work. Could one of you take a look at my indexing stuff in the makefile? The subject and journal indexes work just fine, but the type index isn't behaving. Is "type" a reserved word for some other reason? Anyway, the code if you can take a look is in inspire/invenio branches called knowledge-bases.

@invenio-developers
Collaborator

Originally by valkyrie (@valkyriesavage) on 2011-06-16

ok, y'all, this is now working

INSPIRE and invenio branches called knowledge-bases available in AFS

@jrbl
Collaborator

Originally on 2011-08-24

It is worth noting that to work correctly this branch requires configuration directives to be set in invenio-local.conf, like:

CFG_WEBSEARCH_SYNONYM_KBRS = {
  'journal': ['JOURNALS', 'leading_to_comma'],
  'collection': ['COLLECTION', 'exact'],
  'subject': ['SUBJECT', 'exact'],
}
@jrbl
Collaborator

Originally on 2011-08-25

I think that this actually does work. I failed it so I could reassign it to myself, because I'm doing some cleanups and chasing down problems with the unit tests (the problems appear to be in the tests themselves.) I'm snapshotting to my github inspire and invenio repositories, in 506-knowledge_bases-rebased. I'll be deploying to inspire-hep-dev in a minute so people can check this out.

@jrbl
Collaborator

Originally on 2011-08-26

I have now deployed this on prod, as per INSPIRE RT#148083.

I have cherry-picked the Invenio patch into inspire-ops on branch rebased-20110816 (which is still our latest deployment target) and Travis has put the Inspire patch into the inspire repository.

The branches are on my AFS and github as 506-knowledge-bases-rebased.

@invenio-developers
Collaborator

Originally by hoc on 2011-08-26

OK, for this part:
PS, SCL, type, TC -> 690C_a (via doctype.kb)

SCL searching doesn't work yet.

the search
find a smith and scl s (or scl p)
should simply be an alias for
find a smith and tc p

There are other SCL values (as Travis pointed out on EVO it's a blend of FC and TC) but the "published" one is the most important and we want to seriously deprecate other uses of it.

@invenio-developers
Collaborator

Originally by hoc on 2011-08-26

Here's a problem with conference papers:

find a witten and tc c [does not work]
http://inspirebeta.net/search?ln=en&ln=en&p=find+a+witten+and+tc+c

find a witten and tc conference paper [does not work, this used to work]
http://inspirebeta.net/search?ln=en&ln=en&p=find+a+witten+and+tc+conference+paper

find a witten and tc conference [works, this is new]
http://inspirebeta.net/search?ln=en&ln=en&p=find+a+witten+and+tc+conference

@jrbl
Collaborator

Originally on 2011-08-30

I've moved heath's comments to #791 because that's where I mean to take care of them. I think this ticket is still ready for merge. Tibor, you'll want to make sure you fetch the very latest version of the 506-knowledge_bases-rebased branches, because I did some squashing tonight.

@invenio-developers
Collaborator

Originally by Valkyrie Savage vasavage@gmail.com on 2011-08-30

In [c5c240e]:

#CommitTicketReference repository="" revision="c5c240e3e3ad27db467926d04de21aa2f84478a9"
WebSearch: type and field codes

- updated SPIRES mappings to reflect different indices for type
  and field codes and journal codens
* tests for same (fixes #506)(fixes #521)
* changes behavior of bibknowledge slightly so that exact kbr search by
  key, for the empty string, returns hits only if the empty string is
  actually a kbr key.
* and tests for this behavior

Co-authored-by: Joe Blaylock <jrbl@slac.stanford.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.