Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove delimiter from subjectLabel #312

Closed
acka47 opened this issue Jul 19, 2016 · 11 comments
Closed

Remove delimiter from subjectLabel #312

acka47 opened this issue Jul 19, 2016 · 11 comments
Assignees

Comments

@acka47
Copy link
Contributor

acka47 commented Jul 19, 2016

Reported by @aquast via email:

Bei der Ressource http://lobid.org/resource/HT018433961/about sind im Feld
"subjectLabel" anscheinend die Slashes als Trennung der Items verwendet
worden:

JLD-Anzeige:

"subjectLabel" : [ "1400-1468; Guttemberg, Giovanni", "1400-1468; Gensfleisch,
Johann", "1400-1468; Gutenberg, Giovanni", "1400-1468; Guttenbergius,
Joannes", "1400-1468; Gensfleisch ZurLaden, Johannes", "1400-1468; Gensfleisch
zur Laden, Johannes", "1400-1468; Gutenberg, Johann", "1400-1468; Gensfleisch
zur Lade, Johannes", "1400-1468; Gutenberg, Jean", "1400-1468", "1400-1468;
Gutemberg, Joannes", "1400-1468; Gutenberg, Iogann", "1400-1468; Gensfleisch,
Johannes", "1400-1468; Gensfleisch von Sorgenloch, Johann", "1400-1468;
Guttenberg, Johann", "1400-1468; Gutenberg, Johann G.", "Gutenberg, Johannes",
"1400-1468; Gensfleisch zum Gutenberg, Johann" ],

Intendiert waren vom Hersteller der Daten (LBZ) eigentlich die wohl übliche
Trennung anhand der Semikola.

Gibt es eine Ursache für die Trennung generell und speziell nach "/" Oder
könnt Ihr das ggf. schnell korrigieren?

Stephani hat auf meine Rückfrage empfohlen, das die aus Feld 710 a kommenden
Daten nicht aufgetrennt werden sollten. Das wäre mir auch am liebsten.

@aquast says we should rather store this as one string or use ; as delimiter. See Morph lines 1264-L1268.

@acka47 acka47 self-assigned this Jul 19, 2016
@acka47 acka47 added ready and removed ready labels Jul 19, 2016
acka47 added a commit to lobid/lodmill that referenced this issue Jul 20, 2016
@fsteeg
Copy link
Member

fsteeg commented Jul 20, 2016

Build for lobid/lodmill#791 was passing, merged.

Weekly index creation pulls master, so the change should be automatically deployed over the weekend.

@fsteeg
Copy link
Member

fsteeg commented Jul 22, 2016

Index size for lobid-resources is about 61 GB, see: http://gaia.hbz-nrw.de:9200/_plugin/head/

Free space on gaia is 57 GB, so new index over the weekend will be an issue.

Deleting the current staging index, setting alias to production index.

@acka47
Copy link
Contributor Author

acka47 commented Jul 25, 2016

No changes at http://lobid.org/resource?id=HT018433961&format=full. Did the full re-indexing run as planned during the weekend?

@fsteeg
Copy link
Member

fsteeg commented Jul 25, 2016

No, index creation failed for both API 1.x and 2.0.

The issue seems to be that the actual baseline dump file is missing:

ls -al /files/open_data/closed/hbzvk/index.hbz-nrw.de/alephxml/clobs/baseline/aliasNewestFulldump.tar.gz
lrwxrwxrwx 1 800 800 87 Jul 24 06:44 /files/open_data/closed/hbzvk/index.hbz-nrw.de/alephxml/clobs/baseline/aliasNewestFulldump.tar.gz -> /files/open_data/open/DE-605/mabxml/DE-605-aleph-baseline-marcxchange-2016072318.tar.gz

tail /files/open_data/open/DE-605/mabxml/DE-605-aleph-baseline-marcxchange-2016072318.tar.gz
tail: cannot open `/files/open_data/open/DE-605/mabxml/DE-605-aleph-baseline-marcxchange-2016072318.tar.gz' for reading: No such file or directory

@dr0i: Could this be related to our new setup for getting the catalog data (see hbz/lobid-resources#91)?

@acka47: The transformation change in lobid/lodmill@e3f4bd5 was only for API 1.x, we'll need to do the same in https://github.com/hbz/lobid-resources, or are we taking a different approach for API 2.0?

@fsteeg fsteeg removed their assignment Jul 25, 2016
@fsteeg fsteeg added ready and removed deploy labels Jul 25, 2016
@acka47
Copy link
Contributor Author

acka47 commented Jul 25, 2016

The transformation change in lobid/lodmill@e3f4bd5 was only for API 1.x, we'll need to do the same in https://github.com/hbz/lobid-resources, or are we taking a different approach for API 2.0?

The plan was to get rid of subjectLabel altogether in API 2.0 (see hbz/lobid-resources#8). Thus, we will have to think about where to put the contents of 710. I already adressed the problem in hbz/lobid-resources#8 (comment).

@fsteeg
Copy link
Member

fsteeg commented Jul 25, 2016

Manually triggered new index creation from tar.gz baseline at http://index.hbz-nrw.de/alephxml/export/baseline/2016072214/, based on what happens in the weekly crontab job for hduser@weywot1:

cd /home/hduser/git/lodmill/lodmill-rd/doc/scripts/hbz01

wget http://index.hbz-nrw.de/alephxml/export/baseline/2016072214/DE-605-aleph-baseline-marcxchange-2016072214.tar.gz

DATE=$(date "+%Y%m%d-%H%M")

BRANCH=master

bash -x startHbz01ToLobidResources.sh $BRANCH /home/hduser/git/lodmill/lodmill-rd/doc/scripts/hbz01/DE-605-aleph-baseline-marcxchange-2016072214.tar.gz lobid-resources-$DATE "-staging" quaoar2.hbz-nrw.de quaoar create doc/scripts/hbz01/toBeUpdateFilesXmlClobs_afterBasedump.txt > $DATE-$BRANCH.staging.log.startHbz01ToLobidResources.sh 2>&1 &

Update file used above looks good:

cat ../../../doc/scripts/hbz01/toBeUpdateFilesXmlClobs_afterBasedump.txt
/files/open_data/open/DE-605/mabxml/DE-605-aleph-update-marcxchange-20160723-20160724.tar.gz
/files/open_data/open/DE-605/mabxml/DE-605-aleph-update-marcxchange-20160724-20160725.tar.gz

Indexing into lobid-resources-20160725-1244, see http://quaoar2.hbz-nrw.de:9200/_plugin/head/

@fsteeg fsteeg added deploy and removed ready labels Jul 25, 2016
@fsteeg
Copy link
Member

fsteeg commented Jul 26, 2016

Deployed to staging, see: http://test.lobid.org/resource?id=HT018433961&format=full

@acka47
Copy link
Contributor Author

acka47 commented Jul 26, 2016

Looks good. Did you take care of all all the updates? If yes: +1

@acka47 acka47 assigned fsteeg and unassigned acka47 Jul 26, 2016
@fsteeg
Copy link
Member

fsteeg commented Jul 26, 2016

Logs for the updates that ran as part of the manual indexing (see #312 (comment)) on weywot1 in /home/hduser/git/lodmill/lodmill-rd/doc/scripts/hbz01/20160725-1244-master.staging.log.startHbz01ToLobidResources.sh (finished at 04:33) and regular automated updates in /home/hduser/git/lodmill/lodmill-rd/doc/scripts/hbz01/log/20160726-071001-master.staging.log.startHbz01ToLobidResources.sh (starting at 07:10) look good.

@fsteeg
Copy link
Member

fsteeg commented Jul 26, 2016

Deployed to production, closing. See:

See http://lobid.org/resource?id=HT018433961&format=full

(Added hbz/lobid-resources#91 (comment) to open issue about the baseline file problem.)

@fsteeg fsteeg closed this as completed Jul 26, 2016
@fsteeg fsteeg removed the review label Jul 26, 2016
@aquast
Copy link

aquast commented Aug 3, 2016

thx 1+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants