New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement regular DBS updates #94
Comments
I will speak to Ira soon about this. There are different options – short-term and mid-/long-term. Short-term: Notify DBS regarding changes
Mid- to long-term: move DBS-Stammdaten management to another system
|
We will have a meeting with DBS in October where we will probably get access to the database so that we can set up regular exports of DBS data. |
Renamed the issue to set up regular DBS updates. We now have access to the database. @dr0i had some questions regarding the next steps. The current version of DBS export we use for transformation can be found at http://quaoar1.hbz-nrw.de:7001/assets/data/dbs.zip (see https://github.com/hbz/lobid-organisations/blob/master/README.textile#deployment). I think we made sure in the past that we want the DBS export as near to the original data as possible. Thus, the column names in the csv should be named the same as those in the data base. In the past we exported the following information (though we don't use it all): inr, iso, bib_typ, nam, plz, ort, str, stk_2007, sbi, isil, tvw, tel, fax, ema, url, opa, sta, oef, typ_text, utr_text, gro_text, leitung |
Note that the file at http://quaoar1.hbz-nrw.de:7001/assets/data/dbs.zip contains manual changes, which ideally should be merged with a new database export. |
I communicated about those and especially other changes with the DBS colleagues. So the DBS source should be much better than what we currently have in dbs.zip. No need to merge. |
@acka47 for your initial test use |
We get nearly all relevant data with Filter out active libraries with We will have to adjust the transformation a bit as column names differ from the current dbs.csv file. Here is a comparison old -> new:
Missing field "Unterhaltsträger" What is missing in *Additional fields
|
I tried to join the tables without any previous SQL experience and finally this did work:
|
Here is the query with filtering out non-active libraries:
|
Because of DSGVO it was decided to not allow any updates. Closing. |
Reopening as the status of DBS data access is still unclear. |
It looks like there currently is no DBS data at all in lobid-organisations: https://lobid.org/organisations/search?q=_exists_%3AdbsID+AND+NOT+_exists_%3Aisil |
Transformation fails with (see application.log):
Comparing the old csv and the new one:
and these possible fields are then in quotes ("csv with enclosed fields"). In the new dbs-dump these quotes are missing. |
I don't know whether you interpreted the problem correctly. We actually removed one column (fax) from the SQL query, so it is correct that there is one less. So we must adjust the code there to expect 18 columns. |
Yeah, me too saw that at first as the most plausible source of the error. I admit to have had problems to easily find where this number is specified, but I believe the first line (the description of the csv columns) is analyzed. There is now correctly the "fax" missing, and, counting the commatas, this makes exactly 18 defined columns. Now, my example of the "opening hours" shows that the data has in fact 19 columns because the quotes are missing. |
Sent Therese this regex: |
Update finished. Looks good to me! Assigning @acka47 for review. |
+1 Did you already implement an automatic update procedure for DBS data? |
As discussed offline, @dr0i already implemented automatic updates. We will have to check whether this works next week. |
Seems to work, at least:
Closing. |
To be discussed with DBS colleagues. Questions (amongst others):
The text was updated successfully, but these errors were encountered: