Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I run postgresql/sql/CreateFKConstraints.sql some fail because of problems with data #125

Open
ijabz opened this issue Sep 7, 2020 · 5 comments

Comments

@ijabz
Copy link
Collaborator

ijabz commented Sep 7, 2020

When I run postgresql/sql/CreateFKConstraints.sql some fail because of problems with data, should we run something to fix these data anomalies ?

This was with the 20200901 data but Im sure I have seen these errors before

ERROR: insert or update on table "release_artist" violates foreign key constraint "release_artist_fk_artist"
DETAIL: Key (artist_id)=(0) is not present in table "artist".
ERROR: insert or update on table "release_track_artist" violates foreign key constraint "release_track_artist_fk_artist"
DETAIL: Key (artist_id)=(118760) is not present in table "artist".
ERROR: insert or update on table "release_company" violates foreign key constraint "release_company_fk_company"
DETAIL: Key (company_id)=(670384) is not present in table "label".

@philipmat
Copy link
Owner

Could you please post the output of the following commands:

  • select * from release_artist where artist_id = '0' (might be where artist_id = 0)
  • select * from release_track_artist_fk where artist_id = '118760'
  • select * from release_company where company_id = '670384'

@MarkInNVA
Copy link

Hi, I just imported discogs_20200901 and had similar issues, I believe they are db problems, not script problems.

For you requested more info, using pgadmin4 against discogs_20200901 imported via discogs-xml2db, I got the following results:

select count(*) from public.release_artist where artist_id = 0;
returns 243416

select count(*) from release_track_artist where artist_id = '118760' returns 64110

select * from release_company where company_id = '670384' returns one record with what looks like valid entries in each field.

@philipmat
Copy link
Owner

Thank you, @MarkInNVA - could you post some examples of results for each one of those selects?

@ijabz
Copy link
Collaborator Author

ijabz commented Nov 16, 2021

Artist 0 referred to in release_artist seems to be release credits that dont match to an artist in discogs database.

e.g

jthinksearch=# select * from release_artist where release_id=107;
id | release_id | artist_id | artist_name | extra | anv | position | join_string | role | tracks
-----+------------+-----------+------------------+-------+-----+----------+-------------+----------------------+--------
393 | 107 | 202 | Kid Scientific | 0 | | 1 | | |
394 | 107 | 6871553 | Fran Englehardt | 1 | | 1 | | Executive Producer |
395 | 107 | 0 | Matthew Sordillo | 1 | | 2 | | Executive Producer |
396 | 107 | 147370 | Mike Walsh | 1 | | 3 | | Executive Producer |
397 | 107 | 0 | Renae Loguidice | 1 | | 4 | | Photography |
398 | 107 | 202 | Kid Scientific | 1 | | 5 | | Producer, Written-By |
399 | 107 | 283613 | Skydiver (5) | 1 | | 6 | | Producer, Written-By |

If we look at https://www.discogs.com/release/107 we can see that the two artists with artist_id of zero (Matthew Sordillo and Renae Loguidice) are listed on webpage, but they are not hyperlinks

Not sure what do about this.

Artist 118760 referred to release_track_artist is the meta artist 'No Artist', but i think it makes sense for our database to add this an artist

select * from release_track_artist where artist_id = 118760

200 |       267 |         56 |              1 |    118760 | No Artist   | f     |                                                               |        1 |                          |                                                                                         |
  833 |      1387 |        319 |              3 |    118760 | No Artist   | f     |                                                               |        1 |                          |                                                                                         |
 5794 |      7393 |       1241 |              1 |    118760 | No Artist   | f     |                                                               |        1 |                          |                                                                                         |
12916 |     16576 |       2964 |             14 |    118760 | No Artist   | f     |                                                               |        1 |                          |                                                                                         |
13760 |     17546 |       3122 |              1 |    118760 | No Artist   | f     |                                                               |        1 |                          |           

e.g see first track on this release
https://www.discogs.com/release/56-Dino-Terry-House-De-Luxe-Volume-2

@ijabz
Copy link
Collaborator Author

ijabz commented Nov 16, 2021

Tried adding

insert into artist(id,name) values(118760, 'No Artist');

but release_track_artist_fk_artist fails because there are other rows that use artist_id=0 just like release_artist does

Maybe release_artist and release_track_artist should both allow artist_id to be nullable ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants