New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raw data #19
Comments
Hi @mx03, I admire that you would like to improve the merging. But I would love to point out a few things: First I would say that it is not possible to achieve a 100% match with an automated, leightweight process. Merging a data set of 52776 into 29702 entries is pretty okay, I guess. From the logs I can say that the logic itself seems to be pretty good. A Quick look at the log files revealed that in 7257 cases a merge has been actively prevented, because properties differed. Of course there are cases in which an anime exists only on solely one of the anime database sites and so on. Secondly you have to make up your mind about your priority. My priority here is to make correct merges instead of having a high density. Hence the actively prevented merges I mentioned above. I rather not merge an entry than merging it with an entry that is not really the same production. That happened in the past and that was the point I made that decision. |
I can understand your concern, correct merges are most important thing with a database like yours. I was looking for a database like yours to prevent the vendor lock-in of my data. But two test picks "Death Note" and "Cowboy Bebop" are cut in 2 datasets ( I think because of the active merge prevention). So my thought better automatic merge or force datasets by hand together. |
I can totally relate. I started with MAL and added more, because I wanted to prevent a vendor lock-in for myself as well. Or at least I wanted to be able to shift smoothly in a short period of time. And of course, because I though others might appreciate it ;) I personally think doing additional merging manually is a crazy amount of work. Back to your test picks: |
Am I'm wrong or is the duration of an standard tv anime show on anidb always 25min? In my option is the duration a bad factor, because most tv anime series has 23-25min. The value is for the type "tv shows" more an issue, but for web/ona, ova, movie its should be a required check. |
I'm not sure about anidb. I would need to check that specifically. |
This repo looks very interesting. Is there a possibility getting the raw data before merge to optimize the merge between the anime sites?
The text was updated successfully, but these errors were encountered: