Any concern about IMDb Dataset ToS? #8

Open
ckabalan opened this Issue Jan 29, 2018 · 8 comments

Comments

Projects
None yet
5 participants

ckabalan commented Jan 29, 2018

Hi!

I recently wrote a similar application after Kevinformatic's Graph TV went away, and after his ominous update to the webpage ("reasons outside my control", "as soon as I am able") I did some research and found that my app was actually in violation of the IMDb ToS, so I immediately disabled any data retrieval functionality. I had originally just downloaded the files from https://datasets.imdbws.com/ but I did more investigating and found their licensing states:

  1. The data can only be used for personal and non-commercial use and must not be altered/republished/resold/repurposed to create any kind of online/offline database of movie information (except for individual personal use). Please refer to the copyright/license information enclosed in each file for further instructions and limitations on allowed usage.
    Source: Can I use IMDb data in my software?

As I interpret that you can only use the data for like your own private exploration/curiosity, so you couldn't even publish ratings publicly because it would no longer by "individual personal use". I also wonder how specifically calling out "movie information" relates to TV shows, actors, etc. The strange thing is the very next item...

  1. You must acknowledge the source of the data by including the following statement:
    Information courtesy of IMDb (http://www.imdb.com). Used with permission.

If it's only for your own "individual personal use" why would you need to attribute it? No one except yourself is going to see it. I don't understand #4 in combination with #3 unless there are scenarios where IMDb is OK with public display of data scraped from the data sets (IE, NOT personal use).

I talked to another individual familiar with IMDb data and they indicated that the Amazon legal team was actively reaching out to people using IMDb data publicly.

I only bring this up because it sounds to me like @kevinwuhoo may have been hit up by IMDb lawyers/DMCAed/etc and is reluctant to speak publicly. Feel free to close this issue if you feel it is irrelevant to the Git repo and open-source aspect of your project. You may also be outside the US and these laws are irrelevant. Just thought I would chime in.

Contributor

utkarshkukreti commented Jan 29, 2018

I think Kevin's application was using web scraping to get the data because AFAIK it stopped working at the same time as epdate/eprate pages were removed. Kevin also tweeted 4 days ago that graphtv will be back "hopefully soon".

Have you tried contacting IMDb to get a definite answer?

In the meantime I'll add an acknowledgement that the data is from IMDb and also add a contact link to the footer of this app so they can contact me if they want. If they contact me and ask me to remove the data I'll switch to using some other API like trakt.

@utkarshkukreti, I contacted the IMDb Helpdesk privately with the following:

To: IMDb Helpdesk
From: Caesar Kabalan / Dandelock
Subject: IMDb Ratings Data Fair Use
Body:
I would like to write a small website which uses the publicly available IMDb
dataset (http://www.imdb.com/interfaces/) to show ratings data for TV Shows in
graph form. You type in a TV Show name and it shows a chart with the ratings
for each episode which gives you an idea on how well received the show was as
it progressed.

This website would be 100% non-commercial and would be cheap enough to host
that it would not require ads or any monetization. It would however be
displaying data from the public IMDb dataset. All data would be cleared marked
as sourced from IMDb and kept up to date.

Does IMDb have a stance on whether we can use a subset of the data this way?

Any response would be helpful!

I've posed the question publicly if you're looking for additional details:
https://getsatisfaction.com/imdb/topics/imdb-ratings-data-fair-use

They responded to the forum post linked at the bottom of the email and then privately to my helpdesk ticket a few hours later:

To: Caesar Kabalan / Dandelock
From: IMDb Helpdesk
Subject: Re: IMDb Ratings Data Fair Use
Body:
Hi Caesar,

No, your usage wouldn't qualify for our intended usage of our free dataset. That dataset has very limited allowed uses -- namely, private & personal use (meaning, no one else sees our data in your work except you) or in-the-classroom academic work (gor example, a paper or thesis for a class).

The fact that you won't monetize your website doesn't mean that the usage isn't commercial. You would require our commercial content license for your use case.

Our license product is aimed at large companies. Our licensees include The New York Times, Viacom, United Airlines, and Verizon among many others. As such, there is a license fee that starts at five figures. We assume that this is well beyond your means for your project, but if it's not, please let us know and we'll put you in touch with the right licensing people here.

Regards,
The IMDb Help Desk

I marked my question as resolved. Unfortunately it looks like these GraphTV type websites aren't possible using IMDb's data directly.

@ckabalan ckabalan referenced this issue in graphtv/website Jan 30, 2018

Open

Same Origin Policy causing issues? #1

Contributor

utkarshkukreti commented Jan 31, 2018

Thanks for asking them. That's really unfortunate. I'm going to shut down the site and redirect it to this page until I find a different source for the ratings.

Sorry to see this, I was really enjoying this site with GraphTV gone. I've dealt with ratings sites and had similar results. I really wish there wasn't so much gray area around this. From my experience and research, it seems like you can display publicly accessible data if you scrape it (which is simple enough) but who wants to take that risk?

@jessejoe Actually, if you look at their site Conditions of Use...

Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.

And then...

Licensing IMDb Content; Consent to Use Robots and Crawlers: If you are interested in receiving our express written permission to use IMDb content for your non-personal (including commercial) use, please visit our Content Licensing section or contact our Licensing Department. We do allow the limited use of robots and crawlers, such as those from certain search engines, with our express written consent. If you are interested in receiving our express written permission to use robots or crawlers on our site, please contact our Licensing Department.

Ironically the Content Licensing section / Licensing Department are the ones who answered the emails above... So the answer is you cannot do it and everything be "on the up-and-up".

jessejoe commented Feb 2, 2018

@ckabalan it is not uncommon for companies to put unenforceable or even flat out wrong restrictions in their ToS. There is plenty of research and anecdotal (or even actual legal) cases out there supporting that publicly available information is generally safe to gather and/or scrape. The problem is there doesn't seem to be a clear precedent or legal foundation really set for it, so you wind up trying to wedge internet/computer activity into other laws. And again, it's just not worth the risk.

You could come up with your own ratings and put the site back up using those. Maybe your ratings just happen to be very similar to what the IMDB ratings are...

niamiot commented Feb 22, 2018

As @MatthewPDingle suggests, The point is to use a different rating source.
e.g. It seems TVtime (https://www.tvtime.com/fr) uses it's personal rating information. Maybe It would be easier to find an arrangement with them ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment