New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any concern about IMDb Dataset ToS? #8

Open
ckabalan opened this Issue Jan 29, 2018 · 14 comments

Comments

Projects
None yet
8 participants
@ckabalan

ckabalan commented Jan 29, 2018

Hi!

I recently wrote a similar application after Kevinformatic's Graph TV went away, and after his ominous update to the webpage ("reasons outside my control", "as soon as I am able") I did some research and found that my app was actually in violation of the IMDb ToS, so I immediately disabled any data retrieval functionality. I had originally just downloaded the files from https://datasets.imdbws.com/ but I did more investigating and found their licensing states:

  1. The data can only be used for personal and non-commercial use and must not be altered/republished/resold/repurposed to create any kind of online/offline database of movie information (except for individual personal use). Please refer to the copyright/license information enclosed in each file for further instructions and limitations on allowed usage.
    Source: Can I use IMDb data in my software?

As I interpret that you can only use the data for like your own private exploration/curiosity, so you couldn't even publish ratings publicly because it would no longer by "individual personal use". I also wonder how specifically calling out "movie information" relates to TV shows, actors, etc. The strange thing is the very next item...

  1. You must acknowledge the source of the data by including the following statement:
    Information courtesy of IMDb (http://www.imdb.com). Used with permission.

If it's only for your own "individual personal use" why would you need to attribute it? No one except yourself is going to see it. I don't understand #4 in combination with #3 unless there are scenarios where IMDb is OK with public display of data scraped from the data sets (IE, NOT personal use).

I talked to another individual familiar with IMDb data and they indicated that the Amazon legal team was actively reaching out to people using IMDb data publicly.

I only bring this up because it sounds to me like @kevinwuhoo may have been hit up by IMDb lawyers/DMCAed/etc and is reluctant to speak publicly. Feel free to close this issue if you feel it is irrelevant to the Git repo and open-source aspect of your project. You may also be outside the US and these laws are irrelevant. Just thought I would chime in.

@utkarshkukreti

This comment has been minimized.

Show comment
Hide comment
@utkarshkukreti

utkarshkukreti Jan 29, 2018

Contributor

I think Kevin's application was using web scraping to get the data because AFAIK it stopped working at the same time as epdate/eprate pages were removed. Kevin also tweeted 4 days ago that graphtv will be back "hopefully soon".

Have you tried contacting IMDb to get a definite answer?

In the meantime I'll add an acknowledgement that the data is from IMDb and also add a contact link to the footer of this app so they can contact me if they want. If they contact me and ask me to remove the data I'll switch to using some other API like trakt.

Contributor

utkarshkukreti commented Jan 29, 2018

I think Kevin's application was using web scraping to get the data because AFAIK it stopped working at the same time as epdate/eprate pages were removed. Kevin also tweeted 4 days ago that graphtv will be back "hopefully soon".

Have you tried contacting IMDb to get a definite answer?

In the meantime I'll add an acknowledgement that the data is from IMDb and also add a contact link to the footer of this app so they can contact me if they want. If they contact me and ask me to remove the data I'll switch to using some other API like trakt.

@ckabalan

This comment has been minimized.

Show comment
Hide comment
@ckabalan

ckabalan Jan 30, 2018

@utkarshkukreti, I contacted the IMDb Helpdesk privately with the following:

To: IMDb Helpdesk
From: Caesar Kabalan / Dandelock
Subject: IMDb Ratings Data Fair Use
Body:
I would like to write a small website which uses the publicly available IMDb
dataset (http://www.imdb.com/interfaces/) to show ratings data for TV Shows in
graph form. You type in a TV Show name and it shows a chart with the ratings
for each episode which gives you an idea on how well received the show was as
it progressed.

This website would be 100% non-commercial and would be cheap enough to host
that it would not require ads or any monetization. It would however be
displaying data from the public IMDb dataset. All data would be cleared marked
as sourced from IMDb and kept up to date.

Does IMDb have a stance on whether we can use a subset of the data this way?

Any response would be helpful!

I've posed the question publicly if you're looking for additional details:
https://getsatisfaction.com/imdb/topics/imdb-ratings-data-fair-use

They responded to the forum post linked at the bottom of the email and then privately to my helpdesk ticket a few hours later:

To: Caesar Kabalan / Dandelock
From: IMDb Helpdesk
Subject: Re: IMDb Ratings Data Fair Use
Body:
Hi Caesar,

No, your usage wouldn't qualify for our intended usage of our free dataset. That dataset has very limited allowed uses -- namely, private & personal use (meaning, no one else sees our data in your work except you) or in-the-classroom academic work (gor example, a paper or thesis for a class).

The fact that you won't monetize your website doesn't mean that the usage isn't commercial. You would require our commercial content license for your use case.

Our license product is aimed at large companies. Our licensees include The New York Times, Viacom, United Airlines, and Verizon among many others. As such, there is a license fee that starts at five figures. We assume that this is well beyond your means for your project, but if it's not, please let us know and we'll put you in touch with the right licensing people here.

Regards,
The IMDb Help Desk

I marked my question as resolved. Unfortunately it looks like these GraphTV type websites aren't possible using IMDb's data directly.

ckabalan commented Jan 30, 2018

@utkarshkukreti, I contacted the IMDb Helpdesk privately with the following:

To: IMDb Helpdesk
From: Caesar Kabalan / Dandelock
Subject: IMDb Ratings Data Fair Use
Body:
I would like to write a small website which uses the publicly available IMDb
dataset (http://www.imdb.com/interfaces/) to show ratings data for TV Shows in
graph form. You type in a TV Show name and it shows a chart with the ratings
for each episode which gives you an idea on how well received the show was as
it progressed.

This website would be 100% non-commercial and would be cheap enough to host
that it would not require ads or any monetization. It would however be
displaying data from the public IMDb dataset. All data would be cleared marked
as sourced from IMDb and kept up to date.

Does IMDb have a stance on whether we can use a subset of the data this way?

Any response would be helpful!

I've posed the question publicly if you're looking for additional details:
https://getsatisfaction.com/imdb/topics/imdb-ratings-data-fair-use

They responded to the forum post linked at the bottom of the email and then privately to my helpdesk ticket a few hours later:

To: Caesar Kabalan / Dandelock
From: IMDb Helpdesk
Subject: Re: IMDb Ratings Data Fair Use
Body:
Hi Caesar,

No, your usage wouldn't qualify for our intended usage of our free dataset. That dataset has very limited allowed uses -- namely, private & personal use (meaning, no one else sees our data in your work except you) or in-the-classroom academic work (gor example, a paper or thesis for a class).

The fact that you won't monetize your website doesn't mean that the usage isn't commercial. You would require our commercial content license for your use case.

Our license product is aimed at large companies. Our licensees include The New York Times, Viacom, United Airlines, and Verizon among many others. As such, there is a license fee that starts at five figures. We assume that this is well beyond your means for your project, but if it's not, please let us know and we'll put you in touch with the right licensing people here.

Regards,
The IMDb Help Desk

I marked my question as resolved. Unfortunately it looks like these GraphTV type websites aren't possible using IMDb's data directly.

@utkarshkukreti

This comment has been minimized.

Show comment
Hide comment
@utkarshkukreti

utkarshkukreti Jan 31, 2018

Contributor

Thanks for asking them. That's really unfortunate. I'm going to shut down the site and redirect it to this page until I find a different source for the ratings.

Contributor

utkarshkukreti commented Jan 31, 2018

Thanks for asking them. That's really unfortunate. I'm going to shut down the site and redirect it to this page until I find a different source for the ratings.

@jessejoe

This comment has been minimized.

Show comment
Hide comment
@jessejoe

jessejoe Jan 31, 2018

Sorry to see this, I was really enjoying this site with GraphTV gone. I've dealt with ratings sites and had similar results. I really wish there wasn't so much gray area around this. From my experience and research, it seems like you can display publicly accessible data if you scrape it (which is simple enough) but who wants to take that risk?

jessejoe commented Jan 31, 2018

Sorry to see this, I was really enjoying this site with GraphTV gone. I've dealt with ratings sites and had similar results. I really wish there wasn't so much gray area around this. From my experience and research, it seems like you can display publicly accessible data if you scrape it (which is simple enough) but who wants to take that risk?

@ckabalan

This comment has been minimized.

Show comment
Hide comment
@ckabalan

ckabalan Jan 31, 2018

@jessejoe Actually, if you look at their site Conditions of Use...

Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.

And then...

Licensing IMDb Content; Consent to Use Robots and Crawlers: If you are interested in receiving our express written permission to use IMDb content for your non-personal (including commercial) use, please visit our Content Licensing section or contact our Licensing Department. We do allow the limited use of robots and crawlers, such as those from certain search engines, with our express written consent. If you are interested in receiving our express written permission to use robots or crawlers on our site, please contact our Licensing Department.

Ironically the Content Licensing section / Licensing Department are the ones who answered the emails above... So the answer is you cannot do it and everything be "on the up-and-up".

ckabalan commented Jan 31, 2018

@jessejoe Actually, if you look at their site Conditions of Use...

Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.

And then...

Licensing IMDb Content; Consent to Use Robots and Crawlers: If you are interested in receiving our express written permission to use IMDb content for your non-personal (including commercial) use, please visit our Content Licensing section or contact our Licensing Department. We do allow the limited use of robots and crawlers, such as those from certain search engines, with our express written consent. If you are interested in receiving our express written permission to use robots or crawlers on our site, please contact our Licensing Department.

Ironically the Content Licensing section / Licensing Department are the ones who answered the emails above... So the answer is you cannot do it and everything be "on the up-and-up".

@jessejoe

This comment has been minimized.

Show comment
Hide comment
@jessejoe

jessejoe Feb 2, 2018

@ckabalan it is not uncommon for companies to put unenforceable or even flat out wrong restrictions in their ToS. There is plenty of research and anecdotal (or even actual legal) cases out there supporting that publicly available information is generally safe to gather and/or scrape. The problem is there doesn't seem to be a clear precedent or legal foundation really set for it, so you wind up trying to wedge internet/computer activity into other laws. And again, it's just not worth the risk.

jessejoe commented Feb 2, 2018

@ckabalan it is not uncommon for companies to put unenforceable or even flat out wrong restrictions in their ToS. There is plenty of research and anecdotal (or even actual legal) cases out there supporting that publicly available information is generally safe to gather and/or scrape. The problem is there doesn't seem to be a clear precedent or legal foundation really set for it, so you wind up trying to wedge internet/computer activity into other laws. And again, it's just not worth the risk.

@MatthewPDingle

This comment has been minimized.

Show comment
Hide comment
@MatthewPDingle

MatthewPDingle Feb 10, 2018

You could come up with your own ratings and put the site back up using those. Maybe your ratings just happen to be very similar to what the IMDB ratings are...

MatthewPDingle commented Feb 10, 2018

You could come up with your own ratings and put the site back up using those. Maybe your ratings just happen to be very similar to what the IMDB ratings are...

@niamiot

This comment has been minimized.

Show comment
Hide comment
@niamiot

niamiot Feb 22, 2018

As @MatthewPDingle suggests, The point is to use a different rating source.
e.g. It seems TVtime (https://www.tvtime.com/fr) uses it's personal rating information. Maybe It would be easier to find an arrangement with them ?

niamiot commented Feb 22, 2018

As @MatthewPDingle suggests, The point is to use a different rating source.
e.g. It seems TVtime (https://www.tvtime.com/fr) uses it's personal rating information. Maybe It would be easier to find an arrangement with them ?

@tweakign

This comment has been minimized.

Show comment
Hide comment
@tweakign

tweakign commented Apr 12, 2018

Can't http://www.omdbapi.com/ be used for this?

@utkarshkukreti

This comment has been minimized.

Show comment
Hide comment
@utkarshkukreti

utkarshkukreti Apr 13, 2018

Contributor

@tweakign I actually contacted the owner of omdbapi for this last week. I haven't heard back from him yet.

Contributor

utkarshkukreti commented Apr 13, 2018

@tweakign I actually contacted the owner of omdbapi for this last week. I haven't heard back from him yet.

@tweakign

This comment has been minimized.

Show comment
Hide comment
@tweakign

tweakign Apr 21, 2018

@utkarshkukreti Any news? It would seem to be pretty ideal.

tweakign commented Apr 21, 2018

@utkarshkukreti Any news? It would seem to be pretty ideal.

@utkarshkukreti

This comment has been minimized.

Show comment
Hide comment
@utkarshkukreti

utkarshkukreti Apr 21, 2018

Contributor

@tweakign I heard back from Brian a couple of days ago. He clarified that the "imdbRating" and "imdbVotes" fields returned from omdbapi are actually not IMDb's data but data collected from other free sources. They do not provide any data sourced from IMDb anymore.

Contributor

utkarshkukreti commented Apr 21, 2018

@tweakign I heard back from Brian a couple of days ago. He clarified that the "imdbRating" and "imdbVotes" fields returned from omdbapi are actually not IMDb's data but data collected from other free sources. They do not provide any data sourced from IMDb anymore.

@afanjul

This comment has been minimized.

Show comment
Hide comment
@afanjul

afanjul May 7, 2018

+1 for trying with another rating source...

afanjul commented May 7, 2018

+1 for trying with another rating source...

@phiresky

This comment has been minimized.

Show comment
Hide comment
@phiresky

phiresky Sep 9, 2018

Just as a note, you can still get the old IMDb data from this ftp server [1] - that data is allowed to be used for non-commercial use (I got explicit permission to use it for my tv-show-ratings site in 2016). It's not updated as of Nov 2017 though.

For the newer data, I avoid the license by not hosting any data myself - instead I just parsed the data, hashed it, and put that hash into my website - the website then tries to fetch the data on page load in a dezentral manner using that hash. As long as a single person has the website open it will load :)

It's not ideal, but it mostly works fine - see https://phiresky.github.io/tv-show-ratings/

1: ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata

phiresky commented Sep 9, 2018

Just as a note, you can still get the old IMDb data from this ftp server [1] - that data is allowed to be used for non-commercial use (I got explicit permission to use it for my tv-show-ratings site in 2016). It's not updated as of Nov 2017 though.

For the newer data, I avoid the license by not hosting any data myself - instead I just parsed the data, hashed it, and put that hash into my website - the website then tries to fetch the data on page load in a dezentral manner using that hash. As long as a single person has the website open it will load :)

It's not ideal, but it mostly works fine - see https://phiresky.github.io/tv-show-ratings/

1: ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment