Welcome to the Git Hub repository for the "Mapping the Television Mega-Text" project. This repository contains a release of our dataset of programming information about 1950's television. This dataset was created by merging together metadata gathered from IMDB.com and elsewhere in an effort to collect, collate, and publish a compendium of metadata about television in the fifties, much of which does not survive in any watchable form today.
In this repository, you will find files containing all of the records from our dataset as CSVs, as well as the Jupyter Notebooks that we wrote in Python3 in order to merge the different sources of television metadata into a single dataset, and that we also used to run a preliminary round of textual analysis working-class content in the Action & Adventure genres.
The notebooks for these python scripts and their accompanying CSVs and image outputs are located in the "merge" and "analysis" folders respectively. The notebook merge.ipynb pulls from a local sqlite database of imdb metadata downloaded on 09/2017. The "merge" folder also contains the jupyter notebook "wiki.ipynb" that contains exploratory work undertaken towards incorporating scheduling information gathered from wikipedia into the dataset. This scheduling information is NOT incorporated into the dataset as it currently is constituted in the file "title5.csv".
Additionally, this README file contains a description of the data, the data structure, and some guidelines on using the data. Please take a minute to briefly read over the sections below carefully.
The compiled data is released as a CSV dump. The file "title5.csv" contains the merged dataset of 1950's programming, and can be found in the merge folder.
Programming information was gathered from 2 sources:
- (1) a bulk download of publicly available data from IMDB
- (2) a public domain scan of Vincent Terrace's Encyclopedia Of Television Shows 1925-2007 on the Internet Archive.
The merged dataset contains program information for 2563 records. Each program record contains 14 different metadata attributes.
In the dataset, these series are displayed as the following columns:
Series | Description |
---|---|
realtitle | assigned by us (defaults to Encyclopedia's title when matched otherwise displays imdb title) |
program_type | from Encyclopedia |
program_genre | from Encyclopedia |
network | from Encyclopedia |
program_description | from Encyclopedia |
first_air_year | from Encyclopedia |
last_air_year | from Encyclopedia |
genre(3) | from IMDB |
plot(98) | from IMDB |
trivia | from IMDB |
program_title | from Encyclopedia |
movie_id | from IMDB |
kind_id | from IMDB (denotes the type of programming content, i.e. series, episode or tv movie) |
title | from IMDB |
There is often more than one genre tags associated with a program. This include the single genre assignment taken from Terrace, and the multiple genre tags assigned on IMDB. In the CSV, these are separated into separate series, and appear as two different columns. The genre tags from Terrace appear under "program_genre," while the IMDB genre tags are collated under genre(3).
Note that the program description field contains the richest amount of textual information pertaining to a given program. Note also, that a range of different sub-types of metadata exist within "program_description" as well. In addition to the prose descriptions of a given program's premise, flavor, and content, the description field may also contain numerous lists of categorized participants and creators. These lists are denoted by unique titles followed by a colon, and may be separated out manually by this punctuation. Types of participant categories occurring in the "program_description" field include:
- "Host:"
- "Hostess:"
- "Starring:"
- "Regulars:"
- "Guests:"
- "Music:"
- "Orchestra:"
- "Vocalists:"
This dataset is provided for the purposes of further exploration, education, experimentation regarding the largely lost world of early television programming.
If you have identified errors in the dataset, or have additional information to add, we welcome your feedback! Please contact us at kn4@andrew.cmu.edu
Thanks!
Please note that we will not accept pull requests for the data in this repository.
If you have corrections, please email them to us at kn4@andrew.cmu.edu and we will consider suggested corrections for inclusion in a future release.
Our dataset is being offered under the CC-BY 4.0 Creative Commons License:
We respectfully ask that you acknowledge "Mapping the Television Mega-Text" and dSHARP at CMU as a source wherever possible, in order to preserve a link to the dataset.
If this data is to be cited in a publication, please cite it using this DOI #ADD DOI from KiltHub
Use of this dataset does not grant or imply the approval, commission, or support of your work by the researchers, Carnegie Mellon University, or dSHARP at CMU. If you transform or modify to the dataset, you must clearly distinguish the resulting work as having been modified from this dataset.
The writers would like to thank and acknowledge the funding of this project by the Andrew W. Mellon Foundation, as well as the technical and professional support of digital humanists and specialists at Carnegie Mellon University and the University of Pittsburgh including Scott Weingart, Daniel J. Evans, Emma Slayton, Matt Lavin, and Matthew Lincoln.