Skip to content

sgotzler/megaText

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mapping the Television Mega-Text

Welcome to the Git Hub repository for the "Mapping the Television Mega-Text" project. This repository contains a release of our dataset of programming information about 1950's television. This dataset was created by merging together metadata gathered from IMDB.com and elsewhere in an effort to collect, collate, and publish a compendium of metadata about television in the fifties, much of which does not survive in any watchable form today.

In this repository, you will find files containing all of the records from our dataset as CSVs, as well as the Jupyter Notebooks that we wrote in Python3 in order to merge the different sources of television metadata into a single dataset, and that we also used to run a preliminary round of textual analysis working-class content in the Action & Adventure genres.

The notebooks for these python scripts and their accompanying CSVs and image outputs are located in the "merge" and "analysis" folders respectively. The notebook merge.ipynb pulls from a local sqlite database of imdb metadata downloaded on 09/2017. The "merge" folder also contains the jupyter notebook "wiki.ipynb" that contains exploratory work undertaken towards incorporating scheduling information gathered from wikipedia into the dataset. This scheduling information is NOT incorporated into the dataset as it currently is constituted in the file "title5.csv".

Additionally, this README file contains a description of the data, the data structure, and some guidelines on using the data. Please take a minute to briefly read over the sections below carefully.

Data Structure

The compiled data is released as a CSV dump. The file "title5.csv" contains the merged dataset of 1950's programming, and can be found in the merge folder.

Programming information was gathered from 2 sources:

Metadata Format

The merged dataset contains program information for 2563 records. Each program record contains 14 different metadata attributes.

In the dataset, these series are displayed as the following columns:

Series Description
realtitle assigned by us (defaults to Encyclopedia's title when matched otherwise displays imdb title)
program_type from Encyclopedia
program_genre from Encyclopedia
network from Encyclopedia
program_description from Encyclopedia
first_air_year from Encyclopedia
last_air_year from Encyclopedia
genre(3) from IMDB
plot(98) from IMDB
trivia from IMDB
program_title from Encyclopedia
movie_id from IMDB
kind_id from IMDB (denotes the type of programming content, i.e. series, episode or tv movie)
title from IMDB

Note: Program Genre

There is often more than one genre tags associated with a program. This include the single genre assignment taken from Terrace, and the multiple genre tags assigned on IMDB. In the CSV, these are separated into separate series, and appear as two different columns. The genre tags from Terrace appear under "program_genre," while the IMDB genre tags are collated under genre(3).

Note: Program Description

Note that the program description field contains the richest amount of textual information pertaining to a given program. Note also, that a range of different sub-types of metadata exist within "program_description" as well. In addition to the prose descriptions of a given program's premise, flavor, and content, the description field may also contain numerous lists of categorized participants and creators. These lists are denoted by unique titles followed by a colon, and may be separated out manually by this punctuation. Types of participant categories occurring in the "program_description" field include:

  • "Host:"
  • "Hostess:"
  • "Starring:"
  • "Regulars:"
  • "Guests:"
  • "Music:"
  • "Orchestra:"
  • "Vocalists:"

Dataset Integrity

This dataset is provided for the purposes of further exploration, education, experimentation regarding the largely lost world of early television programming.

If you have identified errors in the dataset, or have additional information to add, we welcome your feedback! Please contact us at kn4@andrew.cmu.edu

Thanks!

Pull Requests

Please note that we will not accept pull requests for the data in this repository.

If you have corrections, please email them to us at kn4@andrew.cmu.edu and we will consider suggested corrections for inclusion in a future release.

Attribution

Our dataset is being offered under the CC-BY 4.0 Creative Commons License: License: CC BY 4.0

We respectfully ask that you acknowledge "Mapping the Television Mega-Text" and dSHARP at CMU as a source wherever possible, in order to preserve a link to the dataset.

If this data is to be cited in a publication, please cite it using this DOI #ADD DOI from KiltHub

Use of this dataset does not grant or imply the approval, commission, or support of your work by the researchers, Carnegie Mellon University, or dSHARP at CMU. If you transform or modify to the dataset, you must clearly distinguish the resulting work as having been modified from this dataset.

Acknowledgement

The writers would like to thank and acknowledge the funding of this project by the Andrew W. Mellon Foundation, as well as the technical and professional support of digital humanists and specialists at Carnegie Mellon University and the University of Pittsburgh including Scott Weingart, Daniel J. Evans, Emma Slayton, Matt Lavin, and Matthew Lincoln.

About

Mapping the Television Mega-Text

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •